Crawled web sites
Since 2006 July 21, the system has evolved to capture systematically and effectively a growing websites number. In addition, space required to store and give access to these dates also has evolved.
In exhibition to data statistics, we mean “web”, or “web site”, as a resource published on the Internet identified by an independent URL. We mean “capture”, each made capture in web site time. And we mean “file”, each technical files or archives that contains a website. Are included other technical dates that can be interesting for PADICAT public.
The repository contents:
Concept | Total |
---|---|
Number of websites | 146,460 |
Number of harvests | 383,871 |
Total space (TB) | 54.72 |
Origin of harvests
Deposited resources in repository are from: capture .CAT domain of compiled resources to create monographic collections; websites recommended by PADICAT public; and digital resources of institutions that have signed cooperation agreement with Biblioteca de Catalunya.
Concept | Number of websites | Number of harvests |
---|---|---|
Agreements | 723 | 7,288 |
Recommended | 12,142 | 86,509 |
Monographics | 7,184 | 95,228 |
.cat | 53,186 | 132,267 |
Total | 72,241 | 321,292 |
Distribution of file type that contains PADICAT repository
Type | Files | |
---|---|---|
text/html | 470.031.467 | 69,71% |
image/jpeg | 94.859.253 | 14,07% |
image/png | 16.853.933 | 2,50% |
image/gif | 13.146.830 | 1,95% |
application/rss+xml | 9.048.245 | 1,34% |
application/pdf | 8.267.872 | 1,23% |
application/atom+xml | 6.382.130 | 0,95% |
text/xml | 6.274.786 | 0,93% |
text/css | 5.721.966 | 0,85% |
application/json | 5.449.566 | 0,81% |
application/javascript | 5.020.227 | 0,74% |
text/dns | 4.922.473 | 0,73% |
text/plain | 4.870.847 | 0,72% |
application/javascript | 3.930.964 | 0,58% |
application/http | 2.214.667 | 0,33% |
text/javascript | 1.809.139 | 0,27% |
application/x-javascript | 1.735.910 | 0,26% |
application/xml | 1.613.883 | 0,24% |
application/opensearchdescription+xml | 1.362.813 | 0,20% |
Others | 10.792.070 | 1,60% |
Monographic evolution: PADICAT topic collections
PADICAT made eight monographic collections: Catalan museums, folk-rock music in Catalonia, Parliament European Parliament elections campaign (2009), Parliament of Catalonia (2006 and 2010), Spanish Congress and Senate (2008), and local elections (2007 and 2011).
Concept | Number of new websibtes | Number of harvests | Number of files | Space (GB) |
---|---|---|---|---|
Catalan Elections 2006 | 81 | 775 | 4,953,215 | 175 |
Local Elections 2007 | 531 | 1,747 | 13,641,991 | 457 |
Folkrock music | 56 | 56 | 1,148,312 | 22 |
Spanish Elections 2008 | 129 | 896 | 3,117,638 | 135.11 |
European Elections 2009 | 170 | 613 | 5,404,291 | 233.,05 |
Catalan museums | 1.523 | 1,550 | 2,146,133 | 147.49 |
Catalan Elections 2010 | 967 | 31,210 | 17,202,999 | 707.65 |
Local Elections 2011 | 3.346 | 47,429 | 17,202,999 | 1,127 |
Spanish Elections 2011 | 304 | 939 | 1,764,159 | 276 |
Catalan Elections 2012 | 77 | 10,013 | 16,890,655 | 328 |
Total | 7,184 | 95,228 | 88,036,225 | 3608.3 |
More about PADICAT monographic collections in election campaigns. The text is in Spanish:
Ciro Llueca; Daniel Cócera; Natalia Torres; Gerard Suades; Ricard de la Vega (2011). “A ritmo de tweet: archivando elecciones 2.0”. El profesional de la información, vol. 20, nº 3.
http://eprints.rclis.org/handle/10760/15764