Crawled web sites

Since 2006 July 21, the system has evolved to capture systematically and effectively a growing websites number. In addition, space required to store and give access to these dates also has evolved.

In exhibition to data statistics, we mean “web”, or “web site”, as a resource published on the Internet identified by an independent URL. We mean “capture”, each made capture in web site time. And we mean “file”, each technical files or archives that contains a website. Are included other technical dates that can be interesting for PADICAT public.

The repository contents:

ConceptTotal
Number of websites63.145
Number of harvests277.819
Number of files372.932.876
ARC space (TB)13,5
Indexing space (TB)1
Total space (TB)14,5

Origin of harvests

Deposited resources in repository are from: capture .CAT domain of compiled resources to create monographic collections; websites recommended by PADICAT public; and digital resources of institutions that have signed cooperation agreement with Biblioteca de Catalunya.

ConceptNumber of websitesNumber of harvests
Agreements5984.138
Recommended11.50769.655
Monographics7.18495.228
.cat43.856108.798
Total63.145277.819

 

Distribution of file type that contains PADICAT repository

 

TypeFiles
text/html282.840.29075,84%
image/jpeg42.854.85711,49%
image/gif9.520.2302,55%
image/png7.836.0332,10%
application/pdf5.661.4801,52%
application/atom+xml4.020.4131,08%
text/xml2.704.6120,73%
application/rss+xml2.464.6960,66%
text/css2.226.5450,60%
text/plain1.786.4540,48%
application/javascript1.666.1660,45%
text/dns1.441.1110,39%
application/x-shockwave-flash1.339.6100,36%
application/xml972.5650,26%
application/x-javascript869.6380,23%
no-type525.0920,14%
application/octet-stream400.6760,11%
application/msword322.7650,09%
application/http319.7000,09%
image/pjpeg268.9210,07%
Other2.891.0220,78%
Total372.932.876 

Monographic evolution: PADICAT topic collections

PADICAT made eight monographic collections: Catalan museums, folk-rock music in Catalonia, Parliament European Parliament elections campaign (2009), Parliament of Catalonia (2006 and 2010), Spanish Congress and Senate (2008), and local elections (2007 and 2011).

 

ConceptNumber of new websibtesNumber of harvestsNumber of filesSpace (GB)
Catalan Elections 2006817754.953.215175
Local Elections 20075311.74713.641.991457
Folkrock music56561.148.31222
Spanish Elections 20081298963.117.638135,11
European Elections 20091706135.404.291233,05
Catalan museums1.5231.5502.146.133147,49
Catalan Elections 201096731.21017.202.999707,65
Local Elections 20113.34647.42917.202.9991.127
Spanish Elections 20113049391.764.159276
Catalan Elections 20127710.01316.890.655328
Total7.18495.22888.036.2253608,3

 

More about  PADICAT monographic collections in election campaigns. The text is in Spanish:

Ciro Llueca; Daniel Cócera; Natalia Torres; Gerard Suades; Ricard de la Vega (2011). “A ritmo de tweet: archivando elecciones 2.0”. El profesional de la información, vol. 20, nº 3.
http://eprints.rclis.org/handle/10760/15764