FAQs

Preguntas más frecuentes


What is PADICAT?


It is an initiative of the Biblioteca de Catalunya which consists of capturing, processing and giving permanent access to all the Catalan output of a cultural, scientific and general nature in digital format. Definitively, the object is to archive the Catalan internet.

The complete and detailed explanation regarding the aims, objectives and functioning can be found in the section What is PADICAT?.

back to top
--------------------------------------------------------------------------------

 

What can I do to get my website to appear in PADICAT?

PADICAT has various means of capturing websites: the systematic capture of websites with the domain .cat, the capture of the websites pertaining to those institutions with which the Biblioteca de Catalunya has signed a collaboration agreement, the capture of websites which are considered relevant from a search made when browsing, and the capture of those websites which (once their relevance has been confirmed) are incorporated into the collection through the recommendations of users.

If you wish your website to form part of the PADICAT collection, you can send in your recommendations by completing a short form in the section Propose a website.

From the moment that a website becomes a part of the repository, it is captured at least twice a year, and this frequency may increase in the future.

 

back to top
--------------------------------------------------------------------------------

 

What can I do to avoid my website appearing in PADICAT?

Your website can avoid forming part of the collection by the simple inclusion of a robots.txt file which will prevent the website from being visited by our robot.

The robot which we use is identified as PADICAT, and follows the Standard for Robot Exclusion (SRE), which means that it does not enter into any website or component part of a website which is protected using this method, unless there has been previous agreement and authorization between the institution and the Biblioteca de Catalunya.

 

back to top
--------------------------------------------------------------------------------

 

When I visit some of the captured websites, why is it that I can’t see some images or access some of the links?

The purpose of PADICAT is to preserve websites exactly as they were at the moment of capture. At the same time, it seeks to offer users the possibility of browsing the captured websites in the same way as if they were doing so on the real Internet.

However, often there are elements which make the optimum viewing of these websites difficult, or browsing between hyperlinks. 3 basic tips to avoid some of the anomalies in the viewing of captured websites are:


Don't use adresses which include the URL in pages of the same site. So, instead of:

http://www.example.cat/imagenes/logotipo.jpg

or

http://www.example.cat/menu.html,

it would be more advisable to use:

/images/logotipo.jpg

and

/menu.html


Don't use the html tag refresh to return to another page. Example:

< html >
< head >
....
< meta http-equiv="refresh" content="2;url=http://example.cat" >
....
< /head >
....
< /html >


Don't use extracts from external pages, whether images, scripts or other.



back to top
--------------------------------------------------------------------------------

I'm trying to visit a harvested website, but it can't be loaded. Is there any solution?

Occasionally, the harvested website can’t be loaded. Sometimes it happens because the browser (Firefox, Internet Explorer, etc.) has the JavaScript checkbox enabled. Deactivating this option in your browser you will accede to the most of these harvested websites.

Instructions to activate / deactivate this tool in your browser:

http://support.google.com/adsense/bin/answer.py?hl=en&answer=12654

 

back to top

--------------------------------------------------------------------------------

What does PADICAT capture from each website?

PADICAT captures only the websites and parts of websites that are accessible from the Internet. Apart from respecting the limitations that the proprietors of websites may impose (see What can I do to avoid my website appearing in PADICAT?), PADICAT does not enter into or capture any website that requires a password, form, etc, such as for example, areas reserved for the collegiate members of a professional association, or for subscribers to a publication, etc.


back to top
--------------------------------------------------------------------------------

 

I recommended my website as a part of the collection and I can't find it on the database - why?

PADICAT currently has 4 ProLiant DL360 G4p servers working at 100% of their capacity around the clock. Even so, the large number of resources to be captured means that queues are formed, which can slow down the capture of proposed resources.


back to top
--------------------------------------------------------------------------------

 

What volume and capacity of data does PADICAT have?


The volume of data stored in PADICAT can be consulted through the What do we have section of our website, in which the figures are periodically updated.



back to top
--------------------------------------------------------------------------------

 

Can PADICAT capture and display correctly any kind of website?

Owing to irregularities in the file viewing software and inconsistencies during the archiving of these websites (e.g. robots.txt exclusions), some websites may not be displayed correctly (external links, forms or search boxes, fallen images) or may redirect to the current version of the website.

Websites which use html standards of accessibility and language shouldn’t have any problems either in the capture or viewing once they are stored in PADICAT. However, on the other hand, there are certain elements which may complicate both the capture of resources and, above all, their subsequent viewing within the collection. Some recommendations:

For the capture of a website:

  • robots.txt; for general norm, PADICAT respects websites which use exclusion elements.

For browsing and viewing of the captured version:

Links:

  • links: images, scripts, etc. of other external websites. If these elements belong to an external website, they will not be displayed correctly once the website has been captured by PADICAT. It is recommended that you save these logos in the image directory of your server, and that you use relative paths in your website.
  • use relative and/or absolute paths to build the link, rather than using the complete URL.
  • don’t use scripts to build links dynamically.
  • avoid the embedding of flash objects where the links are absolute.
  • avoid using the base href label.
  • avoid using links to URLs that redirect to another site.

Interpreted languages:

  • avoid using local variables on the server that allow variations to the appearance of the site to be viewed, such as for instance, changes of language or dynamic changes to menus.

Encoding:

  • PADICAT uses UTF-8 encoding for the visualization of characters. Errors may occur in the viewing of websites (for example diacritics, et. al) which use a different encoding (e.g. Latin-1), if this is not specified in the original website. Thus it is recommended that the encoding used in the website be specified.

Accessibility recommendations:

  • we recommend avoiding the use of frames, as this can complicate the process of indexing of the website, and, thus, the subsequent retrieval of the website in the search by text.
  • we recommend offering alternatives for access to the information in pages which use Javascript, since there are devices which do not support this code or have the browser option de-activated.

Other recommendations for webmasters:

  • use pages that are not too heavy.
  • do not fill up the same page with too many images.
  • follow the norms of accessibility (frames, coding, etc.).
  • do not use spaces in filenames.


back to top
--------------------------------------------------------------------------------

 

Does the language I use to search have a bearing on the search results?

The indexes generated by the software – from the captured websites – and which are used for searches by key word are unique; that is to say, they are independent of the language which the user chooses from the consultation interface of PADICAT, and depend solely on the language in which the captured website is written.

Therefore, the search terms should be independent of the language in which the user is browsing through PADICAT. Even so, a larger number of results will be obtained if the terms introduced are in Catalan.


back to top
--------------------------------------------------------------------------------

 

Help with searching

Tips for searching

  • To search by free text, use the search by word.
  • To search by specific domain, use the search by URL.

Tips for the advanced search

  • Type in one or more search terms.
  • If appropriate, specify the domain on which you wish to perform the search.
  • In order to limit the search to a period of time, specify the start and finish dates.
  • In order to limit the results to a kind of file, specify one.
  • In order to search within one event, select the corresponding collection; if you wish to search in all the resources, select "All".

Combined and / or expert searches

  • The word can be complete or abbreviated (e.g. coun to find council and councilor)
  • If you type in one or more terms to search, the system will retrieve those resources which contain all of the search terms typed in.
  • Use the AND operator in order to retrieve resources which contain all of the words typed in (e.g.. councilor AND elections).
  • Use the OR operator to retrieve resources which contain one or other of the words typed in (e.g. education OR formation).;
  • Use inverted commas (“”) to look for an exact phrase (e.g. “roda de ter”)

 

back to top
--------------------------------------------------------------------------------

 

What is the content of PADICAT?
 

In the section Crawled web sites you can consult the number of websites contained in PADICAT and the number of captures from these websites carried out on different dates. Also shown is the number of files which make up each capture that can be found in the repository. These files are mainly web pages, approximately 70% of them html, 10% images, 2% pdf, etc. (in order to know the exact details of the kinds of files which make up the websites in PADICAT, see our press release).
Finally, the space occupied is shown, which includes the size of the ARC compressed files which store the captures and their indexes.

This data is updated automatically when new resources are added to the collection.


back to top
--------------------------------------------------------------------------------

 


Doubts and suggestions

If you have any query which has not been resolved or any suggestion to make to us, you can do this using the following form.


back to top