CRDO - Sharing oral resources for research
Versión en español
Version française
Chinese version
The
CRDO project (Resource Center for the Description of Oral) started in 2006 under the banner of CNRS, the National Center for Scientific Research in France. Now in full swing, CRDO is offering labs and independent scholars a free-of-charge service for the storage, sharing and long-term preservation of sound/video recordings and their related material in compliance with the OAIS model.
Items belonging to four categories are available on the site:
Primary data, basically oral/video corpora and any speech-related signal;
Resources such as annotations of corpora, lexicons, reference databases, systems of representation, grammars…;
Tools for linguistic research;
Collections binding together several items of the defined types.
The
crdo.fr website is hosted by Université de Provence. Currently, packages are distributed via CC-IN2P3 and preserved on the archival platform of CINES.
Read CRDO guidelines for the sharing and long-term preservation of oral resources
Features specific to CRDO
- Multi-language access to data: navigation is possible in four languages (English/Spanish/French/Chinese) ; descriptions, keywords and tables of contents may be entered in an optional language in addition to navigation languages.
- CRDO is entirely at the service of producers of speech resources, i.e. laboratories (see
current list) and individual producers irrespective of language/geographical boundaries. Current development aims at allowing producer laboratories to install their own web services for the display/streaming/analysis of data distributed via CRDO. - The range of contributions is as wide as possible, from experimental linguistics (laboratory data) to contact linguistics (field data).
- Contributors do their best to supplement data with resources and tools that will facilitate their processing. The aim is to offer a whole set of devices, from primary acoustic signals to the editing and processing of these signals. This service should give access to information, tools and methods allowing data analysis as well as annotations produced by these tools.
- Each item may be put in
relation with publications, teams and research programs. - A sound resource may be stored with CRDO as a personal deposit or in connection with the institution/laboratory which its author was affiliated with at the time of its production. Each institution may declare a path from which its information system will be able to deal with instructions from CRDO that may launch processes with respect to a particular item. (See details and the
list of institutions). - Thanks to the CRDO licence downloadings are traced and may be followed up. Users commit themselves (1) to mention their use of downloaded items in every publication and (2) to enter on the CRDO website the references of
publications based on this use. In this way, the relevance and utility of each corpus, tool or resource distributed by CRDO may be assessed by the speech research community. - Producers of any item distributed by CRDO are granted access to its downloading report as well as the coordinates, professional affiliation and fields of interest of users who downloaded items shared on CRDO. This feature is complementary to the sharing of
publications. We hope that it will promote the emergence of communities of producers and users (Web 2.0 approach) collaborating on research projects making an optimal use of available resources.
CRDO: Current state of the art (July 2010)
CRDO reached a critical stage this year as we turned on production mode for long-term preservation. This is an outcome of two years of collaborative work on a pilot project coordinated by TGE-Adonis, a "Major Facility" launched by CNRS "to connect Humanities and Social Sciences". In this project we designed long-term preservation procedures fully compliant with the OAIS framework which had earlier proved successful for data sharing between major space agencies of the whole world.
Adapting OAIS to oral resources turned out a tedious task due to features specific to linguistic research and the expectations of scholars engaged in the production and processing of these resources. Notably,
- There is an apparent contradiction between the required stability of long-term storage and the life cycle of projects in which annotations, translations and analytical material are likely to change over time.
- Accuracy of metadata in the field of humanities implies updates at unpredictable moments.
- Because of the cultural sensitivity of speech data, it is necessary to implement evolutive access rights to items (stored objects) and even to individual files within items. Speakers may for instance decide that a fragment of the corpus is eligible for immediate sharing by the public (or specific groups of users) whereas other fragments shall remain confidential. The system must be able to cope with changes in decisions of this kind.
- This implementation of OAIS combines long-term preservation with a flexible framework for "work in progress". For instance, URLs pointing at to open-access files remain stable over their versioning and they are not dependent on the actual location of the distribution site. (Read details on this page.)
All these issues have been raised and solved. To get an idea about the level of flexibility that we implemented on CRDO please consult the page on packaging.
After the signature of legal documents between CNRS and the French National Archive, CRDO acquired the right to store information packages on the long-term preservation platform of CINES and get them transfered for distribution by a platform hosted by CC-IN2P3. This process is entirely transparent to users and producers of shared resources, as they keep interacting with the
CRDO website while actual processes are channeled through secure background links to the large computing centers.
The legal transaction gave us the green light for starting long-term preservation. Thus, we are now in position to transfer source material to these centers with potentially unlimited storage space, which in turn will allow us to receive more material on the CRDO website.
- Recent changes on the CRDO site (from the RSS feed)
- Steering committee
- Legal aspects
- OLAC
- Collaborations of our group
- Data format
- Current development
CRDO page on the TGE-Adonis wiki- Links
CrdoWiki
This wiki space is dedicated to the production and sharing of information about:
- projects related to corpora, tools and resources distributed or documented by CRDO;
- teams working on these projects;
- scholars taking part in these teams;
- documentation on corpora, tools and resources distributed by CRDO — direct links may be retrieved from records stored on the
CDRO site.
We strongly recommend using
Wikipedia as a priority for publishing material of encyclopedic value.
