Jhove
[Valid RSS]   [Valid Atom 1.0]

Speech & Language Data Repository

http://sldr.org

Investissements d'avenir TGE-Adonis CLARIN
Open archives (OAI-PMH)


-   [Sign up]   /   [Login]   - 
--- --- --- --- --- --- --- --- --- --- --- --- 
/ 中文 /  English / español / français / 
Visible items : 235
Documents : 301507
Members : 374 (47 countries)
Spoken languages : 160

Speech & Language Data Repository (SLDR)

SLDR (pronounce ‘SpLanDR’) is replacing CRDO-Aix at the term of its experimental phase. Acronyms CRDO, CRDO-Aix and CRDO-Paris are therefore out of date and should gradually be replaced in all documents. Nonetheless, we are maintaining redirections from ‘crdo.fr’ for the sake of accessibility via old identifiers.

Speech & Language Data Repository (SLDR) is a Trusted Digital Repository offering labs and scholars a service for sharing their oral/linguistic data and archiving it following procedures compliant with the OAIS model for long-term preservation. Its storage is referenced in international repositories such as OLAC (Open Language Archives Community) and Virtual Language Observatory. Currently, packages are distributed via the TGE-Adonis grid hosted by CC-IN2P3 and preserved on the platform of CINES, an institutional archive site beneficiary of the Data Seal of Approval.
SLDR for speech, and CNRTL for text, are the resources centres from which a French sub-network of CLARIN centres is being built: the ORTOLANG project associated with the CORPUS VLRI.

All items and documents are labelled with persistent identifiers facilitating their access regardless of their location, version and status.

Items of four kinds are available on this site:

  • Primary data : sound/video/image/text corpora and any language-related signal ;
  • Resources : annotations of corpora, lexicons, reference databases, systems of representation, grammars etc. ;
  • Tools for linguistic research ;
  • Collections of items as defined above.

Read our flyer, our guidelines for the sharing and long-term preservation of oral resources, and visit our slideshow!

The latest deposits (106) >> morepage 1  >>
Collection sldr000725 OMProDat
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
An Open Source initiative of recordings based on the EUROM1 protocol
(computational_linguistics)
hdl:11041/sldr000725
2013-05-07
Version 1
source data
[tempARK] Collection ldc000828 Treebanks
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
A collection of Treebanks
(computational_linguistics)
hdl:11041/ldc000828
2013-04-15
Version 1
medium-term preservation
Primary data (corpus) ldc000826 Arabic Treebank : Part 1 v 4.1 (Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, Wadji Zaghouani)
Linguistic Data Consortium (LDC, Philadelphia US)
ldc
Arabic Treebank: Part 1 (ATB1) v 4.1, Linguistic Data Consortium (LDC) catalog number LDC2010T13 and isbn 1-58563-566-9, was developed at LDC. It consists of 734 newswire stories from Agence France Presse (AFP) with part-of-speech (POS), morphology, gloss and syntactic treebank annotation in accordance with the Penn Arabic Treebank (PATB) Guidelines developed in 2008 and 2009.
(computational_linguistics)
Arabic (عربي)
picto
hdl:11041/ldc000826
2013-04-11
Version 1
source data
• Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003
Collection prax000822 Cyberbase Gradignan
Praxiling - UMR 5267 (Montpellier FR)
Telem - EA 4195 (Bordeaux FR)
Le corpus "Cyberbase Gradignan" a été recueilli de juillet 2010 à juin 2012 dans le cadre de l'expérimentation Cyber-base® Justice mise en oeuvre la Maison d'Arrêt de Gradignan et finalisée à l'accès à l'information, à l'apprentissage de l'informatique et à l'enseignement.
Il est constitué d'enregistrements audiovisuels portant, d'une part, sur les activités dans l'espace informatique de la Maison d'Arrêt et, d'autre part, sur des entretiens avec les différents acteurs.

(anthropological_linguistics, applied_linguistics)
French
hdl:11041/prax000822
2013-03-27
Version 1
source data
Google earth
OpenStreetMap
[tempARK] Primary data (corpus) sldr000018 Aboriginal people in Taiwan: Amis and Chinese speakers in urban areas (Francois DE SULAUZE)
Wenzao Ursuline College of Languages (WTUC, Taiwan TW)
Amis people is one of the aboriginal people in Taiwan. The Amis language belongs to austronesian languages (formosan group), it is spoken by more than 100,000 Amis persons living in Taiwan. Since the Government of Taiwan has imposed Chinese language for more than 50 years, today Amis language can be considered as an endangered language. This corpus comes from field work with Amis people living in urban context; it describes their usages concerning three languages: Amis, Chinese, and Minnan. (Minnan people are about 70% of the population of Taiwan.) The content is 25 tapes recorded in two cities (Hualien and Taipei) from January 2002 to August 2003. It consists in interviews (in both Chinese and Amis languages), free conversations and Christian liturgy.
(sociolinguistics, anthropological_linguistics, language_documentation)
Amis (Pangcah)
picto picto2

Ms Dibus (in Chinese)

Ms Dibus (in Amis)

Mr Lin (in Chinese)

Mr Lin (in Amis)
catalogue.pdf
hdl:11041/sldr000018
2013-03-21
Version 2
long-term preservation
(Publications)
Google earth
OpenStreetMap

This material is Open Data
[tempARK] Tool sldr000526 Anonymise sound files (Daniel HIRST)
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
PRAAT script.
Purpose: replace portions of a long sound which are labelled with a key word on the accompanying TextGrid with a hum sound with the same prosodic characteristics as the original sound.
Original long sound can be mono or stereo, anonymised sound will be same.

(applied_linguistics, cognitive_science, language_documentation, speech_prosody, computational_linguistics)

>> Collection LPL tools lpl-000763
picto
Creative Commons License
Anonymise sound files by Daniel Hirst is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
hdl:11041/sldr000526
2013-03-11
Version 5
long-term preservation

This material is Open Data
Collection sldr000805 OpenProDat
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
A collection of corpora inspired by EUROM1
(computational_linguistics)
picto
hdl:11041/sldr000805
2013-03-05
Version 1
source data
Collection sldr000804 Travaux d'étudiants Master LEX
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
Département de sciences du langage, Université d'Aix-Marseille (Aix-en-Provence FR)
Travaux des étudiants Master LEX à l'Université d'Aix-Marseille
(computational_linguistics)
hdl:11041/sldr000804
2013-03-04
Version 1
source data
Secondary data (resource) sldr000803 ORCHID.fr (Laurent PREVOT)
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
Département de sciences du langage, Université d'Aix-Marseille (Aix-en-Provence FR)
A dataset composed of: (i) Primary resources extracted from the CID corpus (1h30 of Narrative Sequences), (ii) Annotation produced in the framework of ORCHID and OTIM projects.
(general_linguistics, phonetics, phonology, speech_prosody, applied_linguistics)
French; Chinese
hdl:11041/sldr000803
2013-02-25
Version 1
source data
Google earth
OpenStreetMap
[tempARK] Primary data (corpus) sldr000732 MAPTASK-AIX (Ellen Bard, Corine Astésano, Cheryl Frenck-Mestre, Mariapaola D'imperio, Alice Turk, Noël Nguyen)
Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)
A set of French dialogues elicited with the MAPTASK protocol.
Alignment at the utterance level.
The recording and the transcription have been done in the Framework of Corine Astésano's Marie-Curie Fellowship.

French (français)
picto
hdl:11041/sldr000732
2013-02-08
Version 2
long-term preservation
(Publications)
Secondary data (resource) sldr000801 Toponymes de Valjouffrey (Julien GAILLARD, Robert RAMOS)
Individual contribution
Dessins des montagnes visibles de la vallée de Valjouffrey et annotation des toponymes selon leur appelation ancienne.
(language_documentation, lexicography, sociolinguistics, anthropological_linguistics)

>> Collection Valjouffrey valjouffrey-000007
picto
hdl:11041/sldr000801
2013-01-17
Version 1
source data

This material is Open Data
Primary data (corpus) sldr000799 Corpus de parole semi-spontanée versus lue, chez des sujets dyslexiques versus lambda (Ambre DENIS)
Département de linguistique et phonétique générales, Université d'Aix-Marseille (Aix-en-Provence FR)
Ce corpus contient 9 enregistrements de trois locuteurs différents. Chaque locuteur effectue trois tâches : (1) raconter une bande-dessinée (http://www.espacegraphique.com/blog/carton/dessins/bd-sur-la-colline-593), (2) lire un texte (L’aigle, d’après Georgette Barthélémy, Les animaux et leurs secrets, F. Nathan, édit.), (3) raconter une vidéo (http://www.simonscat.com/Films/Cat-Chat/).
Deux des locuteurs sont dyslexiques (H25 et F52).

(computational_linguistics)
French (français)
picto picto2

F19 semi-spontaneous speech (telling a cartoon)

F52 reading (dyslexical speaker)

H25 semi-spontaneous speech (telling a video)
hdl:11041/sldr000799
2013-01-16
Version 1
source data
page 1  >>

The 8 most frequent downloadings under SLDR licence
Primary data (corpus) Videos of CID - Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)Downloaded 58 time(s) (?)
Secondary data (resource) Annotations of CID - Roxane BERTRANDDownloaded 51 time(s) (?)
Primary data (corpus) Aix-MARSEC database - Daniel Hirst, Céline De Looze, Cyril Auran, Caroline BouzonDownloaded 48 time(s) (?)
Secondary data (resource) VfrLPL - Stéphane RAUZYDownloaded 27 time(s) (?)
Secondary data (resource) Grammar of French language (GP) - Marie-Laure GUéNOTDownloaded 20 time(s) (?)
Primary data (corpus) ANGLISH - Anne Tortel, Daniel HirstDownloaded 16 time(s) (?)
Primary data (corpus) Apéro-Toulouse - Apéro-Toulouse - Laurent PREVOTDownloaded 14 time(s) (?)
Primary data (corpus) EUROM1_fr - Institut de la communication parlée (ICP, Grenoble FR), Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)Downloaded 11 time(s) (?)

The 8 most frequent open-access downloadings (since 4/4/2013)
Tool MOMEL - Daniel HIRSTDownloaded 45 time(s)
Tool Sampa2Praat - Daniel HIRSTDownloaded 41 time(s)
Tool Bol Processor BP2 - Anthony Kozar, Bernard Bel, Srikumar Karaikudi SubramanianDownloaded 38 time(s)
Tool Unpack PRAAT collection - Daniel HIRSTDownloaded 35 time(s)
Tool Phonedit SIGNAIX - Alain Ghio, Robert EspesserDownloaded 34 time(s)
Tool IPA-Sampa - Daniel HIRSTDownloaded 33 time(s)
Tool Anonymise sound files - Daniel HIRSTDownloaded 33 time(s)
Secondary data (resource) SldrWiki - Laboratoire parole et langage - UMR 7309 (LPL, Aix-en-Provence FR)Downloaded 31 time(s)

This site has been declared to Commission Nationale de l’Informatique et des Libertés (CNIL) under agreement Nr.1222972 on 26 March 2008. As per French Law, any person cited by name is granted access to, modification, correction and suppression of data relative to him/her (art. 34 of the « Informatique et Libertés » act of 6 January 1978). To exert your right, send a message to webmaster(at)sldr.org.

This site is optimized for FireFox or any browser with the 'tabs' option set.