SLDR


Visible documents : 190
Members : 304 (42 countries)
Spoken languages : 159
 

Jhove
[Valid RSS]   [Valid Atom 1.0]

Speech & Language Data Repository

Speech & Language Data Repository (SLDR)   http://crdo.up.univ-aix.fr

TGE-Adonis CLARIN OAI
Open archives (OAI-PMH)


-   [Sign up]   /   [Login]   - 
--- --- --- --- --- --- --- --- --- --- --- --- 
/ 中文 /  English / español / français / 
Sponsored by :
• National Science Foundation (BCS-98009, KDI, SBE)
• TalkBank project
The Open ANC (OANC)
Department of Computer Science, Vassar College (New York US)
http://crdo.up.univ-aix.fr/sldr000770/en
oai:sldr.org:sldr000770 (olac - oai_dc - VLO - language-archives)
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183691
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183706
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183705
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183707
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183709
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183708
ARK: http://crdo.up.univ-aix.fr/ark:/87895/1.4-183710
http://crdo.up.univ-aix.fr/wiki/crdo000770

Reference : [hide]
The Open ANC (OANC) (Nancy IDE, Randi REPPEN, Keith SUDERMAN). Primary data (corpus). Department of Computer Science, Vassar College (New York US). Created 2011-05-29. Speech & Language Data Repository. Reference oai:sldr.org:sldr000770 - Archived ark:/87895/1.4-183691
The Open ANC (OANC) (Nancy IDE, Randi REPPEN, Keith SUDERMAN). Données primaires (corpus). Department of Computer Science, Vassar College (New York US). Création 2011-05-29. Speech & Language Data Repository. Référence oai:sldr.org:sldr000770 - Archived ark:/87895/1.4-183691
The Open ANC (OANC) (Nancy IDE, Randi REPPEN, Keith SUDERMAN). Datos primarios (corpus). Department of Computer Science, Vassar College (New York US). Creación 2011-05-29. Speech & Language Data Repository. Referencia oai:sldr.org:sldr000770 - Archived ark:/87895/1.4-183691
The Open ANC (OANC) (Nancy IDE, Randi REPPEN, Keith SUDERMAN). 语音库. Department of Computer Science, Vassar College (New York US). 创建 2011-05-29. 语音和语言数据资源库. 参考 oai:sldr.org:sldr000770 - Archived ark:/87895/1.4-183691

[back]

This material is Open Data
 
Type of item Primary data (corpus)
Identifier sldr000770 (version 1/1)
Status long-term preservation
Paid-basis distribution (LDC)
ldcThe ANC has so far released 22 million words of American English, which is available from the Linguistic Data Consortium.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T20
Table of contents
(More)
The following corpora are included:

Spoken
- Charlotte
- Switchboard

Written
- Eggan (fiction)
- Slate
- Verbatim
- ICIC
- OUP
- 911 Report
- Biomed
- Govenment
- PLOS
- Berlitz

The following annotations are also included:
- Structural markup (divisions, paragraphs) etc. down to the paragraph level.
- Sentence boundaries.
- Tokens with Hepple (Penn) part of speech annotations.
- Noun chunks
- Verb chunks
Preview
picto
DescriptionThe American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. The ANC will provide the most comprehensive picture of American English ever created, and will serve as a resource for education, linguistic and lexicographic research, and technology development. This open portion of the American National Corpus (OANC) contains approximately 15 millions words from the full corpus.
Language of the corpusEnglish -> English, American (American English)
OLAC discourse typenarrative
OLAC linguistic data typeprimary_text
Linguistic domain(s)text_and_corpus_linguistics
discourse_analysis
language_documentation
SLDR contact
  Nancy IDE
ContributorsNancy IDE
Randi REPPEN
Keith SUDERMAN
Link to the wiki pagehttp://crdo.up.univ-aix.fr/wiki/crdo000770
KeywordsInformation Retrieval, Parsing, Sense Disambiguation, Discourse Modeling, Language Teaching, Text Databases, Human Machine Communication
Version historyVersion 1 of this archive was published on 2010-10-01
Specific extensions of text filesanc, xml
Users' communitycrdo.up.univ-aix.fr/sldr000770/com
Relations
(see documentation)
isPartOf http://www.AmericanNationalCorpus.org
hasVersion http://www.anc.org/OANC/OANC_GrAF.zip
hasVersion http://www.anc.org/OANC/OANC_GrAF.tgz
hasFormat http://www.cs.vassar.edu/~ide/papers/LAF.pdf
isReferencedBy http://www.cs.vassar.edu/~ide/pubs.html
Approximate file(s) size (Mb) 3260
Data coverage (spatial) (2-char country code)US
Roles
(voir documentation)
sponsor: National Science Foundation (BCS-98009, KDI, SBE)
sponsor: TalkBank project
Derogation to the principle of open access to public archives (see documentation)AR042 (25 years) - Documents developed under a contract for the provision of services performed on behalf of one or more specific persons. (Code du Patrimoine, art. L. 213-2, I, 1, b)
Deadline for next update of access rights2036-05-29
Compliance with current policy on public archives100%
CopyrightIde, Nancy, and Suderman, Keith (2007). The Open American National Corpus (OANC). http://www.AmericanNationalCorpus.org/OANC
OLACDisplay code
SIP (DocDC + DocMeta)Display code
First deposit on2011-05-29
This item was last modified on2011-05-31
Archival resource key(s) for current version7 segment(s) in current version
ark:/87895/1.4-183705
ark:/87895/1.4-183706
ark:/87895/1.4-183707
ark:/87895/1.4-183708
ark:/87895/1.4-183709
ark:/87895/1.4-183710
ark:/87895/1.4-183691
Archived with CINES onWed, 01 Jun 2011 11:07:00 GMT
Version #1

This site has been declared to Commission Nationale de l’Informatique et des Libertés (CNIL) under agreement Nr.1222972 on 26 March 2008. As per French Law, any person cited by name is granted access to, modification, correction and suppression of data relative to him/her (art. 34 of the « Informatique et Libertés » act of 6 January 1978). To exert your right, send a message to webmaster(at)sldr.org.


 Bookmark and Share
[back]