| Type of item |
Primary data (corpus) |
| Identifier |
sldr000770 (version 1/1) |
| Status |
long-term preservation |
| Paid-basis distribution (LDC) | |
|---|
Table of contents (More) | The following corpora are included:
Spoken - Charlotte - Switchboard
Written - Eggan (fiction) - Slate - Verbatim - ICIC - OUP - 911 Report - Biomed - Govenment - PLOS - Berlitz
The following annotations are also included: - Structural markup (divisions, paragraphs) etc. down to the paragraph level. - Sentence boundaries. - Tokens with Hepple (Penn) part of speech annotations. - Noun chunks - Verb chunks |
|---|
| Preview | |
|---|
| Description | The American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. The ANC will provide the most comprehensive picture of American English ever created, and will serve as a resource for education, linguistic and lexicographic research, and technology development. This open portion of the American National Corpus (OANC) contains approximately 15 millions words from the full corpus. |
|---|
| Language of the corpus | English -> English, American (American English) |
|---|
| OLAC discourse type | narrative |
|---|
| OLAC linguistic data type | primary_text |
|---|
| Linguistic domain(s) | text_and_corpus_linguistics discourse_analysis language_documentation |
| SLDR contact |
|
| Contributors | Nancy IDE Randi REPPEN Keith SUDERMAN |
|---|
| Link to the wiki page | http://crdo.up.univ-aix.fr/wiki/crdo000770 |
|---|
| Keywords | Information Retrieval, Parsing, Sense Disambiguation, Discourse Modeling, Language Teaching, Text Databases, Human Machine Communication |
|---|
| Version history | Version 1 of this archive was published on 2010-10-01 |
|---|
| Specific extensions of text files | anc, xml |
|---|
| Users' community | crdo.up.univ-aix.fr/sldr000770/com |
|---|
Relations (see documentation) | isPartOf http://www.AmericanNationalCorpus.org hasVersion http://www.anc.org/OANC/OANC_GrAF.zip hasVersion http://www.anc.org/OANC/OANC_GrAF.tgz hasFormat http://www.cs.vassar.edu/~ide/papers/LAF.pdf isReferencedBy http://www.cs.vassar.edu/~ide/pubs.html
|
|---|
| Approximate file(s) size (Mb) |
3260 |
| Data coverage (spatial) (2-char country code) | US |
|---|
Roles (voir documentation) | sponsor: National Science Foundation (BCS-98009, KDI, SBE) sponsor: TalkBank project
|
|---|
| Derogation to the principle of open access to public archives (see documentation) | AR042 (25 years) - Documents developed under a contract for the provision of services performed on behalf of one or more specific persons. (Code du Patrimoine, art. L. 213-2, I, 1, b) |
|---|
| Deadline for next update of access rights | 2036-05-29 |
|---|
| Compliance with current policy on public archives | 100% |
|---|
| Copyright | Ide, Nancy, and Suderman, Keith (2007). The Open American National Corpus (OANC). http://www.AmericanNationalCorpus.org/OANC |
|---|
| OLAC | Display code |
|---|
| SIP (DocDC + DocMeta) | Display code |
|---|
| First deposit on | 2011-05-29 |
|---|
| This item was last modified on | 2011-05-31
|
|---|
| Archival resource key(s) for current version | 7 segment(s) in current version ark:/87895/1.4-183705 ark:/87895/1.4-183706 ark:/87895/1.4-183707 ark:/87895/1.4-183708 ark:/87895/1.4-183709 ark:/87895/1.4-183710 ark:/87895/1.4-183691
|
|---|
| Archived with CINES on | Wed, 01 Jun 2011 11:07:00 GMT Version #1 |