Linguistics Links

Language(s), linguistic analysis

Ethnologue (SIL International)

Glottolog (Max Planck Institute for Evolutionary Anthropology)

WALS Online (World Atlas of Language Structures)

Atlas of Pidgin and Creole Language Structures (APiCS)                                                                                                  

The Rosetta Project (Stanford University Libraries)


MYLANGUAGES (exercises and information on 90+ languages)

The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses 

Glossary of linguistic terms (SIL International)


Linguistic Society of America

Doulos SIL font (Unicode font that includes IPA symbols, available for Mac, Windows, and Linux)

Unified style sheet for linguistics

LSA style sheet

Language Resources Area (LINGUIST List)

Endangered Languages Database (World Oral Literature Project)

The World Phonotactics Database (Australian National University, Canberra)

Discourse studies, functional linguistics

Discourse & Society and Resources for discourse studies (Teun van Dijk)

What is meant by discourse analysis? (Stef Slembrouck)

Transcription in Action: Resources for the Representation of Linguistic Interaction (John Du Bois, Mary Bucholtz, UCSB)

Reporter's Recording Guide: A State-by-State Guide to Taping Phone Calls and In-Person Conversations in the 50 States and D.C.

Interactional Linguistics Bibliography (Sandra Thompson)

Scott DeLancey's Functional Syntax lectures (LSA Summer Institute 2001)

Word Frequency Lists (free list contains lemma and part of speech for top 5,000 words in American English; Mark Davies & Dee Gardner, BYU)

Sociolinguistics (English dialects, gender and language, language policy) (software for creating online surveys, basic subscription is free)

IDEA International Dialects of English Archive (free, online archive of primary source dialect and accent recordings; Paul Meier)

The Speech Accent Archives (George Mason University)

Varieties of English (University of Arizona, Language Samples Project)

Dictionary of American Regional English

Language policy website & emporium (James Crawford; language policy issues in the U.S.)

Studies on LGBTQ Language: A Partial Bibliography (Gregory Ward)

Some publicly available corpora and electronic texts - English

TalkBank (various: conversation [Santa Barbara Corpus of Spoken American English, SwitchBoard], child language [CHILDES], meetings, tutorials; read Ground Rules for database sharing)

Corpus of Contemporary American English (COCA) (400+ million words, 1990-2010, spoken and written English; Mark Davies, BYU)

Corpus of Historical American English (COHA) (400+ million words, 1810-2009; Mark Davies, BYU)

Google Books: American English (155 billion words in over 1.3 million books from the 1810s-2000s; Mark Davies, BYU) 

Santa Barbara Corpus of Spoken American English (use restricted to ODU faculty, staff, and students). Audio files can be downloaded here

International Corpus of English (ICE) (spoken and written data from worldwide varieties of English; most corpora are free to download after filling out a license agreement)

British National Corpus

    Mark Davies search interface for the British National Corpus

Michigan Corpus Linguistics:

   MICASE (Michigan Corpus of Academic Spoken English)

   MICUSP (Michigan Corpus of Upper-Level Student Papers)

VOICE 1.0 Online: The Vienna-Oxford International Corpus of English (English as a lingua franca; Barbara Seidlhofer et al.; free registration)

CallFriend American English Speech Corpus Southern Dialect (use restricted to ODU faculty, staff, and students)

Linguistic Data Consortium (if not a member, establishing a guest account permits access to several online corpora: Brown Corpus, Switchboard)

CHAINS: Characterizing Individual Speakers (speaker identification corpus; University College Dublin)

Travel agent dialogue (transcripts; SRI International)

Dialogue Diversity Corpus (various; William Mann)

Business Letter Corpus

American Rhetoric/Online Speech Bank

Oral Histories in the Perry Library (transcripts and audio files of interviews with former and present administrators, faculty, and staff of Old Dominion University)

Presidential tapes

White House Tapes Collections (Nixon Presidential Library & Museum; audio files; includes Watergate-related tapes)

American Slave Narratives (sample narratives, sample audio clips)

American Life Histories, Manuscripts from the Federal Writers' Project, 1936-1940 (Manuscript Division, Library of Congress)

Oral Histories of the American South (Southern Oral History Program; transcripts and audio)

Louie B. Nunn Center for Oral History corpus (University of Kentucky library; transcripts and audio)

StoryCorps (oral history collection; audio files for hundreds of narratives available on website)

Immigrant Archive Project (video narratives)

Oral histories from witnesses to 9/11 WTC attacks

The Oxford Text Archive ("holds several thousand electronic texts and linguistic corpora, in a variety of languages") (various: fiction, nonfiction, presidential inaugural addresses, Sapir's Language)

Scottish Corpus of Texts & Speech (University of Glasgow)

Some publicly available corpora - Multilingual, non-English languages

Backbone (European Commission; video, audio, transcripts of interviews in English, French, German, Polish, Spanish, Turkish and English as a lingua franca)

An Crúbadan (downloadable text corpora for a large number of under-resourced languages; Kevin Scannell, Saint Louis University) 

National Center for Sign Language and Gesture Resources (American Sign Language, video and transcripts of sentences; stories in preparation, Boston University)

Linguistic Data Consortium  (if not a member, establishing a guest account permits access to several online corpora: Arabic and Chinese news texts)

SEALang Library (Thai, Burmese, Khmer, Lao, Shan, Karen, Mon and Vietnamese corpora)

Corpus del Espanol (Spanish, Mark Davies)

Real Academia Espanola (Spanish): El Corpus de Referencia del Español Actual (CREA)El Corpus Diacrónico del Español (CORDE) 

Corpus do Portugues (Portuguese, Mark Davies, Michael Ferreira)

Português Falado - Variedades Geográficas e Sociais (1995-1997 - Programa LÍNGUA/SÓCRATES, Comissão Europeia DGXXII)

CORIS/CODIS (a corpus of written Italian, R. Rossini Favretti)

Czech National Corpus (Institute of the Czech National Corpus, Charles University)

The Lancaster Corpus of Mandarin Chinese (Lancaster University)

Language Engineering Resources for the Indigenous Minority Languages of the British Isles and Ireland (Scottish Gaelic, Welsh, Lancaster University)

Kotonoha: Corpus of Modern Japanese. National Institute for Japanese Language and Linguistics. Tokyo: NINJAL

Hebrew Corpus of Arutz 7 Newswires (Israel National News)

Korean National Corpus (21st Century Sejong Project)

IPI PAN Corpus-Polish (Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences)

KorpusDK (written Danish) 

FidaPLUS Korpus Slovenskegajezika Jezika (Slovenian)

Global Recordings Network (website contains audio files of Bible stories and readings in many understudied languages)

Lists of corpora and electronic texts

Bookmarks for corpus-based linguists (collection of links relevant to corpus linguistics including lists of free and fee-based corpora, David Lee)

LINGUIST LIST list of corpora

Free Concordancers

Casualconc (concordancer for Mac; Yasu Imao)

AntConc (concordancer for Windows, Mac, and Linux; Laurence Anthony)

Simple Concordance Program (concordance and word listing program, reads texts in many languages, available for Windows and Mac; Alan Reed)

Web Concordancer (various: Brown, LOB, student writing, Starr Report; Virtual Language Centre)