Speech Accent Archive

by user1
03 March, 2022

Parallel English speech samples from 177 countries

Context:

Everyone who speaks a language, speaks it with an accent. A particular accent essentially reflects a person’s linguistic background. When people listen to someone speak with a different accent from their own, they notice the difference, and they may even make certain biased social judgments about the speaker.

The speech accent archive is established to uniformly exhibit a large set of speech accents from a variety of language backgrounds. Native and non-native speakers of English all read the same English paragraph and are carefully recorded. The archive is constructed as a teaching tool and as a research tool. It is meant to be used by linguists as well as other people who simply wish to listen to and compare the accents of different English speakers.

This dataset allows you to compare the demographic and linguistic backgrounds of the speakers in order to determine which variables are key predictors of each accent. The speech accent archive demonstrates that accents are systematic rather than merely mistaken speech.

All of the linguistic analyses of the accents are available for public scrutiny. We welcome comments on the accuracy of our transcriptions and analyses.

Content:

This dataset contains 2140 speech samples, each from a different talker reading the same reading passage. Talkers come from 177 countries and have 214 different native languages. Each talker is speaking in English.

This dataset contains the following files:

reading-passage.txt: the text all speakers read
speakers_all.csv: demographic information on every speaker
recording: a zipped folder containing .mp3 files with speech

Acknowledgements:

This dataset was collected by many individuals (full list here) under the supervision of Steven H. Weinberger. The most up-to-date version of the archive is hosted by George Mason University. If you use this dataset in your work, please include the following citation:

Weinberger, S. (2013). Speech accent archive. George Mason University.

This datasets is distributed under a CC BY-NC-SA 2.0 license.

Inspiration:

The following types of people may find this dataset interesting:

ESL teachers who instruct non-native speakers of English
Actors who need to learn an accent
Engineers who train speech recognition machines
Linguists who do research on foreign accent
Phoneticians who teach phonetic transcription
Speech pathologists
Anyone who finds foreign accent to be interesting

Size: 885791 KB Price: Free Author: Rachael Tatman Data source: kaggle.com