Site icon Zdataset

English Word Frequency

⅓ Million Most Frequent English Words on the Web

LicenseOther (specified in description)

Tagsinternetlinguisticslanguages

Context:

How frequently a word occurs in a language is an important piece of information for natural language processing and linguists. In natural language processing, very frequent words tend to be less informative than less frequent one and are often removed during preprocessing. Human language users are also sensitive to word frequency. How often a word is used affects language processing in humans. For example, very frequent words are read and understood more quickly and can be understood more easily in background noise.

Content:

This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived from the Google Web Trillion Word Corpus.

Acknowledgements:

Data files were derived from the Google Web Trillion Word Corpus (as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium) by Peter Norvig. You can find more information on these files and the code used to generate them here.

The code used to generate this dataset is distributed under the MIT License.

Inspiration:

Exit mobile version