1.3 million labelled comments from Reddit LicenseData files © Original Authors Tagsarts and entertainment, internet, online communities, social science Context This dataset contains […]

Can you identify duplicate questions? LicenseOther (specified in description) Tagssoftware, linguistics, artificial intelligence, languages Context Quora’s first public dataset is related to the […]

A complete history of major league baseball stats from 1871 to 2015 LicenseCC BY-SA 3.0 Tagsbaseball, history Baffled why your team […]

⅓ Million Most Frequent English Words on the Web LicenseOther (specified in description) Tagsinternet, linguistics, languages Context: How frequently a word occurs […]

Questions from 2016-2020 classified in three categories based on their quality LicenseData files © Original Authors Tagstext data, nlp, text mining This […]

A collection of book ratings LicenseCC0: Public Domain Tagsarts and entertainment, online communities, literature Content Contains 278,858 users (anonymized but with demographic […]