May 2015 Reddit Comments

user1

3 years ago

Get personal with a dataset of comments from May 2015

Tagsinternet, online communities, linguistics

Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we’ve decided to host a small portion of the comments here for Kagglers to explore. (You don’t even need to leave your browser!)

You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?

Who knows? Top visualizations may just end up on Reddit.

Data Description

The database has one table, May2015, with the following fields:

created_utc
ups
subreddit_id
link_id
name
score_hidden
authorflaircss_class
authorflairtext
subreddit
id
removal_reason
gilded
downs
archived
author
score
retrieved_on
body
distinguished
edited
controversiality
parent_id