Get personal with a dataset of comments from May 2015
LicenseReddit API Terms
Tagsinternet, online communities, linguistics
Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we’ve decided to host a small portion of the comments here for Kagglers to explore. (You don’t even need to leave your browser!)
You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?
Who knows? Top visualizations may just end up on Reddit.
Data Description
The database has one table, May2015
, with the following fields:
- created_utc
- ups
- subreddit_id
- link_id
- name
- score_hidden
- authorflaircss_class
- authorflairtext
- subreddit
- id
- removal_reason
- gilded
- downs
- archived
- author
- score
- retrieved_on
- body
- distinguished
- edited
- controversiality
- parent_id