May 2015 Reddit Comments

  • by user1
  • 28 February, 2022

Get personal with a dataset of comments from May 2015

LicenseReddit API Terms

Tagsinternetonline communitieslinguistics

Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we’ve decided to host a small portion of the comments here for Kagglers to explore. (You don’t even need to leave your browser!)

You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?

Who knows? Top visualizations may just end up on Reddit.

Data Description

The database has one table, May2015, with the following fields:

  • created_utc
  • ups
  • subreddit_id
  • link_id
  • name
  • score_hidden
  • authorflaircss_class
  • authorflairtext
  • subreddit
  • id
  • removal_reason
  • gilded
  • downs
  • archived
  • author
  • score
  • retrieved_on
  • body
  • distinguished
  • edited
  • controversiality
  • parent_id

Size: 20926840 KB Price: Free Author: Kaggle Data source: kaggle.com