Transcriptions of 400,000 handwritten names
LicenseCC0: Public Domain
Tagsmusic, image data, text data, nlp, deep learningand 1 more
Overview
This dataset consists of more than four hundred thousand handwritten names collected through charity projects.
Character Recognition utilizes image processing technologies to convert characters on scanned documents into digital forms. It typically performs well in machine-printed fonts. However, it still poses difficult challenges for machines to recognize handwritten characters, because of the huge variation in individual writing styles.
There are 206,799 first names and 207,024 surnames in total. The data was divided into a training set (331,059), testing set (41,382), and validation set (41,382) respectively.
Content
The input data here are hundreds of thousands of images of handwritten names. In the Data, you’ll find the transcribed images broken up into test, training, and validation sets.
Image Lable follow the following naming format enabling you to extend the data set with your own data.
Image | URL | ||||
---|---|---|---|---|---|
D2M | 15 | 0010079F | 0002 | 1 | first name.jpg |
D2M | 15 | 0010079F | 0002 | 1 | surname.jpg |
D2M | 15 | 0010079F | 0003 | 2 | surname.jpg |
D2M | 15 | 0010079F | 0004 | 3 | first name.jpg |
D2M | 15 | 0010079F | 0004 | 3 | surname.jpg |
D2M | 15 | 0010079F | 0005 | 4 | first name.jpg |
D2M | 15 | 0010079F | 0006 | 5 | first name.jpg |
D2M | 15 | 0010079F | 0006 | 5 | surname.jpg |
D2M | 15 | 0010079F | 0007 | 6 | first name.jpg |
Inspiration
The Inspiration of this is to explore the task of classifying handwritten text and to convert handwritten text into the digital format using various approaches out there