This page is currently in development. Below there are a number of helpful resources, many of which come from Northeastern’s Lab resources and Miriam Posner’s crowdsourced “Corpra of Interest” document.
Getting Started
- Finding data sets for text analysis
- “Where to Start,” courtesy of Ted Underwood
- Stanford’s Introduction, from ‘‘Tooling Up for Digital Humanities’’
- Stanford’s “Natural Language Processing with Deep Learning”
- Julia Silge and David Robinson’s “Text Mining with R: A Tidy Approach” textbook
- Brendan O’Connor, et al., “Computational Text Analysis for Social Science”
- Duke University’s Introduction to Text Analysis libguide
Python
- Folgert Karsdorp, ‘‘Python Programming for the Humanities’’
- Charles Severance, ’’Python for Informatics”, an applied but comprehensive introductory Python text with sections on text parsing
- Download and install Python
- Download and install PyCharm, an Integrated Development Environment (IDE) for Python
- Download and install IPython, an interactive shell for Python
- The Hitchhiker’s Guide to Python
- Neal Caren’s tutorials on Python and text analysis
R (Programming Language)
- Matthew Jockers, ‘‘Text Analysis With R for Students of Literature’’ (PDF available for download via the NEU Library)
- Download and install R
- Download and install RStudio, an Integrated Development Environment (IDE) for R
- RSeek, a search tool for finding resources on R
- Simple data types in R
Topic Modeling
- JDH’s Special Issue on Topic Modeling (2012)
- Megan R. Brett’s “Basic Introduction” (conceptual)
- Ted Underwood, “Topic modeling made just simple enough”
- Scott Weingart’s “Guided Tour” (comprehensive, lots of links)
- Ben Schmidt’s article about Latent Dirichlet allocation’s (LDA’s) limitations (also from the JDH special issue)
- MALLET, an open-source and Java-based Latent Dirichlet allocation (LDA) package
- Shawn Graham, Scott Weingart, and Ian Milligan’s tutorial for setting up a command line environment for using MALLET
- Ben Schmidt’s R package wrapping MALLET
- GUI Tools that use MALLET
- Stanford Topic Modeling Toolbox (an alternative to MALLET)
Word Embedding Models
- Ben Schmidt’s Blog Post on Vector Space Models
- Links to his R package wrapping word2vec (word2vec is written in C)