Counting Words in Social Science
Social scientists are embracing the idea of using 'text as data' as a way to quantify, measure, and discover social concepts. I’ll discuss a brief history of how this strategy has worked and evolved, and present the massive multinomial regression models that serve as a basis for text analysis. Illustrated with a series of applications — tweets about politicians, reviews on yelp.com, congressional speech — we'll cover the how and why of this approach. The "how" touches on distributed computing and regularized estimation techniques. The "why" considers questions of prediction, treatment effects estimation, and inference about the content of text itself. Despite being all based on the same model, we'll see that these goals each involve a different set of assumptions and challenges.