Counting Words in Social Science


Econometrics Seminar
University of Pennsylvania

3718 Locust Walk
103 McNeil

Philadelphia, PA

United States

Social scientists are embracing the idea of using 'text as data' as a way to quantify, measure, and discover social concepts.  I’ll discuss a brief history of how this strategy has worked and evolved, and present the massive multinomial regression models that serve as a basis for text analysis.  Illustrated with a series of applications — tweets about politicians, reviews on, congressional speech — we'll cover the how and why of this approach.  The "how" touches on distributed computing and regularized estimation techniques.  The "why" considers questions of prediction, treatment effects estimation, and inference about the content of text itself.  Despite being all based on the same model, we'll see that these goals each involve a different set of assumptions and challenges.

Download paper

Sign up

Matt TadyMore on Matt Taddy


Matt Taddy

University of Chicago, Booth School of Business