Counting Words in Social Science

-

Econometrics Seminar
University of Pennsylvania

3718 Locust Walk
103 McNeil

Philadelphia, PA
19104

United States

Social scientists are embracing the idea of using 'text as data' as a way to quantify, measure, and discover social concepts.  I’ll discuss a brief history of how this strategy has worked and evolved, and present the massive multinomial regression models that serve as a basis for text analysis.  Illustrated with a series of applications — tweets about politicians, reviews on yelp.com, congressional speech — we'll cover the how and why of this approach.  The "how" touches on distributed computing and regularized estimation techniques.  The "why" considers questions of prediction, treatment effects estimation, and inference about the content of text itself.  Despite being all based on the same model, we'll see that these goals each involve a different set of assumptions and challenges.

Download paper

Sign up

Matt TadyMore on Matt Taddy

 

Matt Taddy

University of Chicago, Booth School of Business