Counting Words in Social Science

Monday, March 24, 2014 - 4:30pm - Monday, March 24, 2014 - 6:00pm

Econometrics Seminar

University of Pennsylvania

3718 Locust Walk
103 McNeil

Philadelphia, PA
19104

United States

Social scientists are embracing the idea of using 'text as data' as a way to quantify, measure, and discover social concepts. I’ll discuss a brief history of how this strategy has worked and evolved, and present the massive multinomial regression models that serve as a basis for text analysis. Illustrated with a series of applications — tweets about politicians, reviews on yelp.com, congressional speech — we'll cover the how and why of this approach. The "how" touches on distributed computing and regularized estimation techniques. The "why" considers questions of prediction, treatment effects estimation, and inference about the content of text itself. Despite being all based on the same model, we'll see that these goals each involve a different set of assumptions and challenges.

Download paper

Matt Tady More on Matt Taddy

Matt Taddy

University of Chicago, Booth School of Business