A Computational Approach to Analyzing
College Facebook Confessions Written by Eleanor Lin Illustrated by Aeja Rosette Whether soliciting advice in a newspaper column or posting on social media, people often feel more comfortable discussing sensitive topics under the cloak of anonymity [1]. For college students in particular, Facebook confessions pages are a common means of discussing controversial topics. But is there a way to move from qualitative observations of confessions boards to a more quantitative understanding of what topics students are posting about, how their audiences are responding, and how these topics are influenced by their campus environments and global events? These are the questions that Soubhik Barari, a computational social science researcher, set out to answer in his 2018 article “Analyzing Latent Topics in Student Confessions Communities on Facebook” [2, 3]. Drawing together a dataset of 170,000 confessions posts from American universities’ Facebook confessions pages, statistics on those universities’ campus environments, and current events from Twitter, Barari reaches some interesting conclusions on this distinctive social media ecosystem. First, in order to determine which topics occurred in the dataset and to label the main topic of each post, Barari used an approach called humanin-the-loop topic modeling. Using a probabilistic method called latent Di-
20
richlet allocation, combined with manual selection, he grouped together frequently co-occurring keywords from posts into sets that represent different topics [4]. Each post was then labeled as belonging to the single topic for which it contained the most words in the associated set. Posts were also labeled according to whether they referred to global or campus events. Finally, each post was analyzed to count how many words reflecting a given cognitive state (e.g., “anger” or “anxiety”) were contained in it. Some results, such as the higher proportion of anxiety-related words in the pages of top-ranked universities, may not surprise readers familiar with Ivy League stress culture [5]. Other findings are more striking. For instance, pages that mentioned romantic and sexual topics more frequently mentioned mental and physical health less frequently, and vice versa. Posts at colleges with a higher average tuition had higher odds of mentioning race and ethnicity, but not of mentioning socioeconomics. Furthermore, less racially or ethnically diverse colleges did not have significantly fewer posts related to race and ethnicity. However, based only on the proportions of different topics occurring in a college’s confessions board, a machine learning algorithm was able to predict with 84% accuracy whether a school was very white (defined as having a student body consisting of