
1 minute read
College of Computing & Informatics
Duy Hoang

Advertisement
College of Computing & Informatics
Computer Science
Faculty Mentor: Dr. Jake Ryland Williams
Information Science
Developing an automated document editing system with collaborative editing data
Imagine if there were a tool that could not only proofread writing in terms of syntax, grammar, and style, but could also review content and update it with new information. This research aims to build an automated document editing system using a dataset constructed from Wikipedia’s collaborative article editing process. We’re constructing this dataset by tracking differences in versions of Wikipedia articles. For each page’s revision history, we compare every two consecutive revisions to identify the revised content in the new version and the deleted content in the original version.
The intended architecture for this system is based on a neural sequence-to-sequence model, consisting of two long short-term memory networks (LSTMs): one LSTM maps the input sentence to a fixed-size vector while the second maps this vector to the output sentence. We train the models on different Wikipedia content area categories. Following a performance evaluation, we will make available pre-trained and category-specific models with code for public use.
College of Computing & Informatics
Siddharth Srinivasan
College of Computing & Informatics
Software Engineering
Faculty Mentor: Dr. Jake Ryland Williams
Information Science
Identifying Violent Protest Activity Using Data Mining and Machine Learning
In this project, I am establishing a data collection stream and access system to employ a computational framework designed to understand collective action using machine learning. An original prototype system used a basic machine learning algorithm on a sample of a static database of more than 600 millions geo-tagged Tweets from around the world. However, the need for better performance and a changes to Twitter’s data (exact lat-lon tweet locations are now rarely provided) require more agile ML development and data access. Ultimately, project output will likely be of interest to social scientist researchers and government officials, in addition to individuals who wish to understand the nature and aggregation of political activity occurring a fine levels of time and location.