1 minute read

College of Computing & Informatics

Duy Hoang

Advertisement

College of Computing & Informatics

Computer Science

Faculty Mentor: Dr. Jake Ryland Williams

Information Science

Developing an automated document editing system with collaborative editing data

Imagine if there were a tool that could not only proofread writing in terms of syntax, grammar, and style, but could also review content and update it with new information. This research aims to build an automated document editing system using a dataset constructed from Wikipedia’s collaborative article editing process. We’re constructing this dataset by tracking differences in versions of Wikipedia articles. For each page’s revision history, we compare every two consecutive revisions to identify the revised content in the new version and the deleted content in the original version.

The intended architecture for this system is based on a neural sequence-to-sequence model, consisting of two long short-term memory networks (LSTMs): one LSTM maps the input sentence to a fixed-size vector while the second maps this vector to the output sentence. We train the models on different Wikipedia content area categories. Following a performance evaluation, we will make available pre-trained and category-specific models with code for public use.

College of Computing & Informatics

Siddharth Srinivasan

College of Computing & Informatics

Software Engineering

Faculty Mentor: Dr. Jake Ryland Williams

Information Science

Identifying Violent Protest Activity Using Data Mining and Machine Learning

In this project, I am establishing a data collection stream and access system to employ a computational framework designed to understand collective action using machine learning. An original prototype system used a basic machine learning algorithm on a sample of a static database of more than 600 millions geo-tagged Tweets from around the world. However, the need for better performance and a changes to Twitter’s data (exact lat-lon tweet locations are now rarely provided) require more agile ML development and data access. Ultimately, project output will likely be of interest to social scientist researchers and government officials, in addition to individuals who wish to understand the nature and aggregation of political activity occurring a fine levels of time and location.

This article is from: