Computer Science - Computing & Information Technology

Page 1

Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared Saia, Computer Science, U New Mexico

Problem Statement and Motivation Recent breakthroughs in data collection technology, such as GPS and other mobile sensors, are giving biologists access to data about social interactions of wild populations on a scale never seen before. Such data offer the promise of answering some of the big questions in population biology. Unfortunately, in this domain, our ability to analyze data lags substantially behind our ability to collect it. Particularly, current methods for analysis of social interactions are mostly static. Our goal is to design a computational framework for analysis of dynamic social networks and validate it by applying to equid populations (zebras, horses, onagers).

Technical Approach •

Collect explicitly dynamic social data: sensor collars on animals, synthetic population simulations, cellphone and email communications, …

Represent a time series of observation snapshots as a series of networks. Use machine learning, data mining, and algorithm design techniques to identify critical individuals, communities, and patterns in dynamic networks.

Validate theoretical predictions derived from the abstract graph representation by simulations on collected data and controlled and quazi-experiments on real populations

Key Achievements and Future Goals Done: • Formal computational framework for analysis of dynamic social networks • Scalable methods for • dentifying dynamic communities • identifying periodic patterns • predicting part of network structure • identifying individuals critical for initiating and blocking spreading processes Future: • Validate methods on biological data • Extend methods from networks of unique individuals to classes of individuals


Tanya Berger-Wolf and Bhaskar DasGupta, Computer Science, UIC; Mary Ashley, Biology, UIC; Wanpracha Chaovalitwongse, Industrial Engineering, Rutgers

Problem Statement and Motivation

Microsatellites Genotypes Alleles #1

5’

CACACACA

#2

CACACACACACA

#3

CACACACACACACA

1/1 2/2 3/3 1/2 1/3 2/3

Falcons and other birds of prey are extremely secretive about their lives. Sharks are hard to catch in the open ocean. Cowbirds leave eggs in other birds’ nests and let them raise the cowbird chicks. One of the things common to all these species is that it is difficult to study their mating system. It is even difficult to identify which animals are siblings. Yet, this simple fact is necessary for conservation, animal management, and understanding of evolutionary mechanisms.

New technologies for collecting genotypic data from natural populations open the possibilities of investigating many fundamental biological phenomena. Yet full utilization of the genotypic data is only possible if statistical and computational approaches keep pace with our ability to sample organisms and obtain their genotypes.

Our goal is to develop robust computational methods for reconstructing kinship relationships from microsatellite data.

CACACACA

Reconstruct

Young Lemon sharks (Negaprion brevirostris) during sampling in Bimini, Bahamas Cowbird (Molothrus ater) nestling with a song sparrow nestmate

Key Achievements and Future Goals

Technical Approach •

Use Mendelian constraints to form potential feasible family groups

Use the combinatorial optimization of the covering problem with various parsimony objectives to find the best sets of family groups containing all individuals. Typically there is more than one optimal or near optimal solution.

Use consensus techniques to combine solutions that are optimal, coming from different methods, or resulting from perturbations allowing for errors in data into one robust error-tolerant solution.

All resulting optimization problems are NP-hard and provably hard to approximate. We use commercial optimization package CPLEX to find optimal solutions.

http://kinalyzer.cs.uic.edu •

The following methods are or becoming available as a web-based service: • Reconstruction of sibling groups + error identification • Reconstruction of parental genotype • Reconstruction of half-sibling relationships

Future: • Incorporation of partial information • Multi generation pedigree reconstruction • Non-diploid species


Investigator: Bhaskar DasGupta, Computer Science Prime Grant Support: NSF

Problem Statement and Motivation We investigate fundamental graph-theoretic problems with significant applications in analyzing biological, social and financial networks. Some example categories of such problems include: • graph partitioning and community detection in social networks, • graph sparsification to address degeneracy and redundancy issues in biological networks, and • stability of financial network models.

Technical Approach

Key Achievements and Future Goals

• We formulate precise computational problems, study their properties, use novel algorithmic tools to design efficient algorithms, and implement the resulting algorithms to test their accuracy and efficiency.

Some examples of key achievements include:

• A primary focus of our technical approach is to involve combinatorial algorithmic techniques.

• analyzing stabilities of shock propagation models in financial networks, and

• analysis of computational complexities of Newman's modularity maximization approach for biological and social networks,

• development of methodologies for synthesis, inference and simplification of biological signal transduction networks. For further information, see www.cs.uic.edu/~dasgupta


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support : NSF IIS-0905215, DBI-0960443 Chemical Compound

Graph Object H

H

Anti-cancer activity

Problem Statement and Motivation • Graph/network mining is an emerging technology, but has not yet been applied to drug discovery data.

N H C

C H

C

C

• Drug discovery is a time consuming and costly process.

H

C

label

C

H

• Graph mining has the potential to drastically reduce the cost and time needed by identifying the highly likely chemical compounds

C

O

N

H

H

Technical Approach

Key Achievements and Future Goals

• Adopt subgraphs based features to characterize graph objects, i.e. the chemical compounds

• Devised new subgraph-based feature construction techniques for chemical compounds

• Mine discriminative subgraph features that can distinguish the class labels

• Made good prediction on the effectiveness of chemical compounds to treat disease

• Introduce scoring functions to rank the features effectiveness

• Developed novel approaches to reduce the training examples needed which have to be obtained via costly experiments

• Explore anti-monotonic property to speed up the mining process


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support: DBI-0960443

Problem Statement and Motivation •

Early detection of brain diseases is critical for medical treatment - Alzheimers Disease, ADHD and HIV

Brain diseases generally result in anomalies in brain connectivity

Functional connectivity derived from fMRI image is noisy

Key Achievements and Future Goals

Technical Approach •

Explore novel graph mining techniques instead of traditional image classification approaches

Developed novel feature selection and classification algorithm for uncertain graphs

Represent fMRI brain images as uncertain graphs

Captured temporal dimension of fMRI

Identify the relationship between uncertain graph structures and labels

Demonstrated the effectiveness of the network/graph based approach to detect anomaly in brain connectivity based on patient records

Use graph classification to identify anomalous brain networks


Lenore D Zuck (contract with U of Pittsburgh– DARPA funded)

Problem Statement and Motivation • •

Key Achievements and Future Goals

Technical Approach • • •

Construction of robust, privacy preserving, fully verified, protocols that are fault tolerant, consume little power and memory, and are highly efficient and privacy preserving Use step-wise refinement to guarantee that implementation follows specification and preserves all properties of protocols Expand basic protocols to more sophisticated situations (that are anticipated in such a satellite cluster) and repeat the above steps UIC’s role is to provide for the formal framework to allow for the verification: • Automatic verification of systems with arbitrary nodes connected in arbitrary topologies • Development of methods of verification for fault tolerance, power, memory, and privacy properties

DARPA’s System F6 program aims at developing new space architecture where clusters of small, cheap, wirelessly connected satellites replace current satellite architecture The project will design, evaluate, develop, and fully verify asynchronous distributed system protocols to create a secure, robust, real-time, and reliable protocol suite capable of facilitating applicationlevel communication within DARPA’s F6 project.

• • • • •

The project started in May 2011 A protocol was developed for attaining secure aggregation of data in networks that are the topic of the project The protocol was formally modeled Its properties were formally specified. The properties were formally verified using the (real-time) modelchecker UPPAAL using small, realistic, network topologies In the near future we expect to expand the methodologies to apply to arbitrary topologies


Maxine Brown, UIC Computer Science; Thomas DeFanti and Tajana Rosing, University of California, San Diego; Joe Mambretti, Northwestern University Primary Grant Support: National Science Foundation

Problem Statement and Motivation •

The TransLight/StarLight team focuses on experimentation with nextgeneration network infrastructure technologies to better understand the emerging requirements of e-Science and other advanced applications that have yet to be supported in production environments with today's international research networks. The team expands upon and enhances innovative communication services in support of global science research and education as they relate to specific applications: GreenLight International, Science Cloud Communication Services, CineGrid, High-Performance Digital Media Network, the international Global Environment for Network Innovations (iGENI), and SAGE™ (Scalable Adaptive Graphics Environment).

GLIF, the Global Lambda Integrated Facility, is an international virtual organization supporting persistent data-intensive scientific research and middleware development on advanced optical networks. (GLIF map 2011 – www.glif.is)

Key Achievements and Future Goals

Technical Approach •

The goal is to continue to expand upon experimental networking technologies through the development of several international communication services and advanced applications, leveraging existing collaborations to make significant scientific impact. To accomplish this, we use regional, national and international optical networking infrastructure, as available, to: • Integrate applications, middleware and new technologies across geographically distributed sites. • Focus on end-to-end connections and services of leading-edge sites and facilities. • Focus on experimentation to meet the emerging requirements of e-Science and other advanced application domains.

TransLight/StarLight leadership has established international partnerships, communication channels, forums, and processes to ensure ongoing successful interactions among its constituents. The management team continually works with domain scientists to better understand application requirements and the need for customized services. All science is not well served by one protocol at one network layer. Through its aggressive use of networks to conduct end-to-end experiments, TransLight/StarLight will discover new methods and technologies that motivate services and capabilities to be customized for individual science disciplines. www.startap.net/translight


Maxine Brown, Andrew Johnson, Luc Renambot, UIC Computer Science; Jason Leigh, University of Hawaiʻi at Mānoa; Primary Grant Support: National Science Foundation Problem Statement and Motivation •

• The UIC Cyber-Commons 3D wall runs SAGE, which enables users to simultaneously display 3D as well as 2D windows

Key Achievements and Future Goals

Technical Approach •

SAGE is cross-platform, open-source middleware that provides a common operating environment, or framework, to access, stream and juxtapose 2D and 3D data objects – whether digital cinema animations, high-resolution images, video-teleconferencing, presentation slides, documents, spreadsheets or laptop screens – on one or more tiled display walls. SAGE’s network-centered architecture allows collaborators to simultaneously run various applications (such as 3D rendering, remote desktop, video streams and 2D maps) on local and/or remote computers and clusters, and share them by streaming the pixels of each application over ultra-high-speed networks to large tiled displays. Users manipulate content in real time using a keyboard, laptop, Gyromouse, joystick, trackball, 6 degree-of-freedom magnetic tracker, Nintendo Wiimote, touch screen, and/or MS Kinect.

SAGE is a trademark of the University of Illinois Board of Trustees.

SAGE and tiled display walls create global collaborative visualization environments that enable virtual teams of researchers to manage the scale and complexity of 2D and 3D data and work with one another. • Scientists can view ultra-resolution images and create “cybermashups,” or juxtapositions of information – a critical component of data analysis – to make informed observations and discoveries. • Technology-enhanced classrooms, such as UIC’s Cyber-Commons, teach students to collaborate within a university and among universities, and to solve problems within a discipline and among multiple disciplines. Current funding is helping transition SAGE from a research prototype to a hardened technology, to nurture the growth of the SAGE User Community, and to create new open services for visualization and collaboration utilizing shared cyberinfrastructure.

• •

SAGE is having a profound and transformative effect on data visualization, data exploration and collaboration, and is making cyberinfrastructure more accessible to end systems and to end users, both in the laboratory and in the classroom. Currently, users at over 90 sites worldwide rely on SAGE and tiled display walls to provide them with a globally integrated collaborative work environment to facilitate data analysis and high productivity, in such diverse fields as geoscience, homeland security, bioscience, cosmology, atmospheric science, chemistry, computer science, medicine, and cultural heritage. SAGE is used to support several classes and seminars taught in the UIC Computer Science, Art and Design, and Physics departments. www.sagecommons.org


Ugo Buy, Computer Science Primary Grant Support: NIST

Problem Statement and Motivation

GUI

Constraints

SFCs

Plant spec

Translator

Control programs are hard to write and maintain

Flexible manufacturing demands rapid reconfiguration

Possibility of deadlock, mutex violations, deadline violations

TPNs

Supervisor generator Refined TPNs

Code generator

Control code

Key Achievements and Future Goals

Technical Approach •

Avoid verification complexity with supervisory control

System for enforcing deadlines on transition firing in time Petri nets

Petri nets vs. finite state automata

Framework for compositional control

Synthesis of deadline-enforcing supervisors using net unfolding

Integration of methods for enforcing mutual exclusion and freedom from deadlock

Compositional methods (e.g., hierarchical control)

Generation of target code


Isabel F. Cruz, Ouri Wolfson (Computer Science) and Aris Ouksel (Information and Decision Sciences). In collaboration with Roberto Tamassia (Brown U.) and Peter Scheuermann (Northwestern U.) service layer

biological and chemical sensors

web services, on-line libraries, emergency info

CASSIS application layer

4

Context and Profile Manager

1

3

user layer

7 8

7 8

city maps, floor plans of buildings

police profile db

police station

hospital, clinic

Architecture of a new system, CASSIS, to provide comprehensive support for context-aware applications in the Health Domain as provided by the Alliance of Chicago

Testing on operational scenarios of public health management applications:

6

dynamic info e.g. operating at full capacity

database layer

• Application Server

5

2

environmental db (hospital states, sensor states, etc.)

Problem Statement and Motivation

on-line cameras with recording device

GIS data

fire house

firemen profile db

subway control center

aggregated user profiles

healthcare profile db

• • •

FBI profile db

police officer

dy n e.g amic . G in PS fo

fireman

doctor

Daily operations of health care providers Epidemic occurrences (e.g., meningitis) Crisis situations (e.g., terrorist attacks, natural disasters)

travelling businessman

Key Achievements and Future Goals

Technical Approach •

Peer-to-peer and mediated semantic data integration

Dynamic data as collected by sensor networks

Matching of user profiles to services

Competitive environment management

Security and privacy

Performance and scalability (e.g., caching and data aggregation)

• •

Peer to Peer Semantic Integration of XML and RDF Data Sources [Cruz, Xiao, Hsu, AP2PC 2004] Opportunistic Resource Exchange in Inter-Vehicle Ad-Hoc Networks (Best paper award) [Xu, Ouksel, Wolfson, MDM 2004, Best Paper Award] An Economic Model for Resource Exchange in Mobile Peer-to-Peer Networks [Wolfson, Xu, Sistla, SSDBM, 2004]. Multicast Authentication in Fully Adversarial Networks [Lysyanskaya, Tamassia, Triandopoulos, IEEE Security and Privacy, 2004] Personal Service Areas for Location-Based Wireless Web Applications [Pashtan, Heusser, Scheuermann, IEEE Internet Computing, 2004]


Isabel F. Cruz, Computer Science, in collaboration with Nancy Wiegand, U. Wisconsin-Madison Primary Grant Support: NSF

Problem Statement and Motivation •

Geospatial data are complex and highly heterogeneous, having been developed independently by various levels of government and the private sector

Portals created by the geospatial community disseminate data but lack the capability to support complex queries on heterogeneous data

Complex queries on heterogeneous data will support information discovery, decision, or emergency response

Key Achievements and Future Goals

Technical Approach •

Data integration using ontologies

Ontology representation

Algorithms for the alignment and merging of ontologies

Semantic operators and indexing for geospatial queries

User interfaces for • Ontology alignment • Display of geospatial data

Create a geospatial cyberinfrastructure for the web to • Automatically locate data • Match data semantically to other relevant data sources using automatic methods

Provide an environment for exploring, and querying heterogeneous data for emergency managers and government officials

Develop a robust and scalable framework that encompasses techniques and algorithms for integrating heterogeneous data sources using an ontology-based approach


Piotr Gmytrasiewicz, Computer Science Primary Grant Support: National Science Foundation

Problem Statement and Motivation observation Beliefs Environment State

Problem: Allow artificial agents to make optimal decisions while interacting with the world and possibly other agents •

Artificial agents: Robots, softbots, unmanned systems

Hard-coding control actions is impractical

Let’s design agents that can decide what to do

One approach: Decision theory, not applicable when other agents are present

Another approach: Game theory, not applicable when agent is action alone

Agent(s) actions

Key Achievements and Future Goals

Technical Approach •

Combine decision-theoretic framework with elements of game theory

Use decision-theoretic solution concept

Agent’s beliefs encompass other agents present

Solutions tell the agent what to do, given its beliefs

Computing solutions is hard (intractable), but approximate solutions possible

Solution algorithms are variations of known decision-theoretic exact and approximate solutions

Convergence results and other properties are analogous to decisiontheoretic ones

A single approach to controlling autonomous agents is applicable in single-and multi-agent settings

Unites decision-theoretic control with game theory

Gives rise to a family of exact and approximate control algorithms with anytime properties

Applications: Autonomous control, agents, human-machine interactions

Future work: Provide further formal properties; improve on approximation algorithms; develop a number of solutions to dynamic interactive decision-making settings


Andrew Johnson, Jason Leigh, Maxine Brown, Tom Peterka, Computer Science Primary Grant Support: National Science Foundation and Department of Energy

Problem Statement and Motivation •

• The NASA-funded ENDURANCE project uses CAVE2 and SAGE to further planetary science research. (UIC Electronic Visualization Lab, UIC Earth & Environmental Sciences Dept., Stone Aerospace, NASA Ames and Montana State University.

Key Achievements and Future Goals

Technical Approach •

CAVE2 is built with polarized stereo LCD displays with ultra-thin bezels. UIC partnered with U.S. company Planar Systems, Inc., to design and build the desired display screens. CAVE2 is programmable with a variety of application programming tools; notably: • UIC’s OmegaLib middleware enables the development of applications on scalable virtual-reality and hybrid systems, and can be integrated with third-party toolkits. It also supports Omicron, a library that handles input from a number of novel input devices – such as multi-touch, 3D hand/body gesturing, head tracking, and mobile and tablet devices. • UIC’s SAGE™ (Scalable Adaptive Graphics Environment) enables CAVE2’s wall to be partitioned into “windows” – enabling one or many 2D and 3D windows of information to simultaneously be displayed.

CAVE2 and SAGE are trademarks of University of Illinois’ Board of Trustees.

CAVE2, the next-generation virtual-reality environment, is a hybrid system that merges the benefits of both scalable-resolution display walls and virtual-reality systems to create a single unified environment. • Virtual environments immerse people in worlds too large, too small, too dangerous, too remote, or too complex to be viewed otherwise. • Tiled display walls create virtual “project rooms” in which people can display very-large images and/or simultaneously juxtapose more information, and better spatially organize, see and infer relationships among the data. The seamless 2D/3D CAVE2 environment supports information-rich analysis as well as 3D simulation exploration at a resolution matching human visual acuity.

• •

CAVE2 is the world’s first flat-panel-based, high-resolution CAVE (Cave Automatic Virtual Environment, which UIC built and successfully commercialized in the 1990s). It provides users with a 320-degree panoramic environment for displaying information at 37 Megapixels in 3D or 74 Megapixels in 2D with a horizontal visual acuity of 20/20 – almost 10 times the 3D resolution of the original CAVE. CAVE2 enables computer scientists to study a wide range of new problems at the intersection of human-computer interaction, virtual reality, computer graphics, high-performance computing, high-speed networking, and computer-supported cooperative work. CAVE2 transforms scientific workflows by providing researchers with new and more intuitive ways of interacting with their data. http://www.evl.uic.edu/cave2


Robert Kenyon, Steve Jones, Stellan Ohlsson, Andrew Johnson, Eulalia Abril, UIC Computer Science, Communications, and Psychology Depts; Jason Leigh, University of Hawaiʻi at Mānoa; Giselle Giselle Mosnaim, Rush University Medical Center – Primary Grant Support: UIC CCTS Fall 2011 Pilot Grant Program UIC students are designing a computer-enhanced asthma doser device based on commercial platforms such as Arduino to gather and wirelessly transmit data from the embedded sensors to a prototype Health Cloud data server. This Cloud anonymously stores the information and then presents personalized Persuasive Visualizations to individuals in the targeted user group.

Problem Statement and Motivation •

Key Achievements and Future Goals

Technical Approach • •

• •

UIC exploits emerging trends in computing technologies (sensors, cloud computing, mobile computing and visualization) to transform healthcare. UIC is prototyping a healthcare ecosystem that consists of handheld asthma devices, Health Cloud computing to monitor and capture data, and avatar-based Persuasive Visualization feedback delivered via social networking services to motivate recipients to adhere to daily medication schedules. The technologies being developed must be generalizable to other healthcare areas, and to individuals with or without health risks. Small clinical studies will quantitatively assess if these technologies improve this group’s asthma outcomes, and qualitatively evaluate the social and psychological benefits of applying Human Augmentics to asthma self-management.

The overarching goal of this project is to apply emerging trends in computing technologies to transform healthcare. Basic principles of computer modeling, health communications, and behavior change theory are translated into actionable strategies and practices to help individuals change health behaviors and improve health outcomes. The hypothesis is that if individuals are able to easily monitor their health status 24/7 and receive personally tailored, persuasive, and actionable feedback and suggestions at the right times, they can be continuously coached towards healthier living; e.g., reducing unhealthy practices such as sedentary living or cigarette use, or reminded to take daily medications. Specifically, this research aims to help reduce emergency room visits and hospitalizations for acute exacerbations of asthma in inner-city African American adolescents who fail to take their preventive medications.

• • •

The vision is to create the infrastructure for a “lifelong coach” that monitors an individual’s health status, makes predictions of their health future, and provides tailored, persuasive and actionable recommendations and encouragement to help them remain healthy long into their old age. This “coach” will dramatically affect their health and potentially reduce the costs of medical care and insurance. Can human behavior be impacted? Can healthcare be transformed from reactive and hospital-centered to preventive, proactive, evidence-based, person-centered wellbeing? Human Augmentics refers to the field of study that employs information technologies to amplify human capabilities. www.uic.edu/depts/mcam/CCTS/about/pilotgrantfundedfall-2011.shtml# www.augmentics.org


Ajay Kshemkalyani, Computer Science Primary Grant Support: none

Problem Statement and Motivation •

Advance theoretical foundations of • Distributed computing, and • Network design

Understand inherent limitations on • upper and lower bonds, and solvability

Subareas: sensor networks, peer-to-peer networks, mobile, ad-hoc, and wireless networks

Key Achievements and Future Goals

Technical Approach •

Design of distributed algorithms

Design of routing and multicast algorithms

Prove upper and lower bounds

Advance understanding of: • Causality and time; Temporal modalities

Experimental evaluation, where necessary

Synchronization and monitoring mechanisms

More info: see publications at http://www.cs.uic.edu/~ajayk/int/dsnl.html

Predicate detection algorithms for distributed systems

Web and internet performance


John Lillis, Computer Science Primary Grant Support: NSF, IBM

Problem Statement and Motivation A

B

A

B

Today, circuit performance determined by wiring more than logic

CR

Optimizations made by traditional logic synthesis tools correlate poorly with post-layout performance

Need for functionality preserving circuit perturbations at physical level

Candidate: Logic Replication

C C D

E

D

Inherently non-monotone paths

E All paths near-monotone after replication

Key Achievements and Future Goals

Technical Approach •

Extract timing-critical sub-circuit

Induce equivalent logic tree by replication

Optimally embed tree in context of current placement by Dynamic Programming

Embedding objective includes replication cost to prevent excessive replication

Mechanism applied iteratively

Very large reductions in clock period (up to 40%) observed in FPGA domain with minimal overhead [DAC 2004]

Adapts easily to graph-based architectures common in modern FPGAs. Many conventional placers ill-suited to this environment.

Generalizations deal with limitations resulting from reconvergence [IWLS2004]

Ongoing work includes: application to commercial FPGAs; simultaneous remapping of logic; study of lower-bounds on achievable clock period; integrated timing optimization based on Shannon factorization.


Bing Liu, Computer Science Primary Grant Support: National Science Foundation

Problem Statement and Motivation Positive training data

Unlabeled data

Learning algorithm

Given a set of positive examples P and a set of unlabeled examples U, we want to build a classifier.

The key feature of this problem is that we do not have labeled negative examples. This makes traditional classification learning algorithms not directly applicable.

The main motivation for studying this learning model is to solve many practical problems where it is needed. Labeling of negative examples can be very time consuming.

Classifier

Key Achievements and Future Goals

Technical Approach We have proposed three approaches. •

Two-step approach: The first step finds some reliable negative data from U. The second step uses an iterative algorithm based on naïve Bayesian classification and support vector machines (SVM) to build the final classifier.

Biased SVM: This method models the problem with a biased SVM formulation and solves it directly. A new evaluation method is also given, which allows us to tune biased SVM parameters.

Weighted logistic regression: The problem can be regarded as an oneside error problem and thus a weighted logistic regress method is proposed.

In (Liu et al. ICML-2002), it was shown theoretically that P and U provide sufficient information for learning, and the problem can be posed as a constrained optimization problem.

Some of our algorithms are reported in (Liu et al. ICML-2002; Liu et al. ICDM-2003; Lee and Liu ICML-2003; Li and Liu IJCAI-2003).

Our future work will focus on two aspects: • Deal with the problem when P is very small • Apply it to the bio-informatics domain. There are many problems there requiring this type of learning.


Tom Moher, Computer Science Primary Grant Support: National Science Foundation

Problem Statement and Motivation •

K-12 learners have insufficient opportunity to engage in “patient science” involving extended observation, manipulation of variables, and aggregation of evidence.

“Ubiquitous computing” often associated with personal computational devices; embedded phenomena explore the “other side” of ubiquitous computing: ambient media embedded in the physical environment.

Use of conventional classroom computers running standard browsers creates opportunities for widespread adoption on installed school technology base.

Key Achievements and Future Goals

Technical Approach •

Simulated phenomena are “mapped” onto the physical space of the classroom.

Four applications: RoomQuake (seismology), HelioRoom (astronomy), RoomBugs and WallCology (population ecologies).

The state of the simulation is represented through conventional computers located around the classroom serving as “portals” into that phenomenon.

“Phenomenon Server” allows teachers to configure and schedule phenomena for delivery to their classrooms.

Students conduct investigations of the phenomenon by monitoring and manipulating of the state of the simulation through those portals.

Field trials and investigation of student learning in over two dozen classrooms.

• •

The simulations are persistent, running concurrently with the regular instructional flow for periods of days and weeks.

Best paper, ACM Conference on Human Factors in Computing Systems (CHI 2006): “Embedded Phenomena: Supporting Science Learning with. Classroom-sized Distributed Simulations.”


Peter Nelson, CS; Xin Li, CS; Chi Zhou, Motorola Inc. Primary Grant Support: Physical Realization Research Center of Motorola Labs Genotype:

Phenotype :

Problem Statement and Motivation

sqrt.*.+.*.a.*.sqrt.a.b.c./.1.-.c.d •

Real world data mining tasks: large data set, high dimensional feature set, non-linear form of hidden knowledge; in need of effective algorithms.

Gene Expression Programming (GEP): a new evolutionary computation technique for the creation of computer programs; capable of producing solutions of any possible form.

Research goal: applying and enhancing GEP algorithm to fulfill complex data mining tasks.

Mathematical form:

1 (a  bc)  a cd

Figure 1. Representations of solutions in GEP

Key Achievements and Future Goals

Technical Approach •

Overview: improving the problem solving ability of the GEP algorithm by preserving and utilizing the self-emergence of structures during its evolutionary process.

Constant Creation Methods for GEP: local optimization of constant coefficients given the evolved solution structures to speed up the learning process.

A new hierarchical genotype representation: natural hierarchy in forming the solution and more protective genetic operation for functional components.

Dynamic substructure library: defining and reusing self-emergent substructures in the evolutionary process.

Have finished the initial implementation of the proposed approaches.

Preliminary testing has demonstrated the feasibility and effectiveness of the implemented methods: constant creation methods have achieved significant improvement in the fitness of the best solutions; dynamic substructure library helps identify meaningful building blocks to incrementally form the final solution following a faster fitness convergence curve.

Future work include investigation for parametric constants, exploration of higher level emergent structures, and comprehensive benchmark studies.


John Dillenburg, Pete Nelson, Ouri Wolfson, Computer Science Primary Grant Support: NSF, Chicago Area Transportation Study, Illinois Department of Transportation

Problem Statement and Motivation

Global Positioni ng System

US Highw ay Miles

Vehicles increase, roads do not

180

Travel Assitant

Transi t

Internet

Travel Assitant Ride Share Partners

170

Congestion costs U.S. economy over $100 billion/year Vehicle occupancy has dropped 7% in last two decades

Index 1980 = 100

160 150 140 130 120 110 100 1980

1985

Travel Assitant

Travelers

VMT (1980=100)

1990

1997

Year Central Travel Information Computer

Key Achievements and Future Goals

Technical Approach •

We envision a convenient mobile device capable of planning multi-modal (car, bus, train, ferry, taxi, etc.) travel itineraries for its user

Partnered with Regional Transportation Authority on multi-modal trip planner system project sponsored by FTA

The devices communicate with each other and with a central database of travel information via a peer-to-peer ad-hoc network

Prime developer of Gateway traveler information system sponsored by IDOT

Trips with other users could be shared via dynamic ride sharing

Fares and payment are negotiated electronically

Prime developer of Ride Match System 21 car and van pooling system sponsored by CATS

Traffic prediction is used to determine the best route

Realistic, full scale micro simulation of ITA system

Persistent location management is used to track device locations

Test bed deployment for Chicago metro area

Trajectory management is used to predict the future location of a device for planning purposes


A.Prasad Sistla, Computer Science Primary Grant Support: NSF

Problem Statement and Motivation Concurrent System Spec

Yes/No

Model Checker Correctness

Counter example

The project develops tools for debugging and verification hardware/software systems.

Errors in hardware/software analysis occur frequently

Can have enormous economic and social impact

Can cause serious security breaches

Such errors need to be detected and corrected

Spec

Key Achievements and Future Goals

Technical Approach •

Model Checking based approach

Developed SMC ( Symmetry Based Model Checker )

Correctness specified in a suitable logical frame work

Employed to find bugs in Fire Wire Protocol

Employs State Space Exploration

Also employed in analysis of security protocols

Different techniques for containing state space explosion are used

Need to extend to embedded systems and general software systems

Need to combine static analysis methods with model checking


Robert H. Sloan (Computer Science) and György Turán (Mathematics—MSCS) Primary Grant Support: National Science Foundation

Problem Statement and Motivation •

Key Achievements and Future Goals

Technical Approach • • • •

Key mathematical tools for most of the research are complexity theory and combinatorics. Undergraduate students helping explore current capabilities of the largest implemented knowledge base systems. Use of large network analysis to understand large knowledge bases as very large directed graphs. Developing new algorithms for knowledge revision.

All areas of Artificial Intelligence (AI) relay on large bodies of knowledge, large parts of which change over time. As opposed to a database containing facts that can be queried, a knowledge base contains general statements that can be used to derive further implications. Developing a knowledge base, in particular, a knowledge base containing commonsense knowledge that can be used for commonsense reasoning, is a fundamental task of AI. We study the key problems of reasoning, updating (revising), and learning for such knowledge bases, especially those in the computationally efficient Horn form.

• •

Analysis of the standard framework for knowledge revision for the important (because computationally efficient) case of Horn knowledge bases. Published papers including the first, in flurry of ~20 papers in this area in past few years. Analysis of the properties, especially inference, of random collections of Horn formulas, treating them a random hyper graphs. Empirical work measuring actual verbal IQ of a commonsense knowledge base (MIT’s Concept Net 4). Result: VIQ 100—Average Verbal IQ of 4 year 0 month child. See figure opposite corner • Gathered significant press attention


Robert H. Sloan, Computer Science (In collaboration with Richard Warner, Chicago–Kent College of Law) Primary grant support: National Science Foundation

Problem Statement and Motivation •

To develop technologically realistic and sophisticated privacy policies to bind private companies in the 21st century, very much including rules concerning threats to security from data loss.

Most interactions among people and companies are governed in part by long-standing social norms, in addition to formal rules, but in this area social norms are lacking. • One major goal is to contribute to the development of social norms concerning privacy that will simultaneously shape and inform both the development of appropriate technologies and appropriate business practices and laws.

Key Achievements and Future Goals

Technical Approach •

This project is inherently interdisciplinary. The interdisciplinary approach is surprisingly unusual: There are remarkably few interdisciplinary examinations of privacy that effectively combine legal and computer science expertise.

First compilation of all 50 states, state-by-state, of major privacy and data security statutes, and analysis of technical cost and efficacy

Proposal for new legal liability regime, intermediate between negligence and strict product liability, for producers of mass-market software software containing security vulnerabilities, and gametheoretic analysis of the effects of this on the market.

Analysis of the ability of major ISPs to reduce spread of malware and bots.

Short-term goal: Extend our analysis to cover at least US government action.

Traditional technical Computer Science analysis, legal analysis, economic analysis, and occasionally philosophy play a role.


V.N. Venkatakrishnan, Computer Science Primary Grant Support: NSF

Problem Statement and Motivation • •

Technical Approach • •

ESP-IGERT is an interdisciplinary PhD training program that includes faculty from the CS, ECE, Communication, IDS, and Public Health Departments at UIC. The program combines technological, human, enterprise, and legal expertise from the faculty members in those departments to develop interdisciplinary research tackling information privacy using multiple considerations. ESP-IGERT will support approximately 30 PhD students and engage them in six interdisciplinary classes, team-taught by faculty from different departments and two international research summer internships, as well as in multidisciplinary groups contributing to and enriching to each other’s perspectives.

Electronic security and information privacy are central issues in today’s digital age The ecosystem where private and sensitive information resides is composed of many IT subsystems belonging to individuals, organizations, and governments, who are driven by different, and often conflicting, motivations, policies, and practices. Thus, effective solutions for privacy protection must take into consideration all these aspects. They must be easily usable by end users, easily adoptable by businesses, not conflicting with their business goals, and in line with current legislation. To produce such solutions and derive general principles and best practices for individuals, businesses, and public policy makers, an interdisciplinary approach is needed.

Key Achievements and Future Goals • Future Goals  A set of broad scientific principles that constitutes a systemic, deeper understanding of fundamental issues in Electronic Security and Privacy  A set of usable methods, tools, and policies that can be employed by end users, technologists, and policy makers


Ouri Wolfson and Bo Xu, Computer Science Primary Grant Support: NSF

Problem Statement and Motivation resource-query D resource 8

A

D

resource-query C resource 6 resource 7

resource-query A resource 1 resource 2 resource 3

B

Currently, while on the move, people cannot efficiently search for local resources, particularly if the resources have a short life, e.g. an available parking slot, or an available workstation in a large convention hall.

Applications in matchmaking and resource discovery in many domains, including • social networks • transportation and emergency response • mobile electronic commerce.

C

resource-query B resource 4 resource 5

Key Achievements and Future Goals

Technical Approach •

Use Database and Publish/Subscribe technology to specify profiles of interest and resource information

Peer-to-Peer information exchange among mobile devices such as cell phones and pda’s, that form ad hoc network

• •

Exchange uses short-range, unlicensed wireless communication spectrum including 802.11 and Bluetooth.

Exchanged information is prioritized according to a spatial-temporal relevance function to reduce bandwidth consumption and cope with unreliable wireless connections.

Adaptive push/pull of resource information

• •

• •

Developed and analyzed search algorithms for different mobility environments and communication technologies. Designed a comprehensive simulation system that enables selection of a search algorithm. Built a prototype system Published 6 papers, received $250k in NSF support, delivered two keynote addresses on the subject. Submitted provisional patent application Future goals: design complete local search system, combine with cellular communication to central server, test technology in real environment, transfer to industry.


Clement Yu, Computer Science Primary Grant Support: NSF

Problem Statement and Motivation

Users Queries Metasearch Engine

Retrieve, on behalf of each user request, the most accurate and most up-to-date information from the Web.

The Web is estimated to contain 500 billion pages. Google indexed 8 billion pages. A search engine, based on crawling technology, cannot access the Deep Web and may not get most up-to-date information.

Results Queries Search Engine 1

………

Search Engine N

Key Achievements and Future Goals

Technical Approach •

A metasearch engine connects to numerous search engines and can retrieve any information which is retrievable by any of these search engines.

On receiving a user request, automatically selects just a few search engines that are most suitable to answer the query.

Connects to search engines automatically and maintains the connections automatically.

Extracts results returned from search engines automatically.

Merges results from multiple search engines automatically.

• • • • • • •

Optimal selection of search engines to answer accurately a user’s request. Automatic connection to search engines to reduce labor cost. Automatic extraction of query results to reduce labor cost. Has a prototype to retrieve news from 50 news search engines. Has received 2 regular NSF grants and 1 phase 1 NSF SBIR grant. Has just submitted a phase 2 NSF SBIR grant proposal to connect to at least 10,000 news search engines. Plans to extend to do cross language (English-Chinese) retrieval.


Clement Yu, Computer Science Primary Grant Support: National Science Foundation Query appropriate query interface

rn Retu rface te In y r Que

Formulate Query

Query

Repository Query Interfaces Airline Reservation Rent a Car Real Estate

Problem Statement and Motivation •

Many companies sell the same type of products ( eg. computers) or services ( eg. life insurance) via the Web.

Looking for the best product or service (eg. lowest price and meeting specifications) requires excessive checking of many Web search engines. • This imposes too much burden on a user.

The aim is to allow a user seeking a product or a service to submit a single query and to receive the results ranked in descending order of desirability.

METASEARCH ENGINE subquery 1

Search Engine 1

subquery n

Search Engine 2

Search Engine n

 Merge Results

Web Database Final Ranked Results

Key Achievements and Future Goals

Technical Approach •

Companies selling products or services via the Web have different user interfaces.

Most steps in the construction of the integrated user interface have been automated.

Create an user interface that integrates the features of each individual user interface and organize them such that the integrated interface is easily understood.

The same technique can be applied in other areas (e.g. construct generalized forms): • For selling a car online multiple forms need to be filled in • Create a generalized form applicable to multiple sellers.

A user query submitted against the integrated interface is translated into subqueries against individual interfaces.

Preliminary results have also been obtained to determine the proper search engines to invoke for each given user query.

Will produce metasearch engines for various products and services.

It is possible to determine for each user query, which search engines should be invoked: • based on the previously processed queries


Clement Yu, Computer Science Primary Grant Support: National Science Foundation

Problem Statement and Motivation •

Given a collection of documents and a query, the proposed system finds documents which are relevant to the query and are opinionated

The proposed system can advise consumers about the sentiments of a given product or service. It can suggest hints for advertisements.

The system can also analyze political opinions as well as comparing the political viewpoints of different parties.

Key Achievements and Future Goals

Technical Approach •

Accurate retrieval by identifying concepts in queries and documents

Identifying opinionated features

Classifying sentences into opinionative sentences

Determine whether opinions are relevant to the query topic

Determine whether the opinion is positive, negative or mixed (positive and negative)

Achieve the highest effectiveness scores for title queries in the Blog Track of TREC (Text Retrieval Conference) in 2006 and 2007. The tasks include retrieving relevant opinionated documents as well as classify them into positive, negative or mixed categories.

Plan to build various systems to have higher effectiveness, higher efficiency and satisfy different needs.


Philip S. Yu, Computer Science

Problem Statement and Motivation

Co-author network

Data accumulated at exponential rate across all organizations , all domains, and all geographies

These data often not in structured record format - we focus on graphs and networks

Need to be able to mine the vast amount of data to get useful information and knowledge

Yeast protein interaction network

Key Achievements and Future Goals

Technical Approach •

Identify distinctive or discriminative substructures in the graph as features

Devise new similarity measures on graphs

Explore graph compression to reduce a huge graph into a smaller one for further analysis

Conduct community mining from multi-relational networks

Capture dynamic and evolutional behavior of networks

Develop real-time processing capability to address monitoring type applications

Graph indexing methods

Similarity search methods for graphs

Data Integration, cleaning and validation techniques in Information Networks

Online Analytical Processing paradigms for Information Networks

Algorithms for mining Information Networks, including social networks

Real-time stream mining algorithms


Philip S. Yu, Computer Science, UIC Primary Grant Support : DBI-0960443, CNS-1115234, OISE-1129076, Army Grant W911NF-12-1-0066

Problem Statement and Motivation • Data being generated at every high rate • Needs for instance response • Many applications: surveillance, ad placement, highfrequency trading, outbreak control, etc.

Real-time monitoring & mining of multiple steams

• The challenge on real-time stream processing • One-pass • Resource constraint • Evolving nature with concept drift • Noisy data

Key Achievements and Future Goals

Technical Approach • Adapt OLAP type approach to separate on-line and off-line operations • Develop summarization approaches to reduce on-line storage and processing

Devised real-time scalable mining algorithms on clustering, classification, frequent patterns, outliers, spams, community detection, etc.

Developed new approaches to fuse data streams from multiple heterogeneous sources

Handled concept drift, and noise and incomplete data

Received IEEE ICDM 2013 10-Year Highest-Impact Paper Award

• Capture evolving patterns and abnormality • Introduce resource adaptive computation to match the depth of the computation with the rate of the data stream which can be bursty • Address data uncertainty in designing mining algorithms


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support: DBI-0960443, CNS-1115234, OISE-1129076

Problem Statement and Motivation •

Many social networks with different focuses - Facebook, Twitter, Foursquare

Many people participate in multiple social networks

By fusing the information scattered in different networks, prediction power can be greatly improved.

Key Achievements and Future Goals

Technical Approach •

Identify some corresponding accounts across networks referred to as anchor points

Developed effective anchor link predict algorithms to link up the corresponding accounts across networks

Use anchor points as a base to transfer knowledge across networks

Developed novel algorithms to transfer knowledge across networks to help predict user behavior, including social links and location links

Address heterogeneous node and link types, including social, spatial, temporal and text information


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support : NSF IIS-0905215

Problem Statement and Motivation • Information networks are ubiquitous • Chemical compounds, biological networks, social networks, world wide webs, cyber physical networks • Each node and link may have attributes, labels, and weights

HIN of Movie Data Technical Approach • Recognize the path semantics of HIN • Author-Paper-Conference-Paper-Author (authors with papers in the same conference) is different from Author-Paper-Author (co-authorship)

• HIN allows multiple object and link types • E.g., Medical network: patients, doctors, disease, contacts, treatments

Key Achievements and Future Goals • Developed path based mining algorithms for HIN • Clustering, classification, similarity search, recommendation, etc. • Developed network based OLAP to handle large networks

• Introduce the concept of meta path based similarity measure to guide knowledge discovery • Integrate linkage and node information for more effective mining

• Applied HIN to solve various application problems from bioinformatics to social networks


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support : NSF IIS-0905215, Google, MITRE

Problem Statement and Motivation • Social network gains ever increasing popularity • Abundant of information is captured • How to effectively take advantage of these information remains a challenge • The information is noisy, with spam and malicious postings • How to utilize the wisdom of the crowd is unclear

Technical Approach • Devise scalable algorithms to handle large social networks • Develop novel models to capture insights from social science research

Key Achievements and Future Goals • Devised novel influence propagation models • Develop new models to detect magnet community to understand the talent flow • Developed new approaches to detect review spam

• Find innovative ways to utilize the wisdom of the crowds in the social network • Address noisy and incomplete information • Utilize the heterogeneous information network approach

• Devised novel models to detect network shakers to handle the cascading effect of “too big to fail” entities as in the financial network


Philip S. Yu, Computer Science Primary Grant Support : NSF IIS-0914934, CNS-1115234

Problem Statement and Motivation • The large amount of data being captured, and digitized has made privacy an important issue

Anonymization

• In many cases, users are not willing to divulge personal information unless privacy is assured • Many industries need to access vast amount of personal data to advance the products or services, e.g. from personalized medicine to product recommendation

k2-degree anonymization (k=2) Technical Approach • To preserve privacy on network/graph data • Not only node attributes, but also connection information need to be anonymized or perturbed

Key Achievements and Future Goals • Identified the friendship attack in a network, where the degrees of two vertices connected by an edge is utilized to reidentify related victims in a published network and devise the k2-degree anonymization technique.

• Identify weakness of current privacy protection methods

• Devise new privacy attack models

• Proposed the concept of structural diversity to protect the anonymity of the network community identities and develop the k-SDA technique

• Develop novel privacy protection methods accordingly • Received EDBT 2014 Test of time award


Philip S. Yu, Computer Science, UIC with other researchers outside UIC Primary Grant Support : DBI-0960443, OISE-1129076, OIA-0963278, Army grant W911NF-12-1-0066

Problem Statement and Motivation

Data Stream with Concept Drift • High Dimensional Data • Heterogeneous Data Sources • Unconventional Data Types • Uncertain Data

• The ever increasing amount of data being captured, and digitized creates the big data challenge

Velocity

Variety

BIG R Data

• Big data is being recognized as a valuable asset Volume

Scalable Mining Algorithm

Technical Approach

• Getting the value out of the big data remains a challenge • Volume • Velocity • Variety

Key Achievements and Future Goals

• Adapt OLAP type approach to separate on-line and off-line operations

• Devised real-time data stream mining algorithms to handle concept drift

• Develop summarization approaches to reduce on-line storage and processing

• Developed new approaches to handle novel data types, such as graphs/networks

• Use matrix factorization approaches to achieve dimensional reduction

• Developed new approaches to integrate data from heterogeneous data sources

• Consider data uncertainty in designing mining algorithms

• Received IEEE Computer Society 2013 Technical Achievement Award on Big Data


Philip S. Yu, Computer Science Primary Grant Support : DBI-0960443, OISE-1129076, Army Grant W911NF-12-1-0066

Problem Statement and Motivation •

Difficulties for new ventures to get funding

Crowdfunding emerges as new way to raise funding via the web technology

Need to understand the impact of social media on crowdfunding

Success of crowdfunding will help create more new ventures and jobs to grow the economy

Key Achievements and Future Goals

Technical Approach •

Develop prediction models on the number of backers and the success of the fund raising

Studied the effect of early promotion on social media to the success rate on fund raising

Identify key factors that will affect the success rate of the fund raising

Identified strategies to improve the success rate

Improved understanding on how to participate in crowdfunding

Collect Twitter data regarding Kickstarter to develop and evaluate the models


Lenore Zuck, Computer Science Primary Grant Support: NSF, ONR, and SRC

Problem Statement and Motivation •

Translation Validation • Backward Compatibility of successive generations of software • Formal proofs that optimizing compilers maintain semantics of programs

Termination proofs of Pointer programs

Property Verification of parameterized systems (bus protocols, cache coherence, &c)

Key Achievements and Future Goals

Technical Approach •

Translation validation verifies each go of the system. Verification conditions that are automatically created are send to theorem provers

Combination of model checking and deductive methods allows to push the envelope of automatic verification of infinite-state systems (for both pointer programs and protocols)

Based on methodology developed, Intel is using MicroFomal to verify backward compatibility of micropgrams (between RISC & CISC) • (Need to develop better methodologies to prove theories that have bit vectors)

IIV is a new tool that allows automatic verification of safety properties of parameterized systems (nothing bad will ever happen)

Researchers at MSR have expressed interest to integrate pointer analysis in their verification tool


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.