Meeting of the Minds, 2014 by Carnegie Mellon University in Qatar

M E E T I N G OF THE

Minds

U n d e r g r a d u a t e Research Symposium

MoM_Digest2014.indd 1

4/23/14 12:40 PM

MoM_Digest2014.indd 2

4/23/14 12:40 PM

Meeting of the minds is an annual symposium at Carnegie Mellon University that gives students an opportunity to present their research and project work to a wide audience of faculty, fellow students, family members, industry representatives and the larger community. Students use posters, videos and other visual aides to present their work in a manner that can be easily understood by both experts and non experts.

Through this experience, students learn how to brindege the gap between conducting research and presenting it to a wider audience. A review committee consisting of industry experts and faculty members from other universities will review the presentations and choose the best projects and posters. Awards and certificates are presented to the winners.

MoM_Digest2014.indd 3

4/23/14 12:40 PM

Table of Contents

External Judges

PAGE 1

Senior Thesis Projects 2 Biological Science Posters

Q1 Adherence of Pathogenic Fungi Isolated in the Qatari Clinical Setting

Q2 Morphology of Pathogenic Fungi in the Qatari Clinical Setting

Business Administration Posters

Q3 Value Premium in the GCC Markets

Q4 Factors Affecting LNG Prices for Qatar: A Study Using Past Data

Q5 Ambiguity in Choices and the Reflection Effect

Q6 Automated Course Scheduling

Computational Biology Poster

Q7 SNV-check: A Quality Control Tool for Familiar Exome Sequencing Data Based on the Sharing of Rare Genetic Mutations

Computer Science Posters

Q8 Twitter Sentiment Analysis

Q9 Using Technology to Help People Save Food Effortlessly

Q10 SmartPC: Adding 4th Dimension to Computer Input

Q11 Contextual Spellchecker to Improve Human Robot Interaction

Q12 Enhancing Agent Gaze in Uncontrolled Environments

Q13 Seeing is Learning: Accessible Technologies for Universal Learning

Q14 Supercharging Hadoop for Efficient Big Data Analytics

Q15 Descriptive Minicomplexity

Information Systems Posters

Q16 Is an Accessible Website a More Usable One?

Q17 Effect of Website Localization on Impulse Buying Behavior of Arab Shoppers

Q18 On the Relevance of Cultural Intelligence for Technology Acceptance

Q19 Flipped Learning for Educational Content Delivery: The Case of Introductory Programming Courses

Q20 Evaluating the Use of Emerging Technologies in Education

Q21 Studying the Sociotechnical Barriers to Using Augmented Reality Technologies for E-commerce

Q22 Teaching / Learning with iPads

40 42 44 46

Post-Graduate Poster

MoM_Digest2014.indd 4

QG1 DREAM: Distributed RDF Engine with Adaptive Query Optimization & Minimal Communication 48

4/23/14 12:40 PM

External Judges Dr. Hadi Abderrahim, Managing Director, Qatar BioBank Dr. Rashid Al-Kuwari, Advisor, Ministerâ&#x20AC;&#x2122;s Office, Ministry of Environment Maha Al-Mannai, University Collaboration Manager, Qatar Shell Dr. Amer Al-Saigh, Head of Networks, Vodafone Dr. Marco Ameduri, Professor and Associate Dean, WCMC-Qatar Fady Azar, General Manager, PC DealNet Dr. Pitre Bourdon, Head of Research, Aspire Dr. Peter Chomowicz, Associate Dean for Research & Development, VCU-Q Dr. Sebti Foufou, Professor and Head, CS & Engineering Dept., Qatar University Dr. Mohamed Hefeeda, Principal Scientist, QCRI Ashraf Ismael, National Information Assurance Manager, ictQatar Dr. Dirar Khoury, Executive Director, Special Projects, QF Research Division Dr. Hilal Lashuel, Executive Director, QBRI Bryan Munro, CIO, Vodafone Dr. Munir Tag, Program Manager, ICT, QNRF Mohamad Takriti, CEO, iHorizons Dr. Barak Yehya, Expert, Ministry of Development Planning and Statistics Dr. Thomas Zacharia, Executive VP, QF Research Division

MoM_Digest2014.indd 5

4/23/14 12:40 PM

Senior Thesis Projects

Business Administration

Nada M. Salem, Ambiguity in Choices and the Reflection Effect

Advisor: Peter St端ttgen, Ph.D.

Information Systems

Sarah Mustafa, Is an Accessible Website a More Usable One?

Advisor: Selma Limam Mansar, Ph.D.

Noora Al-Maslamani, Effect of Website Localization on Impulse Buying Behavior of Arab Shoppers

Advisor: Divakaran Liginlal, Ph.D.

Muhammad Jaasim Polin, On the Relevance of Cultural Intelligence for Technology Acceptance

Advisor: Selma Limam Mansar, Ph.D.

Haya Thowfeek, Flipped Learning for Educational Content Delivery: The Case of Introductory Programming Courses

Advisor: Selma Limam Mansar, Ph.D.

Daniel Cheweiky, Evaluating the Use of Emerging Technologies in Education

Advisor: Divakaran Liginlal, Ph.D.

Afrah Hassan, Studying the Sociotechnical Barriers to Using Augmented Reality Technologies for E-commerce

Advisor: Divakaran Liginlal, Ph.D.

Aliya Hashim, Teaching / Learning with iPads

Advisor: Divakaran Liginlal, Ph.D.

MoM_Digest2014.indd 6

4/23/14 12:40 PM

Tuesday, April 29, 2014, 4:00 pm - 6:00 pm Carnegie Mellon University, Education City

MoM_Digest2014.indd 7

4/23/14 12:40 PM

Adherence of Pathogenic Fungi Isolated in the Qatari Clinical Setting Author Rula Al-Baghdadi (BS 2014)

Faculty Advisor Jonathan Finkel, Ph.D.

Category Biological Sciences

Abstract Fungal infections due to Trichosporon species have risen 10-fold in the past 4 years at Hamad Medical Corporation in Qatar (HMC), rapidly becoming the second most frequent non-Candida infection isolated in the clinical setting. Systemic infections resulting from Trichosporon species have mortality rates as high as 80%, due to the low efficacy of the most commonly used anti-fungals. These fungi are normally commensal with the host, but become pathogenic under immunocompromised conditions, complications due to diabetes, and in the presence of implanted devices. The fungal species attack the host by adherence to catheters, implanted devices, and mucosal membranes. By adhering to these abiotic surfaces, Trichosporon species form a biofilm composed of yeast cells, pseduohyphal, hyphal, and arthroconidial cells. The formation of a biofilm allows for the growth and dispersion of the fungal infection throughout the host. Biofilm formation results in increased resistance to anti-fungals, further leading to increased difficulty in treatment. The most effective form of treatment to ultimately prevent biofilm formation will be the disruption of the first step of biofilm formation, adherence. In order to investigate the role of adherence in Trichosporon associated infections, clinically isolated strains of Trichosporon were obtained from patients at HMC. These strains were tested for their relative strength of adherence by use of an established, reproducible adherence assay. The data was further analyzed by examining the change in adherence due to the site of infection. This study aims to identify possible differences in adherence due to species or site of infection, with the goal of rapid species identification that would lead to more effective treatment of fungal infections.

MoM_Digest2014.indd 8

4/23/14 12:40 PM

MoM_Digest2014.indd 9

4/23/14 12:40 PM

Morphology of Pathogenic Fungi in the Qatari Clinical Setting

Author Fatima Al-Saygh (BS 2014)

Faculty Advisor Jonathan Finkel, Ph.D.

Category Biological Sciences

Abstract: Pathogenic fungi result in secondary systemic infections in immunocompromised patients, diabetic patients and device-associated surgeries. The fungal species infect the host by adhering to abiotic surfaces such as catheters, and artificial joints. Device associate infections are correlated with the ability of fungi to form surface associated microbial communities called biofilms. Previous studies focused primarily on Candida albicans, but with decreasing rates of infections by Candida species due to effective antifungals treatment, formerly rare fungal species are being isolated with alarming frequency in the clinical setting. Increasing rates of infections have been correlated with Trichosporon and Geotrichum species at Hamad Medical Co-operation in Qatar (HMC). With an 80% mortality rate associated with such infections, and little known about these fungi, intense study is required for the rapid identification of these species. Here we examine the morphology of Trichosporon and Geotrichum strains isolated from HMC was determined using fluorescence microscopy. Unlike Candida species that form yeast, pseudohyphae, and hyphae, it was observed that Trichosporon species also form arthroconidia cells. Moreover, Geotrichum species do not form yeast or pseudohyphal cells, but form arthroconidia and hyphal cells. Due to their distinct morphology and infections Trichosporon or Geotrichum species require treatment with different antifungals. The delay in the rapid identification results in the administration of the wrong antifungal, prolonging recovery or resulting in increased mortality.

MoM_Digest2014.indd 10

4/23/14 12:40 PM

MoM_Digest2014.indd 11

4/23/14 12:40 PM

Value Premium in the GCC Markets Authors Tanzeel Huda (BA 2015) and Noor-ul-huda Admaney (BA 2015)

Faculty Advisor John Gasper, Ph.D.

Category Business Administration

Abstract: In technical terms, Price-to-Earnings or PE ratio is defined as market price per share divided by the net profit per share. In less technical terms, PE ratio is an indication of how well the company is growing and how riskfree it is to invest in it. Companies with high PE ratio are called growth companies while companies with low PE ratio are either “bad” companies or value companies (undervalued companies which have high growth in the future). Value premium is the difference in stock returns between value and growth stocks. We used archival data of the GCC’s (Qatar, UAE and Saudi Arabia) publicly listed companies to find the presence of value premium in these countries’ markets. Consequently, we found that despite the vast differences between markets with value premium (US, Canadian and Singaporean), value premium also exists in the GCC markets. Additional analyses lead us to conclude that the financial crisis of 2008 affected the growth rate of the value premium post financial crisis.

MoM_Digest2014.indd 12

4/23/14 12:40 PM

MoM_Digest2014.indd 13

4/23/14 12:40 PM

Advisor: John Gasper

By Tanzeel Huda and Noor-ul-huda Admaney

The results of this research opposes our null hypothesis that value premium does not exist in the GCC markets. Despite the different nature of the GCC markets as compared to US and Canadian markets due to lesser analyst’s coverage and lesser historical data in the latter markets. We found considerable evidence of value premium which can be seen in the results on the graphs beside. To summarize, we found that the coefficient of low PE stocks was higher than the coefficient of high PE stocks. The negative coefficient of high PE stocks led us to another important finding of our research. We realized that the 2008 financial crisis not only caused a dip in the value premium but that the dip was much deeper for companies with low PE ratios, slowing down the growth rate of value premium.

Conclusion

y We then calculated and plotted the returns of both companies with high PE ratios and low PE ratios to see whether value premium exists in the GCC markets or not. For the returns, we took 2003 as the base year and all the returns throughout are represented as a percentage of returns of the base year.

y After eliminating these outliers, we did an extensive check on the remaining companies to see that they had been performing well in years prior to 2003, to confirm that these good margins were not just as a result of one exceptional year.

y Financial institutions were excluded from the set of value stocks as they had unusually high EBITDA margins.

y The next part was to exclude the bad companies from low PE stocks. To filter out the bad companies, we decided to rely on a few metrics. One of them was EBITDA margins. This measure gives the earnings of a company from their core operations and is thus, a close proxy to a company’s ability to generate cash. We decided that the companies with high EBITDA and net income margins are value companies.

y After getting the PE ratios, we first sorted them in descending order and divided the results into quartiles. The first quartile consisted of growth companies (high PE) while the fourth quartile consisted of bad and value companies (low PE).

2) Data Processing

y The number of stocks that were considered was 211.

y In order to cover the whole GCC markets we used archival data from Qatar, Saudi Arabia, UAE, Bahrain, Oman and Kuwait to give us a wide range of stocks to consider. Due to the similarity of these markets, we decided to amalgamate the results and treat it as one.

y We initially wanted to look at a time period of 20 years but due to lack of data we decided to cut the timeline in consideration to ten years from 2004 to 2013. Empirical research have shown that a timeline of above ten years is also relevant to find the existence of value premium.

y The initial stock tickers were picked from Bloomberg but we soon found out that the data we required was not completely available ie PE ratios, EBITDA margins, daily prices etc. Therefore we switched to S&P Capital IQ for the data extraction.

1) Data Collection

Methodology

We used archival data of GCC’s (Qatar, UAE and Saudi Arabia) publicly listed companies to find the presence of value premium in these countries’ markets. Consequently, we found that despite the vast differences between markets with value premium (US, Canadian and Singaporean), value premium also exists in the GCC markets. Additional analyses lead us to conclude that the financial crisis of 2008 affected the growth rate of the value premium post financial crisis.

Abstract

Value Premium After Crisis

Value Premium Exists

Value Premium Before Crisis

Value Premium in the GCC Markets

Factors Affecting LNG Prices for Qatar: A Study Using Past Data Authors Rafay Abbasi (BA 2014) and Saad Ahmed (BA 2014)

Faculty Advisors Fuad Farooqi, Ph.D.

Category Business Administration

Abstract: Qatar’s natural gas reserves stood at approximately 890 trillion cubic feet (Tcf) as of January 1st, 2013, holding 13% of the total world natural gas reserves, third largest after Russian and Iran (Oil & Gas Journal, 2013). Given that Qatar’s economic future is highly dependent on natural gas, 50% of GDP, 85% of export earning and 70% of governmental revenues (USQBC, 2013), Qatar needs to ensure that when pricing natural gas, all-important factors are given due importance. The research looks into the factors that affect prices of LNG for Qatar. Monthly Data from year 2010 to 2012 for Qatar imports to the US was analyzed using a multiple regression model, using price of U.S. LNG imports from Yemen, crude oil price, national average temperature and national heating degree‐days as the primary variables.

MoM_Digest2014.indd 14

4/23/14 12:40 PM

Meeting of the Minds 2014

Carnegie Mellon University

Rafay Abbasi & Saad Ahmed Faculty: Fuad Farooqi

FACTORS AFFECTING LNG PRICES FOR QATAR A study using past data

Problem Statement Qatar’s economic future is highly dependent on natural gas reserves; 50% of GDP, 85% of export earnings and 70% of governmental revenue. We found no model to predict changes in price of LNG.

Proposed Solution Build a regression model to determine inﬂuential factors for Qatar’s LNG export to the U.S., using % change in price of U.S. LNG imports from Yemen, % change in crude oil prices, % change in national average temperature and % change in national heating degree-days, by analyzing monthly data from the year 2010 to 2012.

Interesting Find The “Rule of Thumb” (Brown & Yücel, 2007) states that the ratio of natural gas and crude oil prices is fairly constant at 10:1, for the United States. A similar trend for Qatar LNG export price with global crude oil prices was found.

MoM_Digest2014.indd 15

!Regression Model:

(1st diﬀerence for dependent and independent variables)$

! ! ! !

Price of U.S. LNG imports from Qatar (U.S. Dollars per thousand cubic feet)

Price of U.S. LNG imports from Yemen (U.S. Dollars per thousand cubic feet)

Yt = 0.0061 + 0.30374β1 + 0.43511β2– 0.22 β3 – 0.0.027β4.

!Crude Oil Prices (U.S. !Dollars per ! Barrel)

National Average Temperature (°F)

National Heating Degree Days

Results:$ • There seems to be no factors that inﬂuence the LNG’s export price for Qatar. The p-value is greater than 0.05 for all factors and none of the factors are statistically signiﬁcant. • The covariance chart shows no relationship amongst the variables themselves. • Qatar has entered long term contracts with Japan, India and United States. The price range at which Qatar sells LNG is set on a pre-negotiated price. The factors may play a vital role in deciding world’s natural gas prices but they do not affect the LNG export price for Qatar. Hence, in this way, Qatar has hedged its position as a global LNG exporter through longterm contracts.

4/23/14 12:40 PM

Ambiguity in Choices and the Reflection Effect Authors Nada M. Salem (BA 2014)

Advisor Peter St端ttgen, Ph.D.

Category Business Administration

Abstract In many real life situations, decisions involve uncertain and sometimes delayed consequences. Many works have studied the effect of ambiguity, both in the outcome and probability, on decision-making. Yet, empirical research on ambiguity preferences has mainly focused on the gain domain, with little research done on the loss domain. To close this gap in literature, this paper examines not only ambiguity in the present gain and future gain domains; it also examines ambiguity attitudes towards loss prospects in both the present and the future domains. On this basis, we first distinguish between risk and ambiguity, and ambiguous probabilities and outcomes. Then, we establish the effect of time on decision-making in accordance with the construal level theory. Finally, we discuss the implications of ambiguity in the loss domain. We hypothesize, in accordance with the reflection effect, a reversal of preferences in the loss domain, where a vague probability prospect is preferred to an all know prospect, which is preferred to a vague outcome prospect and the same effect when the lotteries are extended into the future. Our results show that (1) on the aggregate level, a reflection effect is exhibited in the loss domain, where individuals have opposite preferences in the loss domain than they do in the gain domain, and (2) on the individual level, the reflection effect exists; however, the order of preferences is not as we predicted for the loss domain. As for the future results, we demonstrate that individuals are less averse towards imprecise probabilities, and more seeking towards imprecise outcomes in the gain domain, and the reversed effect for the loss domain. The paper ends on a discussion of the implications of the reflection effect on future studies and research.

MoM_Digest2014.indd 16

4/23/14 12:40 PM

Ambiguity in Choices and the Reflection Effect

All Known (AK) Non-Ambiguous

Vague Probability (VP) Vague Outcome (VO) Ambiguous Probability

50% chance of winning exactly QR100

25%-‐75% chance of winning exactly QR100

Coin Flip: You have a 50/50 chance of either heads or tails

Speeding: You do not know the probability of being caught, but you know what the fine would be if caught

Previous Findings

Ambiguous Outcome

50% chance of winning between QR50-‐QR150 Card Deck: You do not know which suite you will get, but you know the probability of getting each suite

• VO is preferred to AK in the gains domain • AK is preferred to VP in the gains domain o VO>AK>VP in the gains domain • Little research on preferences for ambiguous gambles in the loss domain • Theory suggests a reflection effect for risks: Having opposite preferences for gambles involving losses vs. gains o Well known effect for non-‐ambiguous gambles

Research Question

Does the reflection effect exist for ambiguous gambles?

Experimental Design

• 50 results from Carnegie Mellon students aged 18-‐22 • Individuals offered 12 different raffle tickets • Survey Design: 3(Ambiguity Level) *2(Gain/Loss)*2(Present/Future)

*More hypotheses are presented in the research paper

Hypothesis & Results

• Previous findings replicated (VO>AK>VP) • Reflection Effect: o Aggregate level analysis: -‐H: VP>AK>VO -‐Result: Directional Support o Individual level analysis: -‐H: Opposite preferences for losses vs. gains -‐Result: 64% exhibit reflection effect -‐

Conclusion

The reflection effect does exist for ambiguity. In the loss domain, individuals prefer VP>AK>VO as opposed to VO>AK>VP in the gains domain

Nada Salem- nada@cmu.edu, Advisor: Peter Stüttgen- pstuettg@andrew.cmu.edu, Honors thesis in Business Administration, Tepper School of Business

MoM_Digest2014.indd 17

4/23/14 12:40 PM

Automated Course Scheduling Authors Aniish Sridhar (BA 2015)

Faculty Advisors John Gasper, Ph.D.

Category Business Administration

Abstract Classroom scheduling is an essential part of the planning process for functioning universities. Limited availability of classrooms and varying time slots poses a strict challenge for scheduling courses. Changes in the course policies followed by customized classroom space; along with rigid time requirements for each course suggest an automated approach to solve the problem. Carnegie Mellon University in Qatar requires attention with regard to its classroom scheduling. Providing 4 major programs along with a wide variety of minors, it becomes important to develop an automated classroom-scheduling function that caters to the needs of both faculty and students. We (Aniish Sridhar and Amalan Raymond Roshan) under the guidance of Professor John T. Gasper decided to approach the classroom scheduling problem using Integer Linear Programming. The scheduling process used Microsoft Excel as the user interface and Visual Basic as the programming platform to execute the scheduling function. We also decided to incorporate the use of the Open Solver Engine that will taken in the constraints associated with the scheduling problem and ensure that they are satisfied before the final output is delivered. The scheduler maximizes efficiency of course scheduling by optimizing utilization of classrooms and reducing time spent by the user in scheduling.

MoM_Digest2014.indd 18

4/23/14 12:40 PM

MoM_Digest2014.indd 19

4/23/14 12:40 PM

COURSES

• • • •

OPTIMUM SOLUTION

AVAILABLE ENROLLMENTS TEACHING FEASIBLE CLASSROOM IN COURSE FACULTY TIME SLOTS

NUMBER OF

Courses 70122 70122W 70122Q 70122X 70201 70208 70208 70321 70323 70332 70342 70345 70345 70401 70416 70418 70423 70435 70448 70451 70455 70460

Days UMT R R R TBA UT R MW MW MW MW MW UT MW MW MW UT MW UT UTR MW MW

Start time 10:30 14:00 13:00 9:30 8:30 9:30 9:30 16:30 14:00 16:30 14:00 9:30 9:30 10:30 10:30 16:30 9:30 8:30 8:30 10:30 13:00 8:30

Ending time 11:20 14:50 13:50 10:20 9:20 10:20 10:20 17:50 15:20 17:50 15:20 10:20 10:20 11:50 11:50 17:50 10:20 9:20 9:20 11:20 13:50 9:20

Classes 2163 1202 1202 1202 2163 1213 1213 1301 1213 2152 1301 1301 1301 2152 1213 1213 1202 1301 1202 1213 1202 1202 1301

Duration 50 50 50 50 50 50 50 80 80 80 50 80 50 80 80 80 50 50 50 50 50 50

OUTPUT Teachers Kekre Amalan Vanessa Nazish TBA Gasper Gasper Tridas Tridas Mcginnis Mcginnis Sileo Farooqui Farooqui Collier Gasper Kekre Willem Collier Divakaran Tom Divakaran

Our objective was to minimize the ratio between assigned classroom capacity and course enrollment. The equation below summarizes the objective of the scheduling process.

>= 1

To ensure that classroom capacity is larger than course enrollment, we programmed the following equation:

Let “c” be the variable denoting capacity of the classroom assigned to course X. “e” represents the enrollment for course X

X ij

This equation ensures that each course is assigned one classroom and there are no classroom overlaps within the same time-slot.

The following constraint helps to ensure that each course on its preferred day is offered at least once.

Let X be the variable denoting a course and j be the set of all time slots available. Assume i to be the variable representing the classroom allocated for a given course.

BACKGROUND

Courses scheduling is a critical requirement for functioning universities. Limited classrooms and varying time slots poses a big challenge. An automated approach would be the best solution. Carnegie Mellon University in Qatar, now providing 5 majors with a variety of minors needs to develop an automated courses scheduling function.

PROCESS

Two classes scheduled at the same time-slot should assigned different classrooms. Faculty teaching more than one course on a given day should be allotted different timeslots for each course. Duration of a course offered each day should be aligned with the duration suggested by the administrative department. Every course must be offered at least once on its preferred days. The total courses offered in a given time-slot should be less than number of classrooms. Course enrollment should be smaller than classroom capacity. Core courses across majors and cohorts should not be scheduled at the same timeslot.

: aniishs@qatar.cmu.edu : ANIISH SRIDHAR : JOHN GASPER Department Name : BUSINESS ADMINISTRATION

Andrew ID Student Name Project Advisor

• Incorporating faculty preferences into the scheduling process • Reducing time overlaps for core courses across majors and cohorts • Developing a comprehensive algorithm that will suggest preferred days for oﬀering a course.

SCOPE FOR FUTURE WORK

• Reduces time spent by the administrative department in scheduling courses • Ensures eﬃcient utilization of classrooms at any given point of time • Powerful resource for capacity planning and understanding scope for adding more courses • Re-allocation of courses to classrooms will be easier

BENEFITS

• The model first picks courses that are required to be scheduled on Sunday. • The algorithm creates a arc flow model, with courses being mapped onto time-slots • Using Integer-Linear programming approach, the model picks the appropriate classroom and timeslot for all courses based on their enrollment. • The process continues until a schedule for all five days of the week is formulated. • Resulting is an optimized course schedule.

• • • • • • •

CONSTRAINTS IDENTIFIED

• Maximization of classroom and space utilization on any given day • Optimization of matching course enrollment to classroom capacity is achieved • Re-assigning of courses to classrooms will be easier.

RESULTS

AUTOMATED COURSE SCHEDULING

SNV-check: A Quality Control Tool for Familiar Exome Sequencing Data Based on the Sharing of Rare Genetic Mutations Author Noora J. Al-Muftah (CB 2016)

Faculty Advisor Khalid A. Fakhro, Ph.D. (Weill Cornell Medical College in Qatar)

Category Computational Biology

Abstract The recent explosion of next generation sequencing (NGS) technologies is poised to completely transform the study of disease genetics. In Qatar, the high rates of consanguinity and the large pedigree sizes sets the landscape for clustering of diseases within families. Discovering the causative mutation(s) in these families can fundamentally enhance our understanding of disease biology in humans. As the practice of NGS becomes more routine in the clinic, so will the need for efficient pipelines to process and analyze the data. While many such software suites exist in the open-source sphere, there is very little software that is tailor-made for Quality Control (QC) analysis for family-based exome data. One of the more important uses of QC software, for example, is to check that the individual labels indeed represent the correct individuals in the family tree. Though not common, it is definitely possible that a sample is mislabeled by human error, and the risk is real that an analysis of the family may proceed with incorrect samples being used as affected or unaffected, yielding incorrect results in disease-gene identification. Such errors would render the entire analysis obsolete, unless caught early enough in the pipeline. We present SNV-check, a robust and lightweight sample-fidelity pipeline which performs necessary QC on variant files to assure the quality and fidelity of family data in NGS analysis. At its core, SNV-check uses the rare-variant sharing phenomenon to calculate the likelihood that two samples are related, and the type of relationship (e.g. parent-child, sibling, first or second cousin, or unrelated). As part of the implementation, the program allows the user to specify which files are to be compared, to give a quality score cut-off if wanted, and to specify a common-variant file against which all vcf files are compared to ensure that only rare variants are carried into the pairwise comparison. If a common-variant file is not provided, the program contains a subroutine that allows for such a file to be generated as the intersection of variants from all samples to be compared. The program also optionally allows the user to generate position-based list of all pairwise comparison, useful in troubleshooting if estimated relationships are not as expected.

MoM_Digest2014.indd 20

4/23/14 12:40 PM

SNV-‐check

[Single Nucleo?de Variant]

A quality control tool for familial Exome Sequencing data based on the sharing of rare genetic mutations

Noora J. Al-‐Muftah

Problem and Motivation

nmu%ah@qatar.cmu.edu

Research Advisor: Khalid A. Fakhro, Ph.D (WCMC-‐Q)

Methodology (II)

•  The growth of Next Genera?on Sequencing data has provided numerous opportuni?es to study disease-‐causing muta?ons

•  Rare SNPs comparison as a measure of genetic similarity Low percentage High percentage

•  In Qatar, there is a high number of consanguinity between families which makes muta?ons concentrated within the family clusters

Percentage of sharing of rare SNPs between any two individuals

Siblings •  An important step before undergoing any analysis is to validate that the gene?c variance samples for each individual is accurate and not mislabeled as belonging to another individual or family

•  There is a lack of Quality Control (QC) so%ware for familiar Exome Sequencing data that checks this condi?on

SNV-‐check

Solution

•  A computed tool that checks the rela?onships between two individuals as either parent-‐child, sibling-‐sibling, cousin, or unrelated

Parent-‐child

First/ Second/ Third-‐degree cousins

Unrelated individuals

Sample results •  Pairwise matching of 19 individuals belonging to four diﬀerent families

This graph shows the percentage of matches for each one of the 91 possible pairs

Input: a set of vcf ﬁles of a group of individuals, op?onal quality score cut-‐oﬀ, op?onal database of common muta?ons (dbSNP 138) Output: Table mapping every two individuals to the percentage of sharing of rare SNPs between them

Methodology (I) •  Elimination of low-‐quality score mutations, Indels, and common SNPs For each individual, the set of muta?ons taken to be pairwise-‐ compared is the intersec?on of the three following sets:

Siblings and Parent-‐child Unrelated individuals = low % of matches = high % of matches

Future work Pairwise matching between individual a and individual b Individual a with sets (A1, A2, A3), individual b with sets (B1, B2, B3) where it matches only on the chromosome and the base pair posi?on. Similarly for the percent of [Chr, Pos, GT] matches

•  •

Finding the degrees of consanguinity of the individual Check whether each individual belong to a rare or common ancestry

Developments on applica?on: •  Improve the run-‐?me and eﬃciency for loading the ~2GB dbSNP 138 data set for each run of the program

MoM_Digest2014.indd 21

4/23/14 12:40 PM

Twitter Sentiment Analysis Author Sabih Bin Wasi (CS 2015) and Rukhsar Neyaz Khan (CS 2015)

Faculty Advisors Behrang Mohit, Ph.D.

Category Computer Science

Abstract Sentiment analysis is the task of automatic prediction of sentiments and emotions in natural language. With the rapid growth of the Internet and social networks, sentiment analysis systems are widely used to estimate various types of opinions on consumer products, political speeches, media products, etc. This project attempts to use machine learning (classification) to build a state-of-the-art sentiment analysis system for text from Twitter. Given a Tweet, the system is expected to analyze and predict its sentiment polarity (positive vs. negative vs. neutral). The system performs two subtasks: (a) phrase-level sentiment analysisÍž (b) complete tweet analysis. We use Support Vector Machines (SVM) as the learning framework. We develop a set of intuitive set of lexical, syntactic and polarity-based features. Our system is participating in the Semantic Evaluation Shared Task on Sentiment Analysis. (SemEval-2014). The system has been trained, tuned and tested on the 2013 shared task data and is outperforming all 30+ participating systems. Our teamâ&#x20AC;&#x2122;s focus is now to develop applications for our system.

MoM_Digest2014.indd 22

4/23/14 12:40 PM

MoM_Digest2014.indd 23

4/23/14 12:40 PM

Using Technology to Help People Save Food Effortlessly Authors Sabih Bin Wasi (CS 2015)

Faculty Advisors Thierry Sans, Ph.D.

Category Computer Science

Abstract According to UN-Food and Agriculture Organization, consumers waste around quarter of their perfectly edible food with figures rising to almost 50% in developed countries. This has resulted in not only $1 trillion loss to global economy but has also increased food demand, escalating food prices all over the world. When zoomed-in to consumer lifestyle, user behavior analysis identified that although people realize that food waste is both morally and economically inapt, it requires immense effort to manage food expiry dates manually. Therefore, this research attempts to build an ecosystem for consumers to enable them to save food effortlessly. The system relies on new encoding for barcodes that would include the expiry of the product. Hence, the system is sourced at supermarket, another stakeholder in food waste crises, where each food item is placed with LifeTags using conventional stock management system (SMS). During our research, we successfully integrated this with a popular SMS. At every transaction at POS, the expiry dates of perishable items are then transferred effortlessly into user’s smartphone through cloudservice. In our subsequent phase of research, we attempted to utilize this information on a consumer’s smartphone. We then designed SpreadTheWord feature through which information of potential ‘food waste’ is appropriately shared with the neighborhood of a consumer so anyone, including food-dependent not-for-profit organizations, can claim it. We also use A* algorithm to construct the shortest path for these organizations to collect food economically. As an additional incentive, our proposed system also uses Web Data Extraction and Data mining to suggest consumer taste based recipes utilizing food in their kitchen. Overall, the system attempts to integrate into consumer kitchen life and let them rescue food waste, effortlessly. Simultaneously, the system benefits supermarkets by helping them save 47% of their earnings resulting in a profit of $26B. Taking into account UNFAO figures, if the system is used by the top four grocery retailers in the UK and the USA, this could bring the global food price hike down by drastically.

MoM_Digest2014.indd 24

4/23/14 12:40 PM

MoM_Digest2014.indd 25

4/23/14 12:40 PM

SmartPC: Adding 4th Dimension to Computer Input Authors Afrozul Haq Aziz (CS 2015), Sabih Bin Wasi (CS 2015), and Mohammad Abdullah Zafar (CS 2015)

Faculty Advisor Yaser Sheikh, Ph.D.

Category Computer Science

Abstract: Since the inception of personal computing, Human-Computer Interaction (HCI) has been the focus of research community. From typing on the keyboard to clicking with a mouse, to tapping on a touch screen, there has been a constant push to make user input for everyday users as easy as possible. While these input modalities get the job done, they are unimodal and context-blind. Therefore, in order to make interaction more intuitive for everyday users, this project proposes a fourth-dimension to HCI using voice, gesture, and face-recognition modalities. In order to keep our system within reach of the every laptop user, we built our system using the upcoming standard Intel webcam. The webcam, together with its SDK, provided SmartPC streams of raw data for each modality. Using multithreading, we built our framework that could handle OS-authentication, music controls, basic window actions like sound control, opening internet browsers etc. SmartPC was also an attempt to attach AI to HCI to interpret sound commands and gestures in the given user-context and in-focus applications. Our study showed that the concept of the system was widely praised. In the usage test, which asked participants to perform a set of routine tasks on the laptop, over 5 times more clicks were needed to perform tasks without SmartPC. Not only that, the time taken to complete the tasks was halved by using SmartPC. The real strength of the system, as we believe, comes from its extensibility. Capitalizing on the potential of the functions we develop for our system, we released the API for our system to let other programmers use the three modalities and develop applications. The API was unanimously appreciated within the Perceptual Computing course offered on campus.

MoM_Digest2014.indd 26

4/23/14 12:40 PM

MoM_Digest2014.indd 27

4/23/14 12:41 PM

Contextual Spellchecker to Improve Human Robot Interaction Authors Naassih Gopee (CS 2016)

Faculty Advisor Majd F. Sakr, Ph.D.

Category Computer Science

Abstract This work focuses on developing a contextual spellchecker to improve the correctness of input queries to a multi-lingual, cross-cultural robot receptionist system. Queries that have fewer misspellings will improve the robot’s ability to answer them and in turn improve the effectiveness of the human-robot interaction. We focus on developing an n-gram model based contextual spell-checker to correct misspellings and increase the query-hit rate of the robot. Our test bed is a bi-lingual, cross-cultural robot receptionist, Hala, deployed at CMU-Q. Hala can accept typed input queries in Arabic and English and speak responses in both languages as she interacts with users. All input queries to Hala are logged. These logs allow the study of multilingual aspects, the influence of socio-cultural norms and the nature of human-robot interaction within a multicultural, yet primarily ethnic Arab, setting. A recent statistical analysis has shown that 26.3% of Hala’s queries are missed. The missed queries are due to either Hala not having the required answer in the knowledge base or due to misspellings. We have measured that 50% are due to misspellings. We designed, developed and assessed a custom spellchecker based on an n-gram model. We focused our efforts on a spellchecker for the English mode of Hala. We trained our system on our existing language corpus consisting of valid input queries making the spellchecker more specific to Hala. Finally, we adjusted the n in the n-gram model and evaluated the correctness of the spellchecker in the context of Hala. Our system makes use of the Hunspell, which is an engine that uses algorithm based on n-gram similarity, rule and dictionary based pronunciation data and morphological analysis. Misspelled words are passed through the Hunspell spellchecker and the output is a list of possible words. Utilizing the list of words, we apply our n-gram model algorithm to find which word is best suited in a particular context. The model calculates the conditional probability P(w|s) of a word w given the previous sequence of words s, that is, predicting the next word based on the preceding n-1 words. To assess the effectiveness of our system, we evaluate it using 5 different cases of misspelled word location. The poster shows our results, ‘correct’ indicates when the sentence is correctly spellchecked and ‘incorrect’ when the sentences did not change after passing through the spellchecker, or the sentences included transliterated Arabic, or were incorrectly spellchecked which resulted in loss of semantics. We observed that context makes the spellchecking of a sentence more sensible, which results is a higher hit rate in Hala’s knowledge base. For case 5, despite having more context than the previous cases, the hit rate is lower. This is because other sources of errors were introduced, such as the use of SMS languages or a mixture of English and Arabic. In our future work we would like to tackle the above-mentioned problems and also work on a Part-of-Speech tagging system that would help in correcting real-word mistakes.

MoM_Digest2014.indd 28

4/23/14 12:41 PM

Contextual Spellchecker to Improve Human Robot Interac?on Naassih Gopee & Majd Sakr Carnegie Mellon University Qatar {naassih,msakr}@cmu.edu

2. Background

1. Problem

•  Mul6-‐lingual human-‐robot interac6on introduces new challenges: -‐  Misspellings of input to robot could be due to mul6-‐lingual aspects, context, or translitera6on. •  We study how to improve the correctness of input queries by using a contextual spell checker.

4. Sta?s?cs of Interac?on for 1246 days[1]

3. Mo?va?on

•  •  •  •  •

•  Testbed: Hala, a bi-‐lingual robot recep6onist. •  Previous studies show: -‐ 26.3% of input queries are missed by Hala. -‐  50% of the laFer are due to misspellings. •  Standard spell checkers are ineﬀec6ve since they do not account for context. (e.g: Whre is the compter science facuty → Where is the copter science faculty)

Reduce user frustra6on due to misspellings. Account for misspellings in name en66es. Increase the query hit rate. Increase number of turns in an interac6on. Improve human-‐robot interac6on.

12%

Number of Interac?ons

Total Number of Queries 10%

English Arabic Mixed

85%

English Arabic 90%

October 2009-‐ February 2013

5. Approach and Experimental Setup

Experimental Setup How our contextual spellchecker works •  Classiﬁed 24,659 missed queries into 5 cases: •  Misspelled words are iden6ﬁed and passed to Hunspell: 1.  Single misspelled word sentence. •  Hunspell engine: uses an algorithm based on n-‐gram •  e.g: Helloo à hello similarity, as well as rule and dic6onary based on 2.  Misspelled word followed by a correct word. pronuncia6on data to provide a list of closest matching •  e.g: thsnk you à thank you correct words. 3.  Correct word followed by a misspelled word. •  To choose a correct word, we use the surrounding context: •  e.g: good mornig à good morning •  The n-‐gram model calculates the condi6onal probability 4.  Misspelled word located between 2 correct words P(w|s) of a word w given the previous sequence of words •  Evaluate bigram & trigram. s. •  e.g: are ypu beau6ful à are you beau6ful •  Corpus for the n-‐gram model used in this work is based 5.  Long sentence (3 < # of words < 7) on Hala’s current knowledge base of answers to •  Includes a sequence of Case 4 and Case 2. poten6al ques6ons. •  e.g: whate is youre job à what is your job •  Best matching word is then chosen for the given context. •  For each case, a sample of 100 queries were tested: Hunspell Identify •  Samples were then run through the spellchecker. computes Identify most Submitted list of misspelled suitable to Hala possible word •  Each sample was analyzed for the number of sentences word words correctly spellchecked. 6. Results (Case 1-‐5) Input Sentence

Misspelled Word

List of Possible Words

Spell checked sentence

•  The bar chart shows the percentage of sentences corrected ager passing through the contextual spellchecker. •  Case 1, no context is u6lized. •  Case 2, context resul6ng in a 10% increase in the number of corrected sentences. •  Case 3, more context than Case 2, corrected sentences increase by 16%. •  Cases 4 & 5 despite having more context, had a reduc6on in accuracy since other sources of errors are introduced. •  Increasing the n-‐gram did not improve accuracy in Case 4.

7. Conclusions

Case 5

Case 4 (Trigram)

Case 4 (Bigram)

•  Context enables more accurate spellchecking of a sentence which leads to an improved hit rate: -‐ 60% improvement in accuracy over no context (43% → 69%). •  Other mul6-‐lingual aspects lead to a reduced hit rate: -‐ Use of abbreviated, borrowed and informal terms. -‐ Mixing transliterated (Romanized form) Arabic expressions within English text.

Correct Case 3

Case 2

Case 1

43 0%

10%

20%

Incorrect

57 30%

40%

50%

60%

70%

80%

90%

100%

8. Future Work

•  Implement a Part-‐of-‐Speech tagging system that will enable the correc6on of real-‐word mistakes. •  Implement an Arabic spellchecker for Hala. •  Devise a solu6on for problems due to: •  Transliterated input. •  Abbreviated and informal input.

Reference: [1] Gopee, N., Mulaffer, L., Naguib, J., Sakr, M., & Ziadee, M. (2013). Interaction Analysis of a Multi-lingual Robot Receptionist. Carnegie Mellon University Qatar Meeting of the Minds Undergraduate Research Symposium, 19-20.

MoM_Digest2014.indd 29

4/23/14 12:41 PM

Enhancing Agent Gaze in Uncontrolled Environments Author Mahmoud Al-Ismail (CS 2016) Faculty Advisor Majd F. Sakr, Ph.D.

Category Computer Science

Abstract

Directing the gaze of a robot receptionist (Hala) towards the interlocutor under changing light conditions in uncontrolled environments presents a challenging problem. The difficulty lies in the severe changes of daylight during the day in our uncontrolled environment, which reduces the ability of a single RGB-D sensor to capture features to enable the identification of faces. In this work we evaluate the accuracy of a single RGB-D sensor coupled with computer vision techniques to identify faces and direct Hala’s gaze in real time under severe light change conditions. This problem has been addressed by two approaches, which detect faces based on indoor depth information using Microsoft’s Kinect sensor [1] and by using edge detection [2]. However, those solutions are inadequate for our setting as the Kinect’s depth sensor has a limited range. Further, [1] and [2] require a controlled environment where the interlocutors’ faces are well illuminated and have suitable non-varying backgrounds. For our approach, we utilize the Viola-Jones [3] algorithm and develop a set of configuration parameters that are suitable for our setting’s light changing conditions. Our Viola-Jones algorithm utilizes a fixed window size of pixels to detect faces captured by Microsoft Kinect’s RGB sensor. We evaluate the accuracy of our face detection system by capturing 90-second video frames of interlocutors over 13 time periods (8AM – 9PM) in three different locations in our uncontrolled environment as well as an augmented controlled environment. The controlled environment we designed blocks backlight illuminations and illuminates the interlocutors’ faces well. The results of comparing the system’s accuracy in controlled and uncontrolled environments suggest that utilizing a fixed window size of (84x84) pixels achieves real-time performance of processing 30 fps. However, the results also indicate a severe degradation in accuracy between 12PM and 5PM due to changes in light conditions in both environments. For our uncontrolled environment, the Viola-Jones algorithm with a single RGB-D sensor provides low accuracy during daylight hours. The results led us to three conclusions: (1) utilizing a single RGB-D sensor and state of the art computer vision techniques are ineffective at detecting faces under severe light changes, (2) blocking the backlight illuminations from the interlocutor’s face is not enough to improve accuracy between 12PM to 5PM, and (3) placing the sensor at different locations produces improved results for different time periods. In the future, we aim to evaluate the efficacy of other face detection algorithms for severe changes in light conditions and study the effectiveness of utilizing several RGB-D sensors placed at multiple locations. References [1] L. Xia, C.Chen, and J. Aggarwal, Human Detection Using Depth Information by Kinect, Proc. Int. Work. On Human Activity] Understanding from 3D (HAU3D), pp. 22-15, June 2011 [2] R. Louban. Image Processing of Edge and Surface Defects Theoretical Basis 2009 [3] P. Viola and M. Jones, Robust Real-Time Face Detection. 2001

MoM_Digest2014.indd 30

4/23/14 12:41 PM

Enhancing Agent Gaze In Uncontrolled Environments Mahmoud Al-Ismail

Majd Sakr

mahmoudi@cmu.edu

msakr@cmu.edu

Problem

• Improve natural interaction with robots in everyday settings (e.g. robot receptionist) • Direct the gaze of a robot receptionist (Hala) towards the interlocutor in real time • Allow the robot to use gaze and face gestures to improve the interaction

• Hala is placed in an uncontrolled environment (Figure on the right) • Severe changes of daylight during the day make the problem of identifying faces challenging • Evaluate the accuracy of face detection with a low cost sensor and existing face detection techniques in our uncontrolled environment

9:00 AM

11:00 AM

12:30 PM

17:00 PM

20:00 PM

Motivation

• Direct the gaze of a robot receptionist towards the interlocutor • Enable a robot receptionist to sense its environment under changing conditions • Mitigate the accuracy implications of changes in lighting and shadows • State of the art computer vision techniques face difficulties in such environments

2:00 PM

Uncontrolled environment during the day

Existing Solutions

Our Solution

• Detecting people based on indoor depth information using Kinect[1]

• Works well in uncontrolled environments but Kinect`s depth sensor range is inadequate for our setting

• Face detection using edge detection[2]

• Requires a controlled environment where the interlocutor`s face is well-illuminated and has a suitable background

• Evaluate the accuracy of a single RGB-D sensor with the ViolaJones real-time face detection algorithm[3]

• Identify the light change conditions for our setting • Develop a set of parameters for Viola-Jones that are suitable for our environment • Identify a small fixed window size designed to detect the interlocutor`s face • Evaluate and compare accuracy in a controlled environment (right) and an uncontrolled environment (above) • Compare the impact of the location of the RGB-D sensor on the accuracy of face detection • Achieve real-time face detection (performance of 30 fps)

Camera View of Controlled Environment

Evaluation

• Capture 90 second video frames of the interlocutors over 13 time periods (8AM - 9PM) • 60 minute period (8AM - 12PM, 2PM - 5PM, and 7PM - 9PM) • 30 minute period (12PM - 2PM and 5PM - 7PM) • Data collected in the month of July

• Capture the video in 3 different locations to evaluate accuracy • Evaluate the accuracy of the face detection system in a controlled environment by blocking backlight illuminations and illuminating the interlocutor`s face • Evaluate the accuracy of the face detection system under our uncontrolled environment setting • Compare the accuracy of the face detection system for controlled and uncontrolled environments for the 3 locations of the RGB-D sensor

Location 1

Location 2

Location 3

Placement of the Kinect

Results

Controlled Environment For All Locations

• Use a fixed window of size (84x84) pixels to detect the interlocutor`s face to achieve realtime face detection

100.00% 90.00%

• Reduce computational load • Reduce time to detect the interlocutor`s face per frame • Location 1 and 2 are far superior to Location 3

- In location 3, the RGB-D sensor is subjected to more daylight making it ineffective at capturing facial features

• Degradation in accuracy between 12PM and 5PM due to severe changes in light conditions • Experience better detection accuracy in the morning period than other periods

• Under uncontrolled environment:

• The Viola-Jones algorithm with a single RGB-D sensor provided low accuracy during daylight hours • Different sensor locations produce different accuracies for different time periods • Larger window sizes and tuning Viola-Jones parameters did not improve performance

Detector Accuracy

80.00%

• Under controlled environment:

70.00% 60.00% Cont-ENV Location 1

50.00%

Cont-ENV Location 2

40.00%

Cont-ENV Location 3

30.00% 20.00% 10.00% 0.00%

Time of the day

100.00%

90.00%

80.00%

70.00%

60.00% Cont-ENV Location 1

50.00%

Uncont-ENV Location 1

40.00% 30.00% 20.00% 10.00% 0.00%

Time of the day

Conclusions

60.00% Cont-ENV Location 2

50.00%

Uncont-ENV Location 2

40.00% 30.00%

Detector Accuracy

100.00%

Detector Accuracy

Comparison of Controlled and Uncontrolled Environment Location 3

Comparison of Controlled and Uncontrolled Environment Location 2

Comparison of Controlled and Uncontrolled Environment Location 1

70.00% 60.00%

Uncont-ENV Location 3

40.00% 30.00%

20.00%

10.00%

0.00%

Time of the day

• A single RGB-D sensor and a standard computer vision technique are ineffective at detecting faces in uncontrolled conditions with severe light changes • Given our challenging environment, controlling the lighting conditions by blocking backlight illumination from the interlocutor`s face is not enough to improve the accuracy during severe light changes between 12PM - 5PM • Placing the sensor at different locations produces improved results for different time periods • Face detection in uncontrolled environments with severe lighting conditions is a very challenging problem

Cont-ENV Location 3

50.00%

Future Work

• Evaluate the efficacy of other face detection algorithms for severe changes in light conditions • Evaluate the accuracy of other inexpensive RGB-D sensors • Study the effectiveness of utilizing several RGB-D sensors placed at multiple locations

References

• [1] L. Xia, C. Chen, and J. Aggarwal, Human Detection Using Depth Information by Kinect, Proc. Int. Work. on Human Activity Understanding from 3D (HAU3D), pp. 22-15, June 2011 • [2] R. Louban. Image Processing of Edge and Surface Defects Theoretical Basis of Adaptive Algorithms with Numerous Practical Applications, volume 123, chapter Edge Detection, pages 29–9. Springer Berlin Hei- delberg, 2009 • [3] P. Viola and M. Jones, Robust Real-Time Face Detection. 2001

Funded and Sponsored by QSIURP

MoM_Digest2014.indd 31

4/23/14 12:41 PM

Seeing is Learning: Accessible Technologies for Universal Learning Author Kenrick Fernandes (CS 2014)

Faculty Advosor Divakaran Liginlal, Ph.D.

Category Computer Science

Abstract Qatari citizens and residents are increasingly reliant on information and communication technology services such as Hukoomi and e-banking to acquire critical information and conduct commercial and personal transactions. The related end-user applications are usually designed with the average computer-literate user in mind. However, there are two key factors that inhibit their use in the Qatari context: disabled people and an aging population with functional disabilities. The Qatar Statistics Authority revealed that the number of persons with disabilities grew 23% during the 2007-2009 period and the number of elderly (65+) persons doubled in the last decade. Local organizations such as ictQatar and MADA, Qatarâ&#x20AC;&#x2122;s Accessibility Centre, are working to address this issue at both policy and technology levels. This research work showcases a framework for effectively integrating assistive technologies such as gazetracking to provide an intuitive, easily navigable experience for the above-mentioned segments of the population. We designed two prototypes using state-of-the-art gaze tracking technology to demonstrate how the framework can be implemented for the specific tasks of learning the Arabic language and music. We plan to evaluate these prototypes with students first, simulating real constraints that the applications would face. Further iterations will be tested in field, such as in a Hukoomi session, in collaboration with MADA. Our framework can be applied generally for creating assisted experiences in a constrained virtual space with online e-commerce interactions, desktop software and other gaze-reliant human-computer interactions.

MoM_Digest2014.indd 32

4/23/14 12:41 PM

SEEING IS LEARNING:

ACCESSIBLE TECHNOLOGIES FOR UNIVERSAL LEARNING

Student

Kenrick Fernandes CS 2014 kenrick@cmu.edu

Faculty Advisor

Divakaran Liginlal IS Department liginlal@cmu.edu

Qatar : ICT Leader

Our Framework

Qatar :

We designed an out-of-the-box framework in C# using Microsoft VIsual Studio. The framework interfaces with the Tobii REX Eyetracker to create a gaze-controlled application session. The application’s functionality is content independent and can easily be modified for other contexts.

• Is 1st in GDP/capita worldwide • Internet Penetration Rate ~ 90% • Mobile Penetration Rate > 166% • Qatar ranked 27th of 193 in UN E-Government Survey • Hukoomi used by > 25% of the population • 101 personal computing devices per 100 employees in government • Major telecom and banking services available online

Functional Framework Eyetracker detects and sends gaze coordinates to application Current gaze point

User selects onscreen controls with gaze. Point of gaze is drawn on screen

Visually accessible, interactive controls

Qatar’s Accessibility Challenges

Learning Arabic and Music

1. Disabled Persons

We applied the framework in the contexts of learning the Arabic alphabet and learning music. Gaze-selected controls on the screen can be “pressed” via a keyboard interaction. Pressing a control plays the sound for that control. The gaze point is drawn on the screen at all times so the user is aware of his/her own “location”.

• Approximately 12% of the population facing a disability of some kind • Number of disabled persons grew 23% between 2007 and 2009 • Non-disclosure still a major challenge; numbers are an under-estimate.

2. Ageing Population • Qatari population living longer due to high quality healthcare • 2010 census showed numbers of the elderly (65 and above) doubled in the last decade. • Functional disabilities are a concern among this segment Most services are developed for the fully functional, computer literate end-user. There is a need to improve the accessibility of these applications for these segments of the Qatari population. Eye-tracking is a widely used technology for making applications more accessible on a personal level.

No Button Selected

Gaze-Selected Button

Highlighting the selected control is location-dependent. If the user has to look elsewhere to confirm the selection, the application has failed to serve its purpose. With the Seeing is Learning framework, users with functional disabilities can interact efficiently and intuitively with location-dependent applications.

Future Work A demo of the application is below try it out

SourceS 1. Qatar Statistics Authority 2. MADA, Qatar’s Accessibility Centre

MoM_Digest2014.indd 33

A key area for improvement lies in eliminating the need for other input devices like the keyboard. This also requires robust algorithms to detect the difference between input noise and intended action. Such algorithms will depend on training data from specific user profiles and must be tested in-field. User testing with students in a controlled environment has already been carried out to refine the user experience. We intend to conduct In-field user testing next, building up to deployment. This testing will be carried out in collaboration with subject matter experts including ictQatar and MADA.

4/23/14 12:41 PM

Supercharging Hadoop for Efficient Big Data Analytics Author Kenrick Fernandes (CS 2014)

Faculty Advisor Mohammad Hammoud, Ph.D.

Category Computer Science

Abstract MapReduce is now a pervasive analytics engine for Big Data. Hadoop is an open source implementation of MapReduce and is currently enjoying wide popularity. Hadoop offers a large number of configuration parameters, which makes it difficult for practitioners to conduct efficient and cost-effective analytics on the cloud. In this work we observe that MapReduce performance is highly impacted by the concurrency of the map phase. Consequently, we propose a standalone predictor that supercharges Hadoop by guiding map concurrency configuration and expediting MapReduce performance. More precisely, our predictor can accurately estimate the optimal settings of the three main factors that dictate map concurrency, the HDFS block, the dataset and the cluster sizes. A user simply provides minimal information about her/his workload and our predictor can rapidly offer suggestions for optimally configuring map concurrency for the workload. Unlike many of the related schemes, our predictor does not employ simulation, dynamic instrumentation, and/or static analysis of unmodified MapReduce code to achieve its objectives. In contrast, it relies on a mathematical model which leverages internal MapReduce characteristics that influence map concurrency. We have implemented our predictor and conducted comprehensive experiments on a private cloud and on Amazon EC2 using Hadoop 1.2.0. Our results show that our scheme can correctly predict the best map concurrency configurations for the tested benchmarks and provide users with up to 2.2X speedup in runtime.

MoM_Digest2014.indd 34

4/23/14 12:41 PM

Supercharging Hadoop for Efficient Big Data Analytics Kenrick Fernandes and Mohammad Hammoud {kenrick,mhhamoud}@cmu.edu

Problem

Hadoop MapReduce Hadoop MapReduce is now a pervasive cloud analytics engine. The growing list of users currently includes mammoths like Google, Amazon and Ebay.

■ Hadoop has > 190 configurable parameters, out of which 10-20 have significant impact on job performance ■ The challenge is to: 1. Run MapReduce applications economically 2. While still achieving good performance ■ This challenge can be tackled by effectively configuring Hadoop, currently a burden on Hadoop users.

Our Focus

Characterizing Map Concurrency

Our work focuses on 2 influential parameters ■

■ The numbers of Map tasks and slots define Map concurrency ■

■

■ What are the tradeoffs as map concurrency is varied? ■ As the number of Map waves is increased: ■ Map Setup Time increases - Cost ■ Data Shuffling starts earlier (i.e., earlier Early Shuffle) - Opportunity

Mathematical Model

Objective MapReduce Job Dataset

Our Predictor

The Best Map Concurrency in terms of ■ the best HDFS block size for in given cluster & dataset sizes microseconds ■ or the best cluster size for given HDFS block and dataset sizes ■ or the best dataset size for given HDFS block and cluster sizes

Supercharging Hadoop : Applying Mathematical Model We can use our mathematical model to predict the best map concurrency for any given MapReduce application by:

Shuffle Data Shuffle Rate Single Map Wave Time MST

Our Predictor

in microseconds

■ Fixing all model’s factors except the # of Map Waves ■ Measuring Runtime for a range of Map Wave numbers

Reduce Time Initial Map (Apply Mathematical Model) Slots Number

■ Selecting the minimum Runtime

Conclusion

Results HDFS Block Size (KMeans Benchmark)

Cluster Size (WordCount Benchmark)

Dataset Size (WordCount Benchmark)

■ Map concurrency impacts MapReduce performance tremendously ■ To predict map concurrency, we developed a mathematical model that exploits two main MapReduce characteristics, data shuffling and map setup time ■ Our model works successfully on a private cloud and on Amazon EC2

MoM_Digest2014.indd 35

4/23/14 12:41 PM

Descriptive Minicomplexity Authors F. Lamana Mulaffer (CS 2015)

Faculty Advisor Christos A. Kapoutsis, Ph.D.

Category Computer Science

Abstract

Minicomplexity theory is a relatively new parallel to the standard complexity theory. In both theories, we analyze the complexity of theoretical computational models. In standard complexity theory, the computational model we use is the Turing Machine and we measure the time it takes to solve a problem. In Minicomplexity theory, the computational model we use is the Finite Automaton (FA) and we measure the size, namely the number of states, it requires to solve a problem. This research is on Descriptive Minicomplexity. Descriptive Minicomplexity is the term used when defining logical formulae (“descriptors”) that is equivalent to the computational model that we are considering, in our case, FA. A descriptor is equivalent to a FA if and only if it describes exactly the strings accepted by the FA. In our research we wanted to answer the following two questions: Given a “small” FA of some type, can we build a “small” equivalent formula? Additionally, is this process reversible? We chose to develop descriptors for two-way nondeterministic finite automata (2NFA). The descriptors are called the graph accessibility formulae in DNF (GA/DNF). These are formulae in First-Order logic with Successor plus Transitive Closure (FO[S]+TC). A GA/DNF is a formula that has the following structure: TC[Φ](a, b). This formula is true if and only if there is a path from vertex a to vertex b in the graph described by φ(x, y). We were able to prove the following theorem: Theorem 1. For every family of problems L: L has small 2 NFAs if and only if L has small and “weak” GA/DNFs. A 2NFA is small if it has a polynomial number of states. A GA/DNF is small if it has a polynomial number of symbols. It is “weak” if it obeys some restrictions that have to do with what kind of clauses we can include in the formula when talking about certain positions in the string.

MoM_Digest2014.indd 36

4/23/14 12:41 PM

Descriptive Minicomplexity

F. Lamana Mulaffer (CS 2015) Christos A. Kapoutsis (Adviser)

Context: Computational Complexity vs. Descriptive Complexity

Problems can be solved either by algorithms or by logical formulae. Example: Imagine five policemen were assigned to guard all roads. Are there five roundabouts where they can stand at and achieve this?

Abstractly, given a graph G = (V,E ) and a number k, are there k vertices that cover all edges?

for every set S of k vertices: for every edge e: if e is touched by S: mark e if all edges are marked: return TRUE return FALSE

Researchers have established equivalences between some types of algorithms and some types of logical formulae. Example: NP = ƎSO (Fagin’s Theorem).

Problem: Nondeterministic Read-Only Algorithms

In our research, we focused on a special kind of algorithms: “Small” Non-deterministic Read-Only Algorithms. We asked what kind of formulae are equivalent to these algorithms. We identifed such a class and proved the equivalence. Example: Imagine that many roads in West Bay are blocked due to construction. Is there a way to enter from ENTRANCE and exit from EXIT?

Abstractly, given a graph G = (V,E ) and vertices s and t, is there a path from s to t in G ?

start at v=s repeat: if v=t: return TRUE if v has no neighbours: return FALSE nondeterministically: select a neighbour u set v to u

What kinds of formulae are equivalent to these algorithms?

Solution: 2Way Nondeterministic Finite Automata vs. 1D Graph-Accessibility DNF We modelled NRO Algorithms using 2NFAs.

Every 2NFA with k states can be converted to a 1D GADNF of length polynomial in k.

A 1D GADNF is a formula of the following form:

2NFA

A 2NFA encodes the problem in a string and determines the solution by scanning along the string multiple times.

Every 1D GADNF of length k can be converted to a 2NFA where the number of states is polynomial in k.

The formula is true iff there’s a path from vertex +1 to vertex -1 in the graph described by φ.

Future research includes finding formulae that are equivalent to Deterministic Algorithms.

MoM_Digest2014.indd 37

4/23/14 12:41 PM

Is an Accessible Website a More Usable One? Author Sarah Mustafa (IS 2014)

Faculty Advisor Selma Limam Mansar, Ph.D.

Category Information Systems

Abstract The effect of the World Wide Web is noticeable in organizations, businesses, societies, and individuals. When websites are evaluated, two important website qualities are examined: accessibility and usability. E-accessibility is â&#x20AC;&#x153;a measure of the extent to which a product or service can be used by a person with a disability as effectively as it can be used by a person without that disability for purposes of accessing or using ICT related products or servicesâ&#x20AC;? (Qatarâ&#x20AC;&#x2122;s e-Accessibility Policy, 2011). Several studies have explored the accessibility status of websites and studied the effect of accessible websites on users, especially users with special needs. Many tools have been developed to automatically assess accessibility. It remains difficult to convince organizations to make their websites accessible. The authors are motivated to explore if accessibility makes websites more usable. If such a claim turns out to be true, it will become economically persuading to convince website owners to invest in accessibility. This study focuses on the education sector because universities and schools have a social responsibility to serve students and faculty with special needs and provide them with the same level of education as other abled people. We present a methodology to study the impact of website accessibility on its usability through a 5 stage methodology. A total of four tools will be used to assess both accessibility and usability of educational websites. Then, a framework will be produced to act as a single tool for website usability measurement.

MoM_Digest2014.indd 38

4/23/14 12:41 PM

Methodology

Security

Rank all websites from Most Usable to Least Usable based on the Usability Score

- We have proposed a framework for comparing websites’ accessibility status.

3- Usability Assessment Interface

Future Work

Website’s Usability Score

2- Find Correlation between accessibility and usability

Analysis (PCA)

Principal Component

- We have proposed a framework for measuring and testing a website’s usability.

Data

Collect

Compare two ranked lists and find correlation between accessibility and usability

Find correlation between accessibility and usability

Stage 5:

Rank all websites from Most Accessible to Least Accessible based on the Accessibility Score

1- Complete Usability Framework

Attractiveness

Score

Website’s Accessibility Score

cessibility performance.

Questions It is easy to identify where am I in the website Clickable items stylistically indicate that they are clickable Links labeled with anchor text that provides a clear indication of where they lead Global navigations are on every page The use of colors is balanced in the page The page layout is structured and symmetrical Media (photos, videos, and audio) is well used

0.2

0.1

0.2

Weight (w) 0.3

framework

Perform usability assessment through usability

Stage 4:

- The overall goal of this thesis was to propose a method for comparing a website’s usability performance to its ac-

Contribution

Criteria Navigability

http://wave.webaim.org

Theory (MAUT)

Multi Attribute Utility

Language Check Color contrast is below the minimum ratio of 4.5:1 Use of tables for webpage layout Missing ALT text for linkable images Nested heading elements check Other elements

Element Check

Produce unified usability framework

and Color Contrast Analyser (The Paciello group)

Stage 3:

Assess accessibility through 2 tools: WAVE (WebAIM)

there is no standardized and unified framework for measuring a website’s usability

methods to evaluate usability (e.g. QUIM model by Seffah, Donyaee, & Klein, 2006),

users evaluation, and can we develop heuristics to test them?

relevant to our study? Which ones can be quantitatively measured and how? Which ones need

- While there are standards for web usability (e.g. ISO 92410-11, 1998), and tools and

- Usability: For each one of the factors and criteria used in the QUIM model, which ones are

sibility

ments equally important?

sibility study? And how can the results be turned into a score to rank a website? Are all the ele-

- Accessibility: For each one of the standards, which tools should be used to conduct an acces-

Research Question

cessibility, there is no one method that allows to score or rank websites based on acces-

Stage 2:

Usability Factors and Criterias in QUIM (Seffah, Donyaee, & Klein, 2006)

Loading Time

Familiarity

Insurance

Privacy

Simplicity

Controllability

Navigability

Readability

Fault-tolerance

Productivity

Factors

Safety

Resource safety

Completeness

Accuracy

Feedback

Self-descriptiveness

Consistency

Minimal Action

Minimal memory load

User Guidance

Operability

Flexibility

Likeability

Resources Utilization

Satisfaction

Attractiveness

Time Behavior

Criteria

Usability Framework

Effectivness

Frameworks Outcome

Efficiency

Accessibility Framework

Learnability

Select related themes websites for assessments

Usefulness

Stage 1:

Trustfulness

to invest in making their websites more accessible to gain in usability.

is a more usable one? If such claim appears to be true, website owners might be willing

This thesis investigates whether it is possible to demonstrate that an accessible website

usability testing.

traffic on their sites. There is hardly any effort needed to convince website owners to invest in

On the other hand, website owners seek to make their websites usable to increase interest and

ers is proving to be a challenge.

site owners to make their websites accessible. Until laws are passed, convincing website own-

Accessibility

- While there are standards, guidelines, (e.g. WCAG 2.0), and assessment tools for ac-

ranking of a website’s usability score to that of its accessibility.

Web. Governments, including Qatar (MADA, 2013) have been pushing organizations and web-

To demonstrate if an accessible website is a more usable one, we propose to compare the

bilities. The Web accessibility Initiative (W3C, 2005) sets up guidelines, and standards for the

Problem

Accessible websites are ones that disabled people are able to interact with despite their disa-

Background

Sarah Mustafa . smustafa@qatar.cmu.edu . Advisor: Selma Limam Mansar . Honors Thesis in Informations Systems, Dietrich College . April, 2014

Is an Accessible Website a More Usable One?

Universality

MoM_Digest2014.indd 39

4/23/14 12:41 PM

Effect of Website Localization on Impulse Buying Behavior of Arab Shoppers Author Noora Al-Maslamani (IS 2014)

Faculty Advisor Divakaran Liginlal, Ph.D.

Category Information Systems

Abstract This research work studies online impulse buying behavior of Arab shoppers on e-commerce websites in the Middle East. Specifically, it focuses on culturally attuned e-commerce websites and their potential effects on impulse buying behaviors. The influence of cultural cues is significant and we believe they are more likely to drive Arab consumers to buy impulsively from these e-commerce websites. A total of 32 university students of Qatari and other Arab nationalities participated in the study, which aims to understand the influence of perceived usefulness, perceived enjoyment, perceived cultural affinity and their relation to a consumer’s urge to buy impulsively on e-commerce websites. A questionnaire primarily based on established measures from extant literature was administered to participants to understand their perceptions of the design of a preselected set of websites. The results reveal that differences in websites’ localization levels do not significantly affect their impulse buying behavior. A prominent reason might be that the websites’ sample was selected from existing e-commerce websites, which might not be effectively constructed to stimulate Arab consumers’ impulse buying behavior. A further qualitative study conducted on these websites based on participant observations and a tool that predicts the areas of interest to a user provides better explanation for this. Future research will aim to use stimuli materials that are purposefully constructed to reflect different levels of localization.

MoM_Digest2014.indd 40

4/23/14 12:41 PM

The Effect of Website Localization on Impulse Buying Behavior of Arab Shoppers Impulse buying is any purchase that is not planned in advance. Website localization implies that the website embraces cultural markers that characterize a certain culture.

Research Method 32 University students from Arab nationalities Random Arabic e-commerce websites were presented

Research Objectives

Three different levels of localization for the websites

This research focuses on the importance of integrating cultural cues into the design of Arabic e-commerce websites to enhance Arab shoppersâ&#x20AC;&#x2122; impulse buying behavior.

Hypotheses, Model & Results H1: Perceived Cultural Affinity will affect Perceived Usefulness positively. H2: Perceived Cultural Affinity will affect Perceived Enjoyment positively. H3: Perceived Cultural Affinity will affect Urge to Buy Impulsively positively.

Perceived

Perceived Cultural

Usefulness

Affinity

A questionnaire was administered to participants Quantitative and qualitative results were analyzed

Cross-validation Study EyeQuant was used to evaluate the websites at three levels of localization (low, medium, high). The Areas of Interest (AOI), which are the focal points of user eye activity were used as a metric for comparison with the results of the main study.

Low Perceived

Urge to Buy

Enjoyment

Impulsively

Supported

Not supported

New findings

Conclusions

The presence of cultural cues do not necessarily induce an impulse urge to buy among Arab shoppers. Likely cause: Not all of the cultural elements on the websites can be easily and immediately seen by users and only a few of them are recognized by EyeQuant.

Medium

High

Average AOIs Cultural Elements

Cultural Elements in AOIs

Future Research

Experimental studies are necessary to measure online impulse buying behavior of Arab shoppers on e-commerce websites specifically customized to stimulate impulse buying behavior. Note: The samples selected from existing e-commerce websites might not have been effective at stimulating impulse buying behavior.

Noora Al-Maslamani nmaslama@qatar.cmu.edu Advisor: Prof. Divakaran Liginlal, Information Systems

MoM_Digest2014.indd 41

4/23/14 12:41 PM

On the Relevance of Cultural Intelligence for Technology Acceptance Authors Muhammad Jaasim Polin (IS 2014)

Faculty Advisor Selma Limam Mansar, Ph.D.

Category Information Systems

Abstract Technology acceptance has been widely studied in the field of information systems (IS) (Venkatesh et al., 2003) and several theories have been developed explaining behavioral characteristics of individuals in accepting information technology. With the growth of globalization, technology is overcoming cultural barriers and is no longer isolated in usage and design. In the context of technology usage, technologies are being developed to be used globally (Gregory et al., 2003). Studies have been conducted to explain how culture affects technology acceptance, but most are focused on the individualâ&#x20AC;&#x2122;s national culture. We were not able to find studies that link openness to other cultures (often referred as â&#x20AC;&#x153;cultural intelligenceâ&#x20AC;? (CQ)) to technology acceptance. This study explores the concept of cultural intelligence and its impact on technology acceptance. A survey consisting of the Unified Theory of Acceptance and Use of Technology and the CQ-Scale is presented to explore the impact of CQ on technology acceptance, if any. The research intends to investigate various technologies that are being used in a higher education campus, how they are being accepted by students, and the affect of cultural intelligence, if any, on the acceptance rate. In the context of ensuring successful technology adoption, this research will give recommendations to organizations and institutions to provide the appropriate environment and training.

MoM_Digest2014.indd 42

4/23/14 12:41 PM

On the relevance of cultural intelligence for technology acceptance Introduction

Information technology tools are used globally by institutions and organizations irrespective of culture and in various professional settings. Studies have been conducted to understand the cultural effects on the use of technology and its development (Linjun et al., 2003). According to Strite (2006), technology acceptance across cultures is an important factor to reap IT benefits in modern globalized businesses or organizations. Hence, the ability to cope with cross-cultural differences (Cultural Intelligence)[CQ] is becoming a vital aptitude in modern-day organizations.

Problem Statement • Organizations cannot guarantee maximum efficiency of technologies, although implementation can be successful (Zakour, 2004) • The effects of culture on technology acceptance have been widely studied, however the effects of cultural intelligence on technology acceptance have not been covered.

Technology and Cultural Intelligence

Culture vs. Cultural Intelligence • Culture: Patterned ways in which people think, feel, and reach to various situations and actions; • Cultural Intelligence(CQ) is the capability of an individual to function and manage effectively in culturally diverse settings (Ang et al. 2008). It is not fixed and not related to any particular culture.

WHY

Technology Acceptance • Relies on ‘intention to use’ and ‘actual use’ of technology • Two most widely used and validated models in literature: TAM and UTAUT (Unified Theory of Acceptance and Use of Technology)

Cultural Intelligence: CQ-Scale (Soon Ang et al., 2007)

Technology Acceptance: UTAUT (Venkatesh et al., 2003)

Metacognitive CQ

Performance Expectancy

Cognitive CQ

Effort Expectancy CQ-Score

Behavioral Intention

Cultural Intelligence

Motivational CQ

Social Influence

Behavioral CQ

Facilitating Conditions

Gender

Age

Use Behavior

Voluntariness

Moderating Variables

*CQ:Cultural Intelligence

Results 36

Participants

Participant Characteristics 1. CMUQ Students taking specific courses* 2. Age 18 and above 3. Study Period: 20/03/14 - 01/04/14

61% Female

39% Male

Cultural Intelligence

Methods 1. Survey by Van Dyne et al.(2007) to assess level of cultural intelligence 2. Survey based on UTAUT theoretical framework by Venkatesh et al. (2003)

Technology Acceptance

Technologies Studied

CQ Survey Summary 1. Average CQ-score: 20/28 2. Highest CQ-score: 27/28 3. Lowest CQ-score: 11/28 4. Model Reliability: 0.93 (Cronbach)

1. Tablet PC (iPad) 2. Project Management Software (Redbooth) 3. ERP Software (SAP) 4. Accessibility Software (JAWS)

0.22 Pearson Correlation

Most Correlated Constructs

• CQ mostly correlated with Facilitating Conditions and Social Influence of UTAUT • Metacognitive CQ and Motivational CQ moderately correlated with Social Influence and Facilitating Conditions

• An overall cultural intelligence score has a weak* but positive relevance to technology acceptance (Pearson Correlation: 0.2) • An overall CQ-score had a particular influence on Social Influence (SI) and Facilitating Conditions (FC) of the UTAUT

Conclusions

Limitations

This study shows :

• Cultural Intelligence has a positive relationship to Technology Acceptance • A person with higher cultural intelligence will perceive that a technology is useful, especially if he or she is influenced by other people in his or her environment • Acceptance of technology depends considerably on the social environment of the individual

• Results should be taken within the context of the study and considering the small sample size (N=36) • Soundness of answers can be doubting as well: University students are already familiar with technologies.

Future Directions • The study can be repeated with different environment and settings: • For e.g: Technologies that are completely new to participants • Culture or social group of indivduals could have been studied

Muhammad Jaasim Polin | jaasim@cmu.edu Faculty Advisor: Selma Limam Mansar April 2014

MoM_Digest2014.indd 43

4/23/14 12:41 PM

Flipped Learning for Educational Content Delivery: The Case of Introductory Programming Courses Author Haya Thowfeek (IS 2014)

Faculty Advisors Selma Limam Mansar, Ph.D.

Category Information Systems

Abstract Modern technology has undeniably increased the ease of access to technical knowledge for students. It has also affected the learning process, creating an environment where students no longer need to rely solely on their instructors to aid their learning or learn from online resources, but also learn from their peers. Studies have shown that Flipped Learning, a concept where the traditional lectured classroom is inverted so the students themselves become peer instructors has become a successful learning methodology. This Honors Thesis describes a study that aims to evaluate how different technologies influence and support the Flipped Learning methodology when applied in the context of programming with Python. The purpose of this study is to test the use of different technologies used to facilitate a peer instruction environment as an effective method of educational content delivery in a classroom in the context of flipped learning.

MoM_Digest2014.indd 44

4/23/14 12:41 PM

FLIPPED LEARNING FOR EDUCATIONAL CONTENT DELIVERY

THE CASE OF INTRODUCTORY PROGRAMMING COURSES

RESEARCH QUESTION

HYPOTHESIS

Flipped Learning is a concept whereby students learn subject content outside of the classroom and use lecture time to participate in activities that enhance their learning in groups via Peer instruction.

Peer Instruction is effective to teach programming concepts.

Is Flipped Learning effective in the context of introductory programming? and what types of technologies are effective are effective to support the Peer Instruction process?

RESEARCH METHOD Question individual vote (1)

Small group discussion

Some technologies are able to effectively support peer instruction better than others.

SUBJECTS

Question individual revote (1R)

Isomorphic question individual vote (1.2)

15-110 Intro. to Programming

Classroom discussion

RESEARCH EXPERIMENTS & RESULTS Session 1a

Session 1b

iClickers While Loops ~40 students Larger cluster

iClickers While Loops ~20 students Smaller cluster

% Students who were correct

Freshman - 60 enrolled students Spring 2014

SURVEY DATA

Students were asked to fill an online questionnaire at the end of each session and were also asked to fill out a final questionnaire at the end of the study about the things they liked. The tabulated results are as follows: Liked the technology used at the sessions.

Claimed that it was an improvement from regular class.

Enjoyed the process of PI and having their friends help to learn.

Session 3

Session 2

LectureTools Advanced Recursion ~11 students

Socrative Basic Recursion ~30 students

Claimed to like the process used during the sessions.

RESULTS

Peer Instruction is effective but students have to be exposed to material by an expert prior to a session. Students indicated no preference for specifc tools but instructor preferred LectureTools. Flipped learning augments learning but cannot replace role of expert.

HAYA THOWFEEK hthowfee@qatar.cmu.edu

Advisor: Selma Limam Mansar, Ph.D Information Systems

Dietrich College of Humanities and Social

Co-investigator: Davide Fossati, Ph.D

Sciences Senior Honors Thesis submission.

MoM_Digest2014.indd 45

Computer Science

4/23/14 12:41 PM

Evaluating the Use of Emerging Technologies in Education Authors Daniel Cheweiky (IS 2014)

Faculty Advisor Divakaran Liginlal, Ph.D.

Category Information Systems

Abstract This research aims to study the use of augmented reality (AR) in education with particular focus on integrating augmented reality into textbooks designed for children. The research method involves comparing how students learn with augmented reality text books to how students learn with regular print textbooks. Thirty-three children between the ages of 7 to 12 years, selected through snowball sampling, were recruited for the study. Each participant was asked to read a book on dinosaurs both in print media and in augmented reality form and the corresponding learning process was evaluated with the Fun Toolkit. The results suggest that children are more satisfied and engaged with the content material in AR-text books in comparison with traditional methods of learning using printed books. However, in terms of the ability to recall what the children learned, the results do not conclusively establish the superiority of AR textbooks. This highlights the need for further research with a variety of learning materials based on AR. In summary, this research provides deep analyses and important insights into what makes an emerging technology such as augmented reality a better medium for educating young children.

MoM_Digest2014.indd 46

4/23/14 12:41 PM

Evaluating the Use of Emerging Technologies in Education Daniel Cheweiky Carnegie Mellon University Cheweiky@cmu.edu

RESEARCH QUESTION Whether AR will provide a better medium for education than the traditional method of education, which is reading textbooks. Methodology

Research Significance

Compared the two methods in terms of three aspects (Satisfaction , Engagement and Endurability)

- Understand the key determinants of enhanced learning with Augmented reality . - Compare the effectiveness of Augmented Realitybased textbooks with traditional e-books in stimulating learning among children.

Research Participants

Again-Again table (Satisfaction)

Target audience is children between the ages of 7 to 12. 33 participants were recruited for this research.

Research Model

Smileyometer (Satisfaction)

Results

- Endurability is an indicator of what an individual remembers from an experiment

3.7 T-test was conducted to compare the satisfaction level resulting significance level of 0.00002 which indicates with 95% confidence that the satisfaction levels are different Satisfaction - Studying through AR generated an average of 4.5, but studying through regular educational books generated an average of 3.7. Engagement - Participants became more interactive with the material and looked forward to the next page to learn more. Endurbaility

- Engagement determines whether an individual is interested in a certain technology - Satisfaction represent how satisfied an individual is with using a certain technology

Conclusion Augmented Reality resulted in more Satisfaction (Change in percentage 74% to 90%) over traditional books. Students got more engaged with the material when it came to Augmented Reality. As a result we can say that Augmented Reality results in more satisfaction and engagement over regular traditional books

- Images of the dinosaurs stayed in participantsâ&#x20AC;&#x2122; memories more when they studied using AR than with regular textbooks.

MoM_Digest2014.indd 47

4/23/14 12:41 PM

Studying the Sociotechnical Barriers to Using Augmented Reality Technologies for E-commerce Authors Afrah Hassan (IS 2014)

Faculty Advisor Divakaran Liginlal, Ph.D.

Category Information Systems

Abstract: This research examines the sociotechnical barriers which impact purchasing behavior at an e-commerce website that uses augmented reality technologies. The sociotechnical barriers investigated in the study includes perceived risks, privacy concerns, trust, and perceived usefulness. The first phase of the research involved analyzing from a sociotechnical systems perspective the Ray-Ban virtual mirror, an augmented reality (AR) technology used for e-commerce. The insights gained from the preliminary study helped to design an experiment involving an ecommerce-shopping scenario on the Ray-Ban website. The experimental study evaluated the behavior of 30 e-shoppers through observations and administering a questionnaire designed to measure the sociotechnical barriers associated with the integration of AR technology. The results suggest that the integration of AR technologies, such as the virtual mirror on e-commerce websites, leads to enhanced privacy concerns of e-shoppers. On the other hand, only a weak relationship exists between trust and purchase intention and perceived risks and purchase intention. The results of this research will help businesses, especially in Arab countries, better understand how to deploy emerging technologies such as AR to generate competitive advantage.

MoM_Digest2014.indd 48

4/23/14 12:41 PM

STUDYING THE SOCIOTECHNICAL BARRIERS TO USING AUGMENTED REALITY TECHNOLOGIES FOR E-COMMERCE Afrah A. Hassan Carnegie Mellon University in Qatar ahousain@qatar.cmu.edu

QUESTIONS

SOCIOTECHNICAL ANALYSIS

How do sociotechnical barriers to the use of AR technology influence the purchasing behavior of Arab shoppers on e-commerce websites?

Buy sunglasses E-shopper

AND

Find suitable sunglasses

Find Money

Close sale

Choose sunglasses after using the virtual mirror

Choose sunglasses from a list

Pay through credit card

RESEARCH SIGNIFICANCE

Rayban

110

350 A sociotechnical system has:

110 Million Internet users in the MENA region.

30 million are shopping online already.

350 million in the region are potential users in the longer term.

1- Actors, goals, and agents. 2- An actor has to achieve a goal, which can be refined into two sub-goals. The actor can decide to achieve a goal by itself or delegate it to another Rayban website: 1- a consumer as an actor. 2- Rayban as an agent. 3-The consumer desire to buy a sunglass as a goal.

RESULTS It is important to design systems that fulfill peopleâ&#x20AC;&#x2122;s goals and overcome barriers. Bryl et al. (2009)

METHODOLOGY Control condition

Two tasks were given to the participants: Task One: Choose a sunglass from a list of items in Rayban. (control condition) Task Two: Use the virtual mirror to select a sunglass. (treatment condition)

1- The results of the control condition supported the hypothesis. 2- The results of the treatment conditions supported the hypothesis except the privacy concerns. The research have shown that AR helps in reducing privacy concerns of an e-commerce online shopper.

beta

Purchase Intention beta

Trust

-0.32

Perceived risks

Privacy concerns

-0.26

Perceived benefits 0.61

-0.286

Treatment condition Perceived risks

The particpents filled out questionnair after each tasks.

CONCLUSION

Perceived risks

beta

Purchase Intention beta

Trust

-0.64

Perceived risks

Privacy concerns

0.3

Perceived benefits -0.4

-0.36

IMPLICATIONS One of the concerns in the middle east especially in Qatar is privacy. Due to cultural and religious constrains , people donâ&#x20AC;&#x2122;t use e-commerce websites. The results of this research , helped us to reduce privacy concerns. Thus resulting in positive behavior towards the use of e-commerce websites.

SENIOR HONOR THESIS | ADVISOR : DR. DIVAKARAN LIGINLAL

MoM_Digest2014.indd 49

4/23/14 12:41 PM

Teaching / Learning with iPads Authors Aliya Hashim (IS 2014)

Faculty Advisor Divakaran Liginlal, Ph.D.

Category Information Systems

Abstract

The primary aim of this research is to reconcile students’ and instructors’ attitudes toward the use of iPads to support their learning/teaching processes. The study, conducted at Carnegie Mellon University in Qatar, involves measuring the effect of attitude, behavioral control, subjective norm, and self-efficacy on the intention to use iPads. Twenty five undergraduate students, all of whom were using iPads in their classroom, and twenty five instructors, some of whom were using iPads in the classrooms, participated. The analysis revealed that instructors’ attitude and subjective norm have an effect on their intention to use iPads. On the other hand, for the students, attitude has a significant effect on intention, whereas self-efficacy has a weak relationship with intention. The qualitative part of the study revealed that the students appeared quite enthusiastic about iPad usage as it is a new technology and they were eager to use it. The instructors, on the other hand, are hesitant to adopt iPads as they are not confident of its impact on learning effectiveness and also felt they lacked preparation for integrating iPads into their classroom. The results have consequent implications to pedagogical methods that use technology in the classroom.

MoM_Digest2014.indd 50

4/23/14 12:41 PM

TEACHING/LEARNING WITH IPADS Students’ and instructors’ attitudes toward the use of iPads in undergraduate education Background “New and innovative method of learning through the use of tablets is creating a new model of teaching and learning that can occur anytime, anywhere” (Gitsaki, 2013). - United Arabs Emirates (UAE) experimented with iPads in 3 universities in 2012 - Carnegie Mellon University in Qatar (CMUQ) distributes iPads to all freshmen in 2013.

Research Questions 1. How do we reconcile the perceptions of students and teachers toward the use of technology in the

Methods

Model

Instructors: Interview & Questionnaire sample size: 25 instructors

Attitude, behavioral control, subjective norm, & self-efficacy

Students: Questionnaire Sample size: 25 students

Intention to the iPads

classroom? 2. How will the information obtained influence pedagogical methods used for iPad-based teaching?

Pedagogical methods

Results The thesis analyzes both students’ and instructors’ perception toward the use of iPads by measuring four independent variables and relating them to the intention to use iPads. Those variables are attitude, subjective norms, self-efficacy, and behavioral control. The major conclusions of this research may be stated as follows.

Instructors responses about the iPads a technological solution

1. Applied to instructors, the attitude toward the use if iPads and intention to use are correlated and statistically significant. Also the subjective norm and intention to use iPads are correlated and statistically significant.

Students’ preferences in classroom

Students’ preferences at home

2. Applied to students, the attitude toward the use if iPads and intention to use it have correlation that is statistically significant. However the self-efficacy and intention to use iPads might have weak correlation.

Aliya Hashim ahashim@qatar.cmu.edu Senior Honors Thesis Advisor: Divakaran Liginlal Information Systems

MoM_Digest2014.indd 51

4/23/14 12:41 PM

DREAM: Distributed RDF Engine with Adaptive Query Optimization & Minimal Communication Author Dania Abed Rabbou (CS 2012)

Faculty Advisor Mohammad Hammoud, Ph.D.

Category Post-Graduate

Abstract

The Resource Description Framework (RDF) is building up a strong momentum among various fields including science, bioinformatics, business intelligence and social networks, to mention a few. RDF models data items as triples, each of the form (Subject, Predicate, Object), indicating a relationship between a Subject and an Object captured by a Predicate. RDF repositories can be searched using SPARQL queries. A basic SPARQL query is much like an RDF triple; except that Subjects, Predicates and/or Objects can be variables or literals (RDF triples employ only literals). RDF triples and SPARQL triple patterns can be modeled as directed graphs, with vertices representing Subjects and Objects, and edges representing Predicates. As such, the problem of satisfying SPARQL queries morphs naturally into a sub-graph pattern matching problem. Due to the large-scale of RDF data (i.e., billions of triples) and the need for high-performing systems, RDF engines are usually distributed. Consequently, RDF datasets are partitioned among distributed machines using various partitioning methods such as hash or graph partitioning. After partitioning datasets, complex SPARQL queries are satisfied by typically accessing various partitions at different machines. Studies show that this strategy might cause large intermediate data shuffling and tremendous performance degradation. In this work, we propose DREAM, a distributed RDF engine with adaptive query optimization and minimal communication. In particular, we suggest precluding data partitioning altogether while maintaining parallelism, thus achieving minimal data shuffling and maximal performance. We promote a general and novel framework for approaching the RDF problem. Our framework suggests that RDF systems can be built using four different ways, one of which is to distribute SPARQL queries (as opposed to data) while storing RDF datasets unsliced at all machines. To our knowledge, this distribution strategy has not yet been explored in RDF literature. By storing data unsliced at each machine and distributing queries, DREAM entails no data shuffling (only meta-data traffic) and caches disk-resident data across aggregate memories of the cluster machines. DREAM models every query as a graph and decomposes it into a set of join vertices (i.e., vertices with degrees greater than one, thus involving joins). Join vertices are assigned sub-queries according to specific rules and a plan for each query is generated with near-optimal selectivity through a query plan optimizer. Afterwards, each join vertex in the generated query plan is mapped to a machine, executed independently, and eventually joined with other join vertices to reconstruct the original query graph and produce the final result. DREAM can adaptively select either a centralized (i.e., a single join vertex) or a distributed (i.e., multiple join vertices) query plan per each submitted query based on the queryâ&#x20AC;&#x2122;s complexity estimated by the query optimizer. We implemented DREAM and conducted comprehensive experiments on a private cloud and on Amazon EC2. Our results demonstrate the superiority of DREAM as compared to related schemes. 48

MoM_Digest2014.indd 52

4/23/14 12:41 PM

DREAM: Distributed RDF Engine with Adaptive Query Optimization & Minimal Communication Dania Abed Rabbou and Mohammad Hammoud {dabedrab,mhhammou}@qatar.cmu.edu

Querying RDF: SPARQL

Resource Definition Framework Located In

Awarded To

South America

Qatar Awarded To

2022 FIFA World C Will Host

2014 FIFA World C

Type

SELECT ?tournament ?city WHERE ?tournament type ‘World Cup’ ?tournament awardedTo ?country ?city capitalOf ?country ?city willHost ?tournament ‘World Cup’ ?tournament

Brazil

World Cup

Will Host Capital Of

Capital Of

Brasilia

Doha Located In

A SPARQL QUERY & ITS GRAPH

A SAMPLE RDF GRAPH

GENERAL FRAMEWORK

CURRENT RDF ENGINES DISTRIBUTED

CENTRALIZED

To Expedite D

Queries

Q Q

(-) Low computational power

DN D

(++)

(-) High data shuffling (+) High computational power

DB Statistics

?x q4 ?y

q1 ?w

q2 q3

Query Graph G Join Node

MoM_Digest2014.indd 53

Sx[ ] =

q1 q4

Sy[ ] = q4

Non-Join Node

Sz[ ] =

q1 q4

~140 Billion Connections 1 TB only !

DB RDF3x

Query Planner

PRIMARY EXPERIMENT

Proxy

feasible because BIG GRAPH != BIG DATA

OUR SCHEME: DREAM Q

Q Q1 Q Q 2 Q Q3 Q1 Q2 Q3

(+) No data shuffling

Client

?city

?country

Qatar Foundation

Shared Query

Q12 Q11 Q10 Q9 Q8 Q7

Workers

Least Cost {Sx,Sy,Sz}

Q14 Q13

Intermediate meta-data (No data)

Q1 Sx Sx∩Sy Sx∩Sz Sx∩Sy∩Sz

Statistics Q2 Sy Sy -

Q3 Sz Sz -

Q5 Q4 Q3 Q2 Q1 7

-1

-2

Comparison against Distributed Semi Joins with Hash-Based Partitioning of the dataset LUBM-40

Exclusive Query

4/23/14 12:41 PM

As a global leader in education, Carnegie Mellon University is known for its creativity, collaboration across disciplines, and top programs in business, technology and the arts. The university has been home to some of the world’s most important thinkers, among them 19 Nobel Laureates and 11 Turing Award winners. In 2004, Qatar Foundation invited Carnegie Mellon to join Education City, a groundbreaking center for scholarship and research. As Carnegie Mellon Qatar celebrates its 10th anniversary, the campus continues to grow, providing a prestigious education to 400 students from 42 countries. The university offers five undergraduate degree programs in Biological Sciences, Business Administration, Computational Biology, Computer Science and Information Systems. Students in Qatar join more than 12,000 Carnegie Mellon students across the globe, who will become the next generation of leaders tackling tomorrow’s challenges. The university’s 95,000 alumni are recruited by some of the world’s most innovative organizations. To learn more, visit www.qatar.cmu.edu and follow us on twitter @ CarnegieMellonQ

MoM_Digest2014.indd 54

4/23/14 12:41 PM

MoM_Digest2014.indd 55

4/23/14 12:41 PM

MoM_Digest2014.indd 56

4/23/14 12:41 PM