Meeting of the Minds 2015

Page 1



Meeting of the Minds is an annual symposium at Carnegie Mellon University that gives students an opportunity to present their research and project work to a wide audience of faculty, fellow students, family members, industry representatives and the larger community. Students use posters, videos and other visual aides to present their work in a manner that can be easily understood by both experts and non experts.

Through this experience, students learn how to brindege the gap between conducting research and presenting it to a wider audience. A review committee consisting of industry experts and faculty members from other universities will review the presentations and choose the best projects and posters. Awards and certificates are presented to the winners.


Table of Contents

External Judges

PAGE 1

Biological Sciences Posters

• A Step Towards the Treatment of Tuberculosis

3

• Bacteriophage Diversity in the Ecology of Qatar

5

• Comparison of Genetically Modified Soy Products in Malaysia, Qatar and France

7

• Evaluation of genetic modifications in unlabeled, “organic,” – labeled and “GM certified” – labeled food products

• Is Your Food Genetically Modified?

• Isolating Fluorogen Activating Proteins (FAPs) that Activate Patent Blue V Dye and Structurally Similar Dyes

9 11 13

• Particulate Air Pollution in Qatar and the Air Quality Index

15

• Testing for Genetic Modification in Corn Derived Products across Varying Regions

17

• The Effects of Aspartame and Methanol on Kidney Cells

19

Business Administration Posters

• Forecasting Bank Performance in the GCC Through Market-based Metrics

21

• Robust Return Model

23

Computer Science Posters

• Developing a benchmark to assess and improve the detection of Cross-Site

Scripting (XSS) Vulnerabilities

25

• Doha’s Architecture on Tablet

27

• MANET and DTN Communication Protocols Evaluation for Real-World Scenarios

29

• Metis: Smart Academic Planner for College Students

31

• Real-time Dialectal Arabic Speech Recognition Application

33

• Strengthening the Security of Qatari Websites

35

• Twitter Sentiment Analysis

37

General Education Posters

• Stress, Depression, Anxiety, and Social Media Use Among Arabic-Speaking Undergraduates

• Validity and Reliability of the Arabic Version of the Perceived Stress Scale (PSS-10)

• Effect of the initial Conditions on the interfacial and Bulk Dynamics in

Richtmeyer-Meshkov instability Under Conditions of High Energy Density

39 41 43

Information Systems Posters

• Designing Qatar’s Infrastructure in a Human Centered Way

45

• Factors Influencing the Adoption of Dermatology Diagnoses Through Mobile Applications

47

• Malicious Online Behavior

49

• Voices of Al-Khor: A Study in Digital Culture Heritage

51


Post-Graduate Posters

• AL-BLEU: A Metric and Dataset for Evaluation of Arabic Machine Translation

53

• Alice in the Middle East Learning Computational Thinking in K-12

55

• ‘Arabiyyatii An Innovative Technology-Based Curriculum for Teaching Arabic

to Native Speakers

57

• Chi-Qat Tutor – Using Worked-Out Examples in an Intelligent Tutoring System

59

• CoMingle: Distributed Logic Programming for Decentralized Android Applications

61

• Correction Annotation for Non-Native Arabic Texts in the Qatar Arabic Language Bank

63

• Reasoning with Relations – How do people do?

65


External Judges Dr. Hadi Abderrahim, Managing Director, Qatar BioBank Ms. Haya Al-Ghanim, Innovation Director, QSTP Dr. Marco Ameduri, Prof. and Assoc. Dean, WCMC-Qatar Dr. Omar Boukhris, Postaward Administrator, QNRF Dr. Marios Kambouris, Principal Scientist, QCRI Dr. Dirar Khoury, Exec. Director, Special Projects, QF Research Division Mr. Yacine Messaoui, CIO, Al-Jazeera Ms. Batool Mohammed, Engineer, Vodafone Dr. Alessandro Moschitti, Principal Scientist, QCRI Dr. Klaus Schoenback, Associate Dean for Research & Professor in Residence, Northwestern University in Qatar Dr. Munir Tag, Program Manager, ICT, QNRF Mr. Mohamad Takriti, CEO, iHorizons Dr. Sarah Vieweg, Scientist, QCRI Dr. Stephan Vogel, Research Director, QCRI Dr. Ingmar Weber, Senior Scientist, QCRI Dr. Barak Yehya, Expert, Ministry of Development, Planning and Statistics

1


Tuesday, April 28, 2015, 4:00 pm - 6:00 pm CARNEGIE MELLON UNIVERSITY, EDUCATION CITY


A Step Towards the Treatment of Tuberculosis Authors Maryam Aghadi (BS 2016), Sherif Mostafa (BS 2018)

Faculty Advisors Annette Vincent Ph.D., Valentin Ilyin Ph.D.

Category Biological Sciences, Computational Biology

Abstract The goal of this project was to isolate and annotate a bacteriophage that is effective in targeting Mycobacterium Smegmatis, which is genetically similar to the bacteria responsible for causing tuberculosis. In order to isolate and purify the phage and its DNA, techniques including soil enrichment, pour-plating, vacuum filtration and phenol-chloroform extraction were used, while in order to annotate the phage’s genome, computer programs were used including DNA master, velvetg, velveth and soap. Electron microscopy imaging showed isolation of one bacteriophage with a head and a tail, while spectrophotometry results ensured optimal purity and yield of extracted DNA. Whereas by annotating the genome, we extracted contigs -a set of overlapping DNA segments that together represent a consensus region of DNA- and blasted them using DNA master to extract genes that code for different proteins. The annotated phage’s genome showed that it contained genes that code for several proteins that are essential for lysing M. Smegmatis including DNA polymerase I and HTH DNA binding protein.

3


Abstract:

www.PosterPresentations.com

RESEARCH POSTER PRESENTATION DESIGN © 2012

Goal 2 : Phage isolation

Goal 1 : Obtain high concentration of phages

Methods:

Bacteriophages are viruses that are effective in destroying bacterias. They do that by making the bacteria undergo lytic or lysogenic cycle. In lytic cycle, the injected viral DNA release proteins that degrades bacterial DNA, and uses bacterial nucleotides to create new bacteriophages.Prepared bacteriophages are released to the surrounding by the bursting of bacterial cell, which give rise to indention or plaques within the agar. Morphology of phages correlates to the morphology of plaques, which allows their characterization.

Introduction:

The goal of this project was to isolate and annotate a bacteriophage that is effective in targeting Mycobacterium Smegmatis, which is genetically similar to the bacteria responsible for causing tuberculosis. In order to isolate and purify the phage and its DNA, techniques including soil enrichment, pour-plating, vacuum filtration and phenol- chloroform extraction were used, while in order to annotate the phage’s genome, computer programs were used including DNA master, velvetg, velveth and soap. Electron microscopy image showed isolation of one bacteriophage with a head and a tail, while spectrophotometry results ensured optimal purity and yield of extracted DNA. Whereas by annotating the genome, we extracted contigs -a set of overlapping DNA segments that together represent a consensus region of DNA- and blasted them using DNA master to extract genes that code for different proteins. The annotated phage’s genome showed that it contained genes that code for several proteins that are essential for lysing M. Smegmatis including DNA polymerase I and HTH DNA binding protein.

• We used the Ilumina's Miseq instrument to sequence the four samples. • When the samples were submitted to us they were named: MA, DA, HA, and OS and we kept that nomenclature- not sure what they mean. • For these samples we performed a paired 150bp sequencing run. That means each fragment in the NGS library was sequenced 150bp from both ends- that is why you will find two files for each sample (R1 and R2). • The four phage samples were sequenced on one single run of Miseq. Each sample was tagged with a unique identifier, then the four were pooled, sequenced, finally they were separated into independent fastq files using the unique identifiers (UIDs). The undetermined files (R1 and R2) contain the reads that the software could not unequivocally sort using the UIDs. • We obtained the genome sequences by using terminal and we used velvetg and velveth to produce the contigs and annotate them. • We also used NCBI website to blast the contigs (i.e. find genes that code for specific protein sequences).

Computational Biology methodology

Spectrophotometer from 200-400nm to determine the purity ratio and Conc. of DNA. Purity Ratio = Abs260 / Abs280 = 1.78 ; Optimal 1.7-1.9 Conc. of DNA = 1Abs260 = 50ng/ul =12ug

Goal 5 : DNA Analysis

Goal 4 : DNA extraction

Phage sample was vaccum filtered through a 0.22um filter, that prevented passage of any bacteria of size greater than 0.22um.

Goal 3 : Phage Purification

!

Head

2nd largest contig (with 14 genes, 8626 bases)

Figure 8: Electron microscope of the phage

!

Green represents forward sequence Red represents backward sequence

In prokaryotic cells, during cell replication on the lagging strand, DNA Polymerase I has 5’-3’ exonuclease activity that removes RNA primers and replaces RNA nucleotides with DNA nucleotides.

!

DNA Polymerase I: % Aligned: 100% % Similarity: 100% % Sequence identity: 100% (to Taurus & Marie)

Figure 4: Blasted using NCBI nr

• We then analyzed 2 proteins that we thought is very crucial to the life cycle (lytic cycle) of the phage.

! ! !

4th largest contig(with 11 genes)

3rd largest contig (with 8 genes, 4355 bases)

1st largest contig (with 37 genes, 20175 bases)

Gene mapping of the first 4 contigs, according to the order in which the contigs was found on the genome:

! ! !

Tail

Results:

!

PHAGE TECHNOLOGY: http://www.micreosfoodsafety.com/en/technology.aspx DNA polymerase structure: http://www.hindawi.com/journals/jna/2010/457176/fig1/ Description of HTH motifs: http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/ pdbsum/GetPage.pl?doc=TRUE&pdbcode=n/ a&profunc=TRUE&template=doc_hth.html Helix-turn-helix: http://en.wikipedia.org/wiki/Helix-turn-helix#/media/ File:Lambda_repressor_1LMB.png

References:

• One bacteriophage effective in targeting Mycobacterium Smegmatis was isolated. • DNA of optimal purity i.e. 1.78 and optimal yield i.e. 12ug was isolated from the phage. • DNA Polymerase I and HTH DNA binding protein are essential in the phage’s life and can be few of the major proteins that play a role in bacterial lysing (bursting). Therefore we can increase the expression of genes coding for these 2 proteins, so that we could lyse more bacterial cells and in a faster rate.

Conclusion:

Figure 7: Blasted using NCBI pdb

Figure 6: Blasted using NCBI nr

Figure 8: HTH motifs

Figure 9: Helix-turn-helix

Figure 5: DNA polymerase structure

The helix-turn-helix (HTH) is one of the most common motifs that proteins use to bind to DNA (being found in about one third of DNA-binding protein structure families). It is relatively small (around 20 residues in length) and, in its simplest form, consists of two perpendicular helices joined by a short linker, or turn. Where DNA binding is sequencespecific, the recognition is performed by the second, or Cterminal, helix binding in the major groove of the DNA.

!

HTH DNA Binding protein: % Aligned: 100% % Similarity: 100% % Sequence identity: 100% (to Mycobacterium phage Jobu08)

Figure 3: Blasted using NCBI pdb

Sherif Mostafa Mariam Aghadi Dr. Annette Vincent Dr. Valentin Ilyin

A step towards treatment of Tuberculosis


Bacteriophage Diversity in the Ecology of Qatar

Author Umm-Kulthum Ismail Umlai (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract The Science Education Alliance (SEA) Phage Hunters Advancing Genomics and Evolutionary Science (PHAGES) is a national initiative in the United States, funded by the Howard Hughes Medical Institute (HHMI) and led by Professor Graham Hatfull from the University of Pittsburgh. This project is built around the discovery and analysis of mycobacteriophages in the biodiversity of the United States. For the first time, this research is being done in Qatar at Carnegie Mellon University to discover the possible existence of bacteriophages within the soil and sand of this Gulf State. The aim was to isolate and purify the phages present within the local ecology to be used as subjects of phage related medicinal therapy. Mycobacterium smegmatis (M.smeg) was used as the host bacteria strain in order to identify the presence of its phages within the sand and soil. This host was chosen due to its nonpathogenic nature, fast growing ability and for being a universal host. M.Smeg also has physiological similarities with pathogenic species from the same family; M.tuberculosis causes tuberculosis, and M.leprae causes leprosy. Such efforts are intended to allow us to better understand the mechanisms by which to treat such bacterial diseases using phage therapy as opposed to using increasingly ineffective drugs. Although M.smeg failed to act as a host for the isolated phages from local environmental samples, future works to identify the suitability of other bacterial hosts such as Arthrobacter sp. for these phages will be carried out. Furthermore, analysis of the microbial composition of environmental samples will to be conducted to identify specific bacteria hosts that exist in Qatar’s ecology.

5


Bacteriophage Diversity In the Ecology of Qatar Umm-Kulthum Umlai and Annette Vincent Ph.D Biological Sciences Program, Carnegie Mellon University in Qatar, Doha, Qatar Introduction The Science Education Alliance (SEA) Phage Hunters Advancing Genomics and Evolutionary Science (PHAGES), is a national initiative in the United States, funded by the Howard Hughes Medical Institute (HHMI) and led by Professor Graham Hatfull from the University of Pittsburgh. This project is built around the discovery and analysis of mycobacteriophages in the biodiversity of the USA. Bacteriophages are viruses that are parasitic to bacteria. Each bacteriophage species has its own specificity in which type of bacteria it can infect. Therefore, identification of these phages and using it as a therapeutic tool against specific antibiotic-resistant bacteria that cause diseases has substantial impact. This research is being done in Qatar at Carnegie Mellon University to discover and identify the possible existence of bacteriophages within the ecology of this Gulf State. The aim is to isolate and characterise the phages using soil bacteria that are physiologically similar to the bacteria causing tuberculosis and infective endocarditis. [1] Objectives • To find and isolate any bacteriophages present in the ecology of Qatar, specifically the sand. • To purify and characterize the bacteriophages

Materials and Methods Collect Sand Bacteriophage [2] 1) In order to obtain the phage particles, a sample of sand was enriched with Leuria Broth(LB) culture media, a suitable bacteria host and grown at 37’C in a shaking incubator. 2) After 48hours, the phage particles are expected to have amplified in number and concentrated in the culture 3) This phage culture is filtered to remove micro-organisms other than phage 4) The phage lysate at this point can be stored in a refrigerator at 4’C Isolate Sand Bacteriophage[2] 1) A serial dilution is carried out with the phage lysate and made up with phage buffer 2) The phage lysate dilutions are mixed with saturated bacteria host culture and top agar 3) After allowing for some time for the infection and lysis to occur, these mixtures are pour plated onto LB Plates and incubated at 37’C for 48 hours.

Figure1: Diagram showing a summary of phage infection, replication and finally lysis of the bacterium Agar surface before pour plating

Agar surface with added top agar

Top agar surface with plaques

Figure 2: Diagram showing the process of plaque formation on an agar plate

Purify Sand Bacteriophage [2] If plaques are formed, each plaque of a different morphology is picked out, and re-plated until only a single phage/ plaque is present on each plate. Results Sand samples collected were enriched in bacteria host which were physiologically similar to M. tuberculosis returned negative as no plaques were observed. When the procedure was repeated using a soil sample from Pittsburgh PA, United States of America, distinct plaques were observed. Upon plating with host bacteria physiologically similar to A. woluwensis, plaques were observed with three distinct morphologies, indicating three different types of phages in the sample Each plaque was picked re-plated and phage titer (PFU/mL) was calculated

Data Analysis Figure 1: Plaque morphology of the three phages isolated from sand. 0.5ml of saturated host bacteria physiologically similar to A. woluwensis was infected with 90µl of isolated phage (100-fold diluted) and pour plated onto LB plates. Plates were incubated for 48h at 37ᵒC. (A) Phage type 1; (B) Phage type 2, (C) Phage Type 3

C

B

A

Table 1: Phage Titers of the Plaques Formed using host similar to A. woluwensis Phage Lysate Dilution-Plate Count e 100

10-1

10-2

10-3

Phage Titer (PFU/mL)b, c, d

Phage Type 1

CL a

CL

CL

18

2.00 x 105

Phage Type 2

CL

CL

136

10

1.11 x 105

Phage

3 0 3.33 x103 CL CL Phage Type 3 a. CL= Confluent Lysis, over 200 plaques were observed, more than can be accurately counted b. Calculated by: Plate Count /Dilution factor/ Volume assessed (0.09mL) c. PFU/mL= Plaque forming units per milliliter d. Plates with plaques < 100 used for phage titer calculations since they are most accurately counted e. These LB plates were incubated at 37ᵒC

Discussion • The isolation of these phage particles and their subsequent purification has been a great success so far. It is clear that there are several bacteriophages which are present within the sand collected in Doha •. The first phage (Type 1) has the greatest phage titer (Table 1: 2.00 x 10 5 PFU/mL) and has a glittery appearance that is difficult to notice at first but upon closer inspection, shallow round dents can be observed. • The second phage (Type 2) has a phage titer of 1.11 x 10 5 PFU/mL, producing not as many plaques as the first, but producing clearer dents in the top agar that are easy to notice. The last phage (Type 3) is very similar in shape and size to the 2nd but has a lower phage titer. Conclusion • When this experiment did not work using host similar to M. tuberculosis and the local sand sample but worked with the same bacteria and a foreign soil sample it seemed that either there were no bacteriophages in the sand or that the sand phages simply could not infect particular host. • As a result, the host bacteria was changed and a strain linked to A. woluwensis was used instead. This produced positive results meaning that the sand phages infect bacteria similar to this host. • Future work aims to characterize the viral DNA sequence using Next Generation Sequencing and carry out computational analysis to annotate the sequence. Electron microscopy will be used to visualize the protein structure of these phages. •The project is the first of its kind to be conducted in this region and will allow us to understand the properties and phylogenetics of the sand bacteriophages in the region. •The project is significant because the phages could be used for therapeutic (most likely related to treating bacterial infections e.g. infective endocarditis), genetic, epidemiological or industrial purposes. References 1. Bernasconi, E., Valsangiacomo, C., Peduzzi, R., Carota, A., Moccetti, T., & Funke, G. (2004). Arthrobacter woluwensis subacute infective endocarditis: case report and review of the literature. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America, 38(4): e27–31. doi:10.1086/381436 2. Howard Hughes Medical Institute (HHMI) , 2013. Science Education Alliance- Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) Mycobacterium Smegmatis. [Laboratory Manual . Chevy Chase Maryland 3. Mycobacterium Database- Phagesdb.org. 2014. Phagehunting on Arthrobactersp. ATCC 21022. [Online]. Available from: http://phagesdb.org/media/workflow/protocols/pdfs/ArthrobacterGrowingConditions.pdf.

Acknowledgements The following people and institutions, without which this would not have been possible, deserve special thanks and recognition for their efforts: Carnegie Mellon University Qatar , Howard Hughes Medical Institute, Maya Kemaldean, Maria Navarro, Hassan Al Mana, Susmita Mate, Rayan Mahmoud, Aya Abd Elaal, Omair Al-Nuaimi, Maryam Aghadi and Fatima Amir.


Comparison of Genetically Modified Soy Products in Malaysia, Qatar and France Authors Bilal Jaradat (BS 2015), Rayan Mahmoud (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract The goal of this project is to compare GMO labeled and unlabeled soybean products that are currently available in the market from Malaysia, Qatar and France for GTS-40-3-2 cassette. Our hypothesis is that the labeled GMO free Vanilla soymilk product from France is not genetically modified while the unlabeled Qatari soybeans and Malaysian soymilk are also not genetically modified. The results from the PCR showed the absence of CP4-EPSPS gene in all food samples including the positive control while ELISA showed that the percentage of CP4-EPSPS in all food samples are below 0.4%. In conclusion, the results from the PCR were inconclusive while the ELISA results supported our hypothesis that all the food samples were GMO free for GTS-40-3-2 cassette.

7


Introduction

ELISA – Presence of CP4-EPSPS protein was detected using specific antibody-antigen interactions.

PCR – Detection of genes within the GTS-40-30-2 cassette was carried out using PCR. Cycling conditions followed using James et al 2003.

DNA isolation - The genomic DNA was isolated by NuceloSpin® kit. Concentration was assessed using spectrophotometer.

Figure 1. Map of GTS-40-3-2 inserted in GM soy (cera-gmc.org)

Methods

The goal of this project is to compare GMO labeled and unlabeled soybean products that are currently avaialable in the market from Malaysia, Qatar and France for GTS-40-3-2 cassette. Our hypothesis is that the labeled GMO free Vanilla soy milk product from France is not genetically modified while the unlabeled Qatari soybeans and Malaysian soymilk are also not genetically modified. The results from the PCR showed the absence of CP4-EPSPS gene in all food samples including the positive control while ELISA showed that the percentage of CP4-EPSPS in all food samples are below 0.4%. In conclusion, the results from the PCR were inconclusive while the ELISA results supported our hypothesis that all the food samples were GMO free for GTS-40-3-2 cassette.

6.75

Vanilla Soy Milk (France)

1.00

1.32

1.93

Figure 3. Uniplex PCR for detection of CamV35S

1. A sample calculation for Positive Control= Abs260/Abs280= 0.156/0.169= 0.923.

231 57.7

Soy Milk (Malaysia)

102

Al-Ansari Soybeans (Qatar)

0.923

156

Positive Control (1.0% GTS 40-3-2) Negative Control (0.7% GTS 40-3-2)

1.08

Purity Ratio (Abs260/Abs280)2

[DNA] (μg/mL)1

Sample

Table 1. DNA Concentrations from Spec. Scans

Figure 2. Spectrophotometric Scanning measured at a range of 200 to 400 nm.

Data

158 145 118 125

CaMV35S Promt. CP4-EPSPS Soybean-lectin NOS Term.

curve

N/A

119.1

N/A

N/A 2

for

Actual size (bp)1

ELISA

Using

0.0190 0.0180 0.0140

Soy Milk (Malaysia) Vanilla Soy Milk (France)

< 0.4

< 0.4

< 0.4

% of CP4-EPSPS1

1. The absorbances of the food products were below the lowest standard absorbance readings (0.023 for the <0.4%). Therefore the standard equation from Figure 4 can’t be used determine the CP4-EPSPS percentage.

Abs. at 450 nm

Sample Al-Ansari Soybeans (Qatar)

Table 3. ELISA Results for CP4-EPSPS Protein Using QualitPlate Kit.

Figure 5. Standard QualiPlate Kit.

Expected size (bp)

Gene

Table 2. Expected and actual sizes of target genes

Figure 4. Multiplex PCR for genes in GTS-40-3-2

Bilal Jaradat, Rayan Mahmoud, and Annette Vincent; Ph.D Biological Sciences Program, Carnegie Mellon University in Qatar

Analysis None of the genes in the GTS-40-3-2 cassette were detected in the food samples tested. Multiplex PCR was successful in detecting the presence gene corresponding to size of soybean lectin. Possible failure of primer annealing to template DNA for the GM genes.

References

1. Center for Environmental Risk Assessment. 2010. A Review of the Environmental Safety of the CP4 EPSPS Protein. Washington, USA. 2. Rubin, J. 1984. Glyphosate Inhibition of from Suspension-Cultured Cells of Nicotiana silvestris. 3. James, D. 2003. Reliable Detection and Identification of Genetically Modified Maize, Soybean, and Canola by Multiplex PCR Analysis. BC, Canada. 4. Genetics rights foundation. http://en.biosafetyscanner.org/. Accessed on 29/March/2015. 5. Henegariu, O. 1997. Multiplex PCR: Critical parameters and Step-by-Step protocol. 6. Brandner, D. 2002. PCR-Based Detection of Genetically Modified foods. 7. Center for food safety. 2015. http://www.centerforfoodsafety.org. Accessed on 29/March/2015.

The Multiplex PCR is not reliable to be used as an indicator for the presence of GTS 40-3-2 cassette due to the lack of the target genes including CaMV35S, NOS, and CP4-EPSPS in the Positive Control (Figure 3 - Lane 5). On the other hand, ELISA test prove our hypothesis that labelled and unlabelled food products were GMO-free. Yet, it is important to consider the possibility of the denaturation of CP4-EPSPS proteins during the food processing.

The only tested food sample that is labelled was the Soymilk from France. Since the product is European it was labelled as < 0.9% genetically modified [7]. The Multiplex PCR indicated the absence of CP4-EPSPS (Figure 3 - Lane 3), while ELISA test in Table 3 shows that there is less than 0.4% CP4-EPSPS in the product.

Conclusion

• Absence of CP4-EPSPS in the food samples since we obtained a percentage of less than 0.4%. • Possible that CP4-EPSPS proteins were denatured during the processing of these products and thus they are not being measured by ELISA. [6]

ELISA test, Table 3,

Multiplex PCR, Figure 4,

Comparison of Genetically Modified Soy Products in Malaysia, Qatar, and France


Evaluation of Genetic Modifications in Unlabeled, “Organic”-Labeled, and “GM Certified”-Labeled Food Products Authors Manar Naboulsi (BS 2016), Maryam Al Sulaiti (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract Although genetically modified (GM) food is currently being sold in many places around the world, the regulations for genetic modifications in regards to labeling vary from one country to another. Our goal was to analyze presence of commonly inserted genes in food products that are unlabeled, or labeled as “organic” or “GM certified.” An internal control, Soy lectin gene, was expected to be present in all products. It codes for a carbohydrate-binding protein that plays a role in germination and cell survival. With GM soy products, there are numerous possibly inserted genes that provide the plant with various advantages. The genes that were tested were CP4-EPSPS, PAT, and Cry9C Universal. These are usually inserted in a cassette that includes a CaMV35S Promoter and a NOS Terminator. Therefore, by detecting the presence of NOS Terminator or CAMV35S, the existence of genetic modifications can be determined. Our hypothesis was that “organic” products do not contain modifications, “GM certified” products do, and unlabeled products may or may not. Spectrophotometry, various PCRs for various genes, and ELISA were used to execute our experiment, and the results were consistent with our hypothesis. It was observed that GMO flour contains CP4-EPSPS whilst organic shortbread does not contain any genetic modifications, and one food product (soybeans) contains CP4-EPSPS while the other food product (“Lucky me” Noodles soy sauce) does not.

9



Is Your Food Genetically Modified? Authors Maryam Aghadi (BS 2016), Mohammed El Allam (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract Genetic modification has given rise to crops of high yield and defensive qualities. The purpose of this project was to test canned corn seeds for their genetic modification by carrying out qualitative analysis of the transgenes using Multiplex PCR and ELISA. For the purpose of this project, 10% MON-810 corn, organic Carrefour corn and Lulu corn were analyzed. DNA extraction from the food sample was carried out using the nucleospin kit followed by simultaneous amplification of CAMV35S, CRY1AB, and Maize invertase genes using CV35(F/R), CRY1AB(F/R), and IVR(F/R) primers via multiplex PCR. Existence of amplified products was accessed by carrying out gel electrophoresis for the PCR samples. It was expected to see bands correlating to the sizes of CAMV35S, CRY1AB and maize invertase genes i.e. 156,492 and 226bp respectively for the 10% MON-810 positive food control, while band correlating to the size of maize invertase was expected to be seen in the Organic Carrefour corn. In addition to that, Lulu corn was an experimental food product, so appearance of bands correlating to the size of either CAMV35S or CRY1AB would indicate genetic modification of Lulu corn. Gels showed bands correlating to the size of CRY1AB and CAMV35S i.e.492bp and 156bp respectively in positive and experimental food product indicating their genetic modification. In addition to that, no bands were seen for the negative control in their samples, validating them to be organic. But surprisingly no bands appeared for maize invertase for all the food samples indicating their unexposure to primers because of DNA’s secondary structure induced by high GC content in the maize genome. This hypothesis requires further experimental trials for validation.

11



Isolating Flourogen Activating Proteins (FAPs) That Activate Patent Blue V Dye and Structurally Similar Dyes Author Clinton Royce Cunha (BS 2016)

Faculty Advisors Marcel Bruchez, Ph.D. (CMU), Alison Dempsey (CMU), Chris Szent-Gyorgyi (CMU)

Category Biological Sciences

Abstract Patent Blue V (PBV) is a commonly used dye in lymphatic mapping, and it is possible to extend the usability of PBV through fluorogen activating proteins (FAPs). There is a clinical need to enhance detection of tumours and disease within the human body, and PBV coupled with FAPs can serve as the potential solution. FAPs generate fluorescence from molecules that do not exhibit fluorescence. The goal is to find a protein, a FAP, to activate fluorescence in PBV through binding, in nanomolar concentrations of PBV. PBV is currently used for fluorescence detection using serum albumin, which forms a complex with modest brightness under whole body imaging (Tellier, 2012, p. 2311) detection. Utilizing FAPs will increase the fluorescence of PBV to standard fluorescent protein emissions, which will provide a cheaper, safer and clinically compatible procedure for analyzing specific tumour sites. Several laboratory techniques are used such as: Yeast DNA extraction, Agarose Gel Electrophoresis and Extraction of DNA, PCR Amplification, Yeast Transformation, flow cytometry, UV spectroscopy, and DNA Purification and Quantitation. Finding a FAP for PBV will not only be important in clinical applications, but it will help in the analysis of different secretory pathway proteins and different cell surfaces through flow cytometry and fluorescence microscopy approaches. Since PBV is FDA (in the U.S) approved in the clinical setting, it is much more probable that FAPs that activate PBV will be used within the next couple of years in clinical procedures. By identifying different isolates of FAP’s that are activated by PBV, it is possible to have a wide array of FAPs to use in clinical applications.

13


Isolating Fluorogenic Activating Proteins (FAPs) that activate Patent Blue V Dye and Structurally Similar Dyes Clinton Royce Cunha, Alison Dempsey, Chris Szent-Gyorgyi, Marcel Bruchez Figure 1: Flow Cytometry Screen of Patent Blue V

Methods A library of yeast cells from Pacific Northwest National Library (PNNL) that express human single chain variable fragments (scFvs) were screened for cells that would express fluorescence activation of PBV dye. Approximately 360 (Figure 1) colonies were plated. 20 individual colonies were manually isolated and tested using the Accuri flow cytometer. The DNA of the 20 High PBV expressing isolates was extracted and purified using Zymoprep™ Yeast Plasmid Miniprep II. The purified DNA samples were sent for sequencing to GeneWiz. The results showed similar light chains but different heavy chains (Figure 3). The purified light chain DNA was transformed into Jar200 Yeast cells. Transformed cells were sorted using FACs to determine whether the transformation was successful. Figure 2: Flow Cytometry Screen of Five different Dyes with PBV FAP 10 nM Patent Blue V

10 nM Patent Blue Violet

10 nM MG-2p Single isolate?

Red Channel

Introduction Patent Blue V (PBV) is a commonly used dye in lymphatic mapping, and it is possible to extend the usability of PBV through fluorogen activating proteins (FAPs). There is a clinical need to enhance detection of tumours and disease within the human body, and PBV coupled with FAPs can serve as the potential solution. FAPs generate fluorescence from molecules that do not exhibit fluorescence. The goal is to find a FAP to activate fluorescence in PBV through binding, in nanomolar concentrations. PBV is currently used for fluorescence detection using serum albumin which forms a complex with modest brightness under whole body imaging (Tellier, 2012, p. 2311) detection.

~360 independent isolates

Green Channel (Alexa-488 anti-c-myc) Results and Discussion Four other dyes (Figure 4) were used to analyze the possibility of fluorescence. The Accuri flow cytometer determined the data below (Figure 2), where dye was mixed with isolates of PBV cells that expressed high PBV fluorescence. The data shows that Patent Blue V and Patent Blue Violet have similar fluorescence levels due to similarity in structure. There were many loose binders and low fluorescence when the other dyes concentration was at 1 uM which doesn’t show promising results. Two derivatives of L9 Proteins engineered by Ming Zhang were tested with the five different dyes. A rough spectral determination of the dyes mixed with the proteins, where we saturated the dyes with protein showed that Brilliant Blue FCF and Lissamine Green had no fluorescence. Figure 4: Triarylmethane dyes tested on PBV and LG FAPs N

+

N O

HO

-

S

N

+

O

O

-

O S O O

10 nM Lissamine Green

10 nM Brilliant Blue

O

red (fluorogen) log scale

S

N O

S

+

O

O

O

Patent Blue Violet N

O

-

S

N O

+

S

O

-

S

O O

-

Iso-sulfan Blue

N N

N

-

+

O

-

O O S O

O

-

O S O O

Patent Blue V

Almost all MG isolates do not cross-react!

N

N

O

+

N

HO O O

O O

S

O

-

O

S -

O O

Lissamine Green Malachite Green Brilliant Blue FCF References 1. Szent-Gyorgyi, C. Fluorogen-activating single-chain antibodies for imaging cell surface proteins. Nature Figure 3: Biotechnology, 235-240. 2. Tellier, F. Sentinel lymph nodes fluorescence detection and imaging using Patent Blue V bound to human serum albumin. Biomedical Optics Express, Conclusion 2306. One of the bigger objectives is to test the PBV FAP in a biological 3. Yeast Display scFv Antibody Library User’s Manual. system. That will identify whether or not the FAP can work in a real(n.d.). Retrieved June 26, 2014, from time environment. Future steps involved will include: isolating the FAP http://www.sysbio.org/dataresources/usermanual0 from the yeast cells, creating a covalent homodimer between two light 31113.pdf chains, and codon optimizing the FAP. green (AM2-2) linear scale


Particulate Air Pollution in Qatar and the Air Quality Index

Authors Syed Abbas Mehdi (BS 2018), Nourhan ElKhatib (BS 2018)

Faculty Advisor Terrance Murphy, Ph.D.

Category Biological Sciences

Abstract With growing concern for Qatar’s ambient air quality, this study analyzes particulate matter (PM2.5and PM10) concentrations for 379 days from March 2014 until early April 2015. These concentrations were measured by two Met One BAM-1020 instruments. The Air Quality Index (AQI) was calculated from these data, which indicated there were no good days according to the Environmental Protection Agency USA (EPA) AQI standards. The annual PM average exceeded annual guidelines of WHO, EPA, and Qatar’s National Standards. The aim of this study is to bring this data to public attention.

15



Testing for Genetic Modification in Corn Derived Products Across Varying Regions Authors Aya Gaballa (BS 2016), Fatima Amir (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract The purpose of this project is to test for genetically modified gene insertions in food products and relate the results to the genetic modification laws in their respective countries of origin. The first product is Quaker Corn Meal made in the United States, which has very loose laws regarding genetically modified food products. We expect to find genetic modification in this product. The second product is corn Cheetos, packaged in Saudi Arabia, which has strict laws regarding genetically modified food products. We should observe no genetic modification in the Cheetos. The final food product that was tested was Green Giant organic corn, a product of France, which has very strict laws and regulations that ban the harvesting and selling of genetically modified food for human consumption and we expect to find no genetic modification in it. In order to test our hypothesis, we conducted uniplex, multiplex, and bead PCR, and ELISA to determine the presence of our tested genes. Our results show increased levels of modified transgene presence in the USA-based product and minimal modification in the France and KSA-based products. We therefore conclude that the laws and regulations that apply to the respective countries in terms of genetic modification are consistent with our data.

17



The Effects of Aspartame and Methanol on Kidney CellsEffortlessly Authors Aya Abd Elaal (BS 2016), Rayan Mahmoud (BS 2016)

Faculty Advisor Annette Vincent, Ph.D.

Category Biological Sciences

Abstract The aim of this project is to investigate the effects of Aspartame and its derivative Methanol on proximal tubular kidney cells. Our hypothesis is that the Aspartame and Methanol affect the viability of the kidney cells. The final results showed that Aspartame causes an increase in the kidney cell viability over short term exposures (30 minutes - 1 hours), while methanol causes an increase in cell viability over long term exposures (24 hours - 48 hours). In addition, Apoptosis assay showed the fragmentation of the kidney cells’ DNA after exposure to Aspartame (5 and 250 ug/ml) and Methanol (0.5 and 25 ug/ml), which indicated cell death.

19



Forecasting Bank Performance in the GCC Through Market-based Metrics Author Lana Al-Kahala (BA 2015)

Faculty Advisors John O’Brien, Ph.D., Muhammad Fuad Farooqi, Ph.D.

Category Business Administration

Abstract Capital adequacy requirements are measures taken to protect banks from insolvency and excess leverage risk. They have done a poor job at predicting and preventing financial crises due to complexity, lack of robustness and time insensitivity. Hence, it becomes important to identify metrics that can better predict bank performance. This is where market-based metrics come in, providing a simple measure to track the performance of banks, both Islamic and Conventional, globally. The main focus of this paper will be the applicability of these metrics as a measure of banks’ health specifically in the GCC region. The main research question that this study will answer is whether or not market-based metrics can predict bank stress in the Gulf region in a more robust and timely manner than capital adequacy standards do. The two main ratios used for this study are the market-based capital ratio and the market-based equity ratio. We consider marketbased metrics to offer a simpler and more timely measure of bank performance than the current book value measures do. We can incorporate these measures into current capital adequacy requirements through the issuance of Contingent Convertibles, otherwise known as CoCos. These contingent convertibles are similar to convertible debt, however, they are converted from debt to equity when certain triggers occur. The triggers in this case can be the two measures included in this study. Moreover, we can incorporate different levels of triggering. This way, we will gradually expand the capital structure of the bank in case it goes deeper into financial trouble.

21


Forecasting bank performance in the GCC through market-based metrics Lana AlKahala Class of 2015

John O’Brien, Ph.D. Advisor

Fuad Farooqi, Ph.D. Advisor

MOTIVATION

0.6 0.5 XLF Returns

0.4 0.3

The 2008 crisis caused indices in the GCC to decline by around 40% and led to around $2.68bn in write-downs in four different banks alone.

0.2 0.1 0.0 -0.1 -0.2

2003

2014

-0.3 -0.4 -0.5 2008

RESEARCH QUESTION

DATA

Can market based metrics predict bank stress in the GCC region in a more timely manner than capital adequacy standards do? Is there a difference between Islamic and Conventional banks in that aspect?

The data collected was for 30 Conventional and 18 Islamic banks across the GCC region, extending between 2002-

METHODOLOGY The two metrics I use in this study are:

This study looks at market-based measures of solvency broken down by country as well as by type of bank. These metrics rely on the market rather than the book value of capital derived from Basel requirements. The book value commonly used is the Tier 1 Capital Ratio*.

1. Market-based capital ratio Ratio of bank’s market capitalisation to its total assets 2. Market-based equity ratio Ratio of bank’s market capitalisation to its total equity

FINDINGS

Market-based equity ratio

Market-based capital ratio 3.5

10

Conventional

3.0

8

2.5

6

2.0

Islamic

1.5

4

Islamic

1.0

2

0.5

Conventional 2004

2006

2008

2010

2012

2004

2014

2006

2008

2010

2012

2014

RESULTS & RECOMMENDATIONS Market-based metrics provide timelier signals of bank stress, making it easier to detect before write-downs occur.

XLF Returns

framework? Banks should begin to issue Contingent Convertibles, “CoCos”. Contingent Convertibles should have triggers based on market measures of solvency. These convertibles begin graduating when the bank is under stress, expanding the

2003

2014

-

Conventional

cy ratios and decreases unnecessary hold up of capital. 2008

*Tier 1 Capital Ratio: Tier 1/Risk Weighted


Robust Return Model Authors Vanessa Fernandes (BA 2015), Zainab Irshad Baqri (BA 2015)

Faculty Advisors John O’Brien, Ph.D., Muhammad Fuad Farooqi, Ph.D.

Category Business Administration

Abstract For an investor to have a diversified portfolio, it is important for him/her to arrive at an estimation of his/her risk problem. While investing, the biggest decision factor is how much return an investor can earn given the risk. With Qatar now being classified as an emerging market, many passive investors who generally follow indices are now beginning to shift their funds into Qatar. But as a result of the emerging market status, many active investors, who try to outperform indices, have also jumped into the local market anticipating that the upgrade would attract more buyers and attract demand for Qatari stocks. Thus, investors need to be well informed of their choices in order to make wise decisions in the long run. However, since the Qatari market remains relatively underexplored, this research aims to offer investors potential guidance on how one can go about investing in the stocks of the local market by looking at companies’ fundamental and technical measures that help in predicting returns.

23


Your return foreteller. The current return estimation models do not take into account a company’s financial health as they depend solely on market returns. Our research aims to bring out the significance of including a combination of fundamental as well as technical analysis of individual stocks in predicting returns. With Qatar being classified as an emerging market and a lot of active and passive investors have been moving their funds into the local market. A Robust Return Model would thus, serve as a helpful tool in predicting future returns so that investors are guided to optimizing their portfolio. In all, the major motivation was to obtain a model that would help an investor forecast returns in a better way.

Return

Return is defined as the gain or loss of a security in a particular period. The return consists of the income and the capital gains relative on an investment. It is usually quoted as a percentage. The general rule is that the more risk you take, the greater the potential for higher return - and loss

Return Formula Returns

Total Stock Return = ((P1-P0)+D)/P0

Fundamentals

P0 = Initial Stock Price P1 = Ending Stock Price D = Dividends

Technicals

Fundamentals

Returns= β + β1*Fundamentals + β2*Technicals

Collinearity Check

5 years Quarterly

Industry Classification

Technicals Fundamentals

Modeling

Auto Regressions

Data Collection & Research

Statistical Modeling

Estimating next quarter returns using Robust Return Model + Comparing results with existing return models

Model Testing

Fundamentals are defined as the qualitative and quantitative information that contribute to the economic well-being and financial valuation of a company (revenue, earnings, assets, etc.)

Technical Analysis Technical analysis is defined as the method of evaluating securities by analyzing statistics generated by market activity, such as past prices and volumes.

CAPM The Capital Asset Pricing Model (CAPM) is a model that describes the relationship between risk and expected return and that is used in the pricing of risky securities. Ra = Rf + β*(MRP)

Banking

MSE R-Square

Holding Companies

Our Model

SRM

CAPM

0.0002

0.024

0.125

0.98

0.73

Our Model

SRM

CAPM

MSE

0.005

0.01

0.75

R-Square

0.82

0.25

Our estimates for next quarter returns were better than the returns estimated by CAPM and SRM. Our team was also successful in predicting returns with negligible error (approx. zero)

Telecom

Retail Our Model

SRM

CAPM

MSE

1.08

1.064

0.75

R-Square

0.006

0.67

Our Model

SRM

CAPM

0

0.02

0.48

0.80

0.79

MSE R-Square

Research by: Vanessa Fernandes and Zainab Baqri Advisors: John O’Brien & Muhammad Fuad Farooqi

• In future, returns will also depend upon how long the company has been trading. • The method could be replicated in GCC and other emerging markets for conclusive results.

Ra = Return on stock Rf = Risk Free Rate Beta = Systemic Risk MRP = Market Risk Premium

SRM The Simple Regression Model (SRM) is the least squares estimator of a linear regression model with a single explanatory variable.


Developing a Benchmark to Assess and Improve the Detection of Cross-Site Scripting (XSS) Vulnerabilities Author Yusuf Musleh (CS 2016) Faculty Advisor Thierry Sans, Ph.D.

Category Computer Science

Abstract Cross Site Scripting (or XSS) is currently the most widespread vulnerability in web applications allowing an attacker to steal user credentials. Some existing tools that aim at detecting XSS vulnerabilities have poor detection rates as XSS vulnerabilities come in various forms. As a research project we proposed to build a benchmark of both safe and vulnerable code to better understand the type of XSS vulnerabilities that current tools fail to detect. This benchmark will also help us develop new tools that are more reliable and accurate in detecting different types of XSS vulnerabilities.

25



Doha’s Architecture on a Tablet Author Posha Dave (CS 2017)

Faculty Advisors Thierry Sans, Ph.D., Rami el Samahy, APA, Kelly Hutzell, AIA

Category Computer Science

Abstract 4dDoha: Buildings is an educational web-based application that serves as a digital “architectural guidebook.” The goal of this project is to create both Android and iOS applications for this web application. The main challenge is to ensure that adding/changing a feature in the web application does not require additional changes to the two mobile applications. So, instead of making native Android and iOS applications, we decided to adopt a hybrid approach that consists in wrapping the web application into a mobile application using WebView in Android and NSURL in iOS. By doing so, any change in the web application will be reflected in the two mobile applications without any additional changes in the code of the mobile applications.

27



MANET and DTN Communication Protocols Evaluation for Real-World Scenarios Author Hasan Al-Jawaheri (CS 2016)

Faculty Advisor Khaled Harras, Ph.D.

Category Computer Science

Abstract Communication is fundamental for a very wide variety of projects, and the choice of protocol and implementation for a particular scenario is critical. Among many communication schemes, this research focuses on MANET and DTN communication. We have tested and evaluated various communication protocol implementations of many MANET and DTN protocols for multiple scenarios to experience the usability, robustness and performance of available solutions for those scenarios. We evaluated a reliable MANET implementation called OLSRd that could perform multi-hop ad-hoc communication reliably. We also discovered that some DTN solutions, such as IBR-DTN, can compete with, or even outperform, end-to-end protocols like FDT and UDT for single-hop scenarios which enables us to use the DTN protocol in scenarios where one-hop communication is fundamental, but delay-tolerance is a plus.

29


MANET And DTN Communication Protocols Evaluation For Real-World Scenarios Hasan Al-Jawaheri, Khaled A. Harras
 Carnegie Mellon University Qatar

I. INTRODUCTION TO MANET

II INTRODUCTION TO DTN

•In the networks we have at home, devices (or nodes) on the network are connected to a router, which connects the nodes to each others and to the rest of the world •In Ad-Hoc mode, nodes connect to one another and each one acts as a router in the network

Ad-hoc setup

Typical setup

vs

• The idea of delay tolerant networks is similar to MANETs, except that an end-to-end connection is not always available • Nodes attempt to send data to their neighbors, hoping that the neighbors can somehow deliver the data to the destination when connection becomes available as nodes move • Research on DTNs aims to minimize replication of data when sending to neighbors by building more knowledge about the network • Building knowledge causes overhead • A balance between the trade-offs has to be found

Ad Hoc network setup
 src: http://images.books24x7.com/bookimages/id_25422/fig437_01.jpg

Typical network setup Src: http://www.thelifenetwork.org/images/adhoc.png

III. Motivation • Learn about and explore DTNs and MANETs • A lot of research without real-life stable solutions • Many implementations are not evaluated nor publicly advertised and thus not many people use them • Al-Jazeera is working on a project where they require ad-hoc and DTN solutions to deliver content to areas with weak connections and thus need a tailored solution that can perform well under those challenged network conditions • There are many application that need proper MANET and DTN services to operate such as: • Space networks • Sensor networks • Drone-base disaster rescue • Firechat (Chinese Ad-Hoc-based messaging app)

V. DTN EVALUATION •Experiments were sending 10MB files from one node to another •Does not evaluate DTN-specific aspects, namely, delay tolerance •The first goal was to test the DTN implementations against common TCP/UDP file transfer protocols to know whether DTN solutions are viable and whether they can replace normal, end-toend file-transfer protocols •The second goal was to discover how those protocols behave under certain network conditions

•We chose 2 DTN implementations JDTN and IBR-DTN to compare them against UDT and FDT (common file transfer protocols)

IV. MANET EVALUATION • We performed experiments on laptops and Raspberry Pi’s • Experiments were streams of UDP and TCP packets on a 1hop pre-test setup then a 2-hop setup • The goal was to observe the reliability and robustness of the protocols: • Click DSR Project - failed • OLSRd - succeeded • The Grid - failed • B.A.T.M.A.N. - failed • The failed implementations had issues such as being old and incompatible, or having very poor performance that makes evaluation impossible • Experiments were
 done in the CMUQ 
 building as shown

• Maximum sending rate is ~2Mbps • So we know nodes are capable of sending at rates up to 2Mbps • We observe an almost-linear curve for transfer up until a certain point (around 1.5Mbps), then reception rate starts to decrease arbitrarily • The same issue happens on both systems, making it likely to be the implementation’s flaw

Set transfer rate vs. Actual transfer rate

Transfer rate at A vs. Reception rate at C

VII. CONCLUSION & FUTURE WORK

• IBR-DTN competes with FDT and wins in every experiment except high packet-loss and high delay

References

•OLSRd has shown ability to perform ad-hoc communication •DTN solutions are very competent compared to end-to-end normal solutions and provide several extra features •DTN solutions were only tested for a small set of parameters •More parameters can be tested •More nodes in MANET and DTN setups •Migrate solutions to work with Android •Attempt using Linux on Android to run the software of phones •Evaluate DTNs on the Raspberry Pi’s

[1] Evan P.C. Jones and Paul A.D. Ward. Routing strategies for delay-tolerant networks. 2006. [2] Azzedine Boukerche, Begumhan Turgut, Nevin Aydin, Mohammad Z. Ahmad, Ladislau Boloni, and Damla Turgut. Routing protocols in ad hoc networks: A survey. 2011. [3] David B. Johnson, David A. Maltz, and Josh Broch. Dsr: The dynamic source routing protocol for multi-hop wireless ad hoc networks. 2001. [4] T. Clausen, P. Jacquet, A. Laouiti, P. Muhlethaler, A. Qayyum, and L. Viennot. Optimized link state routing protocol for ad hoc networks.


Metis: Smart Academic Planner for College Students Authors Rukhsar Neyaz Khan (CS 2015), Sabih Bin Wasi (CS 2015)

Faculty Advisor Mark Stehlik

Category Computer Science

Abstract According to the study conducted by completecollege.org across US universities, only one in three college students graduate on time. This results in the potential loss of $60,000 for each student who misses timely graduation. It was identified that students’ lack of knowledge about course offerings, too many conflicting resources and schedules contribute to this problem. To address this issue, we developed a web application to guide students through their college careers. The core of our system relies on Modular Requirement Framework (MRF) that embeds variety of degree requirements into a graph that can later be traversed to check for remaining requirements a student needs to satisfy. MRF checks for all possible combinations of course-requirement mappings to achieve completeness. Together with MRF, we also developed Course Relevance Ranking System (CRRS) that recommends courses to students based on their academic context and course offerings each semester. Using MRF and CRRS, we built features that can directly impact student’s course choices in their academic career. We analyzed students’ engagement with our system during the alpha-launch phase. It was found, that on average, students spent more than 17 minutes in each of their sessions. Also, two-thirds of registered students returned to our system after using it for the first time. Since our system is independent of university’s course and curriculum structure, Metis can be extended to other universities easily.

31



Real-time Dialectal Speech Recognition Application Authors Aliaa Essameldin (CS 2017), Husam Yasser (TAMUQ, EE, 2017)

Advisors Ahmed Ali (QCRI), Yifan Zhang (QCRI)

Category Computer Science

Abstract With the revolutionized smart devices and Natural Language Understanding, Automatic Speech Recognition becomes a crucial technology. A challenge that faces the Automatic Speech Recognition today is training machines to be able to recognize different dialects of a language. In our research, we focused on finding ways to optimize Automatic Speech Recognition of the Egyptian Arabic dialect. Our focus was to come up with general solutions that can improve recognition for all dialects, not just Egyptian. We did this by developing a real-time Arabic ASR iPhone application and using it to test the online speech decoder’s performance. We looked at a variety of solutions that involved sound streaming optimization methods on the client’s side and different language and acoustic models combinations on the server’s side.

33



Strengthening the Security of Qatari Websites Author Omar Abou Selo (CS 2015)

Faculty Advisor Thierry Sans, Ph.D.

Category Computer Science

Abstract Qatari websites are increasingly becoming the target for many cyber-attacks. Websites with common vulnerabilities are at risk. Our goal is to detect common vulnerabilities in the webstack (webstask refers to the collection of software components used to build and run a website) of Qatari websites and suggest improvements. We built tools to detect the technologies used by websites and their versions and then identify the common vulnerabilities affecting them. Using these tools we have made some statistical data that gives an insight over the security of the Qatari web.

35


Strengthening the Security of Qatari Websites By Omar Abou Selo with advisory from Professor Thierry Sans

Problem:

Solution:

The Qatari web infrastructure is becoming a tempting target for cyber attacks. Websites with common vulnerabilities are at risk.

Our goal is to detect common vulnerabilities in Qatari websites and help the owners to adopt better security practices in the future.

Step 1, Analyze the Webstack:

For example, qatar.cmu.edu:

We have developed a tool that analyzes the webstack. The webstask refers to the collection of software components used to build and run a website.

Javascript jQuery 1.7.1 PHP version 5.3.3 MySQL version 5.0.2 Apache version 2.2.15

Red Hat Linux version 2.6

Image source: https://www.newspindigital.com/nsd-tech-primer-the-traditional-stack-a-k-a-the-lamp-stack/

Step 2, Identify Vulnerabilities:

For Example, PHP/5.3.3:

We have built a tool that identifies the common vulnerabilities affecting the website using the Common Vulnerabilities and Exposures (CVE) database and the results obtained from the webstask analyzer (step 1).

Total of 66 Vulnerabilities

Step 3, Analyze Qatari websites:

Results and Findings:

Server detection

Server side scripting detection

No name

Name

No version

No name

Name

No version

Potential vulnerability

No vulnerability

Version

Potential vulnerability

No vulnerability

Version

No vulnerability

Existing vulnerability

No vulnerability

Existing vulnerability

7%

12% 6%

22%

18%

1% 35%

43%

29% 78%

30%

15%

47%

Servers used

Vulnerability detection rate 5%

5%

35%

Server side scripting used 1%

3%

1%

28%

2%

31%

53%

29%

48%

38%

57%

3%

55%

13%

Websites with Identified Vulnerabilities

Apache

nginx

Sites with Possible Vulnerabilities

Cloudflare-nginx

Microsoft-IIS

Sites with no vulnerabilities detected

Oracle Application Server

Others GWS, IBM HTTP Server, ‌

ASP.NET

PHP

Express

Plesklin

Servlet


Twitter Sentiment Analysis Author Sabih Bin Wasi (CS 2015)

Faculty Advisors Kemal Oflazer, Ph.D., Alexander W. Cheek, M.Des.

Category Computer Science

Abstract With more than 500M tweets sent each day containing opinion, Twitter has become the goldmine for companies to analyze their brand performance and public sentiments about their products. However, due to its sheer volume, it is challenging to comprehensively analyze public sentiments manually. In this project, we achieve this task automatically by building a system that classifies a given tweet into positive, negative or neutral class based on the sentiments it reflects. We built this system using machine learning techniques. Industry accepted algorithm, Support Vector Machine (SVM) was used as a learner with unigram features as a baseline. Our approach was error足 analysis driven and only feature sets that enhanced system performance were added to the final model. Lexical, Twitter Specific and Emoticon 足based features were analyzed. A new approach of using polarity buckets was used to capture lexical features. To avoid curse of dimensionality, only the top 2000 features were selected to be used in our final model. We also performed tuning on our model to get maximum performance gain without overfitting. We found that all major sentiment analysis systems were performing significantly worse on negative tweets. This problem was handled by incorporating cost足 sensitive classification, which introduced bias towards negative class. Ultimately, we were able to achieve statistically significant improvement from baseline. To explore the applications of our system, we also developed interactive visualization tool that can analyze and visualize millions of tweets in less than a second. We achieved that by using distributed search 足index architecture.

37



Stress, Depression, Anxiety and Social Media Use Among Arabic-speaking Undergraduates Authors Bayan Khaled (BA 2017), Alaa Khader (CS 2016), Aya Gaballa (BS 2016), Fatima Amir (BS 2016) Maryam Al-Sulaiti (BS 2016)

Faculty Advisor Crista Crittenden, Ph.D., MPH

Category General Education

Abstract This survey study explores the levels of perceived stress, depression, anxiety, and social media use in a small sample of Arabic-speaking undergrads. Participants were asked to fill out a number of questionnaires in Arabic. A total of 38 participants completed questionnaires. Of those, 63% were women and the mean age was 21.29 (SD=1.94). The results of this preliminary study indicate that Arabic-speaking undergraduates may suffer from more perceived stress than their U.S. counterparts. Not surprisingly, increased levels of perceived stress were associated with increased levels of anxious and depressive symptoms. In terms of social media, Twitter was the only venue in which increased use was associated with more stress and anxiety. The gender differences in perceived stress and depression may be the most important findings and the one most worthy of further investigation. If female Arabic speakers are more stressed and depressed than their male counterparts, determining the reasons why will allow us to develop suitable interventions to make their time at university more enjoyable and rewarding.

39


Stress, Depression, Anxiety and Social Media Use among Arabic-speaking Undergraduates Bayan Khaled, Alaa Khader, Aya Gaballa, Fatima Amir, Maryam Al-Sulaiti, Crista Crittenden Carnegie Mellon University Qatar Psychology Lab INTRODUCTION Very little research has been done to explore the stress levels and psychological correlates of undergraduate Arabic speakers. Also unclear is whether the everyday use of social media, such as Facebook and Twitter, have any effects on the stress and psychological well-being of those who is it in this population. Only one study to date has looked at the perceived stress levels of Arabic speaking undergrads (Chaaya et al, 2010) and included only women. While stress levels among this group were shown to be high, it is unknown whether or not they have any associations with psychological distress such as anxiety. This survey study explores the levels of perceived stress, depression, anxiety, and social media use in a small sample of Arabic-speaking undergrads.

METHODS Arabic speaking undergraduate students from Carnegie Mellon University-Qatar were asked to participate. The study was approved by the CMU Institutional Review Board. Participants were asked to fill out a number of questionnaires in Arabic. The questionnaires consisted of: Perceived Stress Scale (PSS-10, Cohen, Kamarck, & Mermelstein, 1983) The PSS-10 is an extremely well-validated self-report measure of perceived stress. It has both positively and negatively worded questions, therefor items 4, 5, 7 and 8 are reversed score. Scores range from 0 to 40, with higher scores indicating more perceived stress. Generalized Anxiety Disorder Scale (GAD-7; Spitzer, Kroenke, & Williams, 2006) The GAD-7 is a brief, well-validated self-report measure of anxiety. Participants report on the frequency, over the past 2 weeks, of experiencing such symptoms as being so restless they are unable to sit still. Scores range from 0 to 21, with higher scores indicating more anxiety. Patient Health Questionnaire (PHQ-9; Kroenke, Spitzer & Williams, 2001) The PHQ-9 is a well validated scale for assessing depressive symptoms. Participants report how often over the past two weeks they have experienced certain states, such as feeling little interest in doing things. Scores range from 0 – 27, with higher scores indicating more depressive symptoms. Social Media Use Participants were also asked to indicate how many hours a week they spent on e-mail and different social media sites, which included: Facebook, Twitter, Instagram, Snapchat, Blogging, or any other site they could name. RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

RESULTS

RESULTS CONTINUED

A total of 38 participants completed questionnaires. Of those, 63% were women and the mean age was 21.29 (SD=1.94)

Gender Differences: Females more Stressed and Depressed Gender differences were investigated using independent samples t-test. Among social media use, the only gender difference found was that males spend more time on Facebook (8.58, SD = 2.85) then their female counterparts (2.85, SD=1.01; t = 2.30, p = .028). Arabic speaking females were also found to have higher levels of perceived stress (20.83 vs. 17.00; t = -2.04, p = .049) and depressive symptoms (9.54 vs. 4.91; t = -2.76, p = .009) then males. See Figure below:

Mean scores for the stress and psychological questionnaires, as well as for hours spent on social media are presented in the following table:

Questionnaire (Range) PSS-10 (0-40) GAD-7 (0-21) PHQ-9 (0-27) Social Media Use E-mail Facebook Twitter Instagram Snapchat Blogging Other

Mean Score 19.45 (SD=5.63) 6.84 (SD=4.07) 7.94 (SD=5.04) Mean Hours/Week 5.47 (SD=5.89) 4.87 (SD=7.55) 4.00 (SD=16.94) 6.58 (SD=14.59) 9.48 (SD=17.73) 2.75 (SD=9.66) 5.92 (SD=13.79)

25

Gender Differences

20

Males 15

Females 10

5

0

Higher Perceived Stress Versus US Sample Anxiety and depression were within normal ranges among the entire sample. Scores on the perceived stress were similar to those found by Chaaya et al (2010), who found a mean of 20.3 (SD-4.8) among a group of 58 Arabic-speaking undergraduate women. This however is higher than a large sample of people under 25 in the United States who were found to have a mean PSS-10 score of 16.78 (SD=6.86) in 2009 (Cohen & JanickiDeverts, 2012). More Perceived Stress Associated with more Anxiety and Depression Levels of perceived stress, anxiety and depression where all positively correlated with one another, as shown in the table below:

Measure PSS-10 GAD-7 PHQ-9

PSS -10 X .550** .643**

GAD-7

PHQ-9

X .614**

X

** p<.01 Twitter Use Associated with Increased Stress and Anxiety Overall, social media use was not associated with stress, anxiety or depression. However, Twitter use was found to be correlated with perceived stress (r = .323, p = .048) such that the more hours spent on Twitter, the more perceived stress. Twitter was also associated with anxiety (r = .370, p = .022), with those spending more time on Twitter experiencing more anxiety.

Facebook Use (Hours)

Perceived Stress Score

PHQ (Depression) Score

CONCLUSIONS The results of this preliminary study indicate that Arabic-speaking undergraduates may suffer from more perceived stress than their U.S. counterparts. However, this may be in part to the U.S. data including anyone under 25, which may not include people who are in high stress situations such as university. Not surprisingly, increased levels of perceived stress were associated with increased levels of anxious and depressive symptoms. In terms of social media, Twitter was the only venue in which increased use was associated with more stress and anxiety. The gender differences in perceived stress and depression may be the most important findings and the one most worthy of further investigation. If female Arabic speakers are more stressed than their male counterparts, determining the reasons why this might be may make their time at university more rewarding and enjoyable. REFERENCES

1.Chaaya, M., Osman, H., Naasan,G., & Mahfoud, Z. (2010). Validation of the Arabic version of the Cohen perceived stress scale (PSS-10) among pregnant and postpartum women. BMC Psychiatry, 10, 111-117. 2.Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A global measure of psychological stress. Journal of Health and Social Behavior, 24, 385-396. 3.Spitzer, R.L., Kroenke, K., Williams,J.B.W., & Lowe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine, 166, 1092-1097. 4.Kroenke, K., Spitzer, R.L., & Williams, J.B.W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606-613. 5.Cohen, S., & Janicki-Deverts, D. (2012). Who’s stressed? Distributions of psychological stress in the United States in probability samples from 1983, 2006 and 2009. Journal of Applied Social Psychology, 42(6), 1320-1334.


Validity and Reliability of the Arabic Version of the Perceived Stress Scale (PSS-10) Authors Maryam Al-Sulaiti (BS 2016), Fatima Amir (BS 2016), Aya Gaballa (BS 2016), Alaa Khader (CS 2016), Bayan Khaled (BA 2017)

Faculty Advisor Crista Crittenden, Ph.D., MPH

Category General Education

Abstract Introduction: The Perceived Stress Scale (PSS-10, Cohen, Kamarck & Mermelstein, 1983) is one of the most widely used self-report measures of stress (current citations = 8,559). The PSS-10 measures the degree to which a person perceives their world to be uncontrollable, unpredictable and overwhelming. Scores on the PSS have been found to predict a wide range of mental and physical outcomes, including cancer and diabetes. This current study seeks to validate an Arabic version of the PSS-10, in order to ensure its appropriateness in a different cultural setting. It is hoped that this will lead to future research in determining whether the stress effects described above are generalizable to non-English speakers. Methods: Multiple psychological questionnaires, including the PSS-10, were completed in Arabic. Results: 38 participants (63% female, mean age 21) completed the questionnaires. The PSS-10 was found to be highly correlated with measures of anxiety and depression, and negatively correlated with life satisfaction. It was also found to have good internal reliability. Conclusion: The Arabic version of the PSS-10 is both valid and reliable, though further research is needed to verify results.

41



Effect of the Initial Conditions on the Interfacial and Bulk Dynamics in RichtmeyerMeshkov Instability Under Conditions of High Energy Density Author Arun Pandian (CS 2015)

Faculty Advisor Snezhana Aberzhi, Ph.D.

Category General Education

Abstract: This research deals with the effect of the initial conditions on the interfacial and bulk dynamics in RichtmeyerMeshkov instability (RMI) at the interface of two fluids with different densities when a shock wave refracts it. While previous work in this field has focused on the effect of various initial parameters such as wavelength and amplitude of a symmetric interface, information about the initial relative phase of a multi-wave interphase is largely ignored. Using Smooth Particle Hydrodynamics Code (SPHC), we vary the relative phase between the interfacial sinusoidal waves and we observe the eects on this on the structures formed at the interface in the form of bubbles and spikes. By doing so we have observed number of qualitative and quantitative effects on the interfacial dynamics. We hypothesize that symmetry of the interface is important in creating a pressure gradient at the front. This pressure gradient in turn causes the formation of the aforementioned structures at the interface. Evidence so far correlates with this hypothesis as more symmetry at the interface leads to faster growing spikes as compared to asymmetric cases.

43


e

L ~ µm)

Simula,ons 3 (L ~ mm)

he

/2. s observed.

Effect of the ini,al condi,ons on the interfacial and bulk dynamics in Richtmyer-­‐Meshkov instability under condi,ons of high energy density Arun Pandian, Carnegie Mellon University – Qatar Robert F. Stellingwerf, Stellingwerf Consul,ng, Inc., USA Snejana I. Abarji, Carnegie Mellon University – Qatar; We studied Richtmyer-­‐Meshkov instability (RMI) induced by strong shocks by means of Smooth Par,cle Hydrodynamic Simula,ons (SPH) to determine the effect of the ini,al condi,ons on RMI evolu,on.

RMI plays an important role in various phenomena, from iner,al confinement fusion and explosion of supernova to scramjets and material mixing.

Our new and unexpected result is the strong dependence of fluid transport on rela,ve phases of perturba,on waves.

RMI occurs when a shock passes through a perturbed interface between two fluids with different values of acous,c impedance (fluid densi,es). RMI plays an important role in various phenomena, from iner,al confinement fusion and explosion of supernova to scramjets and material mixing. ASer the shock passage from the light to the heavy fluid, the fluid mo,on is a superposi,on of two mo,ons — the mo,on in the bulk and the growth of the interface.

RMI Evolu,on

SPHC

Bubbles (green and red) and spikes (blue) form as the shock passes through the interface from leS to right. Predic,ons of the highly nonlinear theory are confirmed, e.g., at late ,mes the bubble fronts fla`en due to their effec,ve decelera,on. Effects of the amplitudes and rela,ve phases on the perturba,on waves on RMI evolu,on (1)

M = 5, A= 0.8, a_0/λ = 0.33, phase = 2π

SPHC is a Smooth Par,cle Hydrocode developed over a period of roughly 20 years. SPHC is being used for high velocity impact modeling, gasdynamic flows, mul,-­‐phase flows, shock analysis, shaped charge design, and other challenging problems. Effects of the amplitudes and rela,ve phases on the perturba,on waves on RMI evolu,on (2)

M = 5, A= 0.8, a_0/λ = 0.33, phase = π

•  Ini,al perturba,on has two wavelengths. •  Ini,al perturba,on has two wavelengths. •  Wavelength ra,o = 2; amplitude ra,o = 0.1; phase difference = •  Wavelength ra,o = 2; amplitude ra,o = 0.1; phase 2π (same phase). difference = π (out of phase). •  Strong effect of the second wave (amplitude) on the spike •  Strong effect of the second wave phase on the spike structure (e.g. total mass). mass transport is observed.

Implica,ons We studied the effect of ini,al perturba,on on the evolu,on of Richtmyer-­‐Meshkov instability driven by strong shocks. We found: -­‐  In mul,-­‐mode case, the RMI evolu,on depends on the amplitude and the rela,ve phases of the waves. -­‐  These factors strongly influence over all structure, symmetry, and mass transport of the flow.

Temperature distribu,on indicates regions where heavy mass elements can be generated in a Supernova explosion. Can be used in mi,ga,on and control of mixing in internal confinement fusion Can generate more efficient combustors (scram jet igni,on.)


Designing Qatar’s Infrastructure in a Human Centered Way Author Noshin Anjum Nisa (IS 2015)

Faculty Advisor Alexander Cheek, M.Des.

Category Information Systems

Abstract: Qatar belongs to the top two countries for highest fatality rate from car crashes. Qatar is a fast growing country with constant development of roads, infrastructure and cities. These developments gave rise to the number of cars and people in this country, but two important questions emerge: ‘Are the infrastructures and roads well designed to accommodate the growing number of cars?’ and ‘Are they safe enough for people to walk from one place to the other?’ From past records we can understand that road accidents play a major role in the fatality rate of this country, which implies that it is highly important for Qatar to focus on their road safety to create a better and safe driving environment. This environment can be only achieved through urban design while considering the existing roads and infrastructure. The main purpose of this project was to look at the existing traffic problems that Qatar is facing from a top view and suggest solutions, which exist. We looked at well-developed countries and learnt about their strategies in mitigating traffic problems and promoting a better lifestyle. After learning the strategies, we redesigned the existing roads, which will not only promote a healthier lifestyle but will also help in traffic congestions. This study took into consideration of all the past and existing approached towards minimizing car accidents. But it found that Urban Design has been proven to be the solution towards such issue, which is very complex and involves different groups of people to get involved in mitigating it. The recommendations proposed by this study are applicable with current road system and we believe that it will promote a safer and healthier lifestyle in Qatar.

45



Factors Influencing the Adoption of Dermatology Diagnoses Through Mobile Applications Author Abdulrahman Takiddin (IS 2015)

Faculty Advisor Selma Limam-Mansar, Ph.D.Divakaran Liginlal, Ph.D.

Category Information Systemsnformation Systems

Abstract In this research study, we focus on mobile application in the dermatology field. We collaborated with physicians at the Dermatology Department in Hamad Medical Corporation Qatar. Their problem is that physicians are not readily available to provide a second opinion to a colleague on patients’ cases, as they have their own duties. We proposed to use a mobile application that would allow then to share diagnostic information and pictures among themselves. Our goal was to find out if using the mobile application would enable faster and more convenient interactions between physicians.

47



Malicious Online Behaviour Author Sama Kanbour (IS 2015)

Faculty Advisor Thierry Sans, Ph.D.Mohammad Hammoud, Ph.D.

Category Information Systemst-Graduate

Abstract In 2014, Checkpoint reported that a user accesses a malicious site every minute, and a malware is downloaded every 10 minutes. If we were to prevent malicious browsing habits, it is necessary to go beyond pure technical considerations and instead understand how users put themselves at risk. Our study analyzes the web browsing behavior of the population of Qatar by mining web data analytics of the most popular websites. One of our first conclusions was that the population of Qatar is more inclined to become a victim of malicious attacks than the rest of the world. Our results, structured as a visual storytelling, reveal further characteristics and behaviors tied to this issue.

49


Anaconda

Goslate

WordNet

Stanford Ner

Similar Web

Web of Trust

Virus Total

Free Geo IP

Special thanks to Virus Total for sponsoring this research.

Python NLTK

Alexa Top Sites

RESOURCES

NEUTRAL

viruses, phishing or scam attacks

* Negative sites are subject to malware,

in the rest of the world.

a negative* evaluation, compared to 5%

10% of untrusted sites in Qatar have

Ads / Pop-ups

Spam

Hate

Suspicious

Unethical

Potentially Illegal

Scam

Phishing

Malware

Virus

6.8%

6.9%

than the rest of the world.

2.2%

.7%

.8%

.5%

2.1%

2.7%

QATAR WORLD

Qatar population is 3 times more exposed to malicious sites

                                      

 Link characteristics to malicious behavior

 Summarize findings with data visualization

 THREAT REACH

QUESTIONNABLE

5%

10%

NEGATIVE

those browsed in the rest of the world.

Websites browsed in Qatar are less trusted than

 Clean and cluster data to identify characteristics

 Extract metadata from service providers

 Take a snapshot of popular sites in October

APPROACH

we can prevent unsafe browsing behavior and protect our privacy.

The better our understanding is on cyber threats in Qatar, the more

OBJECTIVE

able to extract, classify and study them.

providers Alexa Top Sites, Similar Web, Web of Trust and Virus Total, we were

in the world during the month of October 2014. Using the analytical service

In response to this question, we studied 3,330 most visited sites in Qatar and

Are people in Qatar more exposed to online attacks than the rest of the world?

                                      

 NEGATIVE TRUST

PROBLEM

BASED ON WEB OF TRUST AND VIRUS TOTAL

JUSTIFYING THREAT IN QATAR

EXPRESSING MOTIVATION

SAMA KANBOUR  THIERRY SANS  SAMAKANBOUR.GITHUB.IO/MALICIOUS

MALICIOUS ONLINE BEHAVIOR

in North America and Western Europe.

Malicious sites browsed in Qatar are mainly located

                      

CRI

ISR

IRL

BEL

ROU

DEU

CHE

PRT

GBR

NLD

USA

Lottery

Investing

File Sharing

Movies

Adult

Advertising

Software

Visas

Music and Audio

Search Engine

Ad Network

file sharing, as well as blocked sites (lottery and adult).

Most viruses come from ads, downloading audios and videos,

1.9%

5.7%

YouTube

Twitter

Inst

Badoo Netlog

Odnoklassniki

DailyMotion

Reddit

Tagged

Facebook

Ask

VK

movie, love. Malicious sites are mostly shared via Facebook and YouTube.

Among the top words that lead to malicious sites are: download, watch, free,

 KEYWORDS SEARCHED AND SOCIAL REFERALS

                    

 MALICIOUS SITES CATEGORIES

                            

 THREAT ORIGINS

BASED ON SIMILAR WEB

UNDERSTANDING THREAT ROOT


Voices Of Al-Khor: A Study In Digital Culture Heritage Author Aisha Al-Missned (IS 2015)

Faculty Advisor Divakaran Liginlal, Ph.D.

Category Information Systems

Abstract Arab culture, and specifically Qatari culture, has always been an oral culture. Many researchers have documented the significance of “hearing” over “seeing” in Arab culture. This study examines how cultural Arab traits such as the importance of the aural senses and the oral tradition influences digital methods of cultural preservation. The research focuses on the case of Al-Khor, a coastal city in northern Qatar. The key objective is to explore ways of building authentic soundscapes that portray Qatari culture and demonstrate the results through a prototype of an augmented storybook featuring the sounds around the Al-Khor shoreline. The larger objective of the research is to preserve the fragile and fading Qatari culture and to ensure that Qatari culture will live on for future generations.

51



AL-BLEU: A Metric and Dataset for Evaluation of Arabic Machine Translation Authors Houda Bouamor, Ph.D., Hanan Alshikhabobakr, Behrang Mohit, Ph.D., Kemal Oflazer, Ph.D.visor Divakaran Liginlal, Ph.D.

Category Postgraduateory Information Syste

Abstract Evaluation of Machine Translation (MT) continues to be a challenging research problem. There is an ongoing effort in finding simple and scalable metrics with rich linguistic analysis. A wide range of metrics have been proposed and evaluated mostly for European target languages. These metrics are usually evaluated based on their correlation with human judgements on a set of MT output. While there has been growing interest in building systems for translating into Arabic, the evaluation of Arabic MT is still an under-studied problem. Standard MT metrics such as BLEU or TER have been widely used for evaluating Arabic MT. These metrics use strict word and phrase matching between the MT output and reference translations. For morphologically rich target languages such as Arabic, such criteria are too simplistic and inadequate. In this work, we present a human judgements dataset and also an adapted metric for evaluation of Arabic machine translation: the Arabic Language BLEU (AL- BLEU), an extension of the BLEU score for Arabic MT evaluation. Our mediumscale annotated dataset is the first of its kind for Arabic with high annotation quality. It is composed of the output of six MT systems with texts from a diverse set of topics. A group of ten native Arabic speakers annotated this corpus with high-level of inter- and intra-annotator agreements. Our AL-BLEU metric uses a rich set of morphological, syntactic and lexical features to extend the evaluation beyond the exact matching. An automatic evaluation metric is said to be successful if it is shown to have high agreement with humanperformed evaluations. We use Kendall’s tau τ , a coefficient to measure the correlation between the system rankings and the human judgements at the sentence level. We conducted a set of experiments to compare the correlation of AL-BLEU against the state-of-the art MT evaluation metrics and demonstrated that our metric has a stronger average correlation with human judgments than the BLEU and METEOR scores. The size and diversity of the topics in our dataset, along with its relatively high annotation quality (measured by IAA scores) makes it a useful resource for future research on Arabic MT and its evaluation.

53


AL-BLEU: A Metric and Dataset for Evaluation of Arabic Machine Translation

Houda Bouamor, Hanan Alshikhabobakr, Behrang Mohit and Kemal Oflazer Carnegie Mellon University in Qatar http://nlp.qatar.cmu.edu/

Arabic in MT Evaluation

BLEU and Arabic

• BLEU [Papineni et al., 2002] continues to be the de-facto MT evaluation metric • Too strict comparison between MT output and reference translations

• BLEU heavily penalizes Arabic Source Reference

• Simplistic and ineffective for languages with flexible word order and rich morphology

MT translation (1) MT translation (2)

• MT evaluation for Arabic is an under-studied problem • There is no human judgment dataset for Arabic MT

France plans to attend ASEAN emergency summit. RankBLEU RankHuman

 Â? Â? Â? Â

2 1

1 2

• How can we adapt BLEU to support Arabic morphological, syntactic and lexical features?

Human Judgment Dataset Data

Systems

• We annotated a corpus composed of different text genres:

• English-Arabic NIST 2005 corpus: commonly used for MT evaluations and composed of news (1056 sentences) • MEDAR corpus [Maegaard et al., 2010]: texts related to the climate change (509 sentences) • In-house Arabic translations of 7 Wikipedia articles (327 sentences)

• We use six state-of-the-art English to Arabic MT systems including: • Four research-oriented PBMT systems with:

• various morphological and syntactic features • different Arabic tokenization schemes

• Two commercial off-the-shelf systems

Judgment collection: a ranking problem

Annotation quality

• Rank the sentences relatively to each other from the best to the worst

• Head-to-head pairwise agreement using Cohen’s kappa

• Adapt the framework of WMT 2011 Shared Task on evaluating MT metrics [Callison-Burch et al., 2011]

English-Arabic Average EN-EU English-Czech

• Assess the quality of each system using Ten bilingual annotators • Use Appraise Toolkit to rank sentences [Federmann, 2012]

Îşinter 0.57 0.41 0.40

Îşintra 0.62 0.57 0.54

• Reliable and consistent annotation quality

AL-BLEU

Evaluation and Results

• Extend BLEU to deal with Arabic rich morphology in MT evaluation

• Measure correlation between AL-BLEU, BLEU and human judgments at the sentence level

• Update the n-gram scores with partial credits for partial matches

• Kendall’s τ as a correlation metric

• morphological, syntactic and lexical matches

Ď„=

• Compute a new matching score, for each hypothesis and reference:

match(th , tr ) =

 1,   

th : hypothesis token if th = tr

5   wf i otherwise  ws + i=1

Where: concordant pair: rank (judge) = rank (metric)

tr : reference token

• Compare the correlation of AL-BLEU vs. [Denkowski and Lavie, 2011]

ws : stem weight wf i : morph. weights

‍٠عنسا ت؎ءء Ů„Ř­ŘśŮˆŘą اŮ„Ů‚Ů…ŘŠ اŮ„ءاع،؊ لألسيان‏

REF:

‍٠عنسا تؚتزŮ… Ř­ŘśŮˆŘą Ů‚Ů…ŘŠ االسيان اŮ„ءاع،؊‏

BLEU and METEOR

• Use 900 sentences extracted from the dataset: 600 Dev and 300 Test

• Avoid the zero n-gram counts and use a smoothed version of BLEU and AL-BLEU

France Plans To AttendPOS ASEAN Summit person, definiteness • SRC: Morphological features: tag,Emergency gender, number,

HYP:

# of concordant pairs - # of discordant pairs total pairs

exact stem & morph. stem only morph. only

BLEU METEOR AL-BLEUM orph AL-BLEUStem AL-BLEU

• MADA [Habash et al.,2009] provides for stem and morph. features • Weights are optimized towards improvement of correlation with judgments • Hill-climbing algorithm is used on a development set

Dev 0.3361 0.3331 0.3746 0.3732 0.3759

Test 0.3162 0.3426 0.3535 0.3564 0.3521

AL-BLEU correlates with human judgments better than BLEU

Conclusion http://nlp.qatar.cmu.edu/resources/AL-BLEU/

• AL-BLEU is a geometric mean of the different matched n-grams

• We provide an annotated corpus of human judgments for evaluation of Arabic MT.

Acknowledgment

• We adapt BLEU into AL-BLEU for evaluation of Arabic MT.

This publication was made possible by grants YSREP-1-018-1-004 and NPRP-09-1140-1-177 from the Qatar National Research Fund (a member of the Qatar Foundation).

• AL-BLEU uses morphological, syntactic and lexical matching. • AL-BLEU could be improved by using richer linguistic information in evaluation of Arabic MT.

1


Alice in the Middle East - Learning Computational Thinking in K-12 Authors Huda Gedawy, Saquib Razak, Ph.D.

Category Postgraduate

Abstract Alice is a programming environment designed for novice programmers to create 3D virtual worlds, including animations and games. Alice proposed to use the context of animation to introduce computing, logic, and communication skills to students in secondary schools. We anticipate that Alice will increase students’ awareness and interest in computing fields and provide them with the skills needed to successes in the workplace as well as in higher education pursuits. Furthermore, Alice will support the Qatar National Vision of having a modern-class educational system. This research focus on adapting Alice for the Middle East with 3D models that are relevant to Qatari culture and are familiar for the young students. In addition, we focused on developing curricular materials in Arabic and English that introduce concepts of logical thinking, and problem solving skills in the context of creating locally acceptable animations to preserve traditions. Finally, we provided online access to the accompanying curricular materials for teachers and students. Three schools in Qatar collaborated with CMU in Alice and around 320 students in three different grades are taking Alice as part of their ICT curriculum. For the future, we plan to spread Alice ME within the public and private schools throughout Qatar and ME region.

55


in the Middle East Learning Computational Thinking in K-12 Huda Gedawy, Saquib Razak

Alice is a programming environment designed to enable students to learn the basics of logic and computing in the context of animations and games

Creating Models

• Designing and developing 3D models culturally relevant and interesting for local students • Customizing learning experience

2

1

3 Schools 3 Grades (7-9)

320 Students

Developing Curriculum

• Developing curricular materials that focus on concepts of analytic, logical thinking, and problem solving skills • Providing a modern world-class educational system that preserves local traditions • Creating learning resources in Arabic • Providing access to all resources on the project website

Visit

alice.qatar.cmu.edu

3

4 Pilot

• Al-Arqam Academy • Khaled bin Al-Waleed Secondary School • Ali bin Abi-Taleb Secondary School

“I hope we study it in grade 9 also” -a student “We want to study Alice again because it’s very useful for the future and enjoying”-a student

Future Work

• Translating Alice models into Arabic

• Creating an English text book for local schools • Applying Alice in all schools of Qatar and ME region


窶連rabiyyatii: An Innovative Technology-Based Curriculum for Teaching Arabic to Native Speakers Authors Hanan Alshikhabobakr, Zeinab Ibrahim, Ph.D., Pantelis Papadopoulos, Ph.D. (Aarhus University), Andreas Karatsolis, Ph.D. (MIT)

Category Postgraduatelty Advisor Divakaran Liginlal, Ph.D.CategoryInformation System

Abstract This paper presents an original research that aims to provide a path towards addressing a long-standing problem in the teaching of the Arabic language to native speakers. Using an interdisciplinary approach, this research focused on the development of technology-based curriculum for the teaching of Modern Standard Arabic (MSA) to 5-6 year-old (kindergarteners) Arabic native speakers. Utilizing the affordances of tabletop touchscreen surface technology, we are reporting on the development of a framework for language instruction based on combining three innovative approaches: (a) student-centered curriculum based on storytelling, (b) physical classroom reconfiguration, and (c) interactive software centered on multi-player, collaborative games. Following the deployment of the new pedagogy into a pre-elementary school language-learning curriculum with both the curricular and technological innovations working together, we report on the significant changes in attitudes towards MSA by students, as well as the learning gains they experienced over the course of a semester. Finally, we report on modifications and improvements which resulted from the outcomes of the actual experiment. Keywords: Arabic language learning, Arabic language teaching, technology, educational games

57



ChiQat Tutor: Using Worked-Out Examples in an Intelligent Tutoring System Authors Nick Green, Ph.D. (University of Illinois at Chicago), Barbara Di Eugenio, Ph.D. (University of Illinois at Chicago), Davide Fossati, Ph.D., Omar AlZoubi, Ph.D., Lin Chen, Ph.D. (University of Illinois at Chicago)

Category PostgraduateFaculty Advisor Divakaran Liginlal, Ph.D.egorormation Systems

Abstract Worked-out examples have been proven to be an effective tutoring strategy. In worked-out examples tutors define a problem, show the steps to solve it, and discuss the solution. Our previous analysis of tutoring sessions in Computer Science also showed the relevance of worked-out examples. We developed an intelligent tutoring system, ChiQat-Tutor, which uses educational techniques to help students learn Computer Science fundamentals. ChiQat-Tutor has a modular architecture allowing tutoring of several concepts such as linked lists, binary search trees, and recursion. Using worked-out examples is one of the teaching strategies that ChiQat can employ. Examples are created by a human tutor within a custom built editor. They are defined by a series of actions that the tutor can take. Such actions include (a) written messages describing concepts or operations; (b) points of interest shown by drawing and pointing on a graph; (c) steps that are executed within the solution. Our current focus is on using worked-out examples within a tutoring session on linked lists, a fundamental Computer Science data structure. During the lesson students can hit the ‘Example’ button on the tutoring window to start executing an example. ChiQat-Tutor supports contextual example execution, where an example relevant to what the student is learning is automatically selected by the system. For example, if the student is working on a problem where a node needs to be added into a list, then the given example will also involve adding a node into another list. We are preparing to evaluate the value that worked-out examples bring to the tutoring of linked lists within our system, by deploying ChiQat-Tutor in an undergraduate Computer Science data structure course. Students will be split in two conditions: half of the group will use ChiQat-Tutor with worked-out examples enabled, and the other half without. Students will take a pre-test before using the system and a post-test after an hour of using it. Learning gains will be calculated from the pre/post-tests. Additionally, ChiQat-Tutor will collect a detailed log of the sessions. From these logs we will analyse how the use of the system and worked-out examples in particular, also affect learning gains.

59


ChiQat Tutor – Using Worked-Out Examples in an Intelligent Tutoring System Nick Green1, Barbara Di Eugenio1, Davide Fossati2, Omar AlZoubi2, and Lin Chen1 1 Computer

2 Computer

Science, University of Illinois at Chicago, Chicago, IL, USA Science, Carnegie Mellon University in Qatar, Doha, Qatar

Goals

A worked-Out Example in Action

The Experiment

Worked-Out Examples (WOE) demonstrate a step by step solution of a problem [4]. Using previous analysis [2] on tutoring sessions, we are developing an ITS, ChiQat-Tutor [1], to use this strategy.

The student can request an example by clicking on the ‘Example’ button in the interface. An example will then be executed based on the problem which they are studying. The example can be related to the problem, but it can be different enough to stop cheating.

Two experiment were conducted with undergraduate computer science students enrolled in a Data Structure course at University of Chicago at Illinois (UIC):

Our questions:  Which conditions trigger WOEs?

 Participants: a total of 133 students were split into two groups; one used ChiQat with WOE enabled, the remaining had it disabled.  Protocol: All students completed a pre-test, followed by using the system for 40 minutes with no guidance from researchers. The session concluded with a post-test.

 How do tutors structure WOEs?  How effective can WOEs be?

a. Writing messages describing a step ChiQat Tutor

The pre/post test marks will be analysed, along with the individual student log data to get some insight into how the strategy was used and if there were any differing outcomes between the two groups.

ChiQat is a tutoring system offering lessons on linked lists, binary search trees and recursion [1]. In each lesson, students are presented with problems that need to be solved. ChiQat can aid students solve the problem using proactive and reactive hints.

Data Acquisition All ChiQat lessons utilise a common logging system, whereby student and tutor actions are recorded. Such actions include:  Lesson and problem events, e.g. a problem has been solved

b. Graphically showing a point of interest

 Instructions and prompts given by the tutor  All user interaction, such as button clicks Log data can give additional insights into students behaviour in using the system.

ChiQat – Linked List Tutorial

Worked-Out Examples in ChiQat Worked-out examples have been included in ChiQat. Examples are created by a human tutor within a custom built editor.

Acknowledgements

Worked-out examples define actions that the intelligent tutor can take while playing the example:

This work is supported by award NPRP 5-939-1-155 from the Qatar National Research Fund.

 Writing messages describing a step  Graphically showing a point of interest  How to progress onto the next step of the example

c. How to progress onto the next step of the example

References [1] AlZoubi, O., Green, N., Fossati, D., and Di Eugenio, B. ChiQat: An intelligent tutoring system for learning computer science. Qatar Foundation Annual Research Conference, November 2013. [2] Di Eugenio, B., Chen, L., Green, N., Fossati, D., and AlZoubi, O. Worked Out Examples in Computer Science Tutoring. In AIED 2013, 16th International Conference on Artificial Intelligence in Education. Memphis, TN, July 2013. [3] Pirolli, P. L., and Anderson, J. R. The role of learning from examples in the acquisition of recursive programming skills. Canadian Journal of Psychology/Revue canadienne de psychologie, 39(2), 240.

Worked-Out Example Editor

[4] Sweller, J. The worked example effect and human cognition. Learning and Instruction, 16(2), 165-169.


CoMingle: Distributed Logic Programming for Decentralized Android Applications Authors Edmund S. L. Lam, Ph.D., Iliano Cervesato, Ph.D.culty Advisor

Category PostgraduateFaculty Advisor DivakarDivakaran Liginlal, Ph.D.

Abstract CoMingle is a logic programming framework aimed at simplifying the development of applications distributed over multiple mobile devices. Applications are written as a single declarative program (in a system-centric way) rather than in the traditional node-centric manner, where separate communicating code is written for each participating node. CoMingle is based on committed-choice multiset rewriting and is founded on linear logic. We describe a prototype targeting the Android operating system and illustrate how CoMingle is used to program distributed mobile applications. As a proof of concept, we discuss several such applications orchestrated using CoMingle.

61


CoMingle: Distributed Logic Programming for Decentralized Android Applications Edmund S. L. Lam and Iliano Cervesato

1. Distributed Programming

Carnegie Mellon University, Qatar

6. Compilation of CoMingle programs

Computations that run at more than one place at once Now more popular than ever Cloud computing Modern webapps Mobile device applications

System-centric specification - High-level, concise - Allows distributed events Choreographic Transformation ↓

Hard to get right

Concurrency bugs (race conditions, deadlocks, . . . ) Communication bugs (type mismatch between sender and receiver, . . . ) Traditional programming woes (non-correctness, non-termination, . . . )

Two views

Node-centric — program each node separately System-centric — program the distributed system as a whole

Compiled to node-centric code

Node-centric specification - Match facts within a node - Handles lower-level concurrency - Synchronization - Progress - Atomicity and Isolation

2. What is CoMingle?

Imperative Compilation ↓

A programming language for distributed mobile apps, that is Declarative High-level and Concise Based on linear logic

Enables high-level system-centric abstraction

Specifies distributed computations as ONE declarative program Compiles into node-centric fragments, executed by each node

Designed for mobile apps that run across Android devices Programming in CoMingle:

Is NOT an alternative to existing frameworks It augments existing mobile development platform (Java + Android SDK)

Low-level imperative compilation - Java code - Low-level network calls - Operationalize multiset rewriting - Trigger and actuator interfaces (To/from Java + Android SDK)

7. Example App Implemented with CoMingle: Drag Racing

3. CoMingle Program by Example module comingle.lib.ExtLib import { size :: A -> int. } predicate swap :: (loc,int) -> trigger. predicate item :: int -> fact. predicate display :: (string,A) -> actuator. rule pivotSwap :: [X]swap(Y,P), {[X]item(D)|D->Xs. D >= P}, {[Y]item(D)|D->Ys. D <= P} --o [X]display(Msg,size(Ys),Y), {[X]item(D)|D<-Ys}, [Y]display(Msg,size(Xs),X), {[Y]item(D)|D<-Xs} where Msg = "Received %s items from %s".

Inspired by Chrome Racer (www.chrome.com/racer) Race across a group of mobile devices Decentralized communication (over Wifi-Direct)

rule init :: [I]initRace(Ls) --o {[A]next(B)|(A,B)<-Cs}, [E]last(), {[I]has(P), [P]all(Ps), [P]at(I), [P]rendTrack(Ls) | P<-Ps} where (Cs,E) = makeChain(I,Ls), Ps = list2mset(Ls).

4. CoMingle Runtime by Example Let s = swap, i = item and d = display Node: n1 s(n2, 5), i(4), i(6), i(8)

Node: n2 i(3), i(20)

rule start :: [X]all(Ps) \ [X]startRace() --o {[P]release()|P<-Ps}.

Node: n3 s(n2, 10), i(18)

rule tap

[n1] d("1 from n2"),i(3) [n1] s(n2,5),i(6),i(8) [n2] d("2 from n1"),i(6),i(8) [n2] i(3) Node: n1 d(”1 from n2”) i(3), i(4)

Node: n2 d(”2 from n1”) i(6), i(8), i(20)

rule win

Node: n3 s(n2, 10), i(18)

Node: n1 d(”1 from n2”) i(4), i(3)

Node: n2 d(”2 from n1”) d(”1 from n3”) i(18), i(20)

Node: n3 d(”2 from n2”) i(6), i(8)

:: [X]last() \ [X]all(Ps), [X]exiting(Y) --o {[P]decWinner(Y) | P <- Ps}.

+ 862 lines of properly indented Java code

[n3] d("2 from n2"),i(6),i(8) [n3] s(n2,10),i(18) [n2] d("1 from n3"),i(18) [n2] i(6),i(8)

:: [X]at(Y) \ [X]sendTap() --o [Y]recvTap(X).

rule trans :: [X]next(Z) \ [X]exiting(Y), [Y]at(X) --o [Z]has(Y), [Y]at(Z).

700++ lines of local operations (e.g., display and UI operations) < 100 lines for initializing CoMingle runtime

8. Current Status Prototype Available at https://github.com/sllam/comingle Proof-of-concept Apps

Drag Racing Battleships P2P Wifi-Direct Directory Swarbble

5. CoMingle Architecture

Application Runtime: - “Main” Android app (Java + Android SDK) - Performs local operations (e.g., UI, local computations) Rewrite Runtime - Executes CoMingle programs - Orchestrates communication between Android devices

http://www.qatar.cmu.edu/˜sllam/

Current team:

Compiler & Language Architects: Edmund Lam and Iliano Cervesato Android App Developers: Ali Elgazar and Zeeshan Hanif Alumni Developer: Nabeeha Fatima

∗ This work will appear in proceedings of International Federated Conference on Distributed Computing Techniques (Coordination’2015) ∗ Funded by the Qatar National Research Fund as project JSREP 4-003-2-001 (Effective Parallel and Distributed Programming via Join Pattern with Guards,Propagation)

sllam@qatar.cmu.edu


Correction Annotation for Non-Native Arabic Texts in the Qatar Arabic Language Bank Authors Wajdi Zaghouani, Ph.D., Nizar Habash, Ph.D. (New York University in Abu Dhabi), Houda Bouamor, Ph.D., Alla Rozovskaya, Ph.D. (Columbia University), Ossama Obeid, Kemal Oflazer, Ph.D.

Category PostgraduateFaculty Advisor DivakarFaculty AdvisorDivakaran Liginlal, Ph.D.ategornformation System

Abstract The Qatar Arabic Language Bank (QALB) is a corpus of naturally written unedited Arabic and its manual edited corrections. QALB has about 1.5 million words of text written and post-edited by native speakers. The corpus was the focus of a shared task on automatic spelling correction in the Arabic Natural Language Processing Workshop that was held in conjunction with 2014 Conference on Empirical Methods for Natural Language Processing (EMNLP) in Doha, with nine research teams from around the world competing. In this poster we discuss some of the challenges of extending QALB to include non-native Arabic text. Our overarching goal is to use QALB data to develop components for automatic detection and correction of language errors that can be used to help Standard Arabic learners (native and non-native) improve the quality of the Arabic text they produce. The QALB annotation guidelines have focused on native speaker text. Learners of Arabic as a second language (L2 speakers) typically have to adapt to a different script and a different vocabulary with new grammatical rules. These factors contribute to the propagation of errors made by L2 speakers that are of different nature than those produced by native speakers (L1 speakers), who are mostly affected by their dialects and levels of education and use of standard Arabic. Our extended L2 guidelines build on our L1 guidelines with a focus on the types of errors usually found in the L2 writing style and how to deal with problematic ambiguous cases. Annotated examples are provided in the guidelines to illustrate the various annotation rules and their exceptions. As with the L1 guidelines, the L2 texts should be corrected with a minimum number of edits that produce semantically coherent (accurate) and grammatically correct (fluent) Arabic. The guidelines also devise a priority order for corrections that prefer less intrusive edits starting with inflection, then cliticization, derivation, preposition correction, word choice correction, and finally word insertion. This project is supported by the National Priority Research Program (NPRP grant 4-1058-1-168) of the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors.

63


Correction Annotation for Non-Native Arabic Texts in the Qatar Arabic Language Bank Wajdi Zaghouani1 , Nizar Habash2, Houda Bouamor 1, Alla Rozovskaya4 , Ossama Obeid1 and Kemal Oflazer1 1Carnegie

2New York University Abu Dhabi Mellon University Qatar 3Columbia University

http://nlp.qatar.cmu.edu/qalb

The Context

QALB

ACLE

Qatar Arabic Language Bank

Automatic Correction of Language Errors

•2 million words corpus of Arabic language errors with manually annotated corrections. •Essays of non-native writers (L2 speakers). •Annotation guidelines for native Arabic speakers text.

•Models of automatic correction of Arabic language errors •Unsupervised •Supervised models

The Problem Learners of Arabic as a second language (L2 speakers) have to adapt to a different script and a different vocabulary with new grammatical rules. The Solution

3. Derivation correction

QALB L2 Annotation Guidelines  Addition to the main set of guidelines  Address unique aspects of the L2 error types.  Errors corrected with a minimum number of edits

Correct the wrong word by only changing the derivational morphology and keeping the same root

.‫ﻭﻗﻔﺕ ﺍﻟﺣﺎﻓﻠﺔ ﻋﻧﺩ ﺍﻟﻣﻳﻘﺎﺕ ﻭﻧﺯﻟﻧﺎ ﻣﻧﻬﺎ ﻭﻏﺳﻠﻧﺎ ﻭﻟﺑﺳﻧﺎ ﻣﻼﺑﺱ ﺍﻹﺣﺭﺍﻡ‬ .‫ﻭﻗﻔﺕ ﺍﻟﺣﺎﻓﻠﺔ ﻋﻧﺩ ﺍﻟﻣﻳﻘﺎﺕ ﻭﻧﺯﻟﻧﺎ ﻣﻧﻬﺎ ﻭﺍﻏﺗﺳﻠﻧﺎ ﻭﻟﺑﺳﻧﺎ ﻣﻼﺑﺱ ﺍﻹﺣﺭﺍﻡ‬

4. Preposition correction (or insertion)

Challenges  Not easy to strictly apply the guidelines to the L2 texts as there is sometimes many ways to correct the text.

Add a missing preposition to correct the meaning or the word order. .‫ﻟﻘﺩ ﺫﻫﺑﻧﺎ ﺍﻟﺣﺞ ﻫﺫﺍ ﺍﻟﻌﺎﻡ‬ .‫ﻟﻘﺩ ﺫﻫﺑﻧﺎ ﺇﻟﻰ ﺍﻟﺣﺞ ﻫﺫﺍ ﺍﻟﻌﺎﻡ‬

L2 Guidelines Correction Order

5. Wrong word choice correction

 L2 texts include a lot of unclear or unknown words.

1. Inflection correction

The annotator should try first to correct the error by limiting the change to the inflectional level.

.‫ﻛﻧﺕ ﻗﺩ ﺑﺩﺃﻧﺎ ﻓﻲ ﺍﻟﻌﺎﻡ ﺍﻟﻣﺎﺿﻲ ﺭﺣﻠﺔ ﺇﻟﻰ ﻣﻛﺔ‬ .‫ﻛﻧﺕ ﻗﺩ ﺑﺩﺃﺕ ﻓﻲ ﺍﻟﻌﺎﻡ ﺍﻟﻣﺎﺿﻲ ﺭﺣﻠﺔ ﺇﻟﻰ ﻣﻛﺔ‬

The annotator may have to completely change a word to another.

.‫ﺳﺄﺿﻊ ﺍﻟﻣﺭﺁﺓ ﻟﻛﻲ ﺃﻗﺭﺃ ﺍﻟﻛﺗﺎﺏ‬ .‫ﺳﺄﺿﻊ ﺍﻟﻧﻅﺎﺭﺍﺕ ﻟﻛﻲ ﺃﻗﺭﺃ ﺍﻟﻛﺗﺎﺏ‬

Evaluation •

2. Cliticization correction The annotator should then try to correct the wrong word by adding or changing the clitics.

.‫ﺫﻛﺭﻳﺎﺕ ﻫﺫﻩ ﺍﻟﺭﺣﻠﺔ ﻻ ﺃﻧﺳﻰ ﻓﻲ ﺣﻳﺎﺗﻲ ﺃﺑﺩﺍ‬ .‫ﺫﻛﺭﻳﺎﺕ ﻫﺫﻩ ﺍﻟﺭﺣﻠﺔ ﻻ ﺃﻧﺳﺎﻫﺎ ﻓﻲ ﺣﻳﺎﺗﻲ ﺃﺑﺩﺍ‬

10 files (1,628 words) with 3 annotators. Inter Annotator Agreement

Average Word Error Rate

19.45%

Average Pairwise Agreement

90.48%

Acknowledgment: This research was supported by Qatar National Research Fund (QNRF), NPRP grant 4-1058-1-168


Reasoning with Relations – How people do? Authors Shikhar Kumar Ph.D. (University of Arizona), Iliano Cervesato, Ph.D., Askerali Maruthullathil, Hau-yu Wong (CMU), Cleotilde Gonzalez, Ph.D. (CMU)

Category PostgraduateFaculty AdvisorDivakarFacDivakaran Liginlal, Ph.D.ategorInformation Systems

Abstract The goal of this work is to study how people do relational reasoning, such as selecting the grade of all students in a class with GPA (Grade Point Average) greater than 3.5. Literature in the field of psychology of human reasoning offer little insight as to how people solve relational problems. We present two studies that look at human performance in relational problems that use basic relational operators. Our results present the first evidence towards the role of problem complexity on performance as determined by the accuracy and discrimination rates. We also look at the role of familiarity with tabular representation of information, as found in spreadsheets for example, and other factors for relational reasoning, and show that familiarity does not play a significant role in determining performance in relational problem solving, which we found counterintuitive. In the two studies we explore subject ability to carry out the most basic forms of relational inference: projection, union, difference and join where the participants were recruited on Amazon Mechanical Turk (mTurk). We provided $0.50 for participation with no bonus for performance. Every subject is assigned one of four problems at random from the provided set of tables with sample data and list of 8 randomized steps Select 4 correct steps out of 8 and organize in correct logical order. The experiment 1 run with basic relational operations and experiment 2 run with combination of relational operations such as projection, union, difference and join. After the experiment participants were asked to report the difficulty level of their problem and their confidence in the solution as 1 being not difficult and 7 being very difficult and a questionnaire with 10 questions to gauge their familiarity with spreadsheet, computer program, mathematics, logical reasoning and problem solving as 1 being no familiarity and 7 being high familiarity. Conclusion of the studies is as follows 1. Two studies that looked at human performances in solving relational problems composed of basic relational operations; 2. Basic relational operations differed in complexity and led to different accuracy and discrimination across problems; 3. Problem with high accuracy have high discrimination; 4. Discrimination rate is positively correlated with subject familiarity only for join operation.

65


Reasoning with Relations – How do people do? Shikhar Kumar, Iliano Cervesato, Askerali Maruthullathil, Hau-yu Wong & Cleotilde Gonzalez

Abstract • •

• •

Relational reasoning operations have different complexity and utilize different cognitive processing This research describes two relational problem solving experiments where we looked at human performance in relational problems, involving elementary relational operations and combination of such operations Operation complexity impacts performance Familiarity doesn’t play a significant role

Background • Relational reasoning problems have only been studied from the perspective of computer science; here we are taking a psychological perspective • We aim to find the role of relational problem complexity and domain familiarity of subjects psychology in resolving relational problems

Experiments • Two studies explore subject ability to carry out the most basic forms of relational inference: projection, union, difference and join • Participants were recruited on Amazon Mechanical Turk (mTurk): $0.50 for participation with no bonus for performance • Assigned one of four problems at random • Provided set of tables with sample data and list of 8 randomized steps • Select 4 correct steps out of 8 and organize in correct logical order [1-4] • Discrimination and accuracy are performance measures

What is relational reasoning ? A relation can be visualized as a table consisting of rows and columns, with no Student duplicate rows. Each column, or attribute, holds data with a consistent Name Age GPA meaning. Each row, or record, contains specific data in the relation, for example John 22 3.5 the name, grade, and major of a specific student in a class. Jerry 23 4.2 Relational inference computes new relations on the basis of relations we already Address know, for example the students with a GPA greater than 3.5 together with their Name Address major. Every relational inference can be expressed as a combination of union, John New York projection, selection, Cartesian product, difference and recursion. Pittsburg

Projection – Name, Age Name Age John Jerry

Union combines two (or more) relations with the same attributes into a single relation. For example, the professor may have two grade sheets, one for each section of the same class, and may need to look at the grades of all the students in the class. Join given two sets of records with a common attribute, join combines the records that share the same value for this attribute. For example, if a professor has a list of students and the classes they take and another list of students and the sports they play, she may need the list of all students with their respective classes and sports.

22 23

Join - Student Address Name Age Address

Difference retains the records that are in one relation but not in a second one. Both relations should have the exact same attributes. For example, a professor with separate grade sheets for the two sections of her class may want to examine the performance of the students coming to the morning section only (knowing that some students attend both the morning and the afternoon section).

John

22

New York

Jerry

23

Pittsburg

Results Experiment 1 • •

Experiment 1: basic relational operations Experiment 2: combination of relational operations After the experiment • Participants were asked to report the difficulty level of their problem and their confidence in the solution

Jerry

Projection deletes some attributes/Columns from a relation (and removes any duplicate record that may ensue). For example, a professor may need to make a list of student names and their respective grades for some exam. List Grade A students.

Experiment 2

N = 398 (57.25%=F, M age = 33.57) Experiment 1: elementary relational operations

• •

N = 403 (42%=F, M age = 34.45) Experiment 2: combination of relational operations

Comparison of Accuracy rates across different problem

Comparison of Accuracy rates across different problem

70

50 63.59 45

60

40

51.88 50

35

• •

1 = no familiarity 7 = high familiarity

Acknowledgements

30

20

25 20 15 10

10 5 0 Projection

Union

Join

0

Difference

PI

PII

PIII

PIV

Comparison of Discrimination rates across different problem

Comparison of Accuracy rates across different problem 70

70

60

60

50

50

40

This paper was made possible by grant NPRP 4-341-1059, Usable automated data inference for end-users, from the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors

30

Accuracy Rate

• Given questionnaire with 10 question to gauge their familiarity with spreadsheets, computer program, mathematics, logical reasoning and problem solving

29.08

30

20

Discrimination Rate

1 = “Not Difficult/Not Confident” 7 = “Very Difficult/Very Confident”

40

Discrimination Rate

• •

Accuracy Rate

42.53

40

30

20

10

10

0

0 Projection

Union

Join

Difference

PI

PII

PIII

PIV


As a global leader in education, Carnegie Mellon University is known for its creativity, collaboration across disciplines, and top programs in business, technology and the arts. The university has been home to some of the world’s most important thinkers, among them 19 Nobel Laureates and 12 Turing Award winners. In 2004, Qatar Foundation invited Carnegie Mellon to join Education City, a groundbreaking center for scholarship and research. The campus continues to grow, now providing a prestigious education to more than 400 students from 40 countries. The university offers five undergraduate degree programs in Biological Sciences, Business Administration, Computational Biology, Computer Science and Information Systems. Students in Qatar join more than 13,000 Carnegie Mellon students across the globe, who will become the next generation of leaders tackling tomorrow’s challenges. The university’s 98,000 alumni are recruited by some of the world’s most innovative organizations. To learn more, visit www.qatar.cmu.edu and follow us on twitter @ CarnegieMellonQ




Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.