D5.5 - Advanced incentive design and decision making: specification, implementation, validation by Smart Society Project

SmartSociety Hybrid and Diversity-Aware Collective Adaptive Systems When People Meet Machines to Build a Smarter Society

Grant Agreement No. 600854

Deliverable D5.5 Work package WP5

Advanced incentive design and decision making: specification, implementation, validation Dissemination level (Confidentiality)1:

Delivery date in Annex I:

30/3/2016

Actual delivery date:

Status2:

Total number of pages: Keywords:

Incentive design, decision-making, online mechanism

PU: Public; RE: Restricted to Group; PP: Restricted to Programme; CO: Consortium Confidential as specified in the Grant Agreement 2 F: Final; D: Draft; RD: Revised Draft 1

Disclaimer This document contains material, which is the copyright of SmartSociety Consortium parties, and no copying or distributing, in any form or by any means, is allowed without the prior written agreement of the owner of the property rights. The commercial use of any information contained in this document may require a license from the proprietor of that information. Neither the SmartSociety Consortium as a whole, nor a certain party of the SmartSociety Consortium warrant that the information contained in this document is suitable for use, nor that the use of the information is free from risk, and accepts no liability for loss or damage suffered by any person using this information. This document reflects only the authorsâ&#x20AC;&#x2122; view. The European Community is not liable for any use that may be made of the information contained herein.

Full project title:

SmartSociety - Hybrid and Diversity-Aware Collective Adaptive Systems: When People Meet Machines to Build a Smarter Society

Project Acronym:

SmartSociety

Grant Agreement Number:

600854

Number and title of work package:

5 Incentive Design and Decision-Making Strategies

Document title:

Advanced incentive design and decision making: specification, implementation, validation

Work-package leader:

Kobi Gal, BGU

Deliverable owner:

Kobi Gal

Quality Assessor:

Mark Hartswood, OXF

List of contributors

Partner Acronym BGU BGU

Contributor Avi Segal Kobi Gal

Executive summary Deliverable 5.5 of WP5 focuses on advanced incentive design and decision-making: specification, implementation and validation. In deliverable 5.4 we focused on the implementation of an incentive and decision-making framework in CAS. As part of this work we have built an Incentive Server framework focused on serving multiple applications that use it for analyzing their user behavior data, recommending incentives for these users and adapting its recommendations over time. As such, it includes mechanisms for defining incentive types and messages, for receiving and storing behavioral data, for applying different algorithms on this behavioral data, for deciding on conditions that requires interventions and for recommending interventions for the applications that use it. We then turned to use this framework in modeling and developing a solution for real time disengagement handling in Citizen Science. To this end we have developed a prediction model for disengagement in Citizen Science and have implemented several incentive strategies based on Self-determination theory (SDT). SDT is a general theory of motivation that purports to systematically explicate the dynamics of human needs, motivation, and well-being within the immediate social context. At the end of deliverable 5.4 we have been waiting for an approval from the Citizen Science team to start the intervention3 experiments.

In this delivery document we report on the deployment, execution and testing of the citizen science intervention platform in a large scale system. We present a general methodology for exploring and extending engagement in citizen science by combining machine learning with intervention design. We first describe in detail the platform for using real-time predictions about forthcoming disengagement to guide interventions. Then we focus on a set of experiments with delivering messages to users based on the expected proximity to the time of disengagement. The messages address motivational factors that have been found in prior studies to influence users’ engagements. These approaches are evaluated on Galaxy Zoo, the largest citizen science application on the web, where we traced the behavior and contributions of thousands of users who received intervention messages over a period of a few months. We found sensitivity of the amount of user contributions to both the timing and nature of the message. Specifically, we found that a message emphasizing the helpfulness of individual users significantly increased users’ contributions when delivered according to predicted times of disengagement, but not when delivered at random times. The influence of the message on users’ contributions was more pronounced as additional user data was collected and made available to the classifier. Following these large-scale experiments we move on to develop yet more principled approaches for interventions design. For this we use our experiments data and a sequential decision making approach, and combine it with off-policy policy evaluation to devise optimal strategies for multimessage interventions. Finally, we analyse our data for diversity aware biases, and present initial finding pointing indeed to such possibilities. Thus, our work has resulted in a working large-scale incentive server platform that was tested implementation wise and research wise in a real world large-scale system. We have combined ML and AI strategies with incentive design and have demonstrated the joint effect of intervention timing and intervention content on the users behaviour. These results were achieved while considering and working closely with the “humans in the loop”, a core value and goal of the FP7 project we are part of. Our diversity ware analysis emphasizes the importance of considering different biases in incentive design and stresses the need for further research and projects to this end. Appendix A. describes the architecture and API specification of the incentive server developed for the project. A paper based on this work was recently accepted for publication at IJCAI 2016.

Note: in this document we use the terms “intervention” and “incentive” interchangeably.

Table of Contents 1 Introduction .................................................................................................................................................. 5 2 Problem and Approach ............................................................................................................................ 7 3 Predicting Disengagement ...................................................................................................................... 9 4 Designing Intervention Strategies ................................................................................................... 10 5 Empirical Studies ..................................................................................................................................... 11 6 Sequential Planning for multi step interventions ...................................................................... 16 7 Diversity Analysis ..................................................................................................................................... 18 8 Conclusion .................................................................................................................................................... 20 Appendix A : Related Work ............................................................................................................................. 21 Appendix B: Intervention Server's Architecture and REST API ......................................................... 21 Appendix C: Intervention Dimensions ........................................................................................................ 30

Introduction

Volunteer crowdsourcing projects such as citizen science efforts rely on volunteers, rather than paid workers, to perform tasks. For example, Zooniverse, the world’s most popular platform for citizen science, engages millions of active volunteers who accelerate scientific discovery by analyzing data as individuals and as groups [Simpson et al., 2014]. User contributions in crowdsourcing platforms typically follow a power-law distribution, where the majority of participants make very few contributions [Preece and Shneiderman, 2009]. Thus, extending and sustaining user engagement in crowdsourcing platforms is an active and important area of research [Eveleigh et al., 2014, Segal et al., 2015]. Paid crowdsourcing platforms such as Amazon mechanical Turk or CrowdFlower incentivize workers to be active in the system and to make high quality contributions with monetary incentives [Ipeirotis, 2010, Horton and Chilton, 2010]. In contrast, volunteer-based platforms rely on intrinsically motivated participants who do not get paid for their contributions. While some volunteers may become recurrent contributors, the vast majority of volunteer participants are “dabblers” [Eveleigh et al., 2014], who make a small number of contributions before disengaging and never re- turning. Despite this casual and non-committed participation pattern, dabblers contribute a substantial amount of the overall effort in these platforms. Even a small increase in the contribution rates of these users can lead to a significant improvement in productivity for the platform. We report on a study of intervention strategies aimed at increasing engagement in volunteerbased crowdsourcing. We employ machine learning to build predictive models that can be used in real time to identify users who are at risk for disengaging soon from the system. We describe a real-time platform that uses predictions of disengagement to guide the delivery of messages designed to increase users’ motivation to contribute. The approach addresses two key challenges of intervention design: First, we evaluate different intervention strategies with respect to balancing the potential disruption of delivering messages with the benefits of the expected improvement in users’ productivity. Second, we explore the use of predictive models to identify the best time to intervene. Intervening too early may not address the loss of motivation that precedes disengagement, while intervening too late may miss participants who disengaged before this time.

Figure 1: Distribution over contributions per user in Galaxy Zoo project in number of classifications. We evaluated our approach on Galaxy Zoo, one of the largest citizen science projects on the web, in which volunteers provide classifications on celestial bodies . Galaxy Zoo has attracted over 120,000 users who have classified over 20 million galaxies since the inception of the project’s fourth version in 2012. However, over 30% of users complete fewer than 10 classifications (See Figure 1) and most users do not come back for a second session. We used three types of motivational messages that were designed to address different motivational issues identified in citizen science by prior work. These messages emphasized how users contribute to the project, their sense of community, or tolerance of the project to individuals’ potential mistakes. We compared the effects of these different messages on users’ behavior in the system when guided by policies that use predictions about forthcoming disengagement versus when delivered at random times. Our experiments were designed collaboratively with Galaxy Zoo leads and administrators. We found that the message emphasizing the value of users’ individual contributions to the platform when timed based on disengagement predictions was able to significantly increase the contribution of users without decreasing the quality of their work or increasing the amount of time spent in the system. Using the same message, but displaying this message at random times, was not effective. A follow-up study compared the effects of using different thresholds on the likelihood of forthcoming disengagement of users to initiate intervention messages, which consequently influenced when users received interventions. Following our elaborated experiments we move to use a sequential decision-making based approach to learn optimal policies for multi step interventions in CAS. For this we use the data gathered in our experiments and apply a model based approached combined with off-policy policy evaluation to learn and evaluate best policies. Lastly, we take a first look on intervention diversity biases to inform next step experiments and projects with enhanced diversity awareness. We make four contributions: First, we provide a methodology and platform for investigating and controlling influences on engagement in crowdsourcing aimed at extending engagement via coupling machine learning and inference with intervention design. Second, we report on a set of experiments with messaging content and timing, that explores the efficacy of alternate strategies when applied to a live volunteer-based crowdsourcing system. Third, we show that it is important to consider both message content and the timing of the delivery of messages. Fourth, we use our collected data to move to a multi step intervention strategy, informed by off-policy policy evaluation practices.

Problem and Approach

The problem of if, when, and how to generate interventions in order to increase users’ engagement in volunteer- based crowdsourcing can be formalized as sequential decision-making under uncertainty. With an intervention strategy centering on the use of real-time messaging, we need to balance the potential distraction of presenting an intervention message to the user with the expected short and long-term benefit on their contributions. The state in the problem formalization is a tuple that represents a user’s interactions with the platform in current and prior sessions, including information about past interventions that were administered for this user. The action set defines a rich design space of intervention strategies: when to intervene (e.g., periodically, as soon as the risk factor is identified, etc.); how to channel the intervention (e.g., email, pop-up, etc.); how the intervention is administered (e.g., text, audio, graphical, video, mixed, etc.); the content of the intervention (e.g., motivational message, performance feedback, community message, etc.), and the duration of the intervention (e.g., for a given time period, until the user has acknowledged the intervention, etc.). Formulating, parameterizing, and solving this decision-making problem to identify ideal intervention policies in- troduces a number of challenges. First, learning about the outcomes of different interventional policies would entail studying and probing volunteer crowdworkers over a large space of state and action combinations. Second, we do not have an understanding of the influence of interventions on contributions. Thus, it is difficult to formulate an objective function for use in formal sequential decision making analysis. For example, a focus on optimizing the number of tasks completed by users may lead to a large number of generated interventions but ultimately reduce the quality of users’ contributions. Our approach is to simplify the general decision-making problem into a simpler decision problem: we use predictions about forthcoming disengagement to control the timing of the display of different messages. Specifically, this approach consists of the following three steps: First, we employ supervised learning to build a model that is used in real-time to predict whether a user is about to disengage from the system within a given time window. The learned model uses taskindependent features relating to users’ past and present interactions with the system. Second, we employ an intervention strategy that targets users whose predicted likelihood of disengagement within the window exceeds a given threshold. The intervention consists of different messages designed to address the motivational issues that have been identified in prior studies [Raddick et al., 2010, Reed et al., 2013, Eveleigh et al., 2014, Segal et al., 2015] as reasons that users disengage from the system. Third, we perform a controlled study to evaluate different intervention strategies in a live citizen science platform. We used three types of motivational messages aimed at emphasizing: (1) the contributions of users to the system, (2) users’ sense of community, and (3) the tolerance of the project to an individual’s potential mistakes. We compared the effects of these messages on users’ behavior in the system when guided by policies that use predictions about forthcoming disengagement versus when delivered at random times. We hypothesized the following: (1) Users in the prediction based intervention groups would be more productive (as measured by contributions and time in the system) than with the random intervention conditions (and a control group receiving no interventions), without harming the quality of their contributions. (2) The influence of the intervention on users’ productivity depends on the message content, the timing of the message, and the confidence threshold on likelihood of forthcoming disengagement. The approach we take here is not limited to a disengagement scenario only, nor it is restricted to the messages discussed here. Rather, it should be viewed as a general methodology in which we : focus on key critical points requiring 7

intervention, learn the characteristics of these points , use ML and AI to predict these points and then, based on hard evidence from the social sciences, act to intervene to alleviate these critical points. All of this in large scale real world systems using an adequate enterprise scale intervention platfom .

Predicting Disengagement

We now describe the first step of our approach, by which a predictive model is used to identify users that are at risk to disengage from the platform. We follow the formalization of this prediction problem by Mao et al. [2013]. We assume a general crowdsourcing platform in which users work individually to solve tasks. A session of a user is defined as a contiguous period of time that the user is engaged with the platform and is completing tasks. The prediction problem is a binary classification problem; given the current state of a user (encapsulating both user history and current session), predict whether the user will disengage and end the current session within a given time window. We consider all tasks which are 30 minutes or less apart from each other as belonging to the same session of activity. We use 5 minutes as the disengagement time window in our prediction problem. Our dataset included 1,000,000 classifications from 13,475 unique users who were active in Galaxy Zoo in the 12 months preceding the experiment. To illustrate the casual participation behavior in our population, the average number of sessions per user was 2.6, and 67% of the users stayed for only one session. An instance to be used in learning or inference is comprised of features describing the user’s past interactions up to the present task, and a binary label determining whether the user will disengage from the system within the 5 minute time window. From the above dataset we used 500,000 instances for training, 250,000 instances for validation and 250,000 instances for testing. The instances in the validation and test sets followed those in the training set in chronological order, preserving temporal consistency. In training our model, we used the 16 most informative featured identified by Mao et al. We confirmed that using this smaller feature set did not significantly affect the predictive performance when compared to the results they reported. The top five features included in the predictive model are, presented in decreasing order of importance: user’s average session time over all sessions so far; user’s average dwell time in current session (measured in the number of seconds between two consecutive tasks); user’s session count; number of seconds elapsed in current session; difference between the number of tasks performed by user in the current session and median number of tasks in previous 10 sessions of this user (or null if user has completed fewer than 10 sessions). Figure 2 (left) shows the predictive performance of the classifier in terms of area under the receiver-operator char- acteristic curve (AUC), as a function of the number of session histories available for users. As shown in the figure, the trained classifier performs better than a baseline based on selecting the most likely class (AUC of .5) and the performance increases with the inclusion of increasing amounts of historical data about users sessions. Figure 2: Performance in predicting disengagement. In top figure, S > x means number of sessions is larger than x. 9

Figure 2 (right) shows the effect of different disengagement thresholds on the predicted performance, measured in terms of precision (percentage of correctly classified dropouts out of all predicted dropouts), recall (percentage of correctly classified dropouts out of all relevant dropouts), and accuracy (percentage of correctly classified dropouts and continued engagements out of all predicted dropout and continuations). As displayed in the figure, increasing the confidence threshold also increases the precision, because the predictor becomes more conservative about deciding whether a user is at risk for disengagement within the target window. However, this raised threshold also reduces recall, because a conservative strategy will miss relevant dropouts. The accuracy of the predictor steadily increases as the confidence threshold grows, before leveling out around 0.75. Based on these results, we selected a confidence threshold of 0.5 to initiate interventions in our experiments, which was expected to provide a good balance between precision and recall without significantly compromising the prediction accuracy.

Designing Intervention Strategies

We now describe the second step of our approach, centering on the design of intervention messages that are promising for increasing the motivation of users in the platform. The intervention messages (shown in Table 1) were developed in accordance with the administrators of Galaxy Zoo, and directly address issues that have been shown in prior work to reduce the motivation of volunteers in citizen science. The helpfulness type message emphasized to individual users that their contributions are valuable to the Galaxy Zoo project. The community type message emphasized the collective nature of Galaxy Zoo project and its sense of community. The anxietyreduction type message emphasized the tolerance to individual mistakes by volunteers, addressing the fear of making mistakes in classification, which has been documented in several studies on motivation in citizen science [Eveleigh et al., 2014, Segal et al., 2015]. We made the following decisions to minimize the disruption to participants associated with the delivered messages, in accordance with guidance from the Galaxy Zoo administrators. First, the messages were shown for 15 seconds or until the message was closed by the user. Second, we generated an intervention message at most once per session for each user. Third, the intervention was introduced using a window that smoothly integrated within the Galaxy Zoo GUI using a â&#x20AC;?slide-inâ&#x20AC;? animation (see Figure 3). Lastly, in accordance with the Galaxy Zoo administrations, the intervention window included an option to opt out of receiving any additional messages. The study received ethics approval from the Institutional Review Boards (IRB) of the participating institutions.

Table 1: Intervention messages used in the first study Type Helpful

Message

Cohort

Please don’t stop just yet. You’ve been extremely helpful so far. Your votes are really helping us to understand deep mysteries about galaxies.

Community Thousands of people are taking part in the project every month. Visit Talk at talk.galaxyzoo.org to discuss the images you see with them. Anxiety

We use statistical techniques to get the most from every answer; So, you don’t need to worry about being “right”. Just tell us what you see.

Random-Helpful Predicted-Helpful

Random-Community Predicted-Community

Random-Anxiety Predicted-Anxiety

Figure 3: Intervention message overlaid on Galaxy Zoo screen (partial view).

Empirical Studies

In this section, we describe the third step of our approach, consisting of two studies for evaluating the effect of different intervention strategies on users’ behavior in the system.

5.1 Effect of Timing and Message Content The first experiment compared the effects of different message contents and the timing of the intervention on the contributions of users in the platform. We created two cohorts of users for each message type. For the first cohort, the timing of the message was guided using the predictive model. For the second, the timing of the message was distributed randomly. Thus, users in the Predicted-Helpful (“P-Helpful”) cohort were targeted by the helpfulness intervention message when they were predicted to disengage within the target horizon by the model, and users in the RandomHelpful (“R-Helpful”) cohort were targeted by the helpfulness intervention message randomly (and similarly for the Predicted-Community and Predicted-Anxiety cohorts).

1 1

Figure 4: Comparison of number of tasks performed overall (top) and by prior sessions for all cohorts (bottom). In bottom figure, S > x means number of sessions is larger than x. The amount of time to wait in the random condition was determined by fitting a Poisson distribution to the prior session history we obtained from Galaxy Zoo. The intervention time was then drawn uniformly between the limits of 0 (i.e, intervene immediately) and a session length that was sampled from this distribution. Note that in practice, the user may already have disengaged from the system by the determined intervention time. To keep the number of interventions equal for each cohort, we ran a pre-experiment simulation to compute the number of interventions that would be generated by the prediction based approach on past data and normalized the random condition to match this number of interventions. Using this strategy we were able to generate nearly equal number of interventions between the different cohorts. A total of 3,377 users were considered in the study, of which 2,544 (75%) were new users with no prior history of interaction in the system. The study took place between August 10 and September 20, 2015. Users logging on to the system during this time period were randomly divided between the six cohorts described above and an additional cohort (the control group) which received no interventions. All cohorts included 567 volunteers each, except the Predicted- Anxiety cohort which included 565 volunteers. In total, 4,168 interventions were generated for all of the intervention cohorts. At the request of the Galaxy Zoo administrators, we left out of the study a small minority of super-users with a contribution rate that was greater than three standard deviations from the mean contribution rate for all cohorts. This sub-population included 33 users (0.1% of total participants) with an average contribution rate of 456 tasks per session (the remaining user population had less than 50 tasks per session on average). These super users were removed from the study since they had already established themselves as persistent contributors with significantly different contribution patterns and they were not a target population for our study. Figure 4 (left) shows the average contribution rates for each of the cohorts. There was no statistically significant difference between the contribution rates of users in the random intervention cohorts and the control group. The figure also shows that the users in the P-Helpful cohort generated 19.6% more contributions than users in the R-Helpful cohort which saw the same message (p < 0.05, analysis of variance). Figure 4 (right) shows the evolution of contribution rates for the seven conditions as the number of sessions for each user increases. As shown in the figure, the contribution rates for the users in the PHelpful cohort steadily increases as the users in this cohort complete more sessions over time and engagement predictions become more informative as a result. In contrast, none of the other conditions expressed any significant increase in contribution rates. We note that, although the contribution of the P-Community cohort increased over time, this trend was not statistically significant. Contrary to our expectations, none of the cohorts stayed longer in the system when compared to the control (871 seconds). Therefore, to explain the higher contribution rates for the users in the P-Helpful cohort, we analyzed the dwell time (the average number of seconds between task

submissions) for the different intervention cohorts. We found that the dwell time following intervention for the P-Helpful cohort (26 seconds) was significantly shorter than the dwell time of the control group (33.5 seconds). A possible consequence of the faster turnaround time for users solving tasks in this cohort is a decrease in the quality of their contributions. Since gold-standard answers to Galaxy Zoo tasks are not available, we instead used user agreement as the metric for quality. User agreement is commonly tracked as a quality metric in crowdsourcing platforms and is the basis for the aggregation algorithms such as Dawid-Skene [Ipeirotis et al., 2010]. We computed the agreement score for each cohort by iterating over all Galaxy Zoo tasks worked on by the users in this cohort during the experiment. For each task, we computed the KL-divergence between the distribution of classifications collected from the cohort to the distribution of classifications collected from Galaxy Zoo since its launch in 2012 until the start of our experimentation and then averaged over all tasks. Our analysis showed no statistically significant difference between the KL divergence of the difference cohorts. This analysis supports the conclusion that the quality of the work for users in the P-Helpful cohort was not different from that of the other cohorts and that the speed up from this intervention did not lead to a decrease in the quality of work. To summarize the first study, we demonstrated that both the timing and the content of messages are important design choices in an intervention strategy aimed at improving volunteer engagement. One potential explanation of the influence of the helpful message is that it resonates with participants interest in making a socially beneficial contribution, with its emphasis of the principle of volunteerism. Nonetheless, without controlling the timing of the intervention based on predictions of forthcoming disengagement, this message is not effective (see the performance of the RHelpful cohort). In contrast, the community type message, which encouraged users to go and explore when they are predicted to be readying to disengage (and perhaps losing interest) may actually stimulate users to exit the task stream at focus of attention; the message is coupled with a link to a community home page. Further, we suspect that overcoming â&#x20AC;&#x153;classification anxietyâ&#x20AC;? may require more intensive psychological support and modulation (e.g., through practice and reassurance on a number of occasions) than a single supportive message. Lastly, we attribute the increase in contributions over time for users in the P-Helpful cohort to the improvement in predictor performance as additional data is collected. When paired with the right intervention message, collecting more history improves the effectiveness of the intervention.

1 3

5.2 Effect of Threshold We now report on a second experiment that we performed to explore the sensitivity of results to changes in the probability threshold used to control the display of messages. That is, we selected different thresholds on the inferred likelihood of forthcoming disengagement required to initiate interventions. We aimed at understanding the ideal trade-off between precision and recall in guiding the delivery of intervention messages. Based on the results from the previous study, all cohorts were targeted only with the helpful intervention message based on disengagement predictions but differed in the threshold value used to predict if users would disengage within the designated time frame of 5 minutes. Users in low-threshold, medium-threshold and high- threshold cohorts were assigned a predictor using threshold values of 0.3, 0.5, and 0.7 respectively. These numbers were chosen because they were shown to have a significant impact on the number of interventions generated in a simulation that we ran in advance (Figure 5 top). Figure 5 (top) shows the simulation of the number of interventions that would be generated for different threshold values and available histories of users (computed on the data collected from the control condition of experiment 1). As shown by the figure, varying the threshold used by the disengagement predictor affects how many interventions are generated for different users, depending on their of past interactions. The number of generated interventions decreases for all histories as the confidence threshold of the predictor increases. The study was conducted between December 18th, 2015 and January 5th, 2016. A total of 1,290 users participated in the study, of which 837 (65%) were new users with no prior history of interaction in the system. The low- and high-threshold cohort included 322 volunteers each, while the medium-threshold cohort included 323 volunteers. In total, 1,529 interventions were generated for all cohorts. Figure 5 (bottom) shows the contribution rates for the different cohorts. As shown in the figure, the rate of contributions for the medium-threshold cohort was significantly higher than that of the other cohorts (p < 0.05, analysis of variance). This result demonstrates the effect of the prediction threshold on usersâ&#x20AC;&#x2122; contribution rates for interventions using the helpful-message type. While all three intervention cohorts outperformed the no-intervention cohort, the 0.5 prediction â&#x20AC;&#x201C; the agnostic threshold used in the first experiment â&#x20AC;&#x201C; achieved the best results.

ÂŠ SmartSociety Consortium 2013 - 2017

Deliverable D5.3

Figure 5: Effect of session history on number of interventions for different prediction thresholds (top) and of prediction thresholds on contributions (bottom). In top figure, S means number of sessions.

Sequential Planning for multi step interventions

We now turn to learn an optimal policy for our intervention strategy based on our experiments so far. We use the data collected in our previous experiments, specifically our random cohorts unbiased data, to build a model of the state and action spaces for the intervention challenge at hand. Using the existing data and a model of the world, we are able to compute the transition and reward functions of our current galaxy-zoo world. This in turn enables us to compute an optimal policy. To verify this policy, we will use an off-policy evaluation approach.

6.1 MDP Model Our episodic MDP model considers each user session separately. The actual models states are not observable. Moreover, approximating the state space using the continuous features at hand is also not pragmatically possible. Thus, a state space approximation / reduction is needed. At this stage we use an approximation scheme based on choosing a sub group of features and preforming a continuous to categorical division of each feature based (for example) on quartile analysis.

Figure 6: Intervention Model

Page 16 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

The outcome model is presented in Figure 6. States S1..Sn are obtained using feature approximation. St is the episode termination state. The action space consists of the intervention actions (denoted I in the figure) and a non-intervention action (not I). An intervention action leads to episode termination while acquiring all rewards up to episode termination (as learned from data) while an non intervention action lead to termination with probability predicted by the disengagement predictor, and to another next state with 1 minus the same probability. The reward in both cases is 1, as the model captures the classification leading to this state. State transitions are learned from the data.

6.1 Feature Approximation To enable feature approximation from large-scale experimental data we have developed a dedicated tool. The Features2MDP tool is presented in Figure 7. For any given experiment data presented in a CSV format, the following actions are supported: data loading, basic statistical analysis, state space reduction by elimination, continuous to categorical feature transitions, action selection, reward allocation, transition table computation. Once all these preparatory steps are completed, Features2MDP computes the optimal policy over the target state and action spaces, given the training data.

Figure 7: Featues2MDP Tool

6.2 Off-Policy Policy Evaluation We will evaluate the outcome policies using an off-policy policy evaluation scheme. In Off-Policy policy evaluation one wishes to learn about policies other than the one followed by the agent when creating the dataset at hand. The policy used to create the data (in our case random policy) is called “behavior policy” while the policy one is interested in evaluation is called the “target policy”. A detailed discussion of Off-Policy evaluation is beyond the scope of this document. We will perform off-policy evaluation in two manners: In the first approach, we will use importance sampling [Precup et al., 2000]; in the 2nd approach, we will compute the Average Discounted Award on © SmartSociety Consortium 2013 - 2017

Page 17 of (35)

ÂŠ SmartSociety Consortium 2013 - 2017

Deliverable D5.3

the derived model (a rollouts approach). The target policy to be implemented for our multi-step intervention strategy will be derived from the policy maximizing the off-policy evaluation.

Diversity Analysis

The GalaxyZoo data we receive from the University of Oxford contains a Country indication for each and every (anonymized) classification record. This gives us the opportunity to inspect the contribution patterns per country, as well as take a first look on the impacts and biases of our interventions once the geo-location data is taken into account.

7.1 Contribution per Geographical Location Figure 8 shows the contribution per country for the top contributing countries during the data collection period. As can be seen from the graph, the US tops the chart, performing approximately 66% of all classifications on GalaxyZoo.

Figure 8: Contribution per Country

When we look at classifications per country normalized to the number of residents in each country, the story is somewhat different. Figure 9 presents the contribution per country (red) and also the contribution when normalized for the population size of each country. After normalization, we can see that some European countries are leading the chart normalization wise, with the US on the 9th place.

Page 18 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

Figure 9: Contribution per country (red) and normalized per population (blue) These results led us to take a closer look on the impact of interventions on different populations.

7.2 Intervention Impact on Different Geographical Locations Table 2 shows the impact of the different intervention messages as compared to one another and to the control group. This is given for the number of tasks (count) and for the session duration (seconds) for several leading countries in the collected dataset. In red we emphasize the top results. Table 2: Impact of Different Interventions per Geo-Location

As can been seen from the table, the Individual Contribution message is the one correlated with higher contributions for volunteers coming from the USA. As this group is the largest contributing group in GalaxyZoo (at least during our experiment and data collection phase), this result is identical to the result we’ve reported earlier. Nonetheless, as one takes a deeper look, additional interesting phenomena can be observed. For example, it seems that the European members from the UK and Germany present different characteristics. Specifically, these members’ increase in contribution is only pronounced when a Community Based intervention is present, a message emphasizing the collaborative nature of the project. The Australian volunteers demonstrate similar patterns while the Brazilian project member are somewhat similar in their response to interventions to their US counterparts. These results, while being very preliminary, demonstrate the need for additional research and projects which are diversity focused and delve deeper into this interesting and important grounds.

Page 19 of (35)

ÂŠ SmartSociety Consortium 2013 - 2017

Deliverable D5.3

Conclusion

We described a methodology and experiments aimed at exploring challenges and opportunities with extending the engagement of users in crowdsourcing and CAS platforms. We focused on the Galaxy Zoo system, the largest crowdsourcing efforts on the web. We constructed a predictive model from a large corpora of data about Galaxy Zoo volunteers focused on predicting when users would soon disengage from the system based on observations about their activities and their histories. We used the model to generate real-time interventions based on the inferred likelihood that volunteers engaged with the system would soon disengage. The interventions were messages designed to address recognized challenges with the motivation of volunteers assisting with citizen science problems. Our evaluations highlighted the joint effect of intervention timing and message content on the contributions of Galaxy Zoo users. We found that a message emphasizing that users individual contributions were important was able to significantly increase the contribution of users without decreasing the quality of their work, but that this messaging intervention had a significant effect only when guided by the predictive model. Following these large-scale experiments we moved on to develop more principled approaches for interventions design. For this we used past data and a sequential decision making approach and performed off-policy policy evaluation to devise optimal strategies for multi-message interventions. Finally, we analysed our data for diversity aware biases, and presented initial finding pointing on future interest in diversity aware research. Our work has resulted in a large-scale incentive server platform that was tested implementation wise and research wise in a real world large-scale system. We have combined ML and AI strategies with incentive design and have demonstrated the joint effect of intervention timing and intervention content on user behaviour in CAS. These results were achieved while considering and working closely with the â&#x20AC;&#x153;humans in the loopâ&#x20AC;?, the large-scale community of our target projects, thus fulfilling a core goal of our FP7 project. Our diversity aware analysis emphasizes the importance of considering different biases in incentive design and stresses the need for further research and projects to this end.

Page 20 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

Appendix A : Related Work

Multiple studies have documented the long-tail distribution of users’ behavior in volunteerbased platforms such as Wikipedia [Preece and Shneiderman, 2009] and citizen science [Eveleigh et al., 2014, Sauermann and Franzoni, 2015]. Mao et al. [2013] developed a predictor for disengagement in volunteer-based crowdsourcing. They used 150 features that included statistics about volunteers’ characteristics, the tasks they solved, and their history of prior sessions in the system for developing predictive models. They demonstrated the effects of different session lengths and window sizes on the prediction accuracy. Their model was tested in an offline setting with holdout data, assuming multiple session histories are available for each user. Inspired by their call for real-time use of their approach, we extended earlier work in two ways: First, we adopted their model to a real-time setting in which users may have limited amount of history (or none at all), and choosing a minimal number of features that provide reasonable predictor performance. Second, we implemented the predictive model and a set of interventions in a large live project on the web. The study of intervention mechanisms for improving users’ productivity is receiving increasing attention from the social and computational sciences. Some works have focused on increasing users’ intrinsic motivation by generating messages to users, whether by framing a task as helping others [Rogstadius et al., 2011], reminding users of their uniqueness [Ling et al., 2005] or making a direct call for action [Savage et al., 2015]. Anderson et al. [2013] developed a model for the influence of merit badges in the stack-overflow platform. They showed how the community behavior changes once users get closer to the badges frontier, and gave insights on the optimal placement of badges in such a system. Segal et al. [2015] demonstrated that they could increase the return rate of volunteers to a citizen science system by sending motivational emails several days after they stopped making contributions. Lastly, we review work on modeling and managing interruptions associated with notifications. Horvitz and Apacible [2003] used machine learning with Bayesian structure search to build models that infer the cost of interrupting users over time based on their interactions with information including computing devices and activities, visual and acoustical analyses, and data drawn from online calendars. Shrot et al. [2014] used collaborative filtering approaches to predict the cost of interruption by exploiting the similarities between users and used this model to inform an interruption management algorithm. Kamar et al. [2013] showed that modeling interruptions as a planning under uncertainty problem can improve agents’ performance when interacting with people.

Appendix B: Intervention Server's Architecture and REST API Architecture The Incentive Server is built to serve multiple applications which use it for analyzing their user behaviour data, recommending incentives for these users and adapting its recommendations over time. As such, it includes mechanisms for defining incentive types and messages, for receiving and storing behavioural data, for applying different algorithms on this behavioural data, for deciding on conditions that requires interventions and for recommending interventions for the applications that use it. Figure 1 describes the main components of the WP5 Incentive Server Architecture. Following is a short description of the server main architecture components: - Data Store: The data storage is the central repository of the IS. It stores all the the historical Information about the users of the different applications. Each application user © SmartSociety Consortium 2013 - 2017

Page 21 of (35)

ÂŠ SmartSociety Consortium 2013 - 2017

Deliverable D5.3

information is stored separately. The minimal data point for an application include a tuple of <timestamp, userid> which denotes a user performed an activity in this application. Such minimal information is already useful for various interventions strategies, e.g. for intervening in cases of predicted disengagement. This will be described in more details later in this report. The datastore also stores all the incentive types supported by the system, the incentives recommended for each user in each application and the algorithmic models used by the IS for its analysis and recommendation. The current implementation of the datastore is based on an open source mysql database. Algorithms for predicting and shaping behavior: The IS uses algorithms for offline analysis and training as well as for online analysis and recommendation. These algorithms rely on behavioral models created beforehand and adapted in real time. The Algorithmic sub-system of the IS includes an offline component which handles the training and storing of models on a periodical basis as well as a component which is responsible for computing attributes and features in real time for analysis and intervention recommendation. Stream Reader: a stream based API is defined to enable external applications to pass user behavioural data to the IS. The stream reader component is listening on application streams, verifies the consistency of the information received and stores it in the IS internal data store for further analysis. Predictor: the predictor is the main real time analysis component of the IS. It is tasked with analyzing the real time data arriving from the applications, applying the various models defined for the applications, and recommending the appropriate interventions if needed. Different predictors may be defined for different interventions scenarios. Dashboard: a detailed dashboard is supplied for the human administrator of the IS. This dashboard handles the presentation of events received by the IS and the recommended interventions generated by it. The dashboard may include different monitoring screens for different applications. In the next chapter we will showcase such a dashboard that is in operation for the â&#x20AC;&#x153;Predictive Modeling in CASâ&#x20AC;? first application.

The IS (incentive server) uses REST APIs to communicate with other components. These are used both for input and for output. The APIs are described in details in the next section. Additionally, the server listens to data streams supplied by external components (servers, applications). These streams are used by the components to pass user behaviour data to the IS. This data forms the basis of information used by the IS to decide on interventions and incentives.

Page 22 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

Figure 1: Incentive Server Architecture REST APIs Following is a detailed specification of the API supported by this server: Home Page: URL Method Parameters

/ GET Field

Response Description

Type

Description

Auto Generated Home Page

Admin Page: URL Permission Method Parameters

/admin/ Logged In. GET Field

Type

Description Page 23 of (35)

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

Response Description

Auto Generated Admin Page

Usage

Response

Description

/login/ POST Field Type Description username String The Username. password String The Password. POST /login/ HTTP/1.1 Content-Type: application/json Host: 127.0.0.1:8000 Connection: close User-Agent: Paw/2.1.1 (Macintosh; OS X/10.10.2) GCDHTTPRequest Content-Length: 35 { "username":"dor", "password":"123" } { 'Token':'ec1db856e230a61cc12ad5040a554c2a312fcee9' } Login Action. Used by external applications to obtain access to the intervention server (IS). Must be performed before any other operation access to the IS.

Get All Incentives: URL Method Permission Parameters

/api/incentive/ GET Logged In. Field

Response

Page 24 of (35)

Type

Description

{ "count": 2, "next": null, "previous": null, "results": [ { "owner": "dor", "schemeName": "SendEmail", "schemeID": 123123, "text": "Hello World!", "typeID": 1212, "typeName": "sending", http://www.smart-society-project.eu/

Deliverable D5.3

Description

ÂŠ SmartSociety Consortium 2013 - 2017

"status": true, "ordinal": 21312, "tags": [], "modeID": 21312, "groupIncentive": false, "condition": "if the user is older than 21" }, { "owner": "dor", "schemeName": "DoNoting", "schemeID": 14232, "text": "Hello World!", "typeID": 1212, "typeName": "sending", "status": true, "ordinal": 21312, "tags": [], "modeID": 21312, "groupIncentive": false, "condition": "if the user is older than 21" } ] } Get All Incentives types from the IS

Add Incentive: URL Method Permission Parameters

/api/incentive/ POST Logged In. Description

Type

Field

String

schemeName

Int

schemeID

String

text

Int

typeID

String

typeName

Boolean

status

The Scheme Name.

The Scheme ID.

The Text of the Incentive. The Type ID. The Type Name. The Status.

ÂŠ SmartSociety Consortium 2013 - 2017

Page 25 of (35)

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

Int

ordinal

Tag[]

tags

Int

modeID

Boolean

groupIncentive

String

condition

The ordinal number.

List of Tags for this Incentive (Array of Tags).

The Mode-ID. If this a group Incentive.

The Condition for this incentive. Response

Description

HTTP 201 CREATED Content-Type: application/json Vary: Accept Allow: GET, POST, HEAD, OPTIONS { "owner": "dor", "schemeName": "SendEmail", "schemeID": 123123, "text": "Hello World!", "typeID": 1212, "typeName": "sending", "status": true, "ordinal": 21312, "tags": [], "modeID": 21312, "groupIncentive": false, "condition": "if the user is older than 21" } } Add an incentive type to the IS

Get Incentive for User: URL Permission Method Parameters

Response Page 26 of (35)

/getIncUser/ Logged In. GET Field userID {

Type String

Description The User Id. http://www.smart-society-project.eu/

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

"schemeID":"123" } Get The Best Incentive for this User. Pull operation by external application to obtain incentive for user

Description

View live interventions: URL Method Parameters

/dash/pages/dash.html GET Field

Response Description

Type

Description

Dash Page: once loaded presents in real time the events received by the IS and the interventions generated by it.

Get interventions by date: URL Method Parameters

/ask_by_date/ GET Field

Response

Description

Type

Description

json: { "id": id, "user_id": str(user_id), "created_at": str(created_at), "intervention_id": str(intervention_id), "preconfigured_id": str(preconfigured_id), "cohort_id": str(cohort_id), "algo_info": str(algo_info), "country_name":str(country_name) } get intervention list from the given date.

get interventions by ID: URL Method Parameters

Response

/ask_by_id/ GET Field Type record_id int json or json list: { "id": id, "user_id": str(user_id), "created_at": str(created_at),

ÂŠ SmartSociety Consortium 2013 - 2017

Description ID of first wanted ID

Page 27 of (35)

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

"intervention_id": str(intervention_id), "preconfigured_id": str(preconfigured_id), "cohort_id": str(cohort_id), "algo_info": str(algo_info), "country_name":str(country_name) } Intervention or intervention list from the given id.

Description

Posting an Incentive to an External Server In addition to the pull mode, the IS supports sending incentives to an application by calling this application REST API. For this, the IS expects to have the following REST API supported by the external application: URL Method Parameters

"http://<External Application Server>/users/" + user_id + "/interventions" POST Field project intervention_type text_message cohort_id time_duration presentation_duration intervention_channel take_action preconfigured_id experiment_name

Response

Type string string string int int int string string int string

Description project name Type

{ "cohort_id": 1, "created_at": "2015-07-07T10:38:37.703+00:00", "details": null, "experiment_name": "Zooniverse-MSR-BGU GalaxyZoo Experiment 1", "intervention_channel": "web message", "intervention_type": "prompt user about talk", "preconfigured_id": 1, "presentation_duration": 30, "project": "galaxy_zoo", "state": "active", "take_action": "after_next_classification", "text_message": "please return", "time_duration": 120, "updated_at": "2015-07-07T10:38:37.703+00:00", "user_id": "17303", "id": "559bac2d3031650001140000" }

Description

Post Incentive to External Application by calling this REST API

Listening to Behavioural Data

Page 28 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

The Incentive Server is supporting stream delivery of behavioural data from external applications. For this, it expects the external application to respond to the curl query below with behavioural data in json format. Details are given: URL Response

Description

curl -H "Accept: application/<param>+json" <URL> Field Type user_id string created_at string action string other fields string Listen to behaviour event streams

ÂŠ SmartSociety Consortium 2013 - 2017

Description mandatory mandatory optional optional

Page 29 of (35)

Deliverable D5.3

Appendix C: Intervention Dimensions As mentioned earlier in our reports, an incentivizing and intervening framework for CAS should address multiple dimensions of interactions with its users if it is to have an effective and positive impact on the intervened party. Table 1 summarizes these dimensions and points out some of the prior work looking at each one of these dimensions. The “Table References” section below describes this prior work. Table 1: Intervention Strategy Dimensions

Dimension

Description

Examples

References

Target

Target population receiving the intervention

all, newcomers (identified as risk factor), subgroup identified with predictive modeling

[27], [28], [1], [9]

Time

Intervention timing

periodical; as soon as risk factor is identified; one day after disengagement

[1], [9], [16]

Duration

Intervention (interruption) duration

for x minutes; until intervention is acknowledged; while user is on webpage

[18], [19], [20], [26]

Media Type

Method of presentation

text; graphics; audio; video; mixed

[7], [8], [13],[25]

Channel

Medium of delivery

email; web page; modal [22], [23] message, Facebook

Mechanism

The method used for this intervention

explanation; encouragement; help system; task routing; competition; achievements; progress in ladder of responsibility; micro breaks; links to additional info; ask for feedback; gamification elements (badges, levels, etc…)

[2], [3], [4], [5], [6], [10], [11], [12], [17], [21], [24]

Message

Content of the intervention communication

“...If GalaxyZoo didn't suit you, then check out all of the other Zooniverse citizen science projects at

[1], [14], [15]

Page 30 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

www.zooniverse.org…” ; checkmark image; fun facts or tasks that are chosen as favorites by others These dimensions span a wide range of activities, all needed when dealing with planning and administering interventions to populations in CAS. From deciding on the target population and getting the ongoing information about its behaviour, to choosing the mechanism of intervention and the channel it is delivered on, to deciding on the intervention timing and length, these are activities that are part of each incentive strategy. Moreover, many of these activities generalize for different incentive scenarios and situations. For example, it is safe to state that the incentive server needs to get information about the population behaviour no matter what is the specific incentive mechanism, channel or message. Table References: 1.

A. Segal, R. Simpson, K. Gal, V. Homsy, M. Hartswood, K. Page, M. Jirotka: Motivating the Zooniverse: Improving Productivity in Citizen Science through Controlled Intervention (in submission). 2. Kamar, Ece, Ashish Kapoor, and Eric Horvitz. "Lifelong learning for acquiring the wisdom of the crowd." Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 2013. 3. Lin, Christopher H., Ece Kamar, and Eric Horvitz. "Signals in the Silence: Models of Implicit Feedback in a Recommendation System for Crowdsourcing." (2014). 4. Kamar, Ece, Severin Hacker, and Eric Horvitz. "Combining human and machine intelligence in large-scale crowdsourcing." Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 2012. 5. Nov, Oded, Ofer Arazy, and David Anderson. "Dusting for science: motivation and participation of digital citizen science volunteers." Proceedings of the 2011 iConference. ACM, 2011. 6. Prestopnik, Nathan, Kevin Crowston, and Jun Wang. "Exploring Data Quality in Games With a Purpose." (2014). 7. Fletcher, J. D., and Sigmund Tobias. "The multimedia principle." The Cambridge handbook of multimedia learning 117 (2005): 133. 8. Clark, Ruth C., and Richard E. Mayer. E-learning and the science of instruction: Proven guidelines for consumers and designers of multimedia learning. John Wiley & Sons, 2011. 9. Mao, Andrew, Ece Kamar, and Eric Horvitz. "Why Stop Now? Predicting Worker Engagement in Online Crowdsourcing." First AAAI Conference on Human Computation and Crowdsourcing. 2013. 10. Mao, Andrew, et al. "Volunteering Versus Work for Pay: Incentives and Tradeoffs in Crowdsourcing." First AAAI Conference on Human Computation and Crowdsourcing. 2013. 11. Bragg, Jonathan, Andrey Kolobov, and Daniel S. Weld. "Parallel Task Routing for Crowdsourcing." Second AAAI Conference on Human Computation and Crowdsourcing. 2014. 12. Kaufmann, Nicolas, Thimo Schulze, and Daniel Veit. "More than fun and money. worker motivation in crowdsourcing–a study on mechanical turk." (2011). 13. Lasecki, Walter Stephen, Ece Kamar, and Dan Bohus. "Conversations in the Crowd: Collecting Data for TaskOriented Dialog Learning." First AAAI Conference on Human Computation and Crowdsourcing. 2013. 14. Shaw, Aaron D., John J. Horton, and Daniel L. Chen. "Designing incentives for inexpert human raters." Proceedings of the ACM 2011 conference on Computer supported cooperative work. ACM, 2011. 15. Evans, Laurel, et al. "Self-interest and pro-environmental behaviour." Nature Climate Change 3.2 (2013): 122125. 16. O'Brien, Heather L., and Elaine G. Toms. "What is user engagement? A conceptual framework for defining user engagement with technology." Journal of the American Society for Information Science and Technology 59.6 (2008): 938-955. 17. Haaranen, Lassi, et al. "How (not) to introduce badges to online exercises."Proceedings of the 45th ACM technical symposium on Computer science education. ACM, 2014. © SmartSociety Consortium 2013 - 2017 Page 31 of (35)

Deliverable D5.3

18. Horvitz, Eric, and Johnson Apacible. "Learning and reasoning about interruption." Proceedings of the 5th international conference on Multimodal interfaces. ACM, 2003. 19. Shrot, Tammar, et al. "CRISP: an interruption management algorithm based on collaborative filtering." Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014. 20. Adamczyk, Piotr D., and Brian P. Bailey. "If not now, when?: the effects of interruption at different moments within task execution." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2004. 21. Anderson, Ashton, et al. "Steering user behavior with badges." Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2013. 22. Papadimitriou, Panagiotis, et al. "Display advertising impact: Search lift and social influence." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011. 23. Marshall, Bryan, et al. "An Exploratory Study of the Impact of Formatting on Email Effectiveness and Recall." Communications of the IIMA 9.4 (2014): 1. 24. Dow, Steven, et al. "Shepherding the crowd yields better work." Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 2012. 25. Jackson, Corey, et al. "Motivations for Sustained Participation in Citizen Science: Case Studies on the Role of Talk." 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. 2014. 26. Kamar, Ece, and Barbara Grosz. "Applying MDP approaches for estimating outcome of interaction in collaborative human-computer settings." (2007). 27. Eveleigh, Alexandra, et al. "Designing for dabblers and deterring drop-outs in citizen science." Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014. 28. Preece, Jennifer, and Ben Shneiderman. "The reader-to-leader framework: Motivating technology-mediated social participation." AIS Transactions on Human-Computer Interaction 1.1 (2009): 13-32. 29. Mao, Andrew, Ece Kamar, Horvitz, Eric. “Why Stop Now ? Predicting Worker Engagement in Online Crowdsourcing”. Proceedings of the first AAAI Conference of Human Computation and Crowdsourcing 30. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum. 31. Reeve, J., & Jang, H. (2006). What teachers say and do to support students’ autonomy during a learning activity. Journal of Educational Psychology, 98(1), 209–218.

Page 32 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

General References

[Anderson et al., 2013] Ashton Anderson, Daniel Hutten- locher, Jon Kleinberg, and Jure Leskovec. Steering user behavior with badges. In Proceedings of the 22nd international conference on World Wide Web, pages 95–106, 2013. [Anderson et al., 2014] Ashton Anderson, Daniel Hutten- locher, Jon Kleinberg, and Jure Leskovec. Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide Web, pages 687–698. ACM, 2014. [Eveleigh et al., 2014] Alexandra Eveleigh, Charlene Jen- nett, Ann Blandford, Philip Brohan, and Anna L Cox. Designing for dabblers and deterring drop-outs in citizen science. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems, pages 2985– 2994. ACM, 2014. [Horton and Chilton, 2010] John Joseph Horton and Lydia B Chilton. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM conference on Electronic commerce, pages 209–218. ACM, 2010. [Horvitz and Apacible, 2003] Eric Horvitz and Johnson Apacible. Learning and reasoning about interruption. In Proceedings of the 5th international conference on Multi- modal interfaces, pages 20–27. ACM, 2003. [Ipeirotis et al., 2010] Panagiotis G Ipeirotis, Foster Provost, and Jing Wang. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation, pages 64–67. ACM, 2010. [Ipeirotis, 2010] Panagiotis G Ipeirotis. Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, 17(2):16–21, 2010. [Kamar et al., 2013] Ece Kamar, Yaakov Kobi Gal, and Bar- bara J Grosz. Modeling information exchange opportunities for effective human–computer teamwork. Artificial Intelligence, 195:528–550, 2013. [Ling et al., 2005] Kimberly Ling, Gerard Beenen, Pamela Ludford, Xiaoqing Wang, Klarissa Chang, Xin Li, Dan Cosley, Dan Frankowski, Loren Terveen, Al Mamunur Rashid, et al. Using social psychology to motivate contributions to online communities. Journal of Computer- Mediated Communication, 10(4):00–00, 2005. [Mao et al., 2013] Andrew Mao, Ece Kamar, and Eric Horvitz. Why stop now? predicting worker engagement in online crowdsourcing. In First AAAI Conference on Hu- man Computation and Crowdsourcing, 2013. [Preece and Shneiderman, 2009] Jennifer Preece and Ben Shneiderman. The reader-toleader framework: Mo- tivating technology-mediated social participation. AIS Transactions on Human-Computer Interaction, 1(1):13– 32, 2009. © SmartSociety Consortium 2013 - 2017

Page 33 of (35)

Deliverable D5.3

[Raddick et al., 2010] M Jordan Raddick, Georgia Bracey, Pamela L Gay, Chris J Lintott, Phil Murray, Kevin Schaw- inski, Alexander S Szalay, and Jan Vandenberg. Galaxy zoo: Exploring the motivations of citizen science volunteers. Astronomy Education Review, 9(1):010103, 2010. [Reed et al., 2013] Jeff Reed, M Jordan Raddick, Andrea Lardner, and Karen Carney. An exploratory factor analysis of motivations for participating in zooniverse, a collection of virtual citizen science projects. In System Sciences (HICSS), 2013 46th Hawaii International Conference on, pages 610–619. IEEE, 2013. [Rogstadius et al., 2011] Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. An assessment of intrinsic and ex- trinsic motivation on task performance in crowdsourcing markets. In Intl. Conf. on Weblogs and Social Media, 2011. [Sauermann and Franzoni, 2015] Henry Sauermann and Chiara Franzoni. Crowd science user contribution patterns and their implications. Proceedings of the National Academy of Sciences, 112(3):679–684, 2015. [Savage et al., 2015] Saiph Savage, Andres Monroy- Hernandez, and Tobias Hollerer. Botivist: Calling volunteers to action using online bots. arXiv preprint arXiv:1509.06026, 2015. [Segal et al., 2015] Avi Segal, Ya’akov Kobi Gal, Robert J Simpson, Victoria Victoria Homsy, Mark Hartswood, Kevin R Page, and Marina Jirotka. Improving productivity in citizen science through controlled intervention. In Pro- ceedings of the 24th International Conference on World Wide Web, pages 331–337, 2015. [Shrot et al., 2014] Tammar Shrot, Avi Rosenfeld, Jennifer Golbeck, and Sarit Kraus. Crisp: an interruption management algorithm based on collaborative filtering. In Pro- ceedings of the SIGCHI conference on human factors in computing systems, pages 3035–3044. ACM, 2014. [Simpson et al., 2014] Robert Simpson, Kevin R Page, and David De Roure. Zooniverse: observing the world’s largest citizen science platform. In Proceedings of the companion publication of the 23rd international conference on World wide web companion, pages 1049–1054, 2014. [Tran-Thanh et al., 2015] Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali D. Ramchurn, and Nicholas R. Jennings. Crowdsourcing complex workflows under budget constraints. In AAAI, 2015.

[Precup et al., 2000] Precup, Doina. "Eligibility traces for off-policy policy evaluation." Computer Science Department Faculty Publication Series (2000): 80. APA Page 34 of (35)

http://www.smart-society-project.eu/

Deliverable D5.3

ÂŠ SmartSociety Consortium 2013 - 2017

Page 35 of (35)