Activity data â€˘ Tom Franklin, Franklin Consulting David Kay, Sero Mark van Harmelen, HedTek Helen Harrop, Sero â€˘ Rob Moores, LMU
Introduction What is activity data? What can it be used for? What are the benefits of using activity data?
What challenges does it raise?
Star-TRAK: NG - activity data in support of student success
Discussion of potential uses - consider use cases that may spark your own ideas
Building a business case TF / HH How can you build a case to make use of activity data in your institution? Consideration of examples from the 2011 JISC Activity Data programme
Working with activity data Analyse some exemplar activity data and identify the potential value and challenges of the data
Feedback & Conclusions What does this mean for learning technologists? What you can do next
Using activity data to support your users Tom Franklin University of Manchester and Franklin Consulting email@example.com
What is activity data? Short answer: Anything in a log file Longer answer ● Every log in ● Every search ● Every access of a resource ● Every submission of a document ● Every access of a web page ● Any action in the VLE
It only matters if: We know who you are
Not quite A proxy will do Logged in users – we know who you are IP address – we know you are the same person ● But proxy servers will cause problems
What really matters is the ability to link activities together Look at patterns of behaviour
Who is doing it?
What is activity data useful for? Student recruitment Student retention
Research impact Resource management http://www.flickr.com/photos/joshmaz/2539433150/ http://www.flickr.com/photos/photoplod/5480574102/ http://www.flickr.com/photos/48722974@N07/4523952050/
Student recruitment What are potential students doing on your website Where do they go? What are their routes? Can you tie schools’ IP addresses to UCAS applications? How can you use this information to ● Support candidates while they are on the site? ● Improve the usability of the site?
How are they doing?
Demonstrating value Library Impact Data Project ● Relationship between library use and results
Using a variety of data ● Turnstile activity ● Library management system ● EZProxy service ● Student record system
Relationship demonstrated (not causality) http://library.hud.ac.uk/blogs/projects/lidp/
No of books borrowed across degree
Year of graduation
Book borrowing and degree classification http://library.hud.ac.uk/blogs/projects/lidp/2011/07/15/huddersfield-borrowing-year-on-year/
Supporting students Exposing VLE data ● Identify patterns of behaviour associated with success ● Identify students who are struggling ● Change academics' attitudes towards the institutional VLE
Research impact Beyond Google Analytics Increase use of the institutional repository Help people find the information that they need People who accessed x also accessed y People who searched for a accessed b http://www.jmorganmarketing.com/wp-content/uploads/2010/07/impact_solution_logo.jpg
AEIOU Across Welsh Repository Network Recommender system – people who accessed x also accessed y Increase number of articles accessed per session Combine with similarity based recommendations http://techcomtnd.blogspot.com/2010_08_01_archive.html
Resource utilisation What resources are being used? By whom? What are the patterns? Can you support users to make better use of resources? Are the subscriptions optimal?
Using activity data: Challenges Mark van Harmelen Univ Man Comp Sci And Hedtek Ltd
What do you want to do Two immediately obvious uses for activity data ● To recommend resources ● To provide learning analytics data
determine challenges Deal with the one specific to recommending resources first, then general challenges for both uses
Anonymisation Thanks to the Data Protection Act, recommenders must not reveal any Personally Identifying Information Strategy: Remove small data sources ● Items only occasionally loaned / referenced ● Small cohort data
Data collection Your educational institution is has a veritable ‘gold mine’ of information But can you get to any of the gold? ● Library data, circulation, downloads and turnstile information ● Student registry data: student attendance and academic results ● Diverse system use: VLE derived data!
Dealing with data quantities Potentially, maybe not now, but after a few years, millions of records Will your databases grind to a halt (or become less responsive) SQL databases can deal with loading, but not good for some recommender purposes, depending on the recommender noSQL has various database technologies that may help: MongoDB is one favorite
Facebook EdgeRank Algorithm While RISE used conventional databases to good effect the underlying data format sometimes needs improvement An example of a complex recommender algorithm using social data For each item and each user compute i = Σuewede Where e represent edges and ue – affinity score between the viewing user and edge creator we – weight for this edge type (create, comment, like, tag, etc) de – time decay factor based on how long the edge was created Then rank according to the item score I
Expensive with SQL, almost trivial with a noSQL graph database like Neo4j
Learning algorithmics â€˘ What is it? A method of predicting which students are at risk of failing or leaving education â€˘ How does it work? 1. Find activity data that differentiates between students at risk and not at risk (statistical significance needed) 2. In subsequent years, look for similar measures in the activity data
Challenges: find the differentiators!
Summary ďƒ˜Three top challenges 1. Get hold of the data sources you need 2. Ensure that from a collection and a processing point of view you are using the right data storage media 3. With learning algorithmics, find the differentiators
Rob Moores Associate Director IMTS
[personal; course; grades]
STAR-TRAK: NG Resource Manager
[AV loans; fines]
[search; loans; fines]
[usage; assignment status /mark]
Student Module Tutor
Student Liaison Officer Other Pastoral
Module Leader STAR-TRAK: NG
Level Leader Faculty Admin
Subject Group Head
Parent Corporate Planner
(with student permission)
Web Service Interface
Leeds Metropolitan University Network
Eclipse BIRT Analytical Reporting
SOAP Opera 2 Data Warehouse
Talend Open Studio Extract Transformation Loading
Star Trak NG Staging Tables
On Premises BizTalk Server 2010
BizTalk Enterprise Service Bus Toolkit 2010 BizTalk Server 2010
Banner Student Information System
INTERIM: Qualitative: Feedback on perceived value from students and staff through focus groups PILOT: Qualitative / Quantitative: Feedback from staff and students on actual value through focus groups; usage statistics
LONG-TERM: Quantative: Analysis of retention rates; NSS scores
Activity Data Use Cases [David Kay, Sero Consulting] Student recruitment Student retention Research impact Resource management More?
1. Student recruitment 2. Student retention 3. Student choice
4. Process improvement 5. Systems optimisation 6. Teaching & Learning quality 7. Research impact 8. Research collaboration 9. Resource recommendation 10. Resource management
First order challenge â€“ scenario definition Objective
Library resource recommendation Student
Benefit Direction to useful, value added and available print and electronic resources
List of options <Students on Student course / your unit who unit, relevant borrowed this reading list, peer borrowed this loans before / next>
Second Order Challenges
Questions Quality of data (veracity, completeness, etc) Critical mass of data Applications involved Local or above campus Legal & corporate compliance Opportunities Platform approach Available and emerging synergies User contribution
Legal issues Three key ones ● Data protection ● Freedom of information ● Sharing data
Also ● Licensing
Building a business case Concept useful whenever asking for funding Applies to bidding for funds eg to JISC Useful project preparation
Building a Business case: Introduction Who is it for? ● Write it in their language ● Write to their knowledge ● You are selling your project to them
Building a business case: Options For each option: ● What is it? ● Benefits of each approach ● Costs of each approach ● Summary of reasons for accepting / rejecting each
Building a business case: benefits Benefits to the funder Benefits in terms that are meaningful to them: ● Money ● Student retention ● Library usage ● …..
Building a business case Costs Project plan Risks Risk Data formats incompatible Sued for breach of privacy
Owner IT manager
Cost 5 days
Data protection officer
Amelioration Map between formats Ensure agreements are in place and signed by students