7 March 2011

OU libezproxy journal recommender project - first month review

RISE project blog updated - first month progress report database, recommendations & programme mtg

couple of new blog posts over wknd - ipads/analytics and RISE project thoughts

Project update February MAR 07, 2011 08:38A.M. Project startup The project is now into the second month and we’ve covered quite a lot of ground already. Apart from setting up the usual project processes, blogs and getting everything running we’ve pushed on with starting the technical build work, with the RISE developer Paul. Database So far we’ve already designed the recommendations database to store the EZProxy log files. We are processing the full log files and now have all our log files back to December ingested into the database. We’ve got a test feed of data to be able to add course codes to the database so we can relate searches to courses. That will let us show searchers what people on their courses are searching for. We’ve identified a tweak to the file as it currently just gives us the first course that students are studying. It might not be an issue for other institutions but for the OU students might be studying several courses at a time.

Our EZProxy log files contain records coming from both our Ebsco Discovery Solution and from SFX. So we’ve been investigating how to get more data about the entries in the log files. The approach that we have been taking is to use the log files as the base layer of data and then query other systems to pull in further information to add to the database to help the recommendations. So we are using the EDS API to draw in subject data and looking at the SFX API to do something similar. Through a combination of ISSNs, DOIs and other techniques we able to


add in data such as journal tiles and article titles.


two new blog posts - thoughts on mobile use & on JISC Activity Data programme meeting

Recommendations For recommendations we’ve settled initially on three levels • Level one provides recommendations based on a course you are associated with. e.g. ‘people on your course are searching for these articles’. • Level two recommendations are based on association. So a connection is assumed between documents searched by a user closely together in a session • Level three makes connections based on subject data about the article by comparing subject terms and making recommendations on the basis of best matches between articles.

Recommendations will be relevance ranked and reinforced by a rating system that asks users to rate how useful they are to them. We’ve been working on technical diagrams and documentation that will get added to the blog as part of the main technical post.


one of the activity data projects is looking at the correlation between library use and outcomes

Code design Code design is taking account of the need to build code that can be released publicly. So we are setting it up so users can easily add in the details of which APIs they want to use and configure details of settings such as their authentication system. We’ve also agreed the wireframe for the user interface and started some discussions with colleagues about the Google Gadget version of search. On the evaluation side we’ve agreed with colleagues carrying out focus groups on the EDS system that they will ask for views about the value of recommendations.

Finally this month, Liz and Richard went to the JISC Activity Data Programme Startup meeting in Birmingham. Apart from the chance to hear about the Programme, the other projects and the synthesis project that will be working to draw together the wider aspects of the projects, there were several useful sessions. Particularly helpful was the technical discussion which looked at challenges, issues and solutions covering not only technical aspects but also IPR and anonymity, data issues and user interfaces. As well as uncovering some of the issues it also helped to pull out some common ground between the projects and areas where we might be able to collaborate. So we’ve a number of areas to follow up with several projects and have already started to do so.


Current JISC Projects related to learning data/analytics

The day was also the chance to talk about the hypothesis that is at the heart of the Activity Data projects. Evaluation of the hypothesis is very much seen as key and building up evidence to support or disprove it is critical. Interestingly the hypothesis approach means that it is much more evident that the project is experimental. It isn’t expected to build a fully sustainable service in a six month period. JISC see this period as phase one and a key part of the synthesis project is to help to tease out what comes next. All in all a really useful day.

intro post on new @Copac and @JRUL activity data project: 'Surfacing the Academic Long Tail'

intro post on new @Copac and John Rylands UL activity data project. 'Surfacing the Academic Long Tail'

Current JISC Projects related to learning data/analytics

intro post on new @Copac and John Rylands UL activity data project. 'Surfacing the Academic Long Tail'

Current JISC Projects of Possible Interest to #LAK11 Attendees

This project will provide an application (STAR-Trak:NG) to highlight and manage interventions with students who are at risk of dropping out, identified primarily by mining student activity data held in corporate systems. STAR-Trak:NG is an extension of the outputs of two JISC projects: our current STAR-Trak (Student Tracking And Retention) proof-of-concept application, and MCMS (Mining Course Management. System) developed by TVU. Whilst far more sophisticated than STARTrak, MCMS is currently tightly bound to TVUs IT architecture, source systems for activity data, business processes and culture, and hence is not yet easily reusable by other institutions. In summary, our project will:

• Draw on the concepts, functionality and architecture of MCMS and build these into Star-Trak, to work with the user activity data, business processes, structures and cultures at Leeds Metropolitan • Provide a stepping-stone to creating an open source application that can, with future development and support through JISC and OSS-Watch, be shared across the HE and FE sectors


JISC activity data projects kick off event tagged earlier today under #jiscad maybe of interest to #lak11 folk


Current JISC Projects of Possible Interest to #LAK11 Attendees

JISC Activity Data Theme – Kick-off Meeting

MAR 02, 2011 10:03A.M. Meeting for all eight projects in the Activity Data theme; held in Birmingham (2nd March 2011). JISCAD - TWITTER SEARCH Presentation available for AGtivity project.

Current JISC Projects of Possible Interest to #LAK11 Attendees


AEIOU Project Plan MAR 01, 2011 01:38A.M.

MAR 03, 2011 12:56A.M. Based in Aberystwyth University, the project will utilise existing relationships and build on the national repository infrastructure established through the Welsh Repository Network (WRN), a collaborative venture between the twelve Higher Education Institutions (HEIs) in Wales.


Overview of #jiscad projects available at: Aberystwyth Uni to look at Welsh repository use.

Aims, Objectives and Final Outputs of the project The AEIOU project aims to increase the visibility and usage of academic research by aggregating Welsh institutional repository activity data to provide a “Frequently viewed together” recommendation service, such as those used by Amazon and many other e-commerce websites. We hope that this will lead to increased use, a greater awareness of ongoing Welsh research interests and improved collaboration between Welsh Institutes.

Building on the partnerships developed through the Welsh Repositories network project, we will focus on a core of six Welsh HEI repositories. A user group will be set up, meetings held and data gathered will be used for project evaluation.


The main outputs will be:

commercial organisations.

• Activity Data Analysis Report • Recommendation service: model, Open Source software and documentation that can be applied to other institutions and networks of repositories

Project Team Relationships and End User Engagement • Project Manager - Jo Spikes: Jo Spikes is currently a member of Aberystwyth University Information Services and works in the Research Support Team. Jo has no prior experience of JISC projects directly but has worked on EU Frameworks V-VII, on ERDF and on WEFO-funded projects in the past. A previous role as Publications Officer for a BBSRC-funded institute has led to knowledge of the importance of disseminating research outputs to a wider audience. Jo will be responsible for workpackages 1 and 5, and parts of 4. End user engagement: Jo will deal directly with the partner institutes’ personnel, establishing a user group, arranging meetings, and liaising with the evaluation consultant.

• Information event for the community • User focus group exemplars • Final report The data audit is already under way and will identify and document current activity data gathering across partner repositories to establish a base line that will be used to compare data throughout the course of the project.

• Project Developer - Antony Corfield: Antony Corfield is also Repository Support Officer (Technical) for the WRN EP where he is responsible for providing technical advice, development and deployment for Institutional Repositories throughout Wales, including the e-theses harvesting project. Antony has a background in IT with a Masters in Computing Science and over 10 years experience as a Java developer within the HE sector working on several Information Management and eLearning projects. Antony will be responsible for workpackages 2, 3 and parts of 4. and be primarily involved in all technical development and deployment with the core institutions.

Outputs will be largely via this blog and development activity will be managed through our Google code project. Final documents will also be linked to the project website.

Risk Analysis and Success Plan

Projected Timeline, Workplan & Overall Project Methodology

Workpackage 1 - Activity Data Audit We will identify and document current activity data gathering (including Google analytics) across partner repositories and analyse the relevance of any activity data to the current project (item views/ file downloads etc.). We hope to establish baseline activity data from all the partner institutions for comparison and evaluation throughout project and provide an assessment on the activity data collected through the six core partners over the life of the project to prove or disprove the hypothesis.


Workpackage 2 - Recommendation Service (specification) Investigate and specify system architecture and components for both aggregating activity data and a shared recommendation service using open standards / protocols where appropriate (e.g. OAI, RSS). Specify the activity data model required to fulfill goals of project and a simple API for the recommendation service.

All document outputs will be released under a Creative Commons license (attribution CC BY) which is recommended for maximum dissemination and re-use of materials. Software will be released under the Apache foundation license (version 2.0) which allows for modification and distribution of source code. The aim is to promote re-use of all outputs and encourage collaborative development across both nonprofit and


Workpackage 3 - Recommendation Service (development and deployment) Development of shared service for repository activity data using opensource solutions where possible based on the specification. Development and deployment of activity data feeds at six core institutional partner repositories. Development work to display related item recommendations with associated metadata and links in each core partner repository.


Weekly Tech Meeting #2 and #3 FEB 28, 2011 02:38P.M. Combined Techy meeting – Monday 21st and 28th February

Workpackage 4 – Evaluation We will report on the recommendation service model and provide an activity data overview. A focus group will be established with the core partners to undertake a user group analysis in order to explore the potential of the recommendation service and its impact on repository users.

• Blog entries Hypothesis and Previous 1990′s Alternative Pepys’ Diary

Workpackage 5 – Project Management Project management will be undertaken according to JISC guidelines. This will include project planning, report and blog writing, dissemination activities and scheduling and attending meetings.

• … includes: data checks for room_booking, Green values and QA

• Data gathering and • … analysis issues

• Created and tested improved backup system • MAGIC archive for attendance fiigures to be analysed • Recontact with programme mgr. regarding deliverables for SustainedMAGIC

The project methodology will involve identifying and documenting current activity data across repositories. This is being done using a questionnaire and follow up-emails and conversations with partners. Data gathering will be automated through harvesting once the service is deployed and compared with this initial baseline activity data. We will establish a user focus group with core partners in order to explore the potential of the recommendation service and its impact on repository users. Software development will use an Agile approach and where possible, work closely with other JISC projects to build upon and share development effort.

• Sustainable Manchester event to attend Invitation to be sent to the Synthesis Project.


Exposing VLE Activity Data The Project Plan FEB 25, 2011 09:01A.M.

Budget CARET (University of Cambridge), in conjunction with Hull University and the University of Oxford will be working on a JISC funded project to bring together activity and attention data for our collective institutional VLE environments.

We’ve already set up the Google code site for the project here. Over the next few posts, we expect to explore our project plan, and a few of the early experiences in collecting and processing data we’ve already got stored.

Aims, Objectives and Final Outputs of the project -

From this project we aim to analyse the logging data to help us produce


behavioural activity reports and statistical data. It will also highlight the ways in which the VLE platforms are being used and for what purpose.

documenting those concerns for the benefit of the wider community

Risk of disengagement of senior stakeholders: Our senior stakeholders may not be interested in or engaged by the results we derive from activity data. Thus there is a moderate risk that the problems we wish to solve using this data may remain unsolved, if we cannot persuade management to take action on the basis of the data. However, we’ll still have gained useful tools for our production team, and for the Sakai community as a whole.

A small amount of background: Cambridge’s VLE (Virtual Learning Environment) is called CamTools, and is based on the Sakai software used by universities and colleges worldwide.

Our objectives are to find out more about: • how people are using our VLE. This will allow us to look at potential areas for growth in VLE use in our institution

Risk: over-ambitious hypotheses. We may find that we’re over-ambitious in the hypotheses we wish to test. However, we can continue the project to some extent using institutional funds, once the project infrastructure has been created. So while this is a moderate risk, we can make sure that its impact is low.

• how well support requests reflect usage patterns, so we can improve our support services • what information is already available to us in our event logs, and how we can present this to management

Technology risk: low, as we’ve already conducted a brief feasibility study, and there are plenty of data visualisation tools are available. Final outputs will include: • this blog, which will contain detailed methods and reflections on the tools, data, and our experiences. This can then be used by the community; hopefully Hull and Oxford Universities will already be doing so by the end of the project

Project management risks: low, as we will use a lightweight management methodology to track risks as we go. We are proactive about identifying our targets, and have an aggressive timescale for meeting these, so we can reduce the risk of going over time or budget constraints.

• the hypotheses about VLE and support service improvements, which we will have tested, informing people about the value of activity data to improve institutional services. Our criteria for success are encapsulated in our expected final outputs, as mentioned above.

• activity information datasets, released so that other people can conduct research into this area.

IPR Risk Analysis and Success Plan -

Staffing risk: low, as we already have staff in post, and the project team has worked together before. Alternative staff are available for all roles if substitution is necessary (we have already demonstrated this, as our project manager who created the initial bid for this project has left, and we’ve been able to replace her with Tony).

All software outputs will be released under an Apache2 licence (as mentioned above, it’s all going in our Google code site), and all documents under a Creative Commons “BY” (Attribution) licence. This means that people can reuse our outputs, even commercially, which can support the creation of business models for more sustainable systems, including collaborative development across both nonprofit and commercial organisations.

Risk that we will not be able to release datasets: Senior stakeholders in the VLE may not approve the release of datasets, even if we anonymise those datasets. This is a moderate risk, but we can alleviate its impact by

Project Team Relationships and End User Engagement -


Workplan 3: Data Analysis Phase 1: this will use the data from the previous phase to create powerful visualisations of log data Project Manager : Tony Stevenson - Tony has led and delivered many projects throughout his career, varying in size and methodologies. Whilst this is hist first time managing a JISC project he has the experience to lead the project.

Workplan 4: Data Harvesting Phase 2: this will include hand examination of the CamTools sites, to categorise them. This is the phase that will also include user questionnaires and interviews Workplan 5: Data Analysis Phase 2: analysis of the data collected in Workplan 4

Script Developer : Raad Al-Rawi - Raad is not only the lead developer for the Cambridge institutional VLE, he is also a respected Sakai community member. Raad will work with Tony and the other members of this project team to help identify, access and use the VLE activity data from within Cambridge.

Workplan 6: Evaluation and Write-up: We will present the results to senior management, and evaluate the results of our data. Workplan 7: Dissemination and Engagement. Throughout the period we will be writing our blog, engaging with users and management, and the JISC and Sakai communities.

Technical Support : Daniel Parry - Daniel is a member of the operational team within CARET and will be able to help the team with technical issues arising in the obtaining of data and it’s analysis. Researchers : Verity Allan & Katy Cherry. The researchers will be our primary point of contact alongside Tony with the Cambridge user base. We expect the researchers will offer invaluable insights into the way that the institutional VLE is being used, by whom and for what purpose. Katy is an experienced research assistant, practised at communicating with academics, and at producing communication materials. Verity is also an experienced researcher, with extensive expertise in supporting academics using CamTools, the Cambridge VLE platform.

Our project methodology combines statistical analysis of large datasets, and collecting individual information. Thus we will analyse our existing datasets of events, and convert them into useful information. We will be looking at ways to anonymise our datasets. We will also be handinspecting all sites in our VLE to classify them as teaching, research, social, administrative or other sites. We will consulte senior stakeholders to find out what reports on activity data they would most value, and will be working with them to try to secure release of anonymised data sets for further research. Once we’ve analysed our datasets, we will create visualisations of them to produce activity information which is meaningful to humans. And we’ll be sharing our methodologies and (hopefully) our data with the community.

End user engagement will likely take many forms; it is not entirely clear what methods will work best initially. So we will use this blog to report on the methods we used, and which worked best.

Projected Timeline, Workplan & Overall Project Methodology-

Budget -

Workplan 1: Project Management. This will continue throughout the project. Workplan 2: Data Harvesting Phase 1: this will involve collating our existing logs, and starting work on finding appropriate visualisation tools


Incidental Data

The hypothesis Each of the projects in the Activity Data programme strand were asked to establish a hypothesis that we would test throughout the project. For RISE the hypothesis we chose is:

Pepy’s diary ideas from Xerox back in the 90′s can help to inspire reasons for carrying out AGtivity. The 1991 paper

“That recommender systems can enhance the student experience in new generation e-resource discovery services”

From Alan Dix’s ideas of Incidental Interaction – In Xerox’ Cambridge laboratories a few years ago, everyone was issued with ‘active badges’. These used small infra-red transmitters to broadcast their location to receivers throughout the office building. At the end of each day the location data was analysed to produce personalised diaries for each person. It knew about the office layout so it could say “went to Paul’s office”, but also could use the fact that, say, several people were in a room together to say “had meeting with Allan and Victoria.

This hypothesis was chosen quite carefully for a number of reasons. We’ve only recently implemented our Ebsco Discovery Solution aggregated search system so we are still in an evaluation stage and are really still assessing how students at the OU will get the best out of the new system. We have a particular perspective at the Open University in that the use that students make of our search systems varies widely from course to course. So we will particularly want to look at whether there is variation between the levels of students in their reaction to recommendations.


Post 2 – Hypothesis “AGtivity”

How do we plan to evaluate the hypothesis? We are planning to approach the evaluation in three ways.

FEB 18, 2011 08:06P.M. The hypothesis is that by combining the usage data with external sources, the UK Access Grid community will be able to evaluate their usage more accurately; in terms of the time nodes are used, audience sizes and environmental impact, and that they will see an overall improvement in Advanced Video Conferencing meetings through more targetted support by the AGSC staff of potentially failing nodes and meetings.

1. By establishing some website metrics to allow us to assess how user behaviour is affected by the recommendations. We expect to build two versions of the search interface, one with recommendations and one without. This will allow us to A/B test the interfaces, so we can track the impact that different options make on users behaviour. Using Google Analytics we will track where users click and where they go.

We believe the mining and communication of activity data can help to improve the management of, and service provision through, advanced collaborative video tools. We hypothesise that:

2. We will track the use that users make of the rating feature to see whether there is evidence that it is actively being used. 3. We will actively encourage user feedback on the tools, carry out surveys with students and run a short series of focus groups to test the hypothesis.

1. By reporting the usage of their nodes back to owners, they will see an improvement in the management of their advanced video conferencing nodes in terms of future planning and procurement.

As part of our evaluation work we will be looking to assess whether there are variations that can be ascribed to course or course level in how useful students find the recommendations to be, and whether there are circumstances where they are not useful. We will also be testing a variety of different types of recommendations and will aim to assess which are found to be most useful.

2. By reporting the attendance of their lectures in terms of the number of nodes in each session, distance learning course administrators will be able to determine which courses are most successful and which are less attended. 3. By reporting to the AGSC when nodes are having a series of apparently small meetings, and meshing this with quality assurance information about the nodes, users will experience an improvement in their meetings made with Advanced Video Conferencing tools.

The evaluation report will detail the activities and results of the work to test the hypothesis and we will look at using Quora to record evidence.

4. By reporting the savings in CO2 back to users, they will be in a better position to evaluate the environmental savings of using


Advanced Video Conferencing over face-to-face meetings.

• A series of reports including a report on the issues of releasing the data openly, a report on the feasibility of using recommendations within a gadget and an evaluation report of the benefits of using the tools that have been developed.


The Project Plan

• This blog will be used to provide updates on project activities. Other dissemination activities will also be explored and used to provide updates on the project work.

FEB 17, 2011 08:18P.M. Aims, Objectives and Final Outputs of the project The overall objectives of the RISE project are to:

Risk Analysis and Success Plan A risk assessment table is shown below:

• Establish a process to collect and analyse attention data about the use of electronic resources from the EZProxy proxy referral system.

Risk Probability Impact Mitigation Lack of availability of key staff Low High Plan work for this project into staff workplans. Ensure staffing time is realistic Loss of key staff Low High Ensure documentation is kept up to date and that knowledge is shared across the Systems Development team Unexpected technical difficulties encountered Medium Medium Work is carefully scoped to ensure it is realistic Unable to release search results data owing to reluctance in organisation Medium Medium Business case approved by Library Leadership team prior to bid submission. RISE will engage with other projects and with key stakeholders Unable to test Google Gadget in OU VLE environment Medium Low Gadget will be tested in a range of environments both OU-controlled and external Lack of engagement from stakeholders Medium Low The Project Manager will draw up a Communications Plan to engage with stakeholders to ensure that they are aware of the project and their needs are met.

• Create a recommender service using attention data to answer questions such as ‘people on my course are looking at these resources’ • Identify metrics to detect changes in user behaviour as a result of service use. • RISE will create a personal recommendations service, MyRecommendations for OU users of the EBSCO Discovery Solution (EDS). • It will explore issues (of anonymity, privacy, licensing and data format/ standards) around making this data available openly and will aim to release it openly so it can be re-used by the wider community in innovative ways.

The success of the project will be measured in several ways: the response of users, the take-up of the tools; and the feedback from the HE community.

• RISE will use the EDS API to create a Google Gadget for the OU Google Apps environment and will aim to test in the OU Moodle Virtual Learning Environment (VLE) using features developed by the JISC DOULS project.

How measured What success looks like User response Survey and informal feedback from students and academics. Analytics data. Majority of users agree that recommendations are useful and enhanced their use of the search system. Analytics shows positive impact. Take-up of tools and data Usage of tools and data, downloads of tools and data. Tools are being downloaded several times a week and there are some comments about the tools. Community feedback Feedback. Wider discussions with community about potential of tools & ways to use the data.

• RISE will evaluate the pros and cons of providing recommender data to students of an e-resource discovery service. • Overall RISE will provide the wider community with an example of the benefits to users of discovery solutions of using e-resource activity data, will aim to make that data available to the wider community, and will provide a tool that can be adapted and reused.

• Open release of data with a Creative Commons Universal CC0 license (where this is possible).

Intellectual Property Rights The code developed as part of the project will be released as Open Source through a Google Code (or SourceForge) site. This will be supported through a forum which will be monitored by project staff and by Library Systems Development staff to ensure the sustainability of support for the code. The intention is that any data released as part of the project will be made available as CC0 Creative Commons Universal. Advice will be sought from the Lucero project to ensure that practices are consistent. OSSWatch will be approached for comment to identify appropriate code licence.

• Creation and open release of a Google Gadget using the EBSCO Discovery Solution API

All project outputs will be made available free at the point of use to the UK and international academic communities. If monitoring of project

Project outputs will include: • The release of database schema, documentation, algorithms and code for a recommendations service using EZProxy data as Open Source through a code website such as SourceForge or Google Code.


output downloads indicates high levels of activity the project will attempt to ensure that project outputs are available through multiple channels to avoid excessive load on individual services. The project will aim to provide some support to adopters but will aim to try to engage with wider communities of interest.

ensuring that developments are appropriately documented and processes are in place for on-going support and sustainability. Workpackage 2: MyRecommendations service creation and implementation A short period of user requirements gathering will take place to look at the appropriate data to be used and how the database will be structured. A mySQL database will be setup to take data from the archived EZProxy log files and other OU systems to record details such as: who is searching, what course they are on, what did they search for, and what resources have been looked at and when.

Project Team Relationships and End User Engagement Members of the Project Team and responsibilities • Richard Nurse – Project Director. Responsible for ensuring the overall direction of the project and the fulfillment of project objectives;

RISE will then build a service ‘MyRecommendations’ that allows users to login to a personalised webpage that contains a search box to search the EDS system, feeds back data to them of their activity and provides recommendations based on the activity of other users. For example it can show their recent searches, what other people on their course have searched for, and what popular keywords are being used to search for electronic resources. The system will also allow rating of resources. We will also explore the feasibility of linking to the MyReferences tool developed by the JISC-funded TELSTAR project to allow users to record their references. We plan to consult with other interested parties working in this area to identify suitable algorithms and processes. The new service will be trailed through the library website as an alternative to the standard EDS interface.

• Elizabeth Mallett – Project Manager. Responsible for ensuring that the project is managed effectively and delivers objectives, outputs and reporting to target and budget; • Paul Grand – Developer. Responsible for developing the RISE MyRecommendations system, the anonymisation processes and Google Gadget developments. • Hassan Sheikh – Technical Consultant. Responsible for providing techical guidance on library technical environment including EDS. • James McNulty – Technical Consultant. Responsible for providing technical guidance on library technical environment including EZPproxy.

Workpackage 3: Opening up recommender data for electronic resources RISE will aim to make the search data available openly. To do so it will carry out an investigation of the issues, such as privacy, data ownership, licensing, data formats and anonymisation, involved in releasing the data. The project will liaise with the JISC-funded LUCERO project to draw on their experience of releasing linked data through and for guidance on the use of course codes and licensing. The project will follow the guidelines set out in the MOSAIC project for anonymising data. The project will look at options to anonymise the data such as hash values or affinity strings.

• The Project Team will work closely with other library staff from the Learning and Teaching and Marketing teams to engage with users RISE will be engaging with end users through a number of mechanisms. We are planning focus groups with students to assess student needs, will provide students with feedback and support mechanisms to gauge the value of the tools to students, and will be carrying out small-scale usability testing. Within a six month project there are practical limitations around the amount of engagement that is possible.

Workpackage 4: Development of a Google Gadget e-resource search and recomendations tool A Google Gadget search tool will be created using the EDS API. We will explore and test how recommendations can be provided to users in this format. We will then make the gadget available openly via the OU library website and to the OU Google Apps environment. We will also work with the JISC DOULS which is exploring integration between Google Apps and the VLE. If possible we will test the gadget within the VLE.

Projected Timeline, Workplan & Overall Project Methodology

RISE comprises four main activities: creating the recommendation system; exposing e-resource search data; creating a Google Gadget; and evaluation.

Workpackage 5: Evaluation RISE will carry out a short evaluation to investigate the differences between search, discovery and downloading behaviour of users and nonusers of the developments. This will use both web analytics and surveying. A small-scale usability test will also be undertaken.

Workpackage 1: Project Management This covers the project start-up, including set up of internal and external reporting and project management arrangements, creating the detailed project plan, setting up the project website/blog. It will include day-today management of the project. There will be a project wind-down stage

Workpackage 6: Dissemination and engagement with programme The project will undertake a programme of dissemination activities to


engage with the community. These will include internal staff presentations within the OU Library and across the OU, wider activities such as a Library Seminar, and will look to promote the work at appropriate sector events and channels. Dr Tony Hirst who blogs at, is involved with RISE and likely to blog about the project. RISE intends to use regular blog posts, on both the project blog and via the library news blog, to provide updates for the work and many of the project team members are active users of twitter so will promote the project using #OUrise as well as #inf11 and #jiscad. The project looks forward to engaging with JISC programme events and is experienced at doing so with previous JISC projects. Time for engaging with the Programme and Synthesis project has been built into the staff time for RISE.


Project Plan FEB 17, 2011 11:10A.M. UCIAD intends to realise something relatively ambitious -set up a software infrastructure for the user-centric integration of activity datawithin a rather short period of time. This stresses the importance of setting up a suitable work plan from the start of the project, ensuring that outputs are delivered and can be taken up as early as possible. Aims, Objectives and Final Output(s) of the project The overall aim of UCIAD is to investigate the use of ontologies and semantic technologies for integrating the different data about the interaction of a user with different systems and websites in an organization. More specifically, to achieve this aim we plan:

Project Management Arrangements and Governance RISE will be managed using the standard OU Project Management processes which are to a modified PRINCE 2 methodology. These processes include the standard project documentation, risks and issues management processes. An experienced project manager – Elizabeth Mallett - has been assigned to manage the project. The project team will meet weekly during the project.

1. To investigate and develop the ontological models needed to integrate user activity data. The objective here is to develop a set of ontologies that can be used to integrate logs and traces of activities existing in a variety of formats, depending on the originating system. Such ontologies will provide a common, meaningful and reusable activity data model for capturing user-centric activity data.

A Project Board will be established to oversee RISE. This will be chaired by Gill Needham, Associate Director of Library Services, with membership including Judith Pickering – Project Manager for the OU DOULS project, Dr Tony Hirst – Lecturer in Telematics, and two Heads of Faculty Teams in the library, Judy Thomas and Clari Gosling.

2. To prototype a reusable, pluggable framework to integrate user activity data across different user facing systems within a large organization, relying on the developed ontological models. Such a framework will be based on semantic data management components available in KMi or externally (as open source software) to aggregate data coming from various systems. In order to accommodate an extensible variety of log formats and activity databases, it will implement a pluggable architecture, where plugins implementing a mapping between a particular source/format and our ontological model can be easily added to the framework.


3. To test and scope the applicability of such a framework within realistic scenarios at The Open University. A complete case study integrating logs from various systems at The Open University, especially access and search logs from The Open University’s main website, specific logs from The Open University’s virtual learning environment, the linked open data platform of The Open University, the seminar system of The Open University, websites and user facing systems from various research projects at the Knowledge Media institute (e.g.,,,,, etc.) will be used to test the UCIAD framework. 4. To demonstrate how the UCIAD activity data framework can benefit the users in their interaction with the organization. Initial requirements, components and guidelines on exploiting the framework to the benefit of the user, regarding in particular GUI issues, ownership and export of the data will be devised by the end of the project, ensuring short-term potential deployment of the


results of the project.

a research direction concerning the use of Semantic Web technologies for the purpose of personal information management.

Risk Analysis and Success Plan • Prof. Enrico Motta is Professor of Knowledge Technologies at KMi and a leading international scientist in the area of Semantic Technologies, with extensive experience of both fundamental and applied research. Professor Motta will act in the project as the chair of the steering group.

Considering the ambitious goals of the project, the major risks relate to the maturity and robustness of semantic technologies, related to their ability to handle very large amounts of user activity data across multiple websites, and to support the user-centric interpretation of this data. The team involved in the project has extensive experience in working with such technologies, in large scale projects.

• Salman Elahi is a research assistant at KMi, and a part time PhD Student working on aspects of user-centric identity and personal information management.

The primary goal of UCIAD being the realisation of an open software platform relying on ontologies to integrate and interpret user activity data, the main success criteria include the successful, documented application of this platform on a large variety of websites at the Open University, and possibly outside. The outputs of the project will be released as open source, and we expect uptake from external organisations to take place towards the end, or after the project.

• Stuart Brown is Web Developments and Online Communities manager at The Open University. He is in particular involved in the overall management of the Open University’s content management systems. Stuart Brown will act as a member of the UCIAD steering group, in charge of the liaison between the project team and the Open University’s online services.

IPR Dissemination will be realised through a variety of channels (blog, twitter, etc.) as well as through direct engagement with the community (users and website developers at The Open University, other researchers and developers through seminars, conferences and dedicated workshops). Several aspects of evaluation will be considered. The ontologies and software framework developed as part of the project will be evaluated both formally (using ontology evaluation frameworks and software validation methods) and through usage in our case study. The overall outcome of the project will be evaluated based on adoption at The Open University and by external parties.

In order not to infringe the privacy-related expectations from users of the considered websites, the activity data considered as part of the project will be kept private. The ontologies to model and integrate such data will be made available under an open license (CC0), for reuse and extension by the community. Some technologies employed in the project have been developed by external organizations and are available as open source software. Code realized as part of UCIAD will also be released under an open source license (LGPL). The code will be made available through UCIAD’s repositories on github. All documentation produced, including reports, blogs and system documentation will be made available under a creative commons license (CC-By).

Projected Timeline, Workplan & Overall Project Methodology

Project Team Relationships and End User Engagement

Based on the aim and objectives described above, we divide the workplan of UCIAD in 5 workpackages:

UCIAD is realised and managed at the Knowledge Media Institute (KMi) of the Open University, which is a 84-strong interdisciplinary research laboratory founded at The Open University in 1995. KMi has established itself as a world-class R&D centre at the leading edge of the Web, semantic, learning, and new media technologies. The research areas in KMi include cognitive sciences, new media technologies for learners, human computer interaction, Semantic Web and Web services, multimedia analysis and information retrieval.

WP1 – Ontologies as Semantic Models for Integrating User Activity Data: The goal of this workpackage is to produce the foundational data models for the project, by developing the ontologies to be used to integrate activity data from various sources. Here, we will employ ontology design methodologies developed in KMi, combining reuse of existing ontologies, data-driven modelling and knowledge engineering techniques.

The project team includes:

Deliverables: A set of documented and reusable user activity data ontologies.

• Dr. Mathieu d’Aquin is a Research Fellow working in the Semantic Web area at the Knowledge Media Institute. Dr. d’Aquin is leading the research and development around approaches to exploit semantic technologies and semantic data. Dr. d’Aquin has in particular been working on concrete solutions for the realization of applications producing and consuming linked data (see for example the JISC-funded LUCERO project which he is directing), and is currently leading the realization of the Open University’s linked data Web – Dr. d’Aquin is also involved in

WP2 – Prototype Ontology Based Architecture for CrossOrganization User Activity Data: The goal of this workpackage is to prototype the architecture for aggregating user activity data based on the ontologies developed in WP1. This architecture will mostly consist of a semantic data management system (triple store, reasoner and query engine), and a plug-in based framework to realise the mapping between logs and activity databases and user activity ontologies.


Deliverables: An open-source, pluggable user activity data framework and documentation.

Directly incurred Staff £28,569 Include research assistant and director of the project Directly incurred non-staff £4,000 Include travel and equipment Directly Allocated £6,994 Include staff and estates Indirect Cost £31,614 Total £71,178 JISC contribution £49,824 OU contribution £21,353

WP3 – Case Study using Multiple Sources of Activity Data: The goal of this workpackage is to deploy the architecture developed in WP2 in a concrete, realistic scenario. We will in particular set up the architecture with a set of plugins to aggregate data from several websites in of The Open University and the Knowledge Media institute (see list of considered systems and websites in Paragraph 14). Initial agreements with the administrators of the considered systems and websites at The Open University’s online services and Knowledge Media institute have already been obtained.


Weekly Tech Meeting #1 FEB 15, 2011 09:41P.M. Combined Techy meeting – Monday 14th February • Blog entries inc Project Plans

Deliverables: A set of plugins for the relevant websites/systems (including for example a plugin for access logs of Apache Web servers), with documentation regarding the development of these plugins and the deployment of the UCIAD framework.

• Registration; kick-off in Birmingham, and Conference in Liverpool • Data gathering and

WP4 – User Centric Interfaces to Activity Data: The goal of this workpackage is to analyse the requirements and implement initial components for user interfaces to the UCIAD framework. In order to reduce development cost, we plan to reuse components of the open source Piwik web analytics engine2, to provide user-centric, ontologybased analytics across organizational websites, instead of website-centric analytics.

• … analysis issues Live data feeds to be backed up, and data sets readable by developers. Issue to be monitored regarding encryption and use of server traffic data. Letters of intent to be emailed to core room node owners.

AGPROJECTS Deliverables: An initial set of components (widgets) for a prototype graphical interface to the UCIAD framework.

Semester 2 – 2011 MAGIC starts

WP5 – Dissemination and Project Evaluation: The goal of the project is to investigate and prototype a pluggable framework for user activity data. It is therefore essential for the project to engage with potential users and developers of this framework, to ensure adoption and further extension. We will realise this through extensive and frequent communication across a variety of channels (project website, blog, twitter, seminar and conferences). The evaluation of the results of the project will be realised through demonstrating in a realistic case study, the benefit and quality of the developed components (ontologies, architecture, plugins, interface).

FEB 11, 2011 05:37P.M. Next set of MAGIC lectures have commenced from 31st January 2011: involves sixteen different courses running over the following weeks. The automatic harvester job running on Monday mornings has been extracting data semi-successfully; this specifies when to record these events over the coming week. Unfortunately, this has needed a bit f manual intervention.

Deliverables: Documented dissemination activities and user-based tests.

Two early documents have been released on the wiki: 1. Sustaining MAGIC project notes 2. MAGIC Recordings on Memetic



3. Improvements to the UK Advanced Video Conferencing Community: Users occasionally have difficulty connecting to meetings and may take a few attempts to connect fully. By analysing the data, it is proposed that these users can be automatically detected and contacted, and provided with online testing and given a quick training guide if required. The AGSC already performs a number of Quality Assurance tests but these can now be cross checked to either encourage nodes that are not tested to take a test, or to see what might have changed at a node since they passed the test.

Post 1 – The Project Plan “AGtivity” FEB 10, 2011 07:10P.M. Aims, Objectives and Final Output(s) This project aims to solve a number of problems related to Advanced Video Conferencing use in the UK; by integrating and presenting amalgamated data sources.

4. Green CO2 Monitoring: By looking at the activity data and the geographic location of the nodes involved, users can be automatically sent details of the actual mileage saved for all their meetings that actually took place as opposed to those that were proposed but that did not happen. This informed carbon saving estimate again can be used to monitor and justify future investment.

Four individual studies are being considered to create a set of objectives in order to achieve this aim. These are 1. that the users do not know when their nodes are being used and so cannot evaluate whether they need additional capacity; 2. that administrators of groups offering teaching using these technologies cannot tell who is attending their courses;

IPR All scripts developed in this project will be available on an open source basis, licensed for free non-commercial use and development and will be available to the UK HE and FE community in perpetuity. All the data will also be licensed for free use by the community.

3. that users may not report that they are having problems with their nodes, resulting in a perception of poor quality of Advanced Video Conferencing meetings; 4. and that users do not have access to the potential environmental savings of using these technologies over travelling to conferences.

It is to be decide what specific open source license but likely to be a google.code version. Future blog post will announce the choice and reasons.

This will be achieved by; selecting a key set of responsive users targetted in each of the above areas to exploit these resulting regular diary reports produced by this amalgamation of data sources.

Project Team Relationships and End User Engagement 1. Ian Dennell is working in the Access Grid Support Centre and is heavily involved with extracting and supplying data on a regular basis to the AGSC funders JANET (UK) as well as providing bespoke data to users and groups when requested. He is key architect to AG statatistics and data analysis; but also one of the main contacts with the user groups.

Risk Analysis and Success Plan In such a short project there is a risk due to engagement from the service providers and users. To mitigate risk in terms of success four types of reporting system are proposed to be developed and delivered as described below. Each semi-independent.

2. Martin Turner is the External Service Developments Manager within RCS, and has worked as project manager for various visualization and video based projects; this includes both the AGtivity and the related SustainedMAGIC projects; but also as an active user within the university and UK AG user groups.

1. Aiding Video Conferencing Room Node Management: Formatted data will be available to individual room node owners as an automated report on their video conferencing room’s usage. Trends in regular meetings can be seen thus giving the room owners raw data allowing for informed future capacity planning.

3. James Perrin and Tobias Schiebeck are currently Visualization Software Engineers within Research Computing Services; and main knowledge is in data filtering, mining, classification as well as presentation on various web based platforms.

2. Informing Teaching Attendance: There are a number of teaching groups using the technologies on a regular basis which can benefit by receiving a historical list of usage. By showing the attendance figures for each lecture series the group can see which courses are popular and possibly more importantly which courses are less popular; and then use this data to rewrite or republicise the less popular courses. The data can also be used to show how successful certain lecture series are and therefore help them expand their distance learning activities.

Users and support contacts will be listed in future blogs. Projected Timeline, Workplan & Overall Project Methodology Initial timeline although likely to swop order in WPs. In such a short


timetable there is time only for Agile development with feedback being delivered on a weekly basis. Th project is managed around weekly tech meetings to allow changes to be proposed.

is #AGtivity • 2 March 2011 start-up meeting in Birmingham: Presentation of hypothesis, Introduction of synthesis projects, Aim for a group of common themes • 14-15 March 2011 JISC conference, meeting face-to-face Fortnightly techy meetings to be planned



New JISC projects started

The Direct Incurred costs are £27,266 for this project; with 1.9% contributing to travel and dissemination. This maximises the effort to the task in hand.

FEB 04, 2011 09:02A.M. First preparation for the start of two new JISC projects started with the creation of this Blog.


Launch Meeting of AGtivity


FEB 04, 2011 01:44P.M.


A Skype meeting on the 1st February 2011 was used to launch the AGtivity project. This included the following resources:

JAN 31, 2011 09:55A.M. This is the blog for the Library Impact Data Project, which is part of the JISC Activity Data programme. The project will run from 1 February 2011 – 31 July 2011.

1. website, 2. wiki

In addition we’ll be tweeting using the #lidp hashtag and will be archiving it at:

3. and this blog. Action List


• Feedback: need to have a blog for communication system

This project aims to prove a statistically significant correlation between library usage and student attainment. Using activity data from three separate systems and matching these against student records which are held in a fourth system, this project will build on in-house research previously undertaken at the University of Huddersfield. By identifying subject areas or courses which exhibit low usage of library resources, service improvements can be targeted. Those subject areas or courses which exhibit high usage of library resources can be used as models of good practice.

• Starting – Documentation and communication: virtually all blog related: six headings for the process 1. Project-plan proposal and items 2. Hypothesis blog post (one at start and one at end); and posts are how answered etc. 3. Technical standards etc…

The partner Universities represent a cross-section of size and mission and will provide a rich data set on which to work.

4. Final blog post; output, outcomes and lessons learned Hypothesis 5. Extra final questionnaire to be completed and a separate budget report

There is a statistically significant correlation across a number of universities between library activity data and student attainment

• Synthesis Project (UoM) running simultaneously (Mark van Harmelen – CS} – Virtual conference organisation

Project Partners

• Communication is twitter based, #jiscad, is main tag feed and local

• University of Huddersfield


• University of Bradford

the benefits to users of discovery solutions of using e-resource activity data, will aim to make that data available to the wider community, and will provide a tool that can be adapted and reused.

• De Montfort University • University of Exeter

As the project progresses we will be updating this blog with progress reports and providing links to project outputs.

• University of Lincoln • Liverpool John Moores University


• University of Salford

What is UCIAD JAN 21, 2011 06:51P.M.

• Teesside University UCIAD (User Centric Integration of Activity Data) is a new 6 month JISC-funded project addressing the integration of activity data spread across various systems in an organization, and exploring how this integration can both benefit users and improve transparency in an organization.


Welcome to the RISE project JAN 28, 2011 11:13A.M. Welcome to the first blog post for the new JISC-funded RISE project at the Open University Library. RISE (Recommendations Improve the Search Experience) will be investigating activity data from use of electronic resources provided by the Open University Library. The overall objectives of the RISE project are to: • Establish a process to collect and analyse attention data about the use of electronic resources from the EZProxy proxy referral system.

Both research and commercial developments in the area of user activity data analysis have until now mostly focused on logging user visits to specific websites and systems, primarily in order to support recommendation, or to gather feedback data from users. However, data concerning a single user are generally fragmented across many different systems and logs, from website access logs to search data in different departments and as a result organizations typically are not able to maintain an integrated overview of the various activities of a given user, thus affecting their ability to provide optimal service to their users. Hence, a key tenet of the UCIAD project is that developing a coherent picture of the interactions between the user and the organization would be beneficial both to an organization and to its users.

• Create a recommender service using attention data to answer questions such as ‘people on my course are looking at these resources’ • Identify metrics to detect changes in user behaviour as a result of service use. • RISE will create a personal recommendations service, MyRecommendations for OU users of the EBSCO Discovery Solution (EDS). • It will explore issues (of anonymity, privacy, licensing and data format/ standards) around making this data available openly and will aim to release it openly so it can be re-used by the wider community in innovative ways.

Specifically, the objective of UCIAD is to provide the conceptual and computational foundations to support user-centric analyses of activity data, with the aim of producing results which can be customized for and deployed in different organizations. Ontologies represent semantic models of a particular domain, and can be used to annotate and integrate data from heterogeneous sources. The project will therefore investigate ontological models for the integration of user activity data, how such models can be used as a basis for a pluggable data framework aggregating user activity data, and how such an infrastructure can be used for the benefit of the users, providing meaningful (and exportable) overviews of their interaction with the organization.

• RISE will use the EDS API to create a Google Gadget for the OU Google Apps environment and will aim to test in the OU Moodle Virtual Learning Environment (VLE) using features developed by the JISC DOULS project. • RISE will evaluate the pros and cons of providing recommender data to students of an e-resource discovery service.

The project, led and managed by Mathieu d’Aquin, will contribute to the • Overall RISE will provide the wider community with an example of


current strand of research in KMi, which focuses on the use of semantic technologies to support online personal information management and is carried out by Dr d’Aquin and Salman Elahi. The solutions developed in the project will be tested on a number of Open University systems, in collaboration with the Open University’s communication services.


