Page 1

Your Personal Newspaper 10 May, 2011 | created using fivefilters.org

Just had team meeting with Tom Franklin for #jiscsalt #jiscad. Ended up visualisations. See ‘visualcomplexity.com’ Nice: http://bit.ly/V6bdE

But is this what the users want? May 9, 2011 09:05AM I’ll admit it, I’m prepared to out myself, I’ve just finished a post graduate research degree and more than once I have used the Amazon book recommender. In fact when I say more than once, possibly over the course of my studies we’ll be getting into double figures. I’m not ashamed, (I may be about using Wikipedia, but let’s not go there), but I’m not ashamed because I did and so did many of my peers. There may be more traditional methods to conduct academic research, but sometimes, with a deadline looming and very little time for a physical trip to the library to speak to a librarian, finding resources in one or two clicks is just to attractive. My hunch is many other scholars also use this method to conduct research. Recently on another Copac project we facilitated some focus groups. The participants in the groups were postgraduate researchers, a mix of humanities and STEM. Some had used Copac before others had not. Although the focus groups were answering another hypothesis I couldn’t resist asking the gathered group, if they would find merit in a book recommender on Copac which was based on 10 years of library circulation data from a world class research library? It’s not often you see a group of students become visibly excited at the thought of a of new research tool, but they did that night. A book recommender, would make a positive impact on their research practices and was greeted with enthusiasm from the group. I thought it was worth mentioning this incident, because when the going gets tough, and we are drowning under data, it might be worth remembering that users really want this to happen.

May 10, 2011 02:28PM Just had team meeting with Tom Franklin for #jiscsalt #jiscad. Ended up visualisations. See ‘visualcomplexity.com’ Nice: http://bit.ly/V6bdE

in other news, my @tabbloid e-paper thingy has failed to arrive for a third week .. anyone recommend an alternative collater? #jiscad #drats May 10, 2011 11:29AM in other news, my @tabbloid e-paper thingy has failed to arrive for a third week .. anyone recommend an alternative collater? # jiscad #drats

AEIOU — The Business Case

Planning my jaunt to Llandrindod for #cilipw11 where I’ll be talking about #jiscad #ildp

May 9, 2011 07:58AM The AEIOU project had an interesting visit from Tom Frankin on 21st April who helped us to develop two technical ‘recipes’ for the Activity Strand cookbook. These still need to be refined but it was an informative exercise, particularly for me as project manager, as it helped me to understand the processes involved in software development.

May 9, 2011 01:00PM

Tom also took a look at the business case I am putting together and gave me some extremely useful advice about not trying to oversell the benefits as this weakens the message. I do need all the advice I can get given the current financial climate as it is not an easy task to convince a management team, trying to find ways to save money, of the benefits of a product. I suspect this will apply to all the projects.

Planning my jaunt to Llandrindod for #cilipw11 where I’ll be talking about #jiscad #ildp


Open Recommender data


May 6, 2011 01:31PM

12345678 [Ebsco Accession number]

One of the aspirations of the RISE project is to be able to release the data in our recommendations database openly. So we’ve been thinking recently about how we might go about that. A critical step will be for us to anonymise the data robustly before we make any data openly available and we will post about those steps at a later date. 

Cyr, Andre

Once we have a suitably anonymised dataset our current thinking is to make it available in two ways:


AI-SIMCOG: a simulator for spiking neurons and multiple animats’ behaviours http://www.???.??/etc

2009 as an XML file; and, 12 3 6

as a prepopulated MySQL database.  The idea is that for people who are already working with activity data then an XML file is most likely to be of use to them. For people who haven’t been using activity data and want to start using the code that we are going to be releasing for RISE then providing a base level of data may be a useful starting point for them. We’d be interested in thoughts from people working with this type of data about what formats and structures would be

User context data

most useful.


XML format For the XML format we’ve taken as a starting point the work done by Mark van Harmelen for the MOSAIC project and were fortunately able to talk to him about the format when he visited to do the Synthesis project ‘Recipes’ work. We’ve kept as close to that original format as possible but there are some totally new elements that we are dealing with such as search terms that we need to include. The output in this format makes the assumption that re-users of this data will be able to make their own subject, relationship and search recommendations by using the user/resource/search term relationship within the XML data.

UG2 [F, UG1, UG2, UG3, UG4, M, PhD1, PhD2, PhD3+ (F is for foundation year) ]

anonymised UserID 1 [Note: sequence number already stored within database] For students: [propose to map to a subject ]

For staff Staff Retrieved from artificial intelligence End record, more records

Proposed RISE record XML format

We are interested in any feedback or comments on whether this format makes sense or would be useful or whether there are changes you think we should make. You can either leave a comment on the blog or email us at Rise-project

Start Basic data: Institution, year and dates Open University

Project update April

2010/2011 May 4, 2011 12:37PM 2011

RISE search interface April was the month that saw a lot of the technical developments come together. The RISE search interface went ‘live’ at http://library.open.ac.uk/rise/ in the middle of the month just before Easter. We’ve managed to pick a fairly quiet time of year to launch it, but that’s the way things go with a short project, you can’t always pick the best time of year to launch. But since the launch we’ve already had over 300 page views and 95 unique visitors. 

4 19 OURISE Resource data Article

RISE and Google Analytics tracking We are using Google Analytics to track use of the search tool. By using a Custom dimension we are able to track how many times each type of recommendation is being used as you can see from the screenshot below. 

10.1007/s00521-009‑0254-2 or 09410643


Synthesis project visit 20th April

The three types of recommendation we are making (relationship, course and search) are all identified separately. Analytics tells us how many times each are being clicked on and also gives you the ability to be able to segment the results with different options. So, for example, you can see how the behaviour of new and returning visitors differs.

Apr 21, 2011 08:15AM Activity data cookbook The RISE team had a really good session yesterday working with Mark van Harmelen from Hedtek to go through and develop a series of ‘recipes’ for the Activity Data ‘cookbook’. These recipes describe the processes involved in the software that we have created to handle activity data. Over the course of three hours we managed to get five processes down on paper, covering the three types of recommendations we are making in RISE, the processes for parsing the EZProxy log files and the process we use to get course information into MyRecommendations.

We have also setup Google Analytics so we can see which recommendations are chosen by users from the list. See the screenshot below. Unsurprisingly the top recommendation is the most commonly used for course and relationship recommendations. But for search, the second recommendation is most commonly used. We’ll be doing some detailed analysis of what analytics tells us about the behaviour of users of the recommendations later in the project. RISE Google Gadget We’ve also completed the creation of our RISE Google Gadget. This is a slightly cut-down version of the main RISE interface to fit into a gadget-size but it includes most of the key features, as you can see from the screenshot below. In the main we have simply reduced the number of recommendations and search results that are shown to users. So users will see five search results rather than ten for example. 

We actually found it to be quite a good way of describing and documenting what the project software is doing. It seemed to be a bit easier to do than we’d expected and it was certainly a useful discipline to have to explain to someone from outside the project how things worked.  It also provided a useful challenge to us as it uncovered at least one issue that we need to do some more thinking about. This relates to how we handle courses in MyRecommendations and specifically what we do when users change courses. We currently take a feed from our systems that tell us which courses a student is studying. [and for the OU, where students study a module at a time, that can be several modules]. Talking through things yesterday made it clearer that we need to be able to store the historical course data, so that the system keeps the link between the course the user is currently studying and the resources viewed. So we need to come up with a solution to this that doesn’t overwrite the courses a student was taking last year with the courses they are taking this year.

There’s an abbreviated results description so you just see the first part of the search results title and we will be interested to see how much of a disadvantage that might be. When we tested the full interface with library staff initially we had some feedback about what information beyond the title they thought it would be useful to see. We have also dropped the ability to rate recommendations from the gadget as it was tricky to develop and likely not to be very intuitive to use.  The gadget still searches the Ebsco Discovery Solution in the same way and brings back results within the gadget, with the full text being shown in a new window. We’ve included links to the Privacy policy and some FAQs.

It still leaves us with having to think carefully about how we handle searches for users who are studying multiple courses at a time but we already knew that. That may well be something that is unique to the OU anyway. Our current thinking is that the number of students studying courses that are widely different is likely to be low, it is more likely that they would be studying related courses. In any case a student might well search for things entirely unrelated to their course. So our default would be to associate the searches with all the student’s courses and rely on the relevance ranking process to make sure that unrelated articles don’t appear high in the recommendations list. If testing finds this to be a problem then we could look at the approach Dave Pattern is suggesting in LIDAP and use a threshold discounting one-off relationships.

The gadget is currently being tested with library staff and will be made available for wider use very soon. Evaluation and testing Now the developments have been completed we are working more on the evaluation and testing stage. Our research with students has been approved by the appropriate panel and we’re expecting to be able to start contacting people for our evaluation shortly. 

Data release formats It was also a good opportunity to talk to Mark about XML data formats for the data we hope to release openly. Mark wrote the original MOSAIC project data collection guide which outlined an XML format for user activity data. For RISE we’ve revisited this format and tweaked it a bit to handle the e-resource data that we’re concerned with. There are a few things we would need to change about the course data and the resource information descriptions. Mark offered the really valuable insight that we only really needed to be able to provide user, resource and search term data. We didn’t need to make explicit


recommendations within the data we released as people could use the data to build their own. That’s been really helpful and we are revising the draft format and plan to post it on here in the near future and talk to people about it.

available for harvesting. However, I wanted to avoid the hassle of harvesting via OAI-PMH so looked closer at the tracker code. This is a neat solution and uses Spring injection to create a listener on the DSpace Event service to capture downloads. With a little hacking to also capture item views I created an AEIOU activity class. The beauty of this is that all that is required to update the DSpace code is a configuration of the Spring context (an XML file) and the addition of a Java jar file.

Recommendation service Apr 21, 2011 02:15AM

Who What Why and When?

Do I really want to write a SOAP service (SUSHI use SOAP but don’t seem to have mature Java open-source client/servers available for hacking!)? How about a REST service? This could be a neat solution using something like Apache CXF.

Apr 21, 2011 01:46AM How best to represent the activity data we’re gathering and passing around? Several projects (PIRUS2, OA-Statistics, SURE, NEEO) have already considered this and based their exchange of data (as XML) on the OpenURL Context Object — the standard was recommended in the JISC Usage Statistics Final Report. Knowledge Exchange have produced international guidelines for the aggregation and exchange of usage statistics (from a repository to a central server using OAI-PMH) in an attempt to harmonise any subtle differences.

Either of these would be great but as I only have a few simple data requests to execute, I’ve decided to go for a quick and easy solution — Apache XML-RPC deployed in a servlet. Behind this I’m using a MySQL database with Apache DBCP handling connections and queries. I was going to use the lightweight mybatis data mapper framework (formerly known as iBatis) but again, as I’m only using a couple of queries it isn’t really worth the overheads for the flexibility it provides.

Obviously then, OpenURL Context Objects are the way to go but

The test set-up is working so now all I’ve got to do is tidy it up, deploy the server as a service and deploy clients within the six DSpace institutional repositories. How long have I got?

how far can I bend the standard without breaking it? Should I encrypt the Requester IP address and do I really need to provide the C-class Subnet address and country code? If we have the IP addresses we can determine subnet and country code. Fortunately the recommendations from Knowledge Exchange realised this and don’t require it.

Consuming and Querying data

So for the needs of this project where we’re concerned with a closed system within a National context, I think I can bend the standard a little and not lose any information. I can use an authenticated service. I also want to include some metadata — the resource title and author maybe.

Apr 21, 2011 02:01AM So what service should I use? My first thoughts were to push data to a SQL database, then I thought of Solr. Solr is fast and efficient, it’s great for powerful full-text and faceted search, hit highlighting and rich document (e.g., Word, PDF) handling. So I pushed OpenURL Context Objects from DSpace to Solr and used simple queries to view the captured activity data.

So here’s the activity data mapped to a Context Object Timestamp (Request time) Mandatory

Then I thought again. What data do I want returned from a recommendation service? I just want a few item handles and some metadata as suggestions to view alongside the current resource. I found I could do this using an SQL query on a test database but wasn’t sure if I could construct queries with inner joins using Solr. I’m not that familiar with Solr and couldn’t find what I wanted. A patch was available for the latest release that could do this but then again, maybe this isn’t one of Solr’s strengths ..or maybe I don’t have the right data structure.

Referent identifier (The URL of the object file or the metadata record that is requested) Mandatory

My thoughts turned back to basing a service on a SQL database.

Resolver identifier (The baseURL of the repository) Mandatory

Referent other identifier (The URI of the object file or the metadata record that is requested) Mandatory if applicable Referring Entity (Referrer URL) Mandatory if applicable Requester Identifier (Request IP address — encrypted possibly!) Mandatory Service type (objectFile or descriptiveMetadata) Mandatory

Hunting and Gathering data Apr 21, 2011 01:56AM The PIRUS2 project has conveniently produced a patch for DSpace (and EPrints) for capturing activity data and either making it available via OAI-PMH or pushing it to a tracker service. I’m grateful to Paul Needham from Cranfield who gave me an insight in to the architecture they were using. I patched the DSpace code and was soon making usage data


MyRecommendations tool goes live today

Feedback link on the interface. Don’t have an OU Computer login? If you are not an OU student or member of staff, the first thing you’ll see will be an OU log in screen. You will need to create a login username and password in order to see the interface. To do this, go to the link on the right side of the screen which says:

Apr 20, 2011 03:10PM The MyRecommendations tool is now live, at http://library.open.ac.uk/rise/. To use MyRecommendations you will need to have an Open University Computer User login [If you aren’t an OU student or staff member you can create a login to test the system. See the details at the end of this blog post].

New visitor? Create a free Open University account here. This will enable you to look at MyRecommendations, search the database, see the results and some of the recommendations. However, you won’t be able to access the actual e-resources as these are restricted by the terms of our licenses to OU students and staff.

Background • MyRecommendations is a prototype system designed to test the hypothesis that “recommender systems can enhance the student experience in new generation e-resource discovery services”. • It uses a simplified search interface to search EBSCO Discovery Service, which the OU Library is calling “One-Stop Search”. • MyRecommendations searches the same content as One-Stop but has been set to search ‘full-text’ content only. • When you first search you may not see many recommendations, but the system learns as it goes along and makes recommendations based on what articles are being looked at, so the more searches that it records and the more articles that are viewed the better the recommendations become. • If you are studying with the OU at the moment you should see recommendations based on what other users on your module have searched. Additional functionality coming soon will allow the user to select which module they’re currently working on. • Other recommendations include articles that people viewed after using the search term you have used; and people who looked at this article also looked at this article

What’s next for RISE The project is also testing a Google Gadget version of the search system and we expect to be able to release this in the next few weeks.

Recommendation types Apr 19, 2011 12:07PM RISE – Recommendation Types The RISE MyRecommendations search system is going to provide three types of recommendations: 1. Course-Based “People on your course(s) viewed” This type of recommendation is designed to show the user what people studying their module have viewed recently. At the moment this only picks up the first module that a student is studying but we are planning a future enhancement that will include all the modules that are being studied with a feature to allow users to flag which module they are currently looking for resources for. The recommendations are generated by analyzing the resources most viewed by people studying a particular module.

A full explanation of the different kinds of recommendations given can be found at http://www.open.ac.uk/blogs/RISE/2011/04/19/recommendation-t ypes/ . If you are studying at the OU, or are a member of staff here is how to use the full functionality: 1. Go to http://library.open.ac.uk/rise . 2. If you are studying with the OU you should immediately see some recommendations based on what others on your module have searched for. 3. Type in a keyword search. Try artificial intelligence for example, and then click Go. 4. MyRecommendations will search One-Stop and bring back the first 10 results. At the bottom of the list are some recommendations based on articles that people viewed when doing a similar search. You can browse through the One-Stop results and click any of them to look at the full text article (this opens up in a new window). The One-Stop results are already relevance-ranked by One-Stop so the most relevant results should be near the top. 5. Click New Search and try another search. When you see the results note that it now shows articles you’ve looked at recently. 6. If you choose one of the recommendations you will be asked to rate the recommendation so the system can learn how useful it is.

2. Relationship Based “These resources may be related to others you viewed recently” These recommendations are generated for a resource-pair. For example, if users commonly visit resource B after viewing resource A, the system considers this to be a relationship, and will recommend resource B to people viewing resource A in the future. As the system doesn’t host resources internally, it instead looks at a user’s previously viewed resources (most recent), and then checks for the most often viewed resources by users who’ve also viewed the same (most recent) resources. 3. Search Based “People using similar search terms often viewed” We have limited data on search terms used, from the EZProxy logfiles so we are using the searches carried out in MyRecommendations to build search recommendations. Using this we associate search terms used with the resources most often visited as a result of such a search. For example, if people searching for ‘XYZ’ most often visit the 50th result returned from Ebsco, this part of the recommendation algorithm will pick up on this. Hence in future when people search for ‘XYZ’, that

Please send any comments or questions to the RISE mailbox at Rise-Project@open.ac.uk, or complete the survey using the


particular result will appear top of the list of recommendations for users in a “People searching for similar search terms often viewed” section.

Working through the SALT hypothesis

Data analysis update

Apr 15, 2011 02:14PM I’m currently project managing, SALT, but my own area of interest is evaluation and user behaviour – So I’m going to be taking on an active role in putting what we develop in front of the right users (we’re thinking academics here at the University) to see what their reactions might be. As I think this over, a number of questions and issues come to mind. Are we more likely to look on things favourably if they are recommended by a friend? If we think about what music we listen to, films we go and see, TV we watch and books we read, are we far more likely to do any of those things should we receive a recommendation from someone we trust, or someone we know likes the same things that we like? If you think the answer to this is yes, then is there any reason that we wouldn’t do the same thing should a colleague or peer recommend a book to us that would help us in our research? In fact more so? Going to see a film that a friend recommends that is, well average, it has far less lasting consequences then completing a dissertation that fails to acknowledge some key texts. As a researcher would you value a service which could suggest to you other books which relate to the books you’ve just searched for in your library?

Apr 19, 2011 09:07AM Whilst we wait for all of the data from the project partners to arrive, Bryony and I have done a quick & dirty analysis of the data we’ve received so far. The good news (touch wood!) is that we’re still on track to prove the project hypothesis: “There is a statistically significant correlation across a number of universities between library activity data and student attainment” The data we’ve looked at so far has a small Pearson correlation (in the region of –0.2) that has a high statistical significance (with a p-value of below 0.01). The reason we’re seeing a negative correlation is due to the values we’ve assigned to the degree results (1=first, 2=upper second, 3=lower second, 4=third, etc). We suspect one of the reasons for the small Pearson correlation is the level of non & low usage (which is something we’ve looked at previously in Huddersfield’s data). Within each degree level, there are sizeable minorities of students who either never made use of a library service (e.g. they never borrowed any books) or who only made low use (e.g. they borrowed less than 5 books), and it’s this which seems partly responsible for lowering the Pearson correlation. However, the data shows that:

We know library users very rarely take out one book. Researchers borrowing library books tend to search for them centrifugally, one book leads to another, as they dig deeper into the subject area, finding rarer items and more niche materials. So if those materials have been of use to them, could they not also be of use to other people researching in the same area? The University of Manchester’s library is stocked with rare and niche collections, but are they turning up within traditional searching, or are they hidden down at that long end of the tail? By recommending books to humanities researchers that other humanities researchers have borrowed from the library I’m really hoping we can help improve the quality of research – we know that solid research means going beyond the prescribed reading list, and discussing new or different works. Maybe a recommender function can support this (even if it potentially undermines the authority of the supervisor prescribed list – as one academic has recently suggested to us: “isn’t this the role of the supervisor?”).

students who gained a first are less likely to be in that set of non & low users than those who gained a lower grade students who gained the highest grades are more likely to be in the set of high library usage than those who gained lower grades

The SALT project plan Apr 15, 2011 04:36PM The SALT project plan is now available

Here’s how I’m thinking we’ll run our evaluation: Once the recommender tool is ready, we’ll ask a number of subject librarians to do the first test the tool to see if it recommends what they would expect to see linked to their original search. They will be asked to search the library catalogue for something they know well, when the catalogue returns their search does the recommender tool suggest further reading which seems like a good choice to them? As they choose more unusual books, does the recommender then start suggesting things, which are logically linked, but also more underused materials? Does it start to suggest collections which are rarely used, but never the less just as valuable? Or does it just recommend randomly unrelated items? And can some of the randomness support serendipity?

The SALT Project Plan

We’ll then run the same test with humanities researcher (it’ll be interesting to see if librarians and academics have similar responses. As testing facilitators, we’ll also be gauging people’s


reactions to the way in which their activity data is used. The question is, do users see this as an invasion of their privacy, or a good way to use the data? Do the benefits of the recommender tool outweigh the concerns over privacy?

If someone else were running the VLE, what would we want to know about it?

The testing of the hypothesis will be crucial indicator as to the legitimacy of the project. Positive results from the user testing will (hopefully) take this project on to the next level, and help us move towards some kind of shared service. But we really need to guage of this segment of more ‘advanced’ users can see the benefit, if they believe that the tool has the ability to make a positive impact on their research, then we hope to extend the project and encourage further libraries to participate. With more support from other libraries then hopefully researchers will be one step closer to receiving a library book recommender.

If a charismatic leader were to rouse academics or students to come to our door bearing pitchforks and burning torches, demanding VLE data, what would be the rhetoric — what would they be demanding?

If we could get secret, spy-style access to our deadliest rival institution (identity an exercise for the reader) what would we want to find out to make our VLE more awe-inspiring than theirs?


If we bear these (and similar) questions in mind when we are steering, we shouldn’t go far wrong. Let’s not get caught producing a series of odd, disconnected charts, they need to inspire thought and change. We need charts, data and stats that connect with the machinery of change.

Apr 15, 2011 07:37AM

In terms of the data, what we have is:

We hypothesise that “The provision of a shared recommendation service will increase the visibility and usage of Welsh research outputs”.

who does what So to do a meaningful analysis we have two axes: Who and What. While we’ll give away as much raw data as is possible, we need

This will be demonstrated through quantitative and qualitative assessments:

to provide supporting mappings. Who is dps1001? What is site 85? We also need to make sure, when we anonymise that we don’t lose those aspects that enable external people to ask questions.

1. By a [significant] increase in attention and usage data for items held within the six core institutional repositories

We’re working out how we should take a first stab at Who and What, and are looking at finding sources. I imagine that when we’ve done this first round of analysis we’ll discover the world doesn’t divide up how we imagine. That seems to be the near universal experience of user experience analysis, certainly we learnt in our JISC Academic Networking project that the world of networking isn’t divided up in quite the way we imagined. As we discover this from the activity data, we will iterate around, trying again and again.

2. By establishing a user focus group to explore the potential of the recommendation service and its impact on repository users

The story so far… Apr 15, 2011 07:32AM Sorry about the quietness here over the past couple of weeks: you must be wondering what we were up to.

It might even be worth applying Bayesian Clustering or Entropy-Based Tree Building to see how a machine would cluster behaviour. All very exciting (to me, anyway!). See pages 15–21 of this powerpoint by Allan Neymark at SJSU to see all this simply explained in terms of Simpsons characters.

We’ve been extracting the data from Sakai, which was more difficult than it sounds. Sakai stores its events in a massive SQL table, one after the other, so that it’s tens of millions of rows long before very long at all. Merging tables, fixing corrupt old data, that kind of thing. Anyway, all done now.

Exciting times. At the same time, extremely tedious for the guys doing the database extraction and normalisation. Personally, I seem to have escaped that bit for this project. Phew!

We’re investigating tools to help us analyse the data. Pentaho looks very promising.

Beginning the data capture

But all this is just detail (albeit time-consuming, irritating detail) around the core issue of what data have we got and what can we do with it. To that end we’ve had a few internal workshops, sent out a few emails, bent some ears, and so on.

Apr 13, 2011 08:05AM Further to Dave’s post about grabbing the data, we’ve also had successful sample data from Teesside and De Montfort. Check out Fulup’s blog about Stitching together library data with Excel for more details on DMU’s experience.

Though none of this should be treated as doctrine, and we’re still definitely open to ideas, we thought it was time to do some initial data investigations, now that we have it. The key structuring concept for me is: Who will be interested in our data, and what would they like to know? An easy to imagine, but not entirely encompassing imaginary situations are these.


5 years of book loans and grades at Huddersfield

Some points raised and thoughts: How to recover activity data across harvested from other repositories? Should the combined activity be collated and if so how.

Apr 12, 2011 09:35PM

Questions were mainly on finding ad defining the usage and benefit access gives the user (this is as opposed to thinking about the ‘risk’ first).

I’m just starting to pull our data out for the JISC Library Impact Data Project and I thought it might be interesting to look at 5 years of grades and book loans. Unfortunately, our e-resource usage data and our library visits data only goes back as far as 2005, but our book loan data goes back to the mid 1990s, so we can look at a full 3 years of loans for each graduating students.

Examples of cross data usage for example important docuemnt download frequencies were shown. Anecdotally some of the best presentations had captured data use, graphs showing outcomes etc. from repository activity. Imherntly analysis of repository Activity Data should be a staple for any project.

The following graph shows the average number of books borrowed by undergrad students who graduated with an specific honour (1, 2:1, 2:2 or 3) in that particular academic year…

A couple of links and related projects for data exchange standards:

…and, to try and tease out any trends, here’s a line graph version….

1. PIRUS (Publisher and Institutional Repository Usage Statistics) project,

Just a couple of general comments: the usage & grade correlation (see original blog post) for books seems to be fairly consistent over the last 5 years, although there is a widening between usage by the lowest &

2. Journal Usage Statistics Portal (JUSP)


highest grades the usage by 2:2 and 3 students seems to be in gradual decline, whilst usage by those who gain the highest grade (1) seems to on the increase

Apr 6, 2011 02:37PM In the long term, and assuming that we get agreement to implement STAR-Trak, we anticipate the following benefits from this project: - Reduction in non-completion rates and increase in student learning performance - Reduction in student administration time spent by teaching staff - The ability to model and undertake scenario analysis using Business Intelligence (BI) applications and the data warehouse cubes (a type of database structure for BI) containing the activity data - The creation of a longitudinal repository of student activity data that over time might otherwise be lost - A platform to support harvesting & analysis of data from resource discovery and management tools

Let the data flow… Apr 11, 2011 08:39AM I’m happy to report that the first set of sample data recently emerged from the library management system (LMS) at John Rylands. This process was not as complex as anticipated since nearly all of the relevant data is in one Sybase table which can easily be exported. Each loaned document in this data is identified by a Talis specific ITEM_ID so a little extra work is required to pull the corresponding ISBN from another table. However, this task is believed to be straightforward. For info, the sample data was for just a one hour period on one particular day – 9-10am on Tuesday 8th March 2011 since you ask – and comprised details of 839 transactions, amongst which were 159 new loans.

GoToMeeting Session #1 Apr 6, 2011 02:28PM

That just leaves us with 3.5 million records to go!

Interesting session – although txt could have been easier; worked in the end. It is recorded, which may be unusual to listen to.

As a bloke wiser than myself once said “A journey of a thousand miles begins with a single step.”

From AGtivity we have a range of new results for the user communities. This is the start for visual themes to create an autobiography for video conferencing room nodes: initial linking to attendance rates throughh remote lectures.

Activity Data Analysis at #inf11

A virtual venue called ‘MAGIC’ is used for semester postgraduate lectures. A couple of data mining views are shown below that start to highlight individual course components. The first shows duration values over time, allowing general usage to be considered (allows test periods and extra-curricular items to be seen to aid quality control);

Apr 7, 2011 12:54PM The JISC IE (Information Environment) Programme held a workshop in Aston University (7÷4÷11, #inf11) and there were a series of references to the Activty Data programme which was not inappropriate.

and the second is connection periods.


Cancelled lectures, potentially rapidly increasing or poor attendance, as well as extra lectures are visible.

4. We then worked down through the prioritised list until we hit the development budget and time limits for the project.

Code and Data Discussion

Ideally we would have liked to have a further round of workshops to confirm the requirements. However project timescales and the other commitments of our key business users has meant that we have had to move straight in to the development phase. Risks around this have been mitigated by continuing discussion with users as the detail of requirements is fleshed out.

There was a good discussion on development environments used; a key componet was data type choices, and if there is any universality. Concern how to create time stamped data items that would cross-relate. Main issue of Risk-Benefit regarding confidentiality of data eg http://obd.jisc.ac.uk/rights-and-licensing but next is how to cross-index them? Other links from Franklin Consulting regarding IPR are repeated here;

Hypothesis Apr 6, 2011 02:00PM

Licenses: there is a JISC sponsored IPR and licensing module that can be found at http://www.web2rights.com/SCAIPRModule/ . Within that you might be particularly interested in

Our hypothesis is that by harvesting user activity (usage and attention) data that already resides in existing institutional systems, combining it with demographic information and presenting it in a single portal-type application, we can improve our support services by revealing new information, providing students, tutors and student support officers with a broader picture of a student’s engagement with the University at both an academic and social level.

1. Introduction to licensing and IPR http://xerte.plymouth.ac.uk/play.php?template_id=352 2. Creative Commons license: http://xerte.plymouth.ac.uk/play.php?template_id=344 Data protection: there is some information at http://www.jisclegal.ac.uk/LegalAreas/DataProtection.aspx Agree with cookbook format system in order to mix-and-match results from different projects – Action for this project to follow-up.

Furthermore, being able to predict students at risk of dropping out, based on lack of engagement, will enable us to develop targeted personalised interventions appropriate to the type and level of non-engagement, and do so in a more joined-up and timely manner.

Test Scenario

We plan to evaluate our hypothesis over three time periods:

An interresting mash-up proposal: which requires use of multiple projects’ data with a para-data (linked data) format. As seen in the image above the maths students were released from lectures earlier in the first half of the Semester’s lecture series, now with mash-up on-top of the same graph we could access both the VLE access and relevant research book loans activity date; and see if there are correlations to early finishing lectures.

INTERIM: Qualitative: Feedback on perceived value from students and staff through focus groups PILOT: Qualitative /Quantative: Feedback from staff and students on actual value through focus groups; usage statistics LONG-TERM: Quantative: Analysis of retention rates; NSS scores At the end of the project we will add to this post, summarising the evidence we have gathered during the project and reflecting on whether we think the hypothesis has been successfully tested.


Project Plan

Apr 6, 2011 02:19PM We have had a tremendous reaction to the project from academic and administrative staff. The input has helped us further our understanding of how the application might be used within HEIs to support retention. The strength of STAR-Trak is in facilitating a face-to-face discussion between staff member and student regarding any potential issues around engagement and retention.

Apr 6, 2011 01:42PM Aims, Objectives and Final Outputs of the project The aim of our project is to test our hypothesis that retention rates and student satisfaction can be improved by facilitating a more informed dialogue between students and staff, based on a rich joined-up picture of a student’s academic and extra-curricular engagement with the University. You can download a pdf copy of the details by clicking here: STAR-Trak NG Aims, Objectives and Final Outputs (opens in new window).

User requirements have been elicited by running workshops with key business users. We are in the fortunate position of already having a proof of concept application. Having something to look at makes it far easier for end users to grasp the potential uses for the application and thus come up with requirements. To elicit the final set of requirements the following steps were taken: 1. Requirements from each workshop were captured, reviewed and then transposed into a single spreadsheet. 2. At this point a further review synthesised several requirements, and further detail was added so that the relative effort in implementing each could be assessed. 3. MoSCoW (Must, Should, Could and Would) prioritisation was then applied to the requirements.

Risk Analysis and Success Plan Project risks are recorded in the Risk Register, reported to the Project Board and managed on a day to day basis by the project manager. We have taken the wider definition of risk that includes the potential for positive as well as negative outcomes. You can download a pdf copy of the risk register by clicking here: STAR-Trak NG Risk Register (opens in new window). IPR


21 March 2011 – Combined Tech meeting

Our proposal stated that Leeds Metropolitan will be pleased to comply with the IPR requirements of the call. Specifically all outputs would be made available, at no cost, to the JISC community. As it is our hope that the STAR-Trak:NG software will be further developed into a shared service, it will be published in accordance with JISC’s Open Source Software Policy. At the time of publishing, we have no plans to change our intentions with respect to IPR. 

Items: JISC conference review update and discussion from meetings with Jorum etc. Initial graph outputs were presented for possible web orientated reports – static reports produced; using excel and gnuplot with processing.org to be considered as a longer term solution as a development environment.

Project Team Relationships and End User Engagement This section explains who is on the project team and what responsibilities they have, also information on how engagement with end users will be facilitated. You can download a pdf copy of the details by clicking here: STAR-Trak NG Project Team, Relationships and End User Engagement (opens in new window).

Some open questions which users may ask were raused: What are the correlations in activity data we are wishing to highlight? examples where do users access data from, is there a relationship with cancelled lectures, do data set match calendar entries; how to drill down (visual analytics) for a particular course

Projected Timeline, Workplan & Overall Project Methodology – This section provides information on workpackages, timescales and project methods. You can download a pdf copy of the details by clicking here: Projected Timeline, Workplan & Overall Project Methodology (opens in new window).

Datasets actions: Permission being sought from; from IOCOM, and from AGTkit Script and permission from front-end: Booking Data

Budget This section provides information on the project budget. You can download a pdf copy of the budget by clicking here: STAR-Trak NG Budget (opens in new window).

AGSC data: QA test files, … and GeoTags for nodes API for CO2 units sites to be retrieved Student booking on MAGIC

Weekly Tech Meetings #4-#7

28 March 2011 – Combined Tech Meeting Full staffing is now in place for the AGtivity project with both Robert Frank and James Perrin collaborating.

Apr 5, 2011 10:10AM

The first RISE Project Board meeting

Combined Techy meetings – for the month of March 4 March 2011 – SustainedMAGIC Tech meeting Initial actions in progress:

Apr 5, 2011 09:43AM

Complete is harvesting of Semester 2 lectures (2010–2011) This data gathering has commenced and due to complete Friday 1st April 2011

The first RISE Project Board met on Friday 1st April at the OU Library. Chaired by Project Sponsor Gill Needham, Associate Director of Library Services, the meeting was attended by Judith Pickering representing the DOULS project; Richard Nurse – RISE Project Director; Liz Mallett – RISE Project Manager; Paul Grand – RISE Technical Developer; Hassan Sheikh – Head of Library IT Systems Development; Judy Thomas – Learning and Teaching Librarian Team Leader (Science and Health & Social Care) representing the Improving the Student Experience programme and Clari Gosling – Head of Faculty Team (Maths, Computing and Technology and OU Business School) .

New secondary backup – temporary space to be created – following on a relocation of current Network storage facility Database (mySGL) for MAGIC archive to extract attendance figures and GWT front end to be built Asides: An inspiration from a JORUM user: http://open.jorum.ac.uk/xmlui/handle/123456789/2167 This is Dr Patrick J O’Malley from Physics at the University of Manchester records his voice and screen; most likely using Camtasia

Richard and Liz gave a presentation outlining the objectives of the RISE Project and gave some background on other projects in the JISC Activity Data programme. Here’s the presentation: RISE background for project board mtg 2011 04–01

7 March 2011 – Open Planning meeting with Synthesis Project

View more presentations from Liz Work

Project aims and hypothesis were described to Tom Franklin from the Synthesis Project: a full AGtivity project review http://www.franklin-consulting.co.uk

Liz gave her Project Manager’s report on work package progress to date and budget status. Judith gave some suggestions as to how the RISE Google gadget might be promoted alongside the other OU gadgets, and informed the group of some gadgets that were being discussed to provide recommendations which may

See previous blog on hypothesis etc.


dovetail with what RISE is doing.

Getting this particular bit of data from our Acadmic Planning department was straightforward. An anonymised list of all 2010 graduating students, along with their programme of study and degree classification, and most importantly their unique id number was duely provided.

There was a discussion among the group as to how best to promote the recommender tool (MyRecommendations) in order to ensure we get some useful feedback on top of the student interviews which are planned. One suggestion was to publish a library website news item. There was some concern that this might confuse students as there is already a news item on the website about the new One-Stop search. Therefore care needs to be taken with timing. Other suggestions were to target particular groups of modules and also talk to Student Services about the best direction in which to promote.

Hurdle one - Our security system does indeed log and track student movement through the use of the swipe card entry system, but we are unable to get the systemt to report on this. All data from the system is archived by the system supplier and is subsequently not readily available to us. This means that entry into the Learning Resource Centres is not going to be something we can report upon on this occassion.

Paul gave a demonstration of the MyRecommendations interface. The board members were impressed by how much had been achieved in such a short time. The search functionality was shown, demonstrating the search results, the resulting recommendations and the ratings functionality. It was commented that the interface was clear, attractive and easy to understand. A decision was made to pass a parameter to the EBSCO API so that the results list only shows those items where full text is available.

Hurdle two — Our Network team systematically delete student network accounts upon graduation, which means the record that links an individual’s unique student ID number, Athens account, security number and Library barcode is not available for the students whose library usage we wished to analyse! There were about 4,500 students who graduated from LJMU in 2010 with undergraduate degrees, but unfortunately, by the time I got to speak to the network manager, 3,000 of these had been deleted, as is our institutional practice and poilicy.

It was mentioned that there is some user testing on new tools happening imminently on campus, and a suggestion was made that RISE might use this as an opportunity to do some extra RISE testing if there are any free slots in the schedule for April.

The upshot of all this is that we are only going to be able to provide data for a third of the potential students that we could have provided data for if we had thought to ask these questions earlier on. But at least we are still able to contribute.

There was a discussion about Privacy issues. The RISE Privacy Statement is written and includes an “opt-out” button which allows users to indicate that they would not like their search data to be included in the recommender system.

Focus Groups – I am hoping that the organisation and co-ordination of some student focus groups will be more fruitful, but early indicators suggest that the timing of this is not particularly good as we are now in a reading week which will be followed by end of semster exams and coursework submissions, along with an Easter Bank Holiday weekend and Royal Wedding to be squeezed in. In effect, this is the busiest time of the year for our students. However, we have agreat relationship with our student union and they are normally very helpful and responsive so I am hoping we will have something organised very soon.

A number of suggestions were made as to where to link to the privacy statement from. These were: Link from the Policies page on the Library website. Link on the EBSCO Discovery Service page. Link from SFX The RISE Privacy Statement sits alongside and is additional to the main OU Website Privacy statement ( http://www8.open.ac.uk/about/main/admin-and-governance/polici es-and-statements/website-privacy-the-ou).

What would we do differently? — the lessons learnt in this instance are to do with internal partnerships and communication. When first approached about the project we thought that we had asked the right questions of the right people within the University. However, it is obvious to us now that we should have made sure that we discussed our plans in more detail with the Head of Networks and the Head of Security as they are our means of access to two of the key systems that we require in order for us to obtain the required data. Discussions with key stakeholders are of the utmost importance as they highlight local practices and procedures as well as potential difficulties with systems and contracts (as is the case with our security system)

Our next Project Board meeting will be in June 2011.

Initial hurdles – the LJMU experience Apr 5, 2011 09:33AM Energised by the initial project team meeting, the LDIP team at LJMU set about gathering the required data in order to make our contribution to the project. Having already had a few discussions we were fairly confident that we would be able to gather the required data. We had access to student records and numbers, Athens data, library usage data from Aleph and we were aware that our security system (gate entry into the library) kept a historic record of each individual’s entry and exit into the buildings which are serviced through the swipe card entry system. We just needed to pull all this together through the unique student number.

On a positive note all our stakeholders our excited to be involved in the project and do wish that we could provide more data. Our networks manager has already indicated that he would be happy to delay future network account deletions if we wanted to obtain similar data for our 2011 graduates. To sum up, an interesting couple of weeks at LJMU in our quest to get the LIDP data, and I hope that this post brings with it a few words to the wise……


Project update March

next steps. Over the next few weeks we will be letting people test MyRecommendations, organising our evaluation sessions, doing a few tweaks to the recommendations system and working on the Google Gadget.

Apr 4, 2011 03:06PM MyRecommendations The key achievement for March has been that we now have our RISE MyRecommendations website up and running on the ‘live’ server. It’s a major step for the project as although there is stuff going on in the background it isn’t until you get the search screen working and functioning that you get a real sense of the progress.

Surfacing the Academic Long Tail — Announcing new work with activity data Mar 31, 2011 09:36AM

A screenshot of the front page is shown below. This uses our standard corporate style (with appropriate links to the RISE project and JISC) and provides a search box for our Ebsco Discovery Solution.

We’re pleased to announce that JISC has funded us to work on the SALT (Surfacing the Academic Long Tail) Project, which we’re undertaking with the University of Manchester, John Rylands University Library.

The search page has a suggestions feature, so as you start to type your search into the search box, it will offer you suggested search terms based on previous search terms seen by the system.

Over the next six months the SALT project will building a recommender prototype for Copac and the JRUL OPAC interface, which will be tested by the communities of users of those services. Following on from the invaluable work undertaken at the University of Huddersfield, we’ll be working with ten years+ of aggregated and anonymised circulation data

Once you have searched RISE you will start to get some recommendations being fed back to you. Where articles that result from the same search that you entered have already been looked at they will appear immediately below the search results

amassed by JRUL. Our approach will be to develop an API onto that data, which in turn we’ll use to develop the recommender functionality in both services. Obviously, we’re indebted to the previous knowledge acquired by a similar project at the University of Huddersfield and the SALT project will work closely with colleagues at Huddersfield (Dave Pattern and Graham Stone) to see what happens when we apply this concept in the research library and national library service contexts.

from EDS as ‘People Searching for similar search terms often viewed’ If you are studying at the OU you should also see recommendations for articles that other people on your course have been looking at. We’ve put together a quick screencast to give you an idea of how the system works in practice. MyRecommendations_prototype Other activities We’ve now met with the JISC-funded LUCERO and DOULS projects here at the Open University. With LUCERO we’ve been talking about their experience of releasing data openly. Although we are talking about different types of data in RISE it has provided us with some useful contacts within the institution. The discussion with DOULS covered two areas: firstly, whether their work with Google Gadgets would help us with the challenge of tackling authentication through the OU’s SAMS system; and secondly, whether our Google Gadget could be listed in the ones they will be recommending to users of the OU’s Google Apps/iGoogle environment. Unfortunately, DOULS authentication is linked in with the OU’s Moodle Virtual Learning Environment so doesn’t help RISE with authentication. But yes we can promote our Gadget through their list.

Our overall aim is that by working collaboratively with other institutions and Research Libraries UK, the SALT project will advance our knowledge and understanding of how best to support research in the 21st century. Libraries are a rich source of valuable information, but sometimes the sheer volume of materials they hold can be overwhelming even to the most experienced researcher — and we know that researchers’ expectation on how to discover content is shifting in an increasingly personalised digital world. We know that library users — particularly those researching niche or specialist subjects — are often seeking content based on a recommendation from a contemporary, a peer, colleagues or academic tutors. The SALT Project aims to provide libraries with the ability to provide users with that information. Similar to Amazons, ‘customers who bought this item also bought….’ the recommenders on this system will appear on a local library catalogue and on Copac and will be based on circulation data which has been gathered over the past 10 years at The University of Manchester’s internationally renowned research library.

Following up on Google Gadget authentication we’ve been talking to the corporate IT and SocialLearn teams at the OU. So we’ve now worked out a method of handling the authentication and have documented what we want the Gadget to do. We should be starting the build of the Gadget in the next week.

How effective will this model prove to be for users — particularly humanities researchers users?

Our application for the evaluation work we plan to do with students has been submitted into the Student Research Project Panel. This is a formal OU process that checks that the surveying and research is appropriate. Once approved we will then be able to get details of students that we can contact to ask to test the system.

Here’s what we want to find out: Will researchers in the field of humanities benefit from receiving book recommendations, and if so, in what ways? Will the users go beyond the reading list and be exposed to rare and niche collections — will new paths of discovery be opened up?

We’ve also had our first Project Board meeting which we will cover in another blog post. It was good to get the chance to show a demonstration of MyRecommendations and agree some of the


For anyone who doesn’t know, the project hypothesis states that: Will collections in the library, previously undervalued and underused find a new appreciative audience — will the Long Tail be exposed and exploited for research?

“There is a statistically significant correlation across a number of universities between library activity data and student attainment”

Will researchers see new links in their studies, possibly in other disciplines?

The first obvious thing here is that we realise there are other factors in attainment! We do know that the library is only one piece in the jigsaw that makes a difference to what kind of grades students achieve. However, we do feel we’ll find a correlation in there somewhere (ideally a positive one!). Having thought about it beyond a basic level of “let’s find out”, the more I pondered, the more extra considerations leapt to mind!

We also want to consider if there are other potential beneficiaries. By highlighting rarer collections, valuing niche items and bringing to the surface less popular but nevertheless worthy materials, libraries will have the leverage they need to ensure the preservation of these rich materials. Can such data or services assist in decision-making around collections management? We will be consulting with Leeds University Library and the White Rose Consortium, as well as UKRR in this area.

Do we need to look at module level or overall degree? There are all kinds of things that can happen that are module specific, so students may not be required to produce work that would link into library resources, but still need to submit something for marking. Some modules may be based purely on their own reflection or creativity. Would those be significant enough to need noting in overall results? Probably not, but some degrees may have more of these types of modules than others, so could be worth remembering.

(And finally, as part of our sustainability planning, we want to look at how scalable this approach might be for developing a shared aggregation service of circulation data for UK University Libraries. We’re working with potential data contributors such as Cambridge University Library, University of Sussex Library, and the M25 consortium as well as RLUK to trial and provide feedback on the project outputs, with specific attention to the sustainability of an API service as a national shared service for HE/FE that supports academic excellence and drives institutional efficiencies.

My next thought was how much library resource usage counts as supportive for attainment. Depending on the course, students may only need a small amount of material to achieve high grades. Students on health sciences/medicine courses at Huddersfield are asked to work a lot at evidence based assignments, which would mean a lot of searching through university subscribed electronic resources, whereas a student on a history course might prefer to find primary sources outside of our subscriptions. 

Focus Group Mar 31, 2011 03:31AM

On top of these, there all kinds of confounding factors that may play with how we interpret our results:

We have established a user Focus Group with six core partners in order to explore the potential of the recommendation service and its impact on repository users. The institutions involved are:

What happens if a student transfers courses or universities, and we can’t identify that?

Aberystwyth University Bangor University University of Wales, Trinity Saint David University of Glamorgan University of Wales Institute Cardiff University of Wales, Newport

What if teaching facilities in some buildings are poor and have an impact on student learning/grades? Maybe a university has facilities other than the library through the library gates and so skews footfall statistics? How much usage of the library facilities is for socialising rather than studying?

The first Focus Group meeting will take place on 7th June at Aberystwyth University.

Certain groups of students may have an impact on data, such as distance learners and placement students, international students, or students with any personal specific needs. For example some students may be more likely to use one specific kind of resource a lot out of necessity. Will they be of a large enough number to skew results?

Along with repository managers, we have also identified and invited a small ‘cross-section’ of users from these institutions i.e. undergraduate and postgraduate students, an academic, researcher and an administrative staff member. An evaluation consultant will also be attending the meeting. The recommendation service software will have been deployed at the six institutional repositories by the end of May and the service will be demonstrated and evaluated during the meeting.

Some student groups are paid to attend courses and may have more incentive to participate in information literacy related elements e.g. nurses, who have information literacy classes with lots of access to e-resources as a compulsory part of their studies.

Hypothesis musings. Mar 24, 2011 10:07AM

A key thing emerging here is that lots of resource access doesn’t always mean quality use of materials, critical thinking, good writing skills… And even after all this we need to think about sample sizes – our samples are self-selected, and involve varying

Since the project began, I’ve been thinking about all the issues surrounding our hypothesis, and the kind of things we’ll need to consider as we go through our data collection and analysis.


sizes of universities with various access routes to resources. Will these differences between institutions be a factor as well? All we can do for now is take note of these and remember them when we start getting data back, but for now I set to thinking about how I’d revise the hypothesis if we could do it again, with a what is admittedly a tiny percentage of these issues considered within it: “There is a statistically significant correlation between library activity and student attainment at the point of final degree result” So it considers library usage overall, degree result overall, and a lot of other factors to think about while we work on our data!


Profile for Helen Harrop

JISC AD MakePDF: 10 May 2011  

JISC Activity Data project blogs and #jiscad tweets

JISC AD MakePDF: 10 May 2011  

JISC Activity Data project blogs and #jiscad tweets