JISC AD Tabbloid: 16 May 2011 by Helen Harrop

16 May 2011

Todayâ&#x20AC;&#x2122;s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

JISCAD - TWITTER SEARCH

many thanks for all the #cilipw11 tweet comments & questions! will try & collate and answer them on the #jiscad #lidp project blog :-)

#jiscad Library Impact Data Project currently being presented on at #cilipw11 by @daveyp. See http://j.mp/kaXMoo for presentation

MAY 13, 2011 12:19P.M.

MAY 13, 2011 11:17A.M.

JISCAD - TWITTER SEARCH

Now doing project with partners (tags #lidp #jiscad ) Looking at 5 years of data to identify trends. Interesting results. #cilipw11

JISCAD - TWITTER SEARCH

2011 looks set to be the year of the Cook Book Metaphor: http://bit.ly/j9bLta and http://bit.ly/kHkJLG #opendata #jiscad

MAY 13, 2011 11:18A.M.

MAY 12, 2011 05:26P.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

Further details of how to find and use the Gadget are included on our new Search Interfaces page. This page provides details of both the main RISE search interface at http://library.open.ac.uk/rise and the Google Gadget.

RISE project Google Gadget now available http://bit.ly/k97KOa and new Search Interfaces page added http://bit.ly/kZguBm #jiscad #ourise

JISCAD - TWITTER SEARCH

Phew — finished exporting & munging Hudders #jiscad #lidp data for @librarygirlknit to analyse. Usage data for a total of 46,575 graduates!

MAY 12, 2011 12:25P.M.

MAY 12, 2011 08:47A.M. RISE

Google Gadget and Search Interfaces page

ACTIVITY DATA

MAY 12, 2011 12:23P.M.

Information Commissioner’s Office publishes UK code of practice on data sharing MAY 11, 2011 09:45A.M. For those of you thinking of sharing or publishing personal data as a result of these projects may be interested in the “Data sharing code of practice” from the Information Commissioner’s office. A mere 59 pages available as a pdf from the Information Commissioner’s office. A few quotes may give you a little of the flavour:

The RISE prototype Google Gadget is now available for use. This is a Google Gadget version of the main RISE interface that allows you to search our One-Stop e-resources service and see recommendations provided by RISE.

“As I said in launching the public consultation on the draft of this code, under the right circumstances and for the right reasons, data sharing across and between organisations can play a crucial role in providing a better, more efficient service to customers in a range of sectors – both public and private. But citizens’ and consumers’ rights under the Data Protection Act must be respected.”

It can be downloaded from the Google Gadgets directory here, or added by manually adding this link into the Add Stuff > Add feed or Gadget feature on your iGoogle desktop.

“Organisations that don’t understand what can and cannot be done legally are as likely to disadvantage their clients through excessive caution as they are by carelessness.”

The first time you use the Gadget it will ask you to sign in to the Open University using your computer login (external users can create a computer login and will be able to see search results and recommendations but won’t be able to connect to licensed resources).

“the code isn’t really about ‘sharing’ in the plain English sense. It’s more about different types of disclosure, often involving many organisations and very complex information chains; chains that grow ever longer,

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

crossing organisational and even national boundaries.”

• What risk does the data sharing pose? ....

The code covers activities such as:

• Could the objective be achieved without sharing the data or by anonymising it? [my emphasis] It is not appropriate to use personal data to plan service provision, for example, where this could be done with information that does not amount to personal data.

• two departments of a local authority exchanging information to promote one of the authority’s services; • a school providing information about pupils to a research organisation;

• Do I need to update my notification?

By ‘data sharing’ we mean the disclosure of data from one or more organisations to a third party organisation or organisations, or the sharing of data between different parts of an organisation. Data sharing can take the form of:

• Will any of the data be transferred outside of the European Economic Area (EEA)?

• a reciprocal exchange of data;

Whilst consent will provide a basis on which organisations can share personal data, the ICO recognises that it is not always achievable or even desirable.

• one or more organisations providing data to a third party or parties;

If you are going to rely on consent as your condition you must be sure that individuals know precisely what data sharing they are consenting to and understand its implications for them. They must also have genuine control over whether or not the data sharing takes place.

• several organisations pooling information and making it available to each other; • several organisations pooling information and making it available to a third party or parties;

— it goes on to say where consent is most appropriate and what other conditions allow sharing (p14-15), with some examples of what is permissable

• different parts of the same organisation making data available to each other.

The general rule in the DPA is that individuals should, at least, be aware that personal data about them has been, or is going to be, shared – even if their consent for the sharing is not needed.

When we talk about ‘data sharing’ most people will understand this as sharing data between organisations. However, the data protection principles also apply to the sharing of information within an organisation – for example between the different departments of a local authority or financial services company.

The Data Protection Act (DPA) requires organisations to have appropriate technical and organisational measures in place when sharing personal data.

When deciding whether to enter into an arrangement to share personal data (either as a provider, a recipient or both) you need to identify the objective that it is meant to achieve. You should consider the potential benefits and risks, either to individuals or society, of sharing the data. You should also assess the likely results of not sharing the data. You should ask yourself:

followed by lots of useful guidance on this area covering both physical and technical security

• What is the sharing meant to achieve? ... • What information needs to be shared? ....

It is good practice to have a data sharing agreement in place, and to review it regularly, particularly where information is to be shared on a large scale, or on a regular basis.

• Who requires access to the shared personal data? .....

and outlines what should be covered by the agreement (p25)

• When should it be shared? ....

it is good practice to carry out a privacy impact assessment.

• How should it be shared? .... Agree common retention periods and deletion arrangements for the data you send and receive.

• How can we check the sharing is achieving its objectives? ....

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

Things to avoid

usual.

• Misleading individuals about whether you intend to share their information.

So anyway onto the round up of the latest happenings in the world of #JISCAD ... if you look beyond my ‘curses to all technology’ headline tweets you will be treated to a rather uplifting post from the #JISCSALT project blog which reports a positively excited response from users to the prospect of academic library recommendations ... I don’t have a data server but if I did then I would print out their article and tape it above :)

• Sharing excessive or irrelevant information about people. • Sharing personal data when there is no need to do so • Not taking reasonable steps to ensure that information is accurate and up to date before you share it.

The #OURISE project have been dipping their toes into the robust anonymisation pool and delving beyond into the technical depths to look at how they will release their recommender data openly. They’re looking for feedback as to the most useful format for the data they release but their current thinking is both XML and as a MySQL database. They’re also soliciting feedback on their XML record format (which is based on the one developed by Mark van Harmelen as part of the MOSAIC project) so it looks like we have the makings of another ‘recipe’ emerging for our cookbook.

• Using incompatible information systems to share personal data, resulting in the loss, corruption or degradation of the data. • Having inappropriate security measures in place, Section 14 is on data sharing agreements pp41-3 Section 15 provides a data sharing checklist p46

The #OURISE project have also shared some useful information regarding how they’re making use of Google analytics to segment the behaviour of their users. And as if that wasn’t enough, they reported that development of the RISE Google Gadget is complete and ready to be put through its paces in the user evaluations. I don’t have the authority to hand out gold stars but if I did then the RISE project team would get one this week ;-)

the case study on p 55 covers research using data from other organisations

ACTIVITY DATA

Tabbloid is dead, long live the tabloid

The rest of the stories in this week’s newspaper are from previous weeks and you’ll be glad to know that I won’t be treating you to a re-synthesis of those stories. Hopefully by next week technology will be behaving more co-operatively!

MAY 11, 2011 02:57A.M.

JISCAD - TWITTER SEARCH

oh fruitloops ... looks like issuu.com doesn't like the #jiscad MakePDF file I uploaded: http://bit.ly/kI3g86 #shakesfistattechnology MAY 10, 2011 11:40P.M.

Technology has not been a friend of mine these past few weeks and after several attempts to rescusitate the failed weekly Tabbloid service I accepted defeat and looked for an alternative. So it is with great pleasure that I introduce you to the new weekly digest using FiveFeeds’ open source MakePDF PDF newspaper maker. Alas it rendered as a series of mostly blank pages when I uploaded it to Issuu.com so the battle is not quite won, but if you click on the image above then it will dynamically generate the PDF on the spot for you. I’ll also be emailing a copy out as

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

the fivefilters open source pdf newspaper maker has come to the rescue: http://bit.ly/F6NQp #goodbyetabbloid #jiscad

Planning my jaunt to Llandrindod for #cilipw11 where I'll be talking about #jiscad #ildp

MAY 10, 2011 05:06P.M.

MAY 09, 2011 02:00P.M.

JISCAD - TWITTER SEARCH

SALT - SURFACING THE ACADEMIC LONG TAIL

Just had team meeting with Tom Franklin for #jiscsalt #jiscad. Ended up visualisations. See 'visualcomplexity.com' Nice: http://bit.ly/V6bdE

janinerigby MAY 09, 2011 10:05A.M. I’ll admit it, I’m prepared to out myself, I’ve just finished a post graduate research degree and more than once I have used the Amazon book recommender. In fact when I say more than once, possibly over the course of my studies we’ll be getting into double figures. I’m not ashamed, (I may be about using Wikipedia, but let’s not go there), but I’m not ashamed because I did and so did many of my peers. There may be more traditional methods to conduct academic research, but sometimes, with a deadline looming and very little time for a physical trip to the library to speak to a librarian, finding resources in one or two clicks is just to attractive. My hunch is many other scholars also use this method to conduct research. Recently on another Copac project we facilitated some focus groups. The participants in the groups were postgraduate researchers, a mix of humanities and STEM. Some had used Copac before others had not. Although the focus groups were answering another hypothesis I couldn’t resist asking the gathered group, if they would find merit in a book recommender on Copac which was based on 10 years of library circulation data from a world class research library? It’s not often you see a group of students become visibly excited at the thought of a of new research tool, but they did that night. A book recommender, would make a positive impact on their research practices and was greeted with enthusiasm from the group. I thought it was worth mentioning this incident, because when the going gets tough, and we are drowning under data, it might be worth remembering that users really want this to happen.

MAY 10, 2011 03:28P.M.

JISCAD - TWITTER SEARCH

in other news, my @tabbloid epaper thingy has failed to arrive for a third week .. anyone recommend an alternative collater? #jiscad #drats MAY 10, 2011 12:29P.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Proposed RISE record XML format

AEIOU - The Business Case

Start <useRecordCollection> <useRecord>

MAY 09, 2011 08:58A.M. The AEIOU project had an interesting visit from Tom Frankin on 21st April who helped us to develop two technical ‘recipes’ for the Activity Strand cookbook. These still need to be refined but it was an informative exercise, particularly for me as project manager, as it helped me to understand the processes involved in software development.

Basic data: Institution, year and dates <from> <institution>Open University </institution>

Tom also took a look at the business case I am putting together and gave me some extremely useful advice about not trying to oversell the benefits as this weakens the message. I do need all the advice I can get given the current financial climate as it is not an easy task to convince a management team, trying to find ways to save money, of the benefits of a product. I suspect this will apply to all the projects.

RISE

Open Recommender data

MAY 06, 2011 02:31P.M. </extractedOn> One of the aspirations of the RISE project is to be able to release the data in our recommendations database openly. So we’ve been thinking recently about how we might go about that. A critical step will be for us to anonymise the data robustly before we make any data openly available and we will post about those steps at a later date.

<source>OURISE </source> </from>

Once we have a suitably anonymised dataset our current thinking is to make it available in two ways:

Resource data <resource>

•

as an XML file; and, <media>Article </media>

• as a prepopulated MySQL database. The idea is that for people who are already working with activity data then an XML file is most likely to be of use to them. For people who haven’t been using activity data and want to start using the code that we are going to be releasing for RISE then providing a base level of data may be a useful starting point for them. We’d be interested in thoughts from people working with this type of data about what formats and structures would be most useful.

XML format For the XML format we’ve taken as a starting point the work done by Mark van Harmelen for the MOSAIC project and were fortunately able to talk to him about the format when he visited to do the Synthesis project ‘Recipes’ work. We’ve kept as close to that original format as possible but there are some totally new elements that we are dealing with such as search terms that we need to include. The output in this format makes the assumption that re-users of this data will be able to make their own subject, relationship and search recommendations by using the user/resource/search term relationship within the XML data.

or <globalID type=”ISSN”>09410643 </globaLID>

<globalID type=”EDSN”>12345678 Accession number]</globaLID>

[Ebsco

<author>Cyr, Andre </author> <title>AI-SIMCOG: a simulator for spiking neurons and multiple animats’ behaviours

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

</title>

</retrievedFrom>

End record, more records </useRecord>

<journalTitle>Nature </journalTitle>

<!– more useRecords here if need be –>

We are interested in any feedback or comments on whether this format makes sense or would be useful or whether there are changes you think we should make. You can either leave a comment on the blog or email us at Rise-project

JISCAD - TWITTER SEARCH

SO what sort(s) of data do universities collect? http://bit.ly/keDxxI #jiscad

</resource> User context data <context>

MAY 05, 2011 11:46A.M.

<user> anonymised UserID </user> <sequenceNumber>1 [Note: sequence number already stored within database] </sequenceNumber>

JISCAD - TWITTER SEARCH

RT @richardn2009: RISE project blog April update http://bit.ly/mabobf #jiscad #ourise

</useDate> For students: [propose to map to a subject ] <courseCode type=”subject”>Engineering </courseCode>

MAY 04, 2011 02:32P.M.

<progression>UG2 [F, UG1, UG2, UG3, UG4, M, PhD1, PhD2, PhD3+ (F is for foundation year) ] </progression> For staff <progression>Staff </progression> </context> Retrieved from <retrievedFrom> <searchTerm>artificial intelligence </searchTerm>

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

RISE project blog April update http://bit.ly/mabobf #jiscad #ourise MAY 04, 2011 02:28P.M. The three types of recommendation we are making (relationship, course and search) are all identified separately. Analytics tells us how many times each are being clicked on and also gives you the ability to be able to segment the results with different options. So, for example, you can see how the behaviour of new and returning visitors differs.

JISCAD - TWITTER SEARCH

RT @lorcanD Have any libraries got the equivalent of vendor forums to allow their users discuss issues/solutions among themselves? [#jiscad] MAY 04, 2011 01:54P.M.

We have also setup Google Analytics so we can see which recommendations are chosen by users from the list. See the screenshot below. Unsurprisingly the top recommendation is the most commonly used for course and relationship recommendations. But for search, the second recommendation is most commonly used. We’ll be doing some detailed analysis of what analytics tells us about the behaviour of users of the recommendations later in the project.

RISE

Project update April MAY 04, 2011 01:37P.M. RISE search interface April was the month that saw a lot of the technical developments come together. The RISE search interface went ‘live’ at http://library.open.ac.uk/rise/ in the middle of the month just before Easter. We’ve managed to pick a fairly quiet time of year to launch it, but that’s the way things go with a short project, you can’t always pick the best time of year to launch. But since the launch we’ve already had over 300 page views and 95 unique visitors.

RISE Google Gadget

RISE and Google Analytics tracking We are using Google Analytics to track use of the search tool. By using a Custom dimension we are able to track how many times each type of recommendation is being used as you can see from the screenshot below.

We’ve also completed the creation of our RISE Google Gadget. This is a slightly cut-down version of the main RISE interface to fit into a gadgetsize but it includes most of the key features, as you can see from the screenshot below. In the main we have simply reduced the number of recommendations and search results that are shown to users. So users

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

will see five search results rather than ten for example.

JISCAD - TWITTER SEARCH

my synthesis of last week's #jiscad blogging highlights: http://bit.ly/kX0U4u [sadly Tabbloid free this week]

There’s an abbreviated results description so you just see the first part of the search results title and we will be interested to see how much of a disadvantage that might be. When we tested the full interface with library staff initially we had some feedback about what information beyond the title they thought it would be useful to see. We have also dropped the ability to rate recommendations from the gadget as it was tricky to develop and likely not to be very intuitive to use.

MAY 04, 2011 01:35P.M.

ACTIVITY DATA

Set doors to manual MAY 04, 2011 05:56A.M. This week’s round-up of activity on the project blogs and twitter will have a distinctly rustic and hand-cranked feel because the Tabbloid service that usually does a mighty fine job of collating it all for me appears to be on strike. First off I’d like to send a, slightly belated, message of congratulations to the OU RISE team who went live with their My Recommendations tool towards the end of April. It was also pleasing to see that they found Mark van Harmelen’s synthesis project visit a useful process to go through. It sounds like they’ve provided Mark with plenty of ‘food for thought’ for the synthesis Cookbook (don’t worry, there’s plenty more cookery-based puns where that came from).

The gadget still searches the Ebsco Discovery Solution in the same way and brings back results within the gadget, with the full text being shown in a new window. We’ve included links to the Privacy policy and some FAQs.

While I’m on the topic of the Cookbook it’s probably a good time to bring Mark’s explanation of the cookery metaphor to your attention. Tom Franklin kindly submitted a preliminary recipe for chocolate fudge brownies which hopefully you’ll all have a go at ... and I hasten to add that I will gladly volunteer for any user testing that you carry out on that particular recipe. Joking aside, we will be releasing and refining the ‘recipes’ over the coming months and your input will be much appreciated.

The gadget is currently being tested with library staff and will be made available for wider use very soon. Evaluation and testing Now the developments have been completed we are working more on the evaluation and testing stage. Our research with students has been approved by the appropriate panel and we’re expecting to be able to start contacting people for our evaluation shortly.

The AEIOU project team have been in a ponderous mood on their blog where they’ve been contemplating what will be the best tool to use for digesting and regurgitating data for their recommender service. The conclusion of their pondering appears to be an FLA soup served on a bed of SQL database (n.b. it’s *really hard* to drop the cookery metaphor once you start). The AEIOU project have been able to make use of a DSpace/EPrints patch that the PIRUS2 project released - to be honest the technical side of what they’re doing is slightly beyond me but it is good to see a project benefitting from a previous JISC project’s open innovation in this way. The twitter feed for #jiscad has been quiet - which is understandable given the ratio of bank holidays to work days over the last couple of weeks. One of the (tangential) things I tweeted about that I think is

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

worth highlighting again is the Digging into Data Challenge which is open for applications until 16 June 2011. There is also a free conference in June but unfortunately it’s in Washington DC - hopefully there will be a livestream or at least a good deal of tweeting that we can follow. The challenge has been covered on the Times Higher website where they highlight the challenge of technology as enabling researchers to navigate the vast ocean of data that technology is making available through digitisation etc. This places data in an ecosystem where the data needs to be made usable in order for a cycle of virtuosity to be unleashed.

ACTIVITY DATA

Free report from Martin Butler Research on Social Media Analytics Comparison APR 27, 2011 03:13P.M. There is a free report from Martin Butler on Social Media Vendor Analytics.

JISCAD - TWITTER SEARCH Included in this brief report are profiles of eleven social media analytics solutions.

@daveyp I'm in a state of prechurning confusion. But it does at least look as if we're going to have some churnable material. #jiscad

Alterian: Sophisticated offering suitable for medium to large organisations, with excellent sentiment analytics. Brandwatch: High quality social data and good sentiment analysis tools. Coremetrics: Very sophisticated solution with deep integration into IBM’s WebSphere technologies. IBM Social Media: Combination of services and solutions typically for large corporations. Lithium: Enterprise level solution with strong collaboration features. MutualMind: Closed loop model for quick conversion of information into action. NM Incite: Top-end offering from Nielson and McKinsey. Many unique features - at a price. Radian6: Very rich visual analysis environment - just been acquired by Salesforce. SAS: Large, complex solution with opportunity to perform almost any kind of analysis. SocialSprout: Easy to use - ideal for agencies, small businesses and individuals. uberVU: Excellent all-round capability, providing a very cost effective solution. Viralheat: Very sophisticated analytics for a very modest price.

APR 28, 2011 02:05P.M.

JISCAD - TWITTER SEARCH

@mcleod you'll find some ppl via jisc work on activity data #jiscad & business intelligence, & in “learning analytics” #lak /cc @dajbelshaw APR 27, 2011 09:04P.M.

Report available at http://www.martinbutlerresearch.com/social-mediaanalytics/

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

handle courses in MyRecommendations and specifically what we do when users change courses. We currently take a feed from our systems that tell us which courses a student is studying. [and for the OU, where students study a module at a time, that can be several modules]. Talking through things yesterday made it clearer that we need to be able to store the historical course data, so that the system keeps the link between the course the user is currently studying and the resources viewed. So we need to come up with a solution to this that doesn’t overwrite the courses a student was taking last year with the courses they are taking this year.

OU RISE blog post on synthesis project visit http://bit.ly/hOtn40 - recipes for the Activity Data cookbook and xml formats #jiscad #ourise

It still leaves us with having to think carefully about how we handle searches for users who are studying multiple courses at a time but we already knew that. That may well be something that is unique to the OU anyway. Our current thinking is that the number of students studying courses that are widely different is likely to be low, it is more likely that they would be studying related courses. In any case a student might well search for things entirely unrelated to their course. So our default would be to associate the searches with all the student’s courses and rely on the relevance ranking process to make sure that unrelated articles don’t appear high in the recommendations list. If testing finds this to be a problem then we could look at the approach Dave Pattern is suggesting in LIDAP and use a threshold discounting one-off relationships.

APR 21, 2011 09:24A.M.

RISE

Synthesis project visit 20th April APR 21, 2011 09:15A.M.

Data release formats It was also a good opportunity to talk to Mark about XML data formats for the data we hope to release openly. Mark wrote the original MOSAIC project data collection guide which outlined an XML format for user activity data. For RISE we’ve revisited this format and tweaked it a bit to handle the e-resource data that we’re concerned with. There are a few things we would need to change about the course data and the resource information descriptions. Mark offered the really valuable insight that we only really needed to be able to provide user, resource and search term data. We didn’t need to make explicit recommendations within the data we released as people could use the data to build their own. That’s been really helpful and we are revising the draft format and plan to post it on here in the near future and talk to people about it.

Activity data cookbook The RISE team had a really good session yesterday working with Mark van Harmelen from Hedtek to go through and develop a series of ‘recipes’ for the Activity Data ‘cookbook’. These recipes describe the processes involved in the software that we have created to handle activity data. Over the course of three hours we managed to get five processes down on paper, covering the three types of recommendations we are making in RISE, the processes for parsing the EZProxy log files and the process we use to get course information into MyRecommendations. We actually found it to be quite a good way of describing and documenting what the project software is doing. It seemed to be a bit easier to do than we’d expected and it was certainly a useful discipline to have to explain to someone from outside the project how things worked.

It also provided a useful challenge to us as it uncovered at least one issue that we need to do some more thinking about. This relates to how we

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

ACTIVITY DATA

particular and noting their interest in ingredients: Firstly, Elizabeth David, doyenne of Mediterranean and French cookery who …

The cookery metaphor APR 21, 2011 08:19A.M.

“Born to an upper-class family, David rebelled against social norms of the day. She studied art in Paris, became an actress, and ran off with a married man with whom she sailed in a small boat to Greece. They were nearly trapped by the German invasion of Greece in 1940 but escaped to Egypt where they parted. She then worked for the British government, running a library in Cairo. While there she married, but the marriage was not long lived. After the war, David returned to England, and, dismayed by the gloom and bad food, wrote a series of articles about Mediterranean food that caught the public imagination. Books on French and Italian cuisine followed, and within ten years David was a major influence on British cooking. She was deeply hostile to second-rate cooking and to bogus substitutes for classic dishes and ingredients. She introduced a generation of British cooks to Mediterranean food hitherto barely known in Britain, such as pasta, Parmesan cheese, olive oil, salami, aubergines, red and green peppers, and courgettes.”

I had a great day today, visiting the Open University Projects RISE and UCIAD. RISE has implemented three kinds of recommendations that appear on search pages and has rolled this out to library users. The recommendations for serial items, the three types of recommendation are based on

Wikipedia on Elizabeth David

1. Choices made from prior search results for the same search term

No mean feat, to change the eating habits of a country.

2. Choices made by users taking the same course

More to the point, I remember reading her cookbooks and enjoying lengthy discussions about good ingredients, a proper concern for a cookbook. The second author is Shizuo Tsuji who wrote a standout work, The Simple Art of Japanese cook, and he properly devotes space to ingredients too. I’d show you pictures but it’s all copyrighted so if you want to, scroll down at Google books till you get to illustrations of ingredients.

3. What users went on to select after the same choice from the current user’s last choice These are shown on as many pages as they are applicable on, eg if the user is staff 2 is not shown, if there are no results nothing is shown. UCAID is developing the infrastructure to gather and ontologies to reason over activity data obtained from multiple sources of different kinds, eg Web sites, blogs, Library services, VLEs. Interestingly, the UCAID trace ontology is being built from the bottom up while examining linked data created from the logs (using a KMi tool, NeOn). While visiting UCAID and hearing more about it, I felt that this is in important longer-term project for the activity data in that the linked data approach will allow great flexibility over queries that are made over activity data, and via the ontology, the ability to infer user centric information. For example, following up an interest of mine, we discussed how this approach could be leveraged to “find me people who are interested in learning the same kinds of things as I am.” (Answer, for general purpose uses, needs a bit of technology to ‘understand’ text, certainly available in the future, we felt.)

So what’s an ingredient? Something you can use. In the UCIAD case, the first example is their UCIAD ontology. But an ingredient can be a data set too, like LIDP’s activity data that is collected from multiple institutions, or RISE’s database contents, or UCIAD’s triple store contents. And there are more. Back to Manchester, I thought about NeOn toolkit (produced by the NeOn Project ). This is a tool being used in the construction of UCIAD’s ontology. For a cookbook metaphor, what about tools, where do they fit? Clearly as cooking utensils, and of course my cookbook heroes do also talk about utensils. I remember Elizabeth David discussing copper omelet pans, and Tsuji writing about chopsticks for cooking tempura, thick in diameter so as to handle the tempura more gently than standard

While we were talking, I felt that the cookbook approach needed another component, an ingredient. I remember reading two cookbook authors in

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

diameter cooking chopsticks.

What I am hoping is that the process of eliciting and gathering recipes will elicit a good amount of data about the systems being developed. For example, it was revelatory to assist in recording three recipes for the three types of recommendations in RISE: By recording salient features of each of the three processes they suddenly stood, at a conceptual level, they stood in sharp relief to each other by virtue of differences expiated in the recipes.

Of course NeOn is in this class of objects, one uses it to prepare something, an ingredient, the UCAID ontology. So basically what we are using here is a metaphor, it turns out that we can use the metaphor of cookbook, recipes, ingredients, utensils, and dishes (we shouldn’t forget dishes!) for some useful computer system concepts:

And that’s all folks, have a happy Easter break! ----

• Utensils – programs or scripts • Ingredient – data, be it usage data or data that describes usage data (eg an ontology).

Footnote: For completeness from a computer scientist, but perhaps ignore this, an ingredient could also parameterise a process or specify how a program or script operates.

• Dish – something for the user to consume (or interact with) • Recipes – description of a process whereby ingredients are transformed to (or change) another ingredient or produce some dish

But something to mull over (vegetarians excepted, and apologies to you):

• Cookbook – a bunch of descriptions of processes.

Pretty good, although we didn’t intend to have as complete an activitydata computer-system description as offered by the metaphor, but those canny cooks have thought of it all. Perhaps it’s a consequence of the nature of making something; perhaps we could chosen any process of making something and used that as the metaphor. For example, we might have used flat-pack furniture construction instructions as the central part of the metaphor. Then in the metaphor we also have tools, bits of disassembled furniture, the piece of furniture assembled by following the instructions, and of course a collection of flat pack assembly instructions. (Don’t ask, but yes, really, honest I lie not :) this is a believable metaphor,of course I keep all my flat pack instructions just in case I find a need to dissemble to and re-assemble in the proper order.) No don’t believe me? ME neither! So it doesn’t take any consideration: The cooking metaphor is viable, perhaps because food is a feature of everyday life, and flat pack assembly as a metaphor is plainly just weird, probably because those instructions are often close to useless, and, additionally, how often do we get new flat pack furniture?

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Who What Why and When? APR 21, 2011 04:20A.M. How best to represent the activity data we’re gathering and passing around? Several projects (PIRUS2, OA-Statistics, SURE, NEEO) have already considered this and based their exchange of data (as XML) on the OpenURL Context Object - the standard was recommended in the JISC Usage Statistics Final Report. Knowledge Exchange have produced international guidelines for the aggregation and exchange of usage statistics (from a repository to a central server using OAI-PMH) in an attempt to harmonise any subtle differences. Obviously then, OpenURL Context Objects are the way to go but how far can I bend the standard without breaking it? Should I encrypt the Requester IP address and do I really need to provide the C-class Subnet address and country code? If we have the IP addresses we can determine subnet and country code. Fortunately the recommendations from Knowledge Exchange realised this and don’t require it.

The point to all of this meandering is that cookbooks fulfill, for the synthesis team, a way of looking at project outputs, surfacing information that will inform whatever synthesis we do. One of the interesting things about synthesis activities is that one can’t say where the activity ends up, much depends on the way the informing information flows together and the patterns it reveals. Of course one can say things like “we can do some work in architecture”, or “we can attempt to build a taxonomy of activity data based systems”, or “we can pull together different projects’ user feedback into recommendations”, but one can’t foretell precisely what will turn up, and what turns the synthesis will take.

So for the needs of this project where we’re concerned with a closed system within a National context, I think I can bend the standard a little and not lose any information. I can use an authenticated service. I also want to include some metadata - the resource title and author maybe. So here’s the activity data mapped to a Context Object

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

• Timestamp (Request time) Mandatory

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Recommendation service

• Referent identifier (The URL of the object file or the metadata record that is requested) Mandatory

APR 21, 2011 03:17A.M.

• Referent other identifier (The URI of the object file or the metadata record that is requested) Mandatory if applicable

Do I really want to write a SOAP service (SUSHI use SOAP but don’t seem to have mature Java open-source client/servers available for hacking!)? How about a REST service? This could be a neat solution using something like Apache CXF.

• Referring Entity (Referrer URL) Mandatory if applicable • Requester Identifier (Request IP address - encrypted possibly!) Mandatory

Either of these would be great but as I only have a few simple data requests to execute, I’ve decided to go for a quick and easy solution Apache XML-RPC deployed in a servlet.

• Service type (objectFile or descriptiveMetadata) Mandatory Behind this I’m using a MySQL database with Apache DBCP handling connections and queries. I was going to use the lightweight mybatis data mapper framework (formerly known as iBatis) but again, as I’m only using a couple of queries it isn’t really worth the overheads for the flexibility it provides.

• Resolver identifier (The baseURL of the repository) Mandatory

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Consuming and Querying data

The test set-up is working so now all I’ve got to do is tidy it up, deploy the server as a service and deploy clients within the six DSpace institutional repositories. How long have I got?

APR 21, 2011 04:04A.M. So what service should I use? My first thoughts were to push data to a SQL database, then I thought of Solr. Solr is fast and efficient, it’s great for powerful full-text and faceted search, hit highlighting and rich document (e.g., Word, PDF) handling. So I pushed OpenURL Context Objects from DSpace to Solr and used simple queries to view the captured activity data.

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Hunting and Gathering data APR 21, 2011 02:59A.M.

Then I thought again. What data do I want returned from a recommendation service? I just want a few item handles and some metadata as suggestions to view alongside the current resource. I found I could do this using an SQL query on a test database but wasn’t sure if I could construct queries with inner joins using Solr. I’m not that familiar with Solr and couldn’t find what I wanted. A patch was available for the latest release that could do this but then again, maybe this isn’t one of Solr’s strengths ..or maybe I don’t have the right data structure.

The PIRUS2 project has conveniently produced a patch for DSpace (and EPrints) for capturing activity data and either making it available via OAI-PMH or pushing it to a tracker service. I’m grateful to Paul Needham from Cranfield who gave me an insight in to the architecture they were using. I patched the DSpace code and was soon making usage data available for harvesting. However, I wanted to avoid the hassle of harvesting via OAIPMH so looked closer at the tracker code. This is a neat solution and uses Spring injection to create a listener on the DSpace Event service to capture downloads. With a little hacking to also capture item views I created an AEIOU activity class. The beauty of this is that all that is required to update the DSpace code is a configuration of the Spring context (an XML file) and the addition of a Java jar file.

My thoughts turned back to basing a service on a SQL database.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

http://www.open.ac.uk/blogs/RISE/2011/04/19/recommendationtypes/ .

RT @LizMallett: OU RISE project MyRecommendations prototype live today. Have a play! http://library.open.ac.uk/rise #jiscad #ourise

If you are studying at the OU, or are a member of staff here is how to use the full functionality: 1. Go to http://library.open.ac.uk/rise . 2. If you are studying with the OU you should immediately see some recommendations based on what others on your module have searched for. 3. Type in a keyword search. Try artificial intelligence for example, and then click Go. 4. MyRecommendations will search One-Stop and bring back the first 10 results. At the bottom of the list are some recommendations based on articles that people viewed when doing a similar search. You can browse through the One-Stop results and click any of them to look at the full text article (this opens up in a new window). The One-Stop results are already relevance-ranked by One-Stop so the most relevant results should be near the top. 5. Click New Search and try another search. When you see the results note that it now shows articles you’ve looked at recently. 6. If you choose one of the recommendations you will be asked to rate the recommendation so the system can learn how useful it is.

APR 20, 2011 06:44P.M.

RISE

MyRecommendations tool goes live today APR 20, 2011 04:10P.M.

Please send any comments or questions to the RISE mailbox at RiseProject@open.ac.uk, or complete the survey using the Feedback link on the interface.

The MyRecommendations tool is now live, at http://library.open.ac.uk/rise/. To use MyRecommendations you will need to have an Open University Computer User login [If you aren’t an OU student or staff member you can create a login to test the system. See the details at the end of this blog post].

Don’t have an OU Computer login? If you are not an OU student or member of staff, the first thing you’ll see will be an OU log in screen. You will need to create a login username and password in order to see the interface. To do this, go to the link on the right side of the screen which says:

Background • MyRecommendations is a prototype system designed to test the hypothesis that “recommender systems can enhance the student experience in new generation e-resource discovery services”. • It uses a simplified search interface to search EBSCO Discovery Service, which the OU Library is calling “One-Stop Search”. • MyRecommendations searches the same content as One-Stop but has been set to search ‘full-text’ content only. • When you first search you may not see many recommendations, but the system learns as it goes along and makes recommendations based on what articles are being looked at, so the more searches that it records and the more articles that are viewed the better the recommendations become. • If you are studying with the OU at the moment you should see recommendations based on what other users on your module have searched. Additional functionality coming soon will allow the user to select which module they’re currently working on. • Other recommendations include articles that people viewed after using the search term you have used; and people who looked at this article also looked at this article

New visitor? Create a free Open University account here. This will enable you to look at MyRecommendations, search the database, see the results and some of the recommendations. However, you won’t be able to access the actual e-resources as these are restricted by the terms of our licenses to OU students and staff. What’s next for RISE The project is also testing a Google Gadget version of the search system and we expect to be able to release this in the next few weeks.

A full explanation of the different kinds of recommendations given can be found at

Todayâ&#x20AC;&#x2122;s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

RT @librarygirlknit: Interesting blog post on DMU's data from @fulup http://bit.ly/fxjjCB #lidp #jiscad we are planning to look at courses in more detail...

my synthesis of last week's JISC Activity Data blogs and tweets: http://bit.ly/eAZf2D [warning: may contain @daveyp] #jiscad APR 20, 2011 11:42A.M.

APR 20, 2011 02:15P.M. JISCAD - TWITTER SEARCH

Interesting blog post on DMU's data from @fulup JISCAD - TWITTER SEARCH the JISC-sponsored http://bit.ly/fxjjCB #lidp http://www.diggingintodata.org/ #jiscad we are planning to look conference & challenge looks a at courses in more detail... APR 20, 2011 09:37A.M. tad exciting #jiscad [via http://oreil.ly/gXoGnC] APR 20, 2011 12:20P.M. JISCAD - TWITTER SEARCH

Awaiting the arrival of Mark Van Harmelen from the JISC Activity Data Synthesis project http://tinyurl.com/3ryst9h #jiscad #ourise

JISCAD - TWITTER SEARCH

RT @iamcreative: my synthesis of last week's JISC Activity Data blogs and tweets: http://bit.ly/eAZf2D #jiscad <- An interesting week...

APR 20, 2011 09:36A.M.

APR 20, 2011 11:50A.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

ACTIVITY DATA

Tabbloid #6: 18 Apr 2011

JISCAD - TWITTER SEARCH

APR 20, 2011 04:22A.M.

However, the data set that includes PC logins has a *huge* correlation between visits & logins! (Pearson cor=0.85 / pvalue=0) #lidp #jiscad

Open publication - Free publishing

There is so much activity data related aceness in this week’s Tabbloid that it’s hard to know where to begin ... << deep breath >> Dave Pattern tweeted about the gender differences he’d noticed in one of the #LIDP datasets: - females have a stronger book & grade correlation than males [src]; and - males have a stronger e-resource usage & grade correlation than the females [src].

APR 19, 2011 05:05P.M.

Serendipitously, Paul Bacsich shared a link to Elly Broos’ discussion paper: ‘Gender Perspective on e-learning and information sharing’ which adds some additional context and apparently is generating a good level of debate on the Instructional Technology Forum email list.

JISCAD - TWITTER SEARCH

@daveyp Possibly wide implications then? Or the need for some in depth work on usage of library space? #jiscad #lidp

The #LIDP project team have been having all sorts of fun with data this past week: - Dave Pattern has swapped his self-proclaimed ‘shambrarian’ title for ‘shamistician’ and has been playing around on the the ‘R Project for Statistical Computing’ website and also sharing some interesting graphs. - De Montfort shared their guide to ‘stitching together library data with Excel’ which makes it look so simple that I’m almost tempted to have a go myself.

APR 19, 2011 05:03P.M.

The #JISCSALT project team shared their news that, somewhat surprisingly, that extracting a sample of data from the LMS at John Rylands had been easier than expected which bodes well for extracting the remaining 3.5 million records. Janine Rigby also shared her thoughts on what shape the user evaluation of their recommender tool will take the plan is to gauge users’ attitudes to data privacy as much as their thoughts on the tool itself and a deeper understanding of the subtle hierarchies of trust and perception of value that are in play when users evaluate recommendations.

JISCAD - TWITTER SEARCH

@daveyp Third library lucky? #jiscad #lidp APR 19, 2011 05:02P.M.

Finally, a couple of ‘wider world’ links worth taking a look at: - Dave Pattern (again, I know!) flagged up a virtual event being run by The National Federation of Advanced Information Services (NFAIS). The event was called ‘Information Access and Usage Behavior in Today’s Academic Environment’ and it’s worth taking a browse through their archived tweets from April 15th to eavesdrop on some of their interesting discussions - I particularly like the fact that one of the attendees managed to reference Flaubert. - This Chronicle of Higher Education article on ‘higher education’s Net-flix Effect’ profiles a ‘number-crunching provost’ in the US who has embraced the power of data to provide course recommendations to students. [spoiler alert: I’m giving them extra brownie points for including the word ‘zeitgeist’ at the end of the article]

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

We've got library entry stats from 2 libraries now and neither shows a correlation between number of visits & final grade #jiscad #lidp

New blog post on RISE project recommendation types http://tinyurl.com/6drsmuk #ourise #jiscad APR 19, 2011 01:50P.M.

APR 19, 2011 05:02P.M.

RISE

Recommendation types

JISCAD - TWITTER SEARCH

APR 19, 2011 01:07P.M.

RT @mdaquin: finally written a bit about the (ongoing work of building) ontologies for #uciad #activitydata http://goo.gl/gstd0 #jiscad

RISE – Recommendation Types The RISE MyRecommendations search system is going to provide three types of recommendations:

JISCAD - TWITTER SEARCH

1. Course-Based “People on your course(s) viewed” This type of recommendation is designed to show the user what people studying their module have viewed recently. At the moment this only picks up the first module that a student is studying but we are planning a future enhancement that will include all the modules that are being studied with a feature to allow users to flag which module they are currently looking for resources for. The recommendations are generated by analyzing the resources most viewed by people studying a particular module.

RT @LizMallett: New blog post on RISE project recommendation types http://tinyurl.com/6drsmuk #ourise #jiscad

2. Relationship Based “These resources may be related to others you viewed recently” These recommendations are generated for a resource-pair. For example, if users commonly visit resource B after viewing resource A, the system considers this to be a relationship, and will recommend resource B to people viewing resource A in the future. As the system doesn’t host resources internally, it instead looks at a user’s previously viewed resources (most recent), and then checks for the most often viewed resources by users who’ve also viewed the same (most recent) resources.

APR 19, 2011 03:38P.M.

APR 19, 2011 01:54P.M. 3. Search Based “People using similar search terms often viewed” We have limited data on search terms used, from the EZProxy logfiles so we are using the searches carried out in MyRecommendations to build search recommendations. Using this we associate search terms used with the resources most often visited as a result of such a search. For example, if people searching for ‘XYZ’ most often visit the 50th result returned from Ebsco, this part of the recommendation algorithm will pick up on this. Hence in future when people search for ‘XYZ’, that particular result will appear top of the list of recommendations for users in a “People searching for similar search terms often viewed” section.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

finally written a bit about the (ongoing work of building) ontologies for #uciad #activitydata http://goo.gl/gstd0 #jiscad

LIBRARY IMPACT DATA PROJECT

Data analysis update APR 19, 2011 10:07A.M. Whilst we wait for all of the data from the project partners to arrive, Bryony and I have done a quick & dirty analysis of the data we’ve received so far. The good news (touch wood!) is that we’re still on track to prove the project hypothesis:

APR 18, 2011 05:10P.M.

“There is a statistically significant correlation across a number of universities between library activity data and student attainment”

ACTIVITY DATA

Mind maps of problems and solutions from Birmingham

The data we’ve looked at so far has a small Pearson correlation (in the region of -0.2) that has a high statistical significance (with a p-value of below 0.01).

APR 18, 2011 11:15A.M. The reason we’re seeing a negative correlation is due to the values we’ve assigned to the degree results (1=first, 2=upper second, 3=lower second, 4=third, etc).

We’ve built Mindmester mind maps of all the impediments and solutions that were surfaced at the Birmingham programme inception meeting.

We suspect one of the reasons for the small Pearson correlation is the level of non & low usage (which is something we’ve looked at previously in Huddersfield’s data). Within each degree level, there are sizeable minorities of students who either never made use of a library service (e.g. they never borrowed any books) or who only made low use (e.g. they borrowed less than 5 books), and it’s this which seems partly responsible for lowering the Pearson correlation. However, the data shows that:

The main map shows all identified impediments and solutions. You can jump to a project specific map for the project that contributed a particular item or solution by clicking on a node (that contains a small arrow).

• students who gained a first are less likely to be in that set of non & low users than those who gained a lower grade • students who gained the highest grades are more likely to be in the set of high library usage than those who gained lower grades Some interesting items were potential sources of impediment/solution crossover as in these (clickable) screenshots

• Visualisation crossover

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

There are also a gratifying number of known solutions in the data: There were about 20 known solutions vs 29 impediments/problems.

• ‘This article is like that article’ crossover (which is very interesting to me outside of the Activity Data Programme for an e-learning application)

Please add a comment below if you see any similarities and patterns; this is very interesting for our synthesis activities (and, who knows, may lead to active project collaboration now or in the future).

Project mind maps are • Aberystwyth AEIOU Activity data to Enhance and Increase Openaccess Usage Wales • Cambridge EVAD Exposing VLE activity data • Edinburgh OpenURL Using OpenURL Activity Data • I can’t but feel that there are some interesting scalability lessons to be learned, though at a guess most outside of the current timeframe. I note here that UCAID is somewhat specialist, with triple store based experience, however, I believe that there is a growth path for activity data into the realms of triple stores with query operations via SPARQL end points).

• Huddersfield LIDP Library Impact Data Project • Leeds Met STAR-Trak STAR-Trak Next Generation • Manchester AGtivity Exploiting Access Grid Activity Data • Manchester SALT Surfacing the Academic Long Tail • Open University RISE Recommendations Improve the Search Experience • Open University UCAID User Centric Integration of Activity Data

And our previous activity data project has a mind map too, of the most relevant things fr the current purposes • Sero and Hedtek MOSAIC Making Our Scholarly Activity Information Count

Here is an example of two projects sharing a similar impediment, there are other examples in the data

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

ACTIVITY DATA

the arrival of the programme manager so that they are still fresh.

Chocolate fudge brownies

You are not vegan as this recipe contains eggs.

APR 18, 2011 10:14A.M. Warnings Do not serve when old, stale or mouldy. Allergy advice: • This recipe contains eggs • This recipe contains gluten Method • Melt the chocolate and butter together in a sauce pan over a low heat • Turn off the heat and gradually stir in the sugar and eggs Chocolate fudge brownies • Add the self-raising flour and cocoa powder Originators/Authors • Pour the mix into greased 30cm x 35cm square tin Tom Franklin • Bake for about 15 minutes at 180°C, the middle should be soft. Purpose • Let the brownies cool in the tin. To earn brownie points for your project. • Cut into rectangles Background Individual steps Most programme managers like chocolate, chocolate cake etc. However to make the point clear we recommend that you serve brownies to indicate that you deserve brownie points

See method Output data

Ingredients 42 5cm x 5cm brownies • 100g Chocolate (plain or milk chocolate to taste) Appendix A: Sample output • 100g Butter or Margarine (if you must or are vegan) See cook for a sample of the output data, or come to one of our workshops

• 200g Self-raising flour • 200g Sugar (Soft brown sugar works best but caster sugar still works well) • 3 large eggs • 2 tablespoons cocoa powder Assumptions You have the ingredients and that you are making these shortly before

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

RT @daveyp: ... as (if Huddersfield is anything to go by) students on computing courses often don't (or rarely) use the library #jiscad #lidp

RT @daveyp: Hmm — found a slight gender difference in one of the #jiscad #lidp sets: females have a stronger book & grade correlation than males ...

APR 16, 2011 01:39P.M.

APR 16, 2011 01:38P.M.

JISCAD - TWITTER SEARCH

RT @daveyp: Seems fairly common for computing courses not to have a library usage & grade correlation — need to check actual usage levels #jiscad #lidp

... as (if Huddersfield is anything to go by) students on computing courses often don't (or rarely) use the library #jiscad #lidp APR 16, 2011 12:16P.M.

APR 16, 2011 01:39P.M.

JISCAD - TWITTER SEARCH

RT @daveyp: ... whereas the males have a stronger eresource usage & grade correlation than the females #jiscad #lidp

Seems fairly common for computing courses not to have a library usage & grade correlation — need to check actual usage levels #jiscad #lidp

APR 16, 2011 01:38P.M.

APR 16, 2011 12:15P.M.

JISCAD - TWITTER SEARCH

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

RT @daveyp: ... whereas the males have a stronger eresource usage & grade correlation than the females #jiscad #lidp

Hmm — found a slight gender difference in one of the #jiscad #lidp sets: females have a stronger book & grade correlation than males ...

APR 16, 2011 11:16A.M.

APR 16, 2011 11:14A.M.

JISCAD - TWITTER SEARCH

RT @daveyp: Hmm — found a slight gender difference in one of the #jiscad #lidp sets: females have a stronger book & grade correlation than males ...

Getting my head round Pearson & Spearman Correlations and playing with the #jiscad #lidp data APR 16, 2011 09:15A.M.

APR 16, 2011 11:16A.M.

JISCAD - TWITTER SEARCH

Looking at R http://bit.ly/8tUTI after @psychemedia talked about it in the pub last night. Will have a play around with it for #jiscad #lipd

JISCAD - TWITTER SEARCH

... whereas the males have a stronger e-resource usage & grade correlation than the females #jiscad #lidp APR 16, 2011 11:14A.M.

APR 15, 2011 08:51P.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

SALT - SURFACING THE ACADEMIC LONG TAIL

borrowing library books tend to search for them centrifugally, one book leads to another, as they dig deeper into the subject area, finding rarer items and more niche materials. So if those materials have been of use to them, could they not also be of use to other people researching in the same area? The University of Manchester’s library is stocked with rare and niche collections, but are they turning up within traditional searching, or are they hidden down at that long end of the tail? By recommending books to humanities researchers that other humanities researchers have borrowed from the library I’m really hoping we can help improve the quality of research – we know that solid research means going beyond the prescribed reading list, and discussing new or different works. Maybe a recommender function can support this (even if it potentially undermines the authority of the supervisor prescribed list – as one academic has recently suggested to us: “isn’t this the role of the supervisor?”).

janinerigby APR 15, 2011 05:36P.M. The SALT project plan is now available The SALT Project Plan

JISCAD - TWITTER SEARCH

RT @daveyp: Of interest to #jiscad ? Sounds like library usage data and targeted suggestions/recommendations are being discussed at #nfais

Here’s how I’m thinking we’ll run our evaluation: Once the recommender tool is ready, we’ll ask a number of subject librarians to do the first test the tool to see if it recommends what they would expect to see linked to their original search. They will be asked to search the library catalogue for something they know well, when the catalogue returns their search does the recommender tool suggest further reading which seems like a good choice to them? As they choose more unusual books, does the recommender then start suggesting things, which are logically linked, but also more underused materials? Does it start to suggest collections which are rarely used, but never the less just as valuable? Or does it just recommend randomly unrelated items? And can some of the randomness support serendipity?

APR 15, 2011 03:53P.M.

We’ll then run the same test with humanities researcher (it’ll be interesting to see if librarians and academics have similar responses. As testing facilitators, we’ll also be gauging people’s reactions to the way in which their activity data is used. The question is, do users see this as an invasion of their privacy, or a good way to use the data? Do the benefits of the recommender tool outweigh the concerns over privacy?

SALT - SURFACING THE ACADEMIC LONG TAIL

janinerigby APR 15, 2011 03:14P.M. I’m currently project managing, SALT, but my own area of interest is evaluation and user behaviour – So I’m going to be taking on an active role in putting what we develop in front of the right users (we’re thinking academics here at the University) to see what their reactions might be. As I think this over, a number of questions and issues come to mind. Are we more likely to look on things favourably if they are recommended by a friend? If we think about what music we listen to, films we go and see, TV we watch and books we read, are we far more likely to do any of those things should we receive a recommendation from someone we trust, or someone we know likes the same things that we like? If you think the answer to this is yes, then is there any reason that we wouldn’t do the same thing should a colleague or peer recommend a book to us that would help us in our research? In fact more so? Going to see a film that a friend recommends that is, well average, it has far less lasting consequences then completing a dissertation that fails to acknowledge some key texts. As a researcher would you value a service which could suggest to you other books which relate to the books you’ve just searched for in your library?

The testing of the hypothesis will be crucial indicator as to the legitimacy of the project. Positive results from the user testing will (hopefully) take this project on to the next level, and help us move towards some kind of shared service. But we really need to guage of this segment of more ‘advanced’ users can see the benefit, if they believe that the tool has the ability to make a positive impact on their research, then we hope to extend the project and encourage further libraries to participate. With more support from other libraries then hopefully researchers will be one step closer to receiving a library book recommender.

We know library users very rarely take out one book. Researchers

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

Interesting conference happening in US today which may be of interest to #jiscad: http://bit.ly/eG3wQL Follow #nfais for tweets

Of interest to #jiscad ? Sounds like library usage data and targeted suggestions/recommendations are being discussed at #nfais

APR 15, 2011 03:06P.M.

APR 15, 2011 02:27P.M.

JISCAD - TWITTER SEARCH

SALT - SURFACING THE ACADEMIC LONG TAIL

RT @daveyp: Of interest to #jiscad ? Sounds like library usage data and targeted suggestions/recommendations are being discussed at #nfais

janinerigby APR 15, 2011 11:50A.M. I’m currently project managing, SALT, but my own area of interest is evaluation and user behaviour – So I’m going to be taking on an active role in putting what we develop in front of the right users (we’re thinking academics here at the University) to see what their reactions might be. As I think this over, a number of questions and issues come to mind. Are we more likely to look on things favourably if they are recommended by a friend? If we think about what music we listen to, films we go and see, TV we watch and books we read, are we far more likely to do any of those things should we receive a recommendation from someone we trust, or someone we know likes the same things that we like? If you think the answer to this is yes, then is there any reason that we wouldn’t do the same thing should a colleague or peer recommend a book to us that would help us in our research? In fact more so? Going to see a film that a friend recommends that is, well average, it has far less lasting consequences then completing a dissertation that fails to acknowledge some key texts. As a researcher would you value a service which could suggest to you other books which relate to the books you’ve just searched for in your library?

APR 15, 2011 03:00P.M.

JISCAD - TWITTER SEARCH

@daveyp thanks for the heads up about #nfais, definitely looks useful for #jiscad APR 15, 2011 03:00P.M.

We know library users very rarely take out one book. Researchers borrowing library books tend to search for them centrifugally, one book leads to another, as they dig deeper into the subject area, finding rarer items and more niche materials. So if those materials have been of use to them, could they not also be of use to other people researching in the same area? The University of Manchester’s library is stocked with rare and niche collections, but are they turning up within traditional searching, or are they hidden down at that long end of the tail? By recommending books to humanities researchers that other humanities researchers have borrowed from the library I’m really hoping we can help improve the quality of research – we know that solid research means going beyond the prescribed reading list, and discussing new or different works. Maybe a recommender function can support this (even if it potentially undermines the authority of the supervisor prescribed list – as one academic has recently suggested to us: “isn’t this the role of the

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

supervisor?”).

the core issue of what data have we got and what can we do with it. To that end we’ve had a few internal workshops, sent out a few emails, bent some ears, and so on.

Though none of this should be treated as doctrine, and we’re still definitely open to ideas, we thought it was time to do some initial data investigations, now that we have it. The key structuring concept for me is:

Who will be interested in our data, and what would they like to know? An easy to imagine, but not entirely encompassing imaginary situations are these.

• If someone else were running the VLE, what would we want to know about it? • If we could get secret, spy-style access to our deadliest rival institution (identity an exercise for the reader) what would we want to find out to make our VLE more awe-inspiring than theirs?

• If a charismatic leader were to rouse academics or students to come to our door bearing pitchforks and burning torches, demanding VLE data, what would be the rhetoric — what would they be demanding? If we bear these (and similar) questions in mind when we are steering, we shouldn’t go far wrong. Let’s not get caught producing a series of odd, disconnected charts, they need to inspire thought and change. We need charts, data and stats that connect with the machinery of change. In terms of the data, what we have is: who does what

EXPOSING VLE ACTIVITY DATA

So to do a meaningful analysis we have two axes: Who and What. While we’ll give away as much raw data as is possible, we need to provide supporting mappings. Who is dps1001? What is site 85? We also need to make sure, when we anonymise that we don’t lose those aspects that enable external people to ask questions.

The story so far... APR 15, 2011 09:18A.M. Sorry about the quietness here over the past couple of weeks: you must be wondering what we were up to.

We’re working out how we should take a first stab at Who and What, and are looking at finding sources. I imagine that when we’ve done this first round of analysis we’ll discover the world doesn’t divide up how we imagine. That seems to be the near universal experience of user experience analysis, certainly we learnt in our JISC Academic Networking project that the world of networking isn’t divided up in quite the way we imagined. As we discover this from the activity data, we will iterate around, trying again and again.

• We’ve been extracting the data from Sakai, which was more difficult than it sounds. Sakai stores its events in a massive SQL table, one after the other, so that it’s tens of millions of rows long before very long at all. Merging tables, fixing corrupt old data, that kind of thing. Anyway, all done now. • We’re investigating tools to help us analyse the data. Pentaho looks very promising.

It might even be worth applying Bayesian Clustering or Entropy-Based Tree Building to see how a machine would cluster behaviour. All very exciting (to me, anyway!). See pages 15-21 of this powerpoint by Allan

But all this is just detail (albeit time-consuming, irritating detail) around

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

Neymark at SJSU to see all this simply explained in terms of Simpsons characters.

ACTIVITY DATA TO ENHANCE & INCREASE OPEN-ACCESS USAGE

Exciting times. At the same time, extremely tedious for the guys doing the database extraction and normalisation. Personally, I seem to have escaped that bit for this project. Phew!

APR 15, 2011 08:44A.M.

JISCAD - TWITTER SEARCH

This will be demonstrated through quantitative and qualitative assessments:

Hypothesis We hypothesise that “The provision of a shared recommendation service will increase the visibility and usage of Welsh research outputs“.

RT @daveyp: Exciting stuff! The 2 sets of #jiscad #lidp data we received yesterday have correlations for final grade / book loans & e-resource usage :-)

1. By a [significant] increase in attention and usage data for items held within the six core institutional repositories 2. By establishing a user focus group to explore the potential of the recommendation service and its impact on repository users

JISCAD - TWITTER SEARCH

Exciting stuff! The 2 sets of #jiscad #lidp data we received yesterday have correlations for final grade / book loans & eresource usage :-)

APR 15, 2011 09:07A.M.

JISCAD - TWITTER SEARCH

RT @daveyp: Exciting stuff! The 2 sets of #jiscad #lidp data we received yesterday have correlations for final grade / book loans & e-resource usage :-)

APR 15, 2011 08:28A.M.

JISCAD - TWITTER SEARCH

Second set of data through for #jiscad #lidp :-)

APR 15, 2011 09:00A.M.

APR 14, 2011 03:18P.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

1st set of data merged for #lidp, with a few glitches and lessons learned. Just waiting for Shibboleth data now...#jiscad

RT @andy_land: More data on its way to MIMAS for recommender service prep - this is all suspiciously straightforward so far! #jiscad #jiscsalt

APR 14, 2011 02:53P.M.

APR 13, 2011 11:22A.M. JISCAD - TWITTER SEARCH

First full set of data through for #jiscad #lidp :-)

JISCAD - TWITTER SEARCH

APR 14, 2011 11:31A.M.

RT @daveyp: A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp

JISCAD - TWITTER SEARCH

Technical blog post update for Open University RISE project http://tinyurl.com/6l6a2pc #ourise #jiscad

APR 13, 2011 10:46A.M.

APR 14, 2011 10:46A.M. LIBRARY IMPACT DATA PROJECT

Beginning the data capture APR 13, 2011 09:05A.M. Further to Dave’s post about grabbing the data, we’ve also had successful sample data from Teesside and De Montfort. Check out Fulup’s blog about Stitching together library data with Excel for more details on DMU’s experience.

Todayâ&#x20AC;&#x2122;s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

RT @daveyp: A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp -fun stuff!

@daveyp interesting, I'm working on #jiscad at the moment too - starting to pull together some information for the #inf11 evaluation

APR 13, 2011 12:34A.M.

APR 12, 2011 10:39P.M.

JISCAD - TWITTER SEARCH

MT @daveyp some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp // look! awesome data!

RT @daveyp: A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp

APR 12, 2011 10:52P.M.

APR 12, 2011 10:39P.M.

JISCAD - TWITTER SEARCH

RT @daveyp: A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp

JISCAD - TWITTER SEARCH

RT @daveyp: A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp

APR 12, 2011 10:44P.M.

APR 12, 2011 10:38P.M.

Today’s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

A bit too tired to be blogging, but here's some book usage & final grade graphs http://bit.ly/gxyPC6 #jiscad #lidp APR 12, 2011 10:37P.M.

Just a couple of general comments:

LIBRARY IMPACT DATA PROJECT

5 years of book loans and grades at Huddersfield

• the usage & grade correlation (see original blog post) for books seems to be fairly consistent over the last 5 years, although there is a widening between usage by the lowest & highest grades

APR 12, 2011 10:35P.M. • the usage by 2:2 and 3 students seems to be in gradual decline, whilst usage by those who gain the highest grade (1) seems to on the increase

I’m just starting to pull our data out for the JISC Library Impact Data Project and I thought it might be interesting to look at 5 years of grades and book loans. Unfortunately, our e-resource usage data and our library visits data only goes back as far as 2005, but our book loan data goes back to the mid 1990s, so we can look at a full 3 years of loans for each graduating students.

JISCAD - TWITTER SEARCH

Data quality from the partners samples looking good #lidp #jiscad #doesthatsoundfunny?

The following graph shows the average number of books borrowed by undergrad students who graduated with an specific honour (1, 2:1, 2:2 or 3) in that particular academic year…

APR 12, 2011 03:17P.M.

…and, to try and tease out any trends, here’s a line graph version….

Todayâ&#x20AC;&#x2122;s Tabbloid PERSONAL NEWS FOR helen.harrop@sero.co.uk

16 May 2011

JISCAD - TWITTER SEARCH

@librarygirlknit and I are putting together a blog post about legal stuff and the #lidp project #jiscad

in the latest of my #jiscad blogposts I appear to have invented a new phrase: http://bit.ly/hcRsVx #hidetheriskstick

APR 12, 2011 02:48P.M.

APR 12, 2011 11:53A.M.

JISCAD - TWITTER SEARCH

Talking to the University of Wollongong Library, Australia about the #lidp project #jiscad APR 12, 2011 02:03P.M.

JISCAD - TWITTER SEARCH

More data on its way to MIMAS for recommender service prep this is all suspiciously straightforward so far! #jiscad #jiscsalt APR 12, 2011 12:15P.M.