Issuu on Google+

Information literacies for online learning (P01545)

End of course essay

The Google Book Search Library Project – Implications and Opportunities for Libraries and Publishers

He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation Thomas Jefferson, Letter to Isaac McPherson, August 13, 1813 _______________________________________________________________

Henry Keil, April 2008

1


Information literacies for online learning (P01545)

End of course essay

1. Introduction 1.1. The Google/library connection The origins of the Google Book Search Project can be traced back to research undertaken in the Computer Science Department at Stanford University in the mid 1990s where Sergey Brin and Lawrence Page were studying for their PhD as part of the Stanford Integrated Digital Library Project, one of several digital libraries initiatives funded by the NSF, ARPA and NASA (Evans, 1995). This seminal work resulted in the development of Google, a search engine built around an innovative algorithm measuring the popularity of Web-pages by the number of links to them generating a ranking value for each Web page in the process (Brin and Page, 1998). The algorithm called PageRank lies at the heart of the Google search engine and laid the foundation for the company’s enormous commercial success. During the following years Google combined PageRank with contextualtargeted advertising tools such as Google AdWords (in 2000), to be integrated later into Google AdSense (in 2003) a product that provided access to the lucrative online advertising market. Further capital was raised through a successful floatation on the stock market in 2004, valuing the company at $1.66 billion on the first day of trading (ZDNet 2004). Throughout that time Google has kept in touch with its roots at Stanford University and in 2004, with the considerable resources at its disposal it decided to return its attention to the earlier Digital Library Project.

1.2. The history of mass digitisation Large-scale digitisation of print material predates the emergence of the Web by some time. In 1971 Michael Hart conceived the Project Gutenberg with the intent to create digital e-text by scanning public domain works for subsequent searching, copying and distribution (originally on Floppy Discs) at a minimal cost. The progress of the Gutenberg Project still depends on volunteers across the world and it is run in a de-centralised way. So far it has managed to digitise around 25000 texts in 37 years, still comprising less than 0.1 % of the total estimated world literature (Project Gutenberg News, 2008). Throughout the late 1990s and early 20th century several large US University libraries have been running their own digitisation projects mainly to support preservation-based programs for their collections. An example of such a project is the Wisconsin-Madison Libraries Digital Collection Center founded in 2000 to store online not only text based material, but also photographic images; maps, posters and multimedia files (UWDCC, 2008). Due to this pioneering work the University of Wisconsin was approached in 2002 by one of its alumni, Google’s Lawrence Page who proposed a collaborative mass digitization project of the University’s entire library book using Google’s newly developed proprietary non-destructive scanning technology (U-M University Library, 2004). Two years later Google announced a collaboration with several researchcentred Universities, the so called Google 5 or G5, a grouping that in addition to the University of Wisconsin included the Stanford Library mentioned earlier,

Henry Keil, April 2008

2


Information literacies for online learning (P01545)

End of course essay

the libraries of Harvard University, Oxford (Bodleian) and the New York Public Library. Under the Google Print for Libraries Initiative the objective was to create a searchable online card catalogue of those combined collections, totalling about 18 Mio. items which amounted to 57% of the WorldCat total of 32 Mio. Book titles (Lavoie et al. 2005). The G5 project is expected to take 10 years with estimated costs of around 20 dollars per book, affordable given Google’s considerable resources.. Two months earlier the company had announced the Google Print for Publishers Programme (now renamed the Google Book Search Partner Program) in which publishers (but not libraries) were invited to participate. In this project Google is creating a separate book index to be queried by its search engine with context-specific ads placed on the search pages. Any revenues generated would be shared between Google and the participating publishers and Google also agreed to place links to book retailers on its Webpages. Whilst many publishers welcomed the Partner Program they remained deeply hostile towards the Google Print for Libraries Project, now known under the term Google Book Search Library Project or Google Library Project in short. The controversy surrounding this program will be explored in the next chapter.

2. The legal issues and its implications 2.1. The scanning process Google’s intention in the Google Library Project is to digitize all works regardless of their copyright status unless it is explicitly requested not to do so (known as ‘owner opt out’ provision). This is interpreted by copyright owners as running counter to the basic tenet of the general copyright law which requests a user to seek explicit permission from the rights owner for copying and using copyright material. As part of its scanning process Google produces two complete copies of any work one of which is donated to the library for payment ‘in kind’ for making the source material available whilst the second copy is held by Google for optical character recognition (OCR)-conversion to text, followed by indexing and storage in a specialist database to be queried subsequently by Google's PageRank search engine. If a user’s search term identifies a book held in the public domain the entire work is made available online but if it points to a protected work, the search results page will display only a sentence or two of text surrounding the search term. Additional links are displayed on the returned Web-page either pointing to the library's collection or to a book retailer for subsequent purchase. Under the terms of the agreement libraries are not permitted to make their ‘Google’ copy available to other commercial search engine providers for indexing thus granting Google a temporary non-exclusive right to the G5 book collection index since libraries are free to let other companies re-scan their book stock to create an additional, separate copy in parallel or at a later stage. In fact the Open Content Alliance (OCA), founded by Yahoo and the Internet Archive in Oct. 2005 offers a separate scanning service focussing on the 20% Henry Keil, April 2008

3


Information literacies for online learning (P01545)

End of course essay

or so titles in the public domain and several libraries are working with both Google and the OCA. (Open Content Alliance, 2007). This project will open up the book index to other search engines thus breaking the Google monopoly. Ultimately it is the libraries who provide Google with the source material and at this point in time they give the highest priority to damaged and fragile material available in the public domain, followed by the remaining public domain works followed by ‘out of copyright’ stock followed by selected ‘out of print’ books amounting in the case of the G5-libraries to about 15% of their combined stock (Band, 2006). Of the remaining books in copyright around 20% are in print and available for sale via normal retail channels leaving a staggering 80% of copyright material out of print. 2.2. The Google point of view Google’s defence in the Library Project centres around two restrictions within the US Copyright Law: 1. the implied license principle 2. the fair use exclusion principle The implied license principle, successfully applied for Web-page searches is based on the assumption that Web-sites by default are publicly available. If a copyright owner does not want his/her Web-content to be indexed, (s)he must use the robots.txt file to prevent the search engine’s robots to access the site (effectively amounting to an ‘opt put’). This principle has not been challenged in court and is used widely by all major search engine providers as they copy large amounts of data into their database without explicit permission of the Web-site operators. Google seeks to extend this principle to digitized books. The fair use exclusion principle of the US Copyright Law (called fair dealing under the equivalent UK law) centres around the purpose of the work (whether it is used commercially or not), its character (whether it is used for criticism, comment, teaching, scholarship or research), nature (whether it has been published, whether it is factual or fictional and the amount being used) and whether its use is ‘transformative’ or consumptive (US Copyright Office, 2006). Google claims that it undertakes transformative action on the copyrighted material and also points to the non-commercial, social use of the final product Google’s legal stance has been strengthended by a recent case law (Kelly v. Arriba Soft, 2003) where an artist (Kelly) sued an Internet search engine provider (Arriba Soft) for putting thumbnail images of his work on the Interent without permission. The court acknowledged that in this instance sufficient transformative action had taken place to warrant application of the ‘fair use exclusion’ of the copyright law (Band, 2006). In August 2005 Google announced a three months moratorium on the Library Project to give copyright owners the opportunity to select which books they wish to withhold from the full text scanning program but then resumed activity in November 2005.

Henry Keil, April 2008

4


Information literacies for online learning (P01545)

End of course essay

2.3. The rights owners’ point of view In October 2005 five large publishing companies, supported by the Association of American Publishers filed a lawsuit against Google seeking an injunction on the scanning process of copyrighted material but not against the participating libraries for making these copies available. In addition to the controversial ‘opt out’ clause copyright owners argue that the process of unambiguously determining copyright ownership is notoriously cumbersome in particular relating to ‘orphan’ works insisting that these costs should be borne by Google. The vast majority of G5 resources appear to consist of ‘orphan’ works, material where the ownership cannot be determined, located or where the rights owner fails to respond. Establishing the legal status of an orphan work requires considerable investigative work and is estimated to cost around $1000 per book, about 40-50 times the cost of scanning and indexing (Band, 2006). Furthermore publishers state that by digitizing their works, Google is preempting the copyright holders from licensing their books to other search engine providers for inclusion in other searchable indexes, thus denying additional potentially lucrative business opportunities (Hanratty, 2005) Finally they also point to the potential misuse of the second copy held by the library, a third party not bound by the legal dispute (Thatcher, 2005). All G5 libraries have signed separate contracts with Google and at present only one has been made publicly available (University of Wisconsin Google Agreement, 2006). Publishers fear that libraries may be inclined to deposit parts of their copy onto the University’s e-reserve system or Virtual Learning Environments to allow student access, thus potentially undermining copyright laws (Thatcher, 2005). Clearly copyright owners would prefer a system where they could ‘opt in’ as they do in the Partner Program but Google fears that only a small proportion will do so making its book index incomplete and thus a less useful resource. In any case given that less than 5% of all books are in print the incentive for ‘opting in’ by publishers would be minimal (Band, 2006). 2.4. The orphan book dilemma An increasing number of Universities and large public libraries are now participating in the Google Library Project, despite the uncertainties surrounding the legality of the proposed large-scale scanning of copyrighted material. Prior to 1978 the rights owners had to renew their copyright with the US Copyright Office every 28 years otherwise the work would fall into the public domain. In 1998 this renewal term has been extended to a total of 67 years requiring the tracking of copyright ownership over a much longer period. It is now recognised that this situation runs contrary to the US Copyright Act, because a lack of identification of the rights owner often means that the public may be deprived of access to that work even if the owner may have no objections to its use. It is assumed that a large proportion of orphan books are effectively ‘abandoned’ by the rights owner likely to be lost to scholars or the public at large (Band, 2006).

Henry Keil, April 2008

5


Information literacies for online learning (P01545)

End of course essay

In the light of the new technological developments as part of the Google Library Project it has been recommended that the US Congress should reconsider its current policy on copyright law and some initial steps have been undertaken (Hanratty, 2005, Library of Congress, 2008). In the UK under the Copyright, Designs and Patents Act (1988) the copyright protection is similarly extensive: copyright of literary works lasts for seventy years from the death of the author and for orphan works copyright expires seventy years after the work has been first made public. The extent of orphan works is unknown but it is likely to be highly significant, in particular relating to older works. For example it is estimated that around 75% of motion pictures from the 1920s are probably not currently owned by anybody but not in the public domain either (Report on Orphan Works, 2006) and a similar figure is given for print material (Lavoie et al. 2005) thus making up a large proportion of the Twilight Zone shown in fig. 1.

Fig. 1. Comparison of in print versus possible orphan versus public domain status (O’Reilly, 2007)

3. Libraries in the post Google Library Project era 3.1. New opportunities The most tangible benefit arising from the Library Project will affect a library’s responsibilities in regards to document preservation and archiving making use of the digital copy it retains. Proponents also state that mass digitization will lead to an increased usage of library resources as Google’s virtual card catalogue will make finding relevant material much easier with libraries likely to evolve into ‘information brokers’. But how will users access the source material in future? Instead of continuing to sacrifice precious shelf space for print stock libraries may start building up a repository of e-reserves of both current ‘born digital’ material (PhD thesis, e-journals and e-books) and scanned in public domain print material. In addition libraries could also seek to become the custodians of orphan works to be offered via the fair use exclusion principle of the Copyright

Henry Keil, April 2008

6


Information literacies for online learning (P01545)

End of course essay

Law to be used for scholarly work. Whilst honouring copyright law there may be the expectation, in particular from the fee-paying students of University libraries to release many of the new digital resources on the institutional intranet as supplementary course material. Spaces for books and photocopiers will diminish and libraries will need to be redesigned to offer a variety of information services space such as online screen and text-to-speech readers, flexible printing facilities and additional collaborative/ interactive learning spaces. Some libraries may even, through their University Presses be tempted to retail works in e-book format or provide print/bind on-demand services for their customers. Additional opportunities may arise in making available streaming multimedia content as part of the JISC-funded JORUM project (JORUM, 2008). All these resources are likely to be adapted, remixed and re-purposed within a different context and in different formats as part of scholarly and non-scholarly activities. This is likely to lead to the development of enriched new art forms based on the synergy of science, engineering, the arts and humanities. It is hoped that in future copyright will not only be defined in terms of reproduction but more in regards to commercial exploitation, using less often the ‘© -all rights reserved’ option but the more flexible Creative Commons (CC) ‘some right reserved’ licenses such as Share Alike (Creative Commons, 2008).

3.2. Accessibility, delivery and presentation of content There is evidence to suggest that any material not part of a searchable Webbased index and obtainable online as full text document, at minimal or no cost, will be ignored. This statement is particularly applicable to Digital Natives or the Google Generation with their instant gratification-centred Web-surfing behaviour and ‘shallow browsing’ learning styles (Prensky, 2001, CIBER briefing paper, 2008). Some new book retailers like Safari Books Online encourage this trend by allowing online browsing of book pages prior to purchase. Given the increasing reliance on online information seeking it is paramount that these users are taught comprehensive information literacy skills at an early stage, something libraries may choose to offer as part of their training services. From an accessibility point of view digital text and images can be readily reformatted (font type/size/colour), enriched (audio) and presented (paper, ebook, mobile device) in a variety of ways depending on the user’s needs and learning styles. As digital material will be reproduced and distributed at virtually no cost it is hoped that (most if not all of) the licensing costs for accessing in-copyright material will be borne via Google’s advertiser-funded business model. There is further concern about the lack of standardised book-specific metadata records across different online catalogues including date, publisher, keywords, abstract, ISBN, and this is now urgently addressed by several initiatives (JISC Collections 2007). In an interesting project Google intends to hyperlink a book’s content in the same way as Web-pages are linked thus creating a powerful book crossreferencing service (Roush, 2005).

Henry Keil, April 2008

7


Information literacies for online learning (P01545)

End of course essay

Mass digitisation of the world literature may also help to reduce existing barriers across cultural and ethnic boundaries assisting in saving for posterity the cultural identities of numerous small and isolated communities currently under threat by Western cultural dominance. 3.3. Evolving business models The commercial exploitation of the ‘long tail’, so successfully pioneered by the online book retailer Amazon will become more important as the tail will get even longer with the inclusion of orphan works. Recent research into the online book viewing and retail behaviour suggests that there is a demand for technical books currently unavailable as hardcopies suggesting that the ‘long tail’ effect of online material might be 2-3 times that of the physical book (O’Reilly, 2006) With mass digitization nothing will ever go ‘out of print’ and flexible print-on demand services as offered by Amazon will flourish. Unbundling, so successfully implemented in the music industry by Apple’s iTunes might easily be applied to multi-author reading material such as Conference Proceedings, Editorials, Book Volumes or Festschriften taking account of the targeted reading approach of the younger generation. Another interesting concept is a pre-publishing service as offered by RoughCuts (2008) where users have access to the manuscript at an early stage, with the possibility of influencing its outcome, effectively turning the book into a public Wiki thus combining online authoring with Web2.0 technology (see also below).

4. ‘Information wants to be free’ This expression written by John Perry Barlow 14 years ago was perceived at the time as highly provocative (Barlow 1994). It states that information containers such as books, DVDs or photographic paper prints have no particular intrinsic value and that the information therein, if offered in digital format and disseminated at a cost that approaches zero could just as well be ‘free’ benefiting its originator in some other way. What has value instead is the ‘conveyance of information’, for example the live performance of an actor or a rock band, the exhibition of one’s photographic works, giving a guest lecture or consulting an open-source software product. He notes that good ideas thrive on abundance and not on exclusion and that information becomes more valuable the more it is spread, contrary to our everyday experience of physical commodities where rare items tend to be more valued. Ultimately if implemented fully Barlow’s concept will lead to a global competition amongst ideas and information. Their availability in digital format will permit the direct measurement of popularity, be it via the PageRank tool of the Google search engine (‘getting into the top 20 hits’) or via online feedback buttons and voting system as employed in the ratings of blog entries, images (Flickr) or movie clips (YouTube).thus providing synergies between Web2.0 technologies (user-generated content and social networking) and the Google Library Project.

Henry Keil, April 2008

8


Information literacies for online learning (P01545)

End of course essay

The scenario of competing ideas had been expressed five years earlier in ‘The Selfish Gene’ (Dawkins, 1989) depicting them as memes as opposed to genes. Mass digitisation will provide the ultimate ex vivo replicator machine without being restricted to identical reproduction. It instead provides opportunities for ‘mutation’ and modification, (re)combination and (re)shuffling thus leading to the enrichment of existing works, analogous to life adapting to the ecosystem it inhabits, and evolving into higher life forms..

5. Conclusion The Google Book Search Project will be a truly transforming endeavour comparable to other large-scale projects such as the Human Genome Project (Baltimore, 2001). Its impact on the information needs and creativity of society will be as profound as the invention of the printing press in the 15th century. There are many issues to be resolved and the main stake holders, librarians, right owners/authors, publishers, copyright lawyers, retailers and users will need to adapt to these changes. Ultimately the benefits will far outweigh any perceived shortcomings leaving behind a legacy of human ingenuity and creativity, a long lasting gift to mankind.

Henry Keil, April 2008

9


Information literacies for online learning (P01545)

End of course essay

6. References Baltimore, D. (2001) Our genome unveiled. Nature 409, 814-816.;viewed 27 March 2008 http://www.nature.com/nature/journal/v409/n6822/index.html#human Band, J. (2006) The Google Library Project: Both Sides of the Story Plagiary: Cross-Disciplinary Studies in Plagiarism, Fabrication and Falsification, 1 (2), 117. Barlow, J.P. (1994) The Economy of Ideas, Wired 2.03, 1-13; viewed 22 March 2008 http://www.wired.com/wired/archive/2.03/economy.ideas.html?topic=&topic_set=

Brin, S and Page, L. (1998) The Anatomy of a Large-Scale Hypertextual Web Search Engine; viewed: 21 March 2008 http://infolab.stanford.edu/~backrub/google.html CIBER briefing paper (2008) Information behaviour of the researcher of the future. UCL London, viewed 27 March 2008 http://www.bl.uk/news/2008/pressrelease20080116.html Creative Commons (2008), License your work; viewed: 27 March 2008 http://creativecommons.org/license/results-one?license_code=by-sa Dawkins, R. (1989) The Selfish Gene, Oxford University Press. ISBN 0192860925 Evans, P.E. (1995) Digital Libraries are much more then Digitized Collections, Educom Review Vol. 30 No. 4, viewed: 21 March 2008 http://www.educause.edu/pub/er/review/reviewArticles/30411.html Hanratty, E. (2005) Google Library::Beyond fair use? Duke Law and Technology Review No. 10; viewed: 21 March 2008 http://www.law.duke.edu/journals/dltr/articles/2005dltr0010.html JISC Collections (2007) Metadata for e-books; viewed 3 April 2008 www.jisc-collections.ac.uk/media/documents/jisc_collections/factfilesebooks01v4.pdf JORUM 2008, viewed 29 March 2008 http://www.jorum.ac.uk/ Kelly v. Arriba Soft, 336 F.3d 811 (2003) 9th Cir, 2003, viewed: 28 March http://homepages.law.asu.edu/~dkarjala/cyberlaw/KelllyvArriba(9C2003).htm Lavoie, B. Connaway, L.S. Dempsey, L. (2005) Anatomy of Aggregate Collections – The Example of Google Print for Libraries . D-Lib Magazine, 11 (9): viewed: 21 March 2008 http://www.dlib.org/dlib/september05/lavoie/09lavoie.html

Henry Keil, April 2008

10


Information literacies for online learning (P01545)

End of course essay

Library of Congress, US Copyright Office, viewed: 25 March 2008 http://www.copyright.gov/orphan/ Open Content Alliance (2007), viewed 10 April 2008, http://www.opencontentalliance.org/ O’Reilly, T. (2007) Reading 2.0, O’Reily Media Inc. viewed: 4 April 2008 http://deepblue.lib.umich.edu/bitstream/2027.42/57301/3/oreilly.pdf Prensky, M. (2001) Digital Natives, Digital Immigrants. On the Horizon , NCB University Press, 9, No. 5 1-6 Project Gutenberg News. (2008), viewed 30 March 2008, http://www.pgnews.org/statistics/ Report on Orphan Works (2006) A Report of the Register of Copyrights, Library of Congress, U. S. Copyright Office, Washington, DC., viewed: 23 March 2008 www.copyright.gov/orphan/orphan-report-full.pdf Rough Cuts (2008) Get Behind the Scenes to Stay Ahead of the Curve; viewed 3 April 2008 http://www.oreilly.com/roughcuts/ Roush, W. (2005) The Infinite Library. MIT Technology Review, viewed 20 March 2008, http://www.technologyreview.com/printer_friendly_article.aspx?id14408 Thatcher, S.G (2005) Fair Use in Theory and Practice: Reflections on Its History and the Google Case. Penn State University Press Prepared for the NACUA conference on “The Wired University: Legal Issues at the Copyright, Computer Law, and Internet Intersection,” Arlington, VA, November 10, 2005, viewed: 25 March 2008, http://counsel.cua.edu/Copyright/publications/Thatcher.doc U-M University Library (2004) Google Digitization Project: A Brief Overview, viewed: 23 March 2008 http://www.lib.umich.edu/mdp/overview.pdf University of Michigan (2006) Mass Digitization: Implications for Information Policy Report from Scholarship and Libraries in Transition: A Dialogue about the Impacts of Mass Digitization Projects. Symposium held on March 10-11, 2006, Ann Arbor MI University of Wisconsin Digital Collection Center, viewed: 25 March 2008, http://uwdcc.library.wisc.edu/ University of Wisconsin Google Agreement (2006), viewed: 24 March 2008, http://www.library.wisc.edu/digitization/agreement.html US Copyright Office Basics (2006), Circular 1; viewed 25 March 2008 http://www.copyright.gov/circs/circ1.html

Henry Keil, April 2008

11


Information literacies for online learning (P01545)

End of course essay

ZDNet News (2004) Google Shares make strong start , viewed: 23 March 2008, http://news.zdnet.co.uk/itmanagement/0,1000000308,39164117,00.htm

Henry Keil, April 2008

12


Essay on Information Literacy