The new cooperative cataloging

Page 1

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0737-8831.htm

LHT 27,1

THEME ARTICLE

The new cooperative cataloging Tom Steele

68

University of Oklahoma, Norman, Oklahoma, USA

Abstract Received 13 October 208 Revised 4 November 2008 Purpose – This paper aims to examine the social phenomenon known as tagging and its use in Accepted 17 November 2008 libraries’ online catalogs, discussing folksonomies, social bookmarking, and tagging web sites. The paper also seeks to weigh the advantages and disadvantages of a controlled vocabulary such as the Library of Congress Subject Headings, and how tagging can assist the LCSH in information retrieval. LibraryThing and the University of Pennsylvania’s PennTags are examined. Design/methodology/approach – Review of recent literature in print and online, as well as browsing Library OPACs using tagging, was the basis for the paper. Findings – The paper concludes that access to information is the main purpose of cataloging, and use of both traditional methods of cataloging as well as interactive methods such as tagging is a valid method for reaching library users of the future. Originality/value – The paper lists many problems and concerns of which to be aware, if a library should choose to adopt tagging for their catalog. It looks at the options of using outside web sites to provide the tags as well as creating tagging systems on the library’s web site itself. The focus of the paper is how libraries can use tagging, as opposed to the phenomenon of tagging itself, as well as a discussion of how tagging compares with controlled vocabularies. Keywords Tagging, Networking, Cataloging Paper type Viewpoint

Library Hi Tech Vol. 27 No. 1, 2009 pp. 68-77 q Emerald Group Publishing Limited 0737-8831 DOI 10.1108/07378830910942928

In December 2006, Time Magazine named its person of the year: you. In doing so, they gave Web 2.0 as the reason. Wikipedia, MySpace, and YouTube are a few of the many web sites that rely heavily on user interaction. Today’s information seeker is expecting to participate, instead of only receive. Meanwhile, the online public access catalog (OPAC) of the majority of libraries is comprised of records made by librarians. The user is invited to look, but not touch. Searching relies heavily on the user’s knowledge of a thesaurus, such as the Library of Congress Subject Headings (LCSH). This is beginning to change. At the University of Pennsylvania, users can tag internet resources with their own terms for either future reference, or to help others find them. LibraryThing is a web site that allows members to catalog their own resources, be it books, media, or other web sites. These sites and others like them rely on user-created taxonomies instead of the traditional thesauri. These taxonomies are known as folksonomies; a term created by Thomas Vander Wal by combining taxonomy with folk. What exactly is described by folksonomy? Unlike a taxonomy, a folksonomy is created by those who are actually using the resources it classifies. These resources are web-based, and may range from links to online photographs. The classification is done with open-ended labels called tags. Because this tagging is done by the user, it is more likely to match the vocabulary those interested in the resource will use. Like subject headings, the tags link the resource to similar resources. Tagging works through a process called social bookmarking. The resources are bookmarked, and the user creates tags to label these bookmarks.


Bookmarking alone does not fill the needs of today’s web users. A study done by William Jones, Susan Dumais, and Harry Bruce found that bookmarks made in internet browsers end up becoming “information closets.” Jones explains that it requires users to organize the links into folders, weed out the dead links, and remember why the links were saved in the first place (Gordon-Murnane, 2006). This is why Joshua Schachter created del.icio.us. Del.icio.us is a social bookmarking site that allows the user to share their bookmarks with other users, and tag them for future retrieval. The users can work together in organizing the links. The same bookmark can be labeled several times, and the more popular it is, the easier it is to retrieve in the future. Vander Wal defined two kinds of folksonomies: broad and narrow. Del.icio.us is an example of a broad folksonomy. The user creates the metadata, and then del.icio.us aggregates the tags and makes it searchable. Users then can see what terms other users have assigned, and get ideas for their own tags for either searching or labeling. Hadley Reynolds, senior analyst and head of research for Perot System’s Delphi Group describes this process as “eavesdropping on someone else’s thought pattern (Dye, 2006).” On the other hand, the narrow folksonomy is created by an individual user mainly for their use alone. An example of this is Flickr, a photo-sharing web site. The user tags their own photos, for their own retrieval later. Although these tags can be public, the social aspect of the narrow folksonomy is not as rich as the broad (Dye, 2006). Gene Smith (2008), in his book Tagging, People-Powered Metadata for the Social Web, defines four characteristics of a folksonomy. First, the tagging is done independently. The users must be allowed to create their own tags, and not forced to choose from a selection. While the system can offer suggestions, the option to add their own still must exist. Second, once these tags are created, they must be aggregated – that is, they are all pulled together via automation. If the tags are selected instead of taken en masse to create a taxonomy, it is not a folksonomy. The third characteristic is an inferred relationship. Relationships do not rely on strictly defined terms, but instead on their use. It is up to the user to decide the meaning of the tag, so the use reflects actual behavior. The last characteristic is that any inference method is valid. While these methods have been used to form a controlled vocabulary, what makes it a folksonomy is the users themselves determine these methods. Smith also described seven kinds of tags. There is descriptive, which is similar to the subject headings supplied by a controlled vocabulary. Tags may also be for the resource, such as “book,” “blog,” “photo” and so forth. Another kind is tags for the ownership or source such as the author or publisher. Tags may also put forth the tagger’s opinion of the resource (“useful,” “good,” “amusing,” “odd”). A user may also make a tag of self-reference just for future retrieval, like “myarticle.” Similarly, tags may be task organizing. If a student wanted to use an article for a class assignment, she could tag it with the course number. On the opposite end, tags can be used for play. An example on Flickr is the tag “squaredcircle.” Tagging can offer users advantages that are not offered by a traditional library classification system based on a controlled vocabulary. These systems operate on the assumption that there is a proper place for everything. It is the cataloger’s job to place resources in these pre-existing categories. In a folksonomy, however, these pre-determined hierarchies do not exist. A resource may belong in any number of places, be it a single hierarchy, or many different hierarchies, or none at all. In a taxonomy, these hierarchies are created from the top down. Folksonomies start from

The new cooperative cataloging 69


LHT 27,1

70

the bottom and work their ways up. As a result it reflects the user’s interest in real time (Noruzi, 2007). Information is growing at an exponential rate. Web content grows faster than bots can extract keywords and fit into a search engine’s hierarchy. In the same manner, libraries receive content faster than it can be cataloged, especially electronic. The need for metadata can be alleviated by tagging. Mark Fralic, del.icio.us VP of business marketing, says tagging works well because it’s “the first few words that come to mind when you’re in a particular frame of mind”, and as a result “tends to be how you’ll want to remember or discover the same thing or similar things in the future” (Dye, 2006). Recall is important for locating these resources in the future. Experiments have shown recall is fastest at the basic level. When shown pictures of dogs and birds, people were more likely to use the term “dog” or “bird” instead of “beagle” or “robin.” Reversely, people when asked “Is this a dog?” were able to answer quicker than when asked “Is this a beagle (Golder and Huberman, 2006)?” This is because dog is for most people a basic level. Variation in knowledge on the subject, or social or cultural background has an effect on what the basic level is. With tagging, the users can relate to their own basic level, whether it is “beagle” or “dog.” A controlled vocabulary uses a hierarchy instead, which may or may not match the users’ basic level. And tagging has taken off. In January 2007, the Pew Internet & American Life Project released a study that found 42 million Americans have created tags for some form of online content. Ten million of these Americans do it on a daily basis (Smith, 2008). This study also found these taggers fit the profile of those who are often early adapters of technology: younger, affluent, and more than half have attended college (Smith, 2008). Almost two-thirds are either Hispanic or African American (Smith, 2008). Why is tagging so popular? It allows everyone to do what use to be the job of the “experts.” As Ellyssa Kroski puts it: No longer do the experts have a monopoly on this domain; in the new age users have been empowered to determine their own cataloging needs. Metadata is now in the realm of the Everyman (Gordon-Murnane, 2006).

Tagging is more current, and captures changes as soon as they happen instead of waiting for the controlled vocabulary to add new terms or amend them. Timo Hannay, director of Web publishing at Nature Publishing Group, explains: People discover the same gene simultaneously and call it two different things, and suddenly they realize it’s the same one and say “Why don’t we try to give it one name?” And so the post hoc categorization or tagging of content by using a community-driven approach rather than relying on it being done centrally has some particular potential in science (Smith, 2008).

One of the most important reasons libraries should consider the use of tags is the benefits of evolution and growth. Their patrons are changing, and are expecting to be able to participate and interact online. The OPAC should be no different. The introduction of a destabilizing force can promote an ecosystem by maintaining diversity. In this case, the ecosystem is today’s library, and the destabilizing force is tagging (Smith, 2008). Thunder Bay Public Library is one library willing to shake up their ecosystem. Joanna Aegard, the library’s head of virtual library services, explains that when the users notice the library is using del.icio.us, “our hope is that they will recognize our position in the community as information providers, visit our web site, work with our virtual collection, and become engaged library users (Rethlefsen, 2007)”.


Of course, as with most new technologies, there are critics of tagging. Although some of the tension is caused by placing metadata creation in the hands of the masses, the professionals have more concerns than just loss of control of their records. Is tagging here to stay or just a fad? Will the masses be willing to continue to tag if it becomes the main source of cataloging? The critics feel controlled vocabularies should not be dumped for the “latest hot new thing (Gordon-Murnane, 2006)”. One of the problems with tagging is its lack of a hierarchy. Users can place tags on a resource, but there’s no guarantee that every aspect is covered. In a controlled vocabulary, if a user searches for a term at any point in a hierarchy they are more likely to find all resources in that category. For example, in the Library of Congress Subject Headings, if a user searches for dachshunds, the authority file tells them they can search also under the terms “miniature dachshunds” and “wirehaired dachshunds”. The authority file also tells them the broader term is “hounds”. With tagging, it is up to the user to tag for both the broader and narrower terms if the resources will be retrieved. A similar problem with tagging is synonymy. While this can be something as simple as using the tag “TV” instead of “television,” many terms have even more synonyms, such as “pop”, “soda,” “coke”, “soft drink”, or even “soda pop”. A controlled vocabulary handles this issue again with an authority file. Polysemy is another problem with tagging related to the tagger’s vocabulary selection. In this case, however, the user may select a word that has more than one similar meaning. For example, a window may be an opening in a wall or the glass in the opening (Golder and Huberman, 2006). While this problem is similar to when a word may have many unrelated meanings (homonymy), it is harder to get around. Homonyms can be avoided by adding more related search terms to make it more likely the resources retrieved are actually about the desired subject. With polysemous terms, many retrieved resources could be related, but not actually what the user desires (Golder and Huberman, 2006). Authority files also correct the tagging problem of plurality. In the earlier dachshunds example, if a user had simply entered “dachshund”, they would be informed to enter “dachshunds” instead. Tagging would rely on the user to search both the singular and plural forms, since the original tagger would be likely to enter the tag in only one of the forms. Another danger of tagging is the user’s intent. An example of bad intent has been labeled spagging. This is a portmanteau of the words “spam” and “tagging” (Arch, 2007). Just like people send spam e-mail to create profit or to cause damage, the same can be done by creating tags. In a library, tags are less likely to be added just for profit, but still a user can cause harm by tagging resources with inappropriate terms. This can be remedied by having users report the tags, and librarians remove them, but still this takes time and energy. All the problems of tagging really come down to the user. Will the user participate and create enough tags for it to be useful? Will the user create too many tags? With any participatory system there is a small minority who can dominate the majority by the sheer number of tags they can create. This can be counteracted in many ways. Librarians could manually remove tags of the atypical highly active user. If the system rates tags, it can drop the high and low tags and rate the middle. Another option is to weight tags by the number of different users adding the tag (Smith, 2008). Because of these problems it is evident that a controlled vocabulary has advantages over tagging. What role should the thesaurus play in the future OPAC? Controlled

The new cooperative cataloging 71


LHT 27,1

72

vocabularies can be used as a reference tool by users when they create tags. Or authority files can be used to help retrieve results from a search without the user even having to use the thesaurus. Also, a user could use tags to first start a search, then use the subject headings in the records retrieved to find related records (Noruzi, 2007). A thesaurus like the Library of Congress Subject Headings can assist users creating tags in many ways. Users can use it to find the standard terms in their subject field, or they can use it to find new concepts. Once a user does a search with tags, and gets too many or too few results, they can then move on to the LCSH to narrow or broaden their search within the hierarchy. If a tag contains a typographical error, this can also be corrected by a thesaurus. Finally, a thesaurus can provide “See Also” references for tags (Noruzi, 2007). Thomas Mann argues that we need the LCSH now more than ever. He gives the subject browse feature available on most OPACs as one reason. With other search methods, one is limited by the terms they can think of. While tagging assists in providing more terms, browsing subject headings will give the most complete list of materials a library owns in that particular subject. Also, because of free floating subdivisions, browsing the LCSH itself will not give the user the search terms (Mann, 2003). Free floating subdivisions are like tags in this respect, as there is no “master list” of tag combinations. The biggest advantage that LCSH has over tagging is its longevity. Libraries have cataloged millions of volumes using the LCSH, and it would take years to tag all the items in the catalog. Some materials could probably never be tagged. Therefore, a tag search is going to come up with only the most recent or popular information. But, this advantage can also be a disadvantage. The larger and older a collection is, the more frequent some terms are used. Catalogers tend to classify items with other items of its kind. As a result, the terms become less effective as a retrieval tool, as the terms become less specific and more generalized (Linnea Marshall, 2003). One problem the LCSH and other controlled vocabularies have with tagging is once again, the human element. Cory Doctorow claims that metadata creators (even the expert catalogers) are lazy, stupid, dishonest, and self-ignorant (Smith, 2008). It cannot be expected for catalogers to be experts in every bit of subject matter a library can acquire. And being only human, they are bound to have their bad days. But, of course, taggers are only human too. Doctorow says, “Requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas. And that’s just not right” (Smith, 2008). Since no system can ever be perfect, tagging at least helps more terms to be entered, and more access points to be created. Tagging is now being done at many web sites, mainly to tag web pages. Some examples of social bookmarking web sites are: . BlinkList (www.blinklist.com/). . delicious (http://delicious.com/). . citeulike (www.citeulike.org/). . connotea (www.connotea.org/). . Flickr (www.flickr.com/). . Furl (www.furl.net/). . Scuttle (http://sourceforge.net/projects/scuttle).


. . . . . . .

Simpy (www.simpy.com/). Spurl.net (www.spurl.net/). Yahoo!’s MyWeb (http://myweb.yahoo.com/). LibraryThing (www.librarything.com/). Tagzania (www.tagzania.com/). Technorati (http://technorati.com/). YouTube (www.youtube.com/).

Libraries can use these sites to tag their collection by creating accounts on these sites, and then either allowing librarians to create tags to form bibliographies, or allowing all patrons of the library access to the libraries account. Some of these sites require software to be downloaded to the web browser, which may or not be appropriate for the library. Some libraries are using these sites to post tag clouds on their web sites, such as Thunder Bay Public Library in Ontario, and the Nashville Public Library (Rethlefsen, 2007). A tag cloud is a common feature of a social bookmarking site. Tags are displayed in a group (or cloud) instead of in a long list. The tags font size is determined by the software by methods usually such as number of times a tag is clicked on or added, or the number of resources linked to a tag. Other sites are for tagging only image resources or videos, which could be useful for a library’s special collections. In fact, the Library of Congress is one library that is doing this. Many images in the Library of Congress’s collection have descriptions that are lacking in information. The library hopes allowing people to create tags for these images, they can be enriched. The project is described at www.loc.gov/rr/print/ flickr_pilot.html The images themselves can be viewed on Flickr at www.flickr.com/ photos/Library_of_Congress While tagging for the large majority is focused on internet links and web resources, LibraryThing is a site that allows its users to tag books. A feature of LibraryThing is widgets, which libraries can use to inform patrons of new books added to their collection. LibraryThing also has images of book jackets, which can enhance a library’s web pages. Some libraries use the tagging feature of LibraryThing to create recommended book lists, add genres, and call numbers (Rethlefsen, 2007) One library using LibraryThing is the Ohio State University Libraries. As seen in the profile (Figure 1) the library has used the tag “leisurereading” to create a list of books the library owns that patrons can read for leisure. One can view the entire list (Figure 2) or view individual books. While OSU Libraries have only used the one tag, once the patron clicks on the individual book, they can see several other tags created by other LibraryThing members (Figure 3). The web page for the book has reviews, as well as recommendations for other books made by LibraryThing. There is also a place for other users to make their own recommendations. A tag cloud displays the tags in size based on the number of members assigning the particular tag for the book. “Humor” is the largest tag, but there is also smaller tags for “comedy”, “funny”, “humour” and “humorous”, which displays how users can think of the same term differently. LibraryThing attempts to reduce the problem of synonymy by allowing annual and lifetime members to combine tags as explained on their web site (www.librarything.

The new cooperative cataloging 73


LHT 27,1

74

Figure 1.

Figure 2.


The new cooperative cataloging 75

Figure 3.

com/wiki/index.php/Tag_combining). The site recommends only combining tags that are always the same in use and meaning. LibraryThing also discourages combining acronyms with non-acronyms, since acronyms stand for many different terms. While plural and single terms may cause problems in tagging, LibraryThing does not combine these tags because some words change their intended meaning in the plural form. For example, LibraryThing explains “prayer” may mean the concept of prayer, while “prayers” may be a set of prayers. Another alternative to libraries making accounts on public tagging sites is to host their own tagging system. One library doing this is the University of Pennsylvania. PennTags allows members of the Penn community to tag web sites, articles in the library’s database, and records in both the video catalog and Franklin, the library’s OPAC. Tagging is done by clicking an “Add to PennTags” link available at the bottom of the page of any resource in the library’s system. Users can also download a bookmarklet to add to their web browser that will enable them to add any page in the World Wide Web to PennTags. Because PennTags is a closed system limited to members of the Penn community, it has a fewer number of tags than other systems like LibraryThing. However, PennTags allows users to add projects, which serve as bibliographies with access the references themselves (Figure 4). Besides the “Add to PennTags” link, the tags are not on Penn Libraries’ OPAC itself. Users can not search Franklin by tags. Those more comfortable with the traditional search experience are not distracted by cloud tags or other unfamiliar features of tagging sites. This way, the library caters to both those seeking a new way to interact with the catalog, and the traditional user.


LHT 27,1

76

Figure 4.

The Ann Arbor District Library (AADL) integrates their tagging system in the OPAC itself, in what their developer John Blyberg calls the SOPAC, or social online public access catalog (Rethlefsen, 2007). Tags found on the right side of the catalog’s home page (www.aadl.org/catalog) include the ten most popular tags ten most recent tags, and ten random tags. There is also a link to view the catalog’s tag cloud. When viewing the catalog record itself, any tags available are seen just above the MARC (Machine-Readable Cataloging) data. Anyone can sign up for an account to create tags for the AADL. Because those signing up are most likely associated with the AADL, once again, the number of tags is drastically smaller than a web site like LibraryThing. The key to making tagging work for a local library catalog like Ann Arbor or Penn is participation. LibraryThing founder Tim Spalding says, “People are not as motivated to tag in a library catalog as they would be in something like LibraryThing (Rethlefsen, 2007)”. Gene Smith (2008) gives several suggestions for increasing participation. The system should first make it easy for people to contribute tags. The library has to make sure the tagging system helps people manage their information well; otherwise it could become just another “information closet” like bookmarks. Encouraging collaboration, self-expression, and play is another way to ensure patron participation. In other words, make tagging fun and useful, not just a chore. Today’s consumer of information is expecting new interactive ways to obtain this information. With social bookmarking and tags, these users can see the library as more than just a building full of books. People are also enjoying the democratic nature of Web 2.0 and are no longer expecting to follow the rules of the experts to find the information they want. Therefore, the traditional metadata creator like the catalog


librarian should play the role of helper, not authoritarian. But tagging is not the best way to find this information. Controlled vocabularies like the LCSH have been around a long time, and will continue to play a major role in the library catalog. Authority files help users find information by reducing the problems of synonymy, polysemy, and single versus plural terms. Sometimes a user needs a hierarchical system to let them know they have found all information the library has on a particular subject. Other times, they just need to find the solution to their problem quickly. It all comes down to access points. By adding a tagging system to their OPAC, a library creates more access points, and more ways to get users to find what their library has to offer. Meanwhile, the catalogers can continue to add the traditional access points to aid in information retrieval where the new methods fail. No system is perfect, but by offering as many tools possible, libraries can continue to be information providers in the Web 2.0 environment. References Arch, X. (2007), “Creating the academic library folksonomy: put social tagging to work at your institution”, College & Research Libraries News, Vol. 68, pp. 80-1. Dye, J. (2006), “Folksonomy: a game of high-tech (and high-stakes) tag”, EContent, Vol. 29, pp. 38-43. Golder, S.A. and Huberman, B.A. (2006), “Usage patterns of collaborative tagging systems”, Journal of Information Science, Vol. 32, pp. 198-208. Gordon-Murnane, L. (2006), “Social bookmarking, folksonomies, and Web 2.0 tools”, Searcher, Vol. 14, pp. 26-38. Linnea Marshall, M.A. (2003), “Specific and generic subject headings: increasing subject access to library materials”, Cataloging & Classification Quarterly, Vol. 36, pp. 59-87. Mann, T. (2003), “Why LC subject headings are more important than ever”, American Libraries, Vol. 34, pp. 52-4. Noruzi, A. (2007), “Editorial”, Webology, Vol. 4, available at: www.webology.ir/2007/v4n2/ editorial12.html (accessed 30 September 2008). Rethlefsen, M.L. (2007), “Tags help make libraries del.icio.us”, Library Journal, Vol. 132, pp. 26-8. Smith, G. (2008), Tagging: People-powered Metadata for the Social Web, New Riders, Berkeley, CA. About the author Tom Steele graduated with a BA in broadcast journalism from Oklahoma State and wound up with a library paraprofessional job in the cataloging department at the university’s Edmon Low Library. Eventually this led him to realize he was a lover of information, not journalism, and received a Master’s in Library and Information Science from his alma mater’s in-state rival, the University of Oklahoma. A third-generation OSU Cowboy, he is ironically now the Science and Technology Cataloger at OU (University of Oklahoma). He can be contacted at: Thomas.D.Steele-1@ou.edu

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com Or visit our web site for further details: www.emeraldinsight.com/reprints

The new cooperative cataloging 77


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.