Issuu on Google+

A SEMANTIC (FACETED) WEB? KATHRYN LA BARRE

This paper is a theoretic-historical response to the central questions of the current issue – Where do we come from? Where are we currently? Where are we going? This examination is situated in the context of a set of heritage theories and practices most commonly known as Faceted Classification or facet theory. The locus of this exploration is the Semantic Web. The chief focus is upon Semantic Web implementations that employ, adapt, or misconstrue the theory or practice of facet analysis and Faceted Classification. A secondary focus is upon suggestions for the creation of operational definitions and functional requirements for facet theory that may serve to enhance, amplify or extend current understandings and practices in Semantic Web implementations. Cet article propose une réponse historico-théorique aux questions qui sont au cœur du présent numéro : D’où venons-nous ? Où en sommes-nous ? Où allonsnous ? Notre réflexion porte sur un ensemble de théories et de pratiques communément connues sous le nom de classification à facettes ou théorie des facettes, telles qu’elles sont appliquées dans le contexte du web sémantique. Nous examinons des applications du web sémantique qui utilisent, adaptent ou interprètent de façon erronée la théorie ou la pratique de l’analyse par facettes ou de la classification à facettes. Notre objectif secondaire est de suggérer la création de définitions opérationnelles et de spécifications fonctionnelles pour une théorie des facettes qui puisse enrichir, amplifier et élargir la compréhension que nous en avons présentement et ses applications sur le web sémantique.

DOI:10.3166/LCN.6.3.103-131  2010 Lavoisier, Paris


104

LCN n° 3/2010. Organisation des connaissances et web 2.0

Background This paper seeks to amplify a conversation about facet theory between Brian Vickery 1 and the author by exploring the following conjecture: A structure such as facet may validly represent certain aspects of a field, but only limited aspects. Speaking of thesauri, Soergel writes that ‘These very rudimentary relationships are not powerful enough to guide a user in meaningful information discovery on the Web or to support inference. They do not reflect the conceptual relationships that people know and that can be used by a system to suggest concepts for expanding the query or making it more specific’ (Soergel, et al., 2004). The same may be true of facet. Do we in fact need a much richer representation of semantic relationships, such as some ontologies are now trying to achieve? (Personal communication, 7 October 2005). Here Vickery refers to a proposal to re-engineer AGROVOC, a “full fledged and semantically rich”, thesaurus into an ontology suitable for application in automated indexing or query formulation and expansion. Through the use of Semantic Web technologies such as RDF — a metadata model for web resource description — and XML — a markup language that defines, validates, organizes and facilitates the sharing of Web resources, this domain-specific ontology would be fully interoperable with a variety of Semantic Web applications (Soergel, et al., 2004). This use case illustrates the crossroads at which traditional and emergent knowledge organization systems (KOS) are situated today, especially in the context of the Semantic Web. This discussion requires traversal of contested territory, as understanding of both entities central to this paper, the Semantic Web and facets, are remarkably disparate. First, the broad parameters of perspectives about the Semantic Web will be explored. Next, variant understandings of facets in theory and practice will be examined. These contexts will provide the background necessary to evaluate Vickery’s conjecture about whether or not facets are robust enough for ontology development and by extension, for application on the Semantic Web. To more fully explore this conjecture, a sample of representative instantiations will be examined. 1. Bryan Vickery ranks among the most highly cited, and well-regarded KO researchers in the world. As a founding member of the Classification Research Group, Vickery and others contributed to the development and extension of Faceted Classification in theory and practice. Up until his death in 2009, he remained actively involved in discussions about the present and future of KO. Our correspondence began in 2003. His constructive criticism and insightful challenges were a formative influence and provide continual guidance.


A semantic (faceted) web?

105

What is the semantic web? Is the Semantic Web a “pipe dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities” (Doctorow, 2001), “political philosophy masquerading as code” (Shirkey, 2003), or a way to leverage ontologies to “add a layer of meaning on top of the existing web” (Markoff, 2006)? Some scholars attribute these conflicting understandings to the existence of factions that perceive only one potential aspect of the Semantic Web instead of a unified whole. For some, the Semantic Web is a universal library; others envision an environment where artificial agents augment human effort; others focus instead on the existence of a stack of data linking and description technologies such as RDF, XML and OWL (Marshall and Shipman, 2003). Today, skeptics and optimists are equally abundant. Asked to predict the course of the Semantic Web over the next ten years, many of the 895 participants in a recent Pew Internet survey expressed doubt that the Semantic Web would make much difference to “average users,” or reach full potential (Anderson and Ranie, 2010). To find firm ground, this paper relies upon official definitions from the World Wide Web Consortium (W3C), an international community of individuals and member organizations that develops “interoperable standards, specifications, guidelines, software and tools” that will “lead the Web to its full potential” through the creation of a “Web of Linked Data, Services and Applications.” The ultimate goal of the W3C is to make relationships and linkages between and among web resources easy for both humans and machines to understand and follow (W3C: About). With that in mind, we return to the use case of Soergel’s AGROVOC vocabulary project. By following interoperable Semantic Web design principles such as the use of RDF and XML, this project sought to create an ontology that could be integrated into an automated classification system for agricultural materials and put to a variety of uses, including query expansion in search, resolving ambiguous language use in resources, and promoting serendipitous discovery of new links among and between documents. In short, the AGROVOC ontology would allow machines to access, convert, or reuse web content from a variety of domain-specific and general resources (Soergel, et al., 2004). One extant example of this type of effort is the DBpedia knowledge base, often considered to be the most powerful demonstration of Semantic Web design principles. Through use of the DBpedia ontology, Wikipedia content (in RDF format) is extracted and repurposed to allow “users to ask sophisticated queries… and [automatically] link to other data sets on the Web.” The DBpedia


106

LCN n° 3/2010. Organisation des connaissances et web 2.0

ontology has classified 1.5 million entities and helps provide access to 1 billion “pieces of information (RDF triples)” (DBpedia website). DBpedia automatically enhances Wikipedia content by linking to external resources including 800,000 images, over 5 million websites and over 9 million datasets (DBpedia website). Thus the Semantic Web refers to W3C’s vision of the ‘Web of Linked data’, composed of documents and a group of technology standards (such as RDF and XML) and vocabularies (such as OWL and SKOS) that support computers and people in “developing systems that can support trusted interactions over the network” (W3C: Semantic Web). To extend our examination of the role of facets in Semantic Web applications, the next section offers traditional definitions of the concepts central to facet theory. What is a facet? Contemporary organization and access systems seek to reflect both fluid associative and formally structured interrelationships among and between information resources. In service of this goal, some Semantic Web applications use facets to create dynamic and responsive information search and discovery systems. Just as there are many different perspectives on what constitutes the Semantic Web and its goals, there are many conflicting understandings of facets. An historically grounded survey of traditional concepts guides the ensuing analytical trajectory of this inquiry. Facets are evident in the classifications of knowledge described by Henry Bliss (1929), Paul Otlet (1934) and S.R. Ranganathan (1937). The first formal and theoretical statement of the principles of facet analysis appeared in 1957 (Ranganathan, 1957), although these principles were evident far earlier in the framework of the Colon Classification (Ranganathan, 1933). In the context of this paper, facet theory encompasses Ranganathan’s writings, which fully enumerate the theory and practice of facets, facet analysis and Faceted Classification, along with the extensions created by members of the British Classification Research Group, which counted Vickery among its members. Here, a facet is a “generic term used to denote any component of a compound subject, also its ranked forms, terms and numbers” (Ranganathan, 1967, p. 88). Facets are homogeneous groupings of terms or concepts that are the end result of the technique of facet analysis (Broughton and Slavic, 2004). Facets are commonly used to represent a component of a subject, or an attribute of an object, and can be combined (or synthesized) to represent complex subjects. This analytical technique foregrounds user interests by considering “what entities [and] what aspects of those entities are of interest to


A semantic (faceted) web?

107

the user group” throughout the process of conceptual analysis (Vickery, 1960, p. 11). The technique of facet analysis relies upon a list of fundamental categories or distinguishing characteristics. Ranganathan envisioned five: Personality, Matter, Energy, Space and Time (PMEST). Vickery’s own list included: things and entities; parts, components and structure; materials and constituents; attributes (qualities, properties, processes or behaviors); operations (experimental or mental); place; and condition (1960, p. 20, 29). Fundamental categories serve as “a provisional guide” for facet analysis and are not meant to be used “mechanically and imposed on [a] subject” but instead suggest “possible characteristics which should not be overlooked” (Vickery, 1960, p. 23-24). The technique of facet analysis follows seven basic steps. The first three steps are pure facet analysis; the last four demonstrate how facet analysis may assist in the creation of the structure of a faceted classification (Vickery, 1960, p. 12-13). Facet analysis: – Defines the domain and interests of domain participants. – Formulates facets by examining representative domain material by: - collecting terms that reflect domain participant interests, and - sorting terms into homogeneous, mutually exclusive groupings (facets). – Structures each facet in a hierarchical order to coalesce synonyms, identify misaligned terms, and gaps in the system. Faceted Classification: – Create scope notes to describe meaning and use of each facet. – Determine helpful order within each facet. – Build notation. – Fit notation, and create a schedule of facet terms. Facet analysis is traditionally conducted as the first procedure in the construction of a Faceted Classification. This process begins with facet analysis, is followed by the creation of a Faceted Classification with a schedule of terms and notation, which is then tested through the process of facet analysis to see that no gaps, errors or omissions have occurred. By the 1960s, in part through the efforts of the British Classification Research Group, facet analysis was readily known as an effective technique for constructing controlled vocabularies (Aitchison and Clarke, 2006). Examples of faceted thesauri include: Thesaurofacet (Aitchison, Gomersall and Ireland, 1969), the UNESCO Thesaurus (Aitchison, 1977) and the Art and Architecture Thesaurus (Petersen, 1990). Descriptions of


108

LCN n° 3/2010. Organisation des connaissances et web 2.0

facet analysis can also be found in the ANSI/NISO standard Z39.19 for the Construction, Format and Management of Monolingual Thesauri (NISO, 2005), the British Standards Institution’s Structured Vocabularies for Information Retrieval, Part 2 for Thesauri (BSI, 2005), as well as the draft of the new ISO standard, Thesauri for Information Retrieval (ISO DIS 25964-1, 2010). The way forward? Thus far, Vickery’s 2008 observation from his article in Axiomathes holds true: Once upon a time, over 50 years ago, I collaborated in writing an article entitled The need for a faceted classification as the basis of all methods of information retrieval (Classification Research Group, 1957). A bold, brash claim! Since then we have had thesauri, relational data bases, taxonomies, ontologies, knowledge bases, topic maps, knowledge architectures and more. Yet in many of them we may detect signs of the old facet mole burrowing away… Clearly, it was an exaggeration to say that facets are the basis of all. To combine two facets is to imply a relation between them, but it is the explicit naming of relations between terms that is another theme in knowledge organisation, found in thesauri and ontologies but not in facets as such (Vickery, 2008, p. 148). In his correspondence with me, Vickery was keenly interested in the role that facet theory might play in the further development of ontologies, and what limitations might be uncovered. The following section offers suggestions that may move this conversation to a wider audience. A need for operational definitions Although Ranganathan and the Classification Research Group initially used the terminology of facet theory in a precise fashion, variant understandings began to multiply over the intervening fifty years. As early as 1955, de Grolier observed divergences in term use even among CRG members. For example, D.J. Foskett considered category an apt synonym for facet, and facet analysis as ‘…analysis of a subject in its entirety into a certain number of … categories of things.’ By 1960 Vickery and others commonly used the term conceptual categories ‘of high generality and application that can be used to group other concepts.’ Attributing these divergences to ‘Ranganathian language,’ de Grolier described the “extremely specialized term fundamental categories to reference each facet of a subject, as well as each division of a facet” as a locus of confusion (de Grolier, 1962, p. 15). He dryly observed that Ranganathan’s habit of “using words in a particular sense…certainly does not facilitate comprehension” (de Grolier,


A semantic (faceted) web?

109

1962, p. 44). In a critical analysis of the ambitious Colon Classification, de Grolier traced the evolution of a preponderance of vague and inexact central definitions, and observed that in “practical realization” facet theory is “singularly empirical and often very arbitrary” (de Grolier, 1962, p. 57). Some of the differences in vocabulary and understanding may also spring from the geographical and cultural separation of the three research groups that promoted facet theory in the 1950s, the Library Research Circle in India, the British Classification Research Group, and the North American Classification Research Group (Brownson, 1960, p. 1930). Regardless of source, terminological and practical confusion is now as rampant as it is well documented (Spiteri, 1998; Maniez, 1999; La Barre, 2006, 2010). For developers who experiment with facet theory outside the confines of LIS, understanding and application of facet theory evinces even greater variability. In the United States the term facets as used by Information Architects (La Barre, 2006) and in some Next Generation Catalogs (La Barre, 2010), more closely resemble ad hoc categories and often bear faint resemblance to facets created through the process of facet analysis. Some Web developers consider any information organization system with elements of synthesis to be a Faceted Classification and many Information Architects loosely refer to the use of faceted access structures for objects as ‘Faceted Classification.’ Here, some members of the British tradition of facet theory draw a sharp distinction - in that facet theory only supports the use of Faceted Classification for subjects. Other differences in understanding relate to whether or not elements of a faceted navigation display can properly be considered fundamental categories, or something else entirely – such as principles of division. Emblematic of this contentious situation, the OCLC research project FAST (Faceted Access to Subject Terminology) is a product intended to simplify and reduce the cost of indexing of bibliographic and other material. The FAST schema has eight facets: Personal names, Corporate names, Geographic names, Events, Titles, Time periods, Topics, and Form/Genre. Here LCSH [Library of Congress Subject Headings] has been assigned a “simplified syntax … retain[ing] the very rich vocabulary of LCSH while making the schema easier to understand, control, apply, and use… [A]ny valid set of LC subject headings can be converted to FAST headings” (OCLC FAST website). Although some discern uniformity in the North American tradition of facet theory, many researchers readily agree with Broughton’s contention that FAST “makes some progress along the road to consistent analytico-synthesis, although it is not faceted in the sense that most UK professionals would recognise” (Broughton, 2006, p. 58). Many also concur with Broughton’s assessment that the Z39.19 definition of facet as “attributes of content objects


110

LCN n° 3/2010. Organisation des connaissances et web 2.0

encompassing various non-semantic aspects of a document” and listing of facets [such as] “topic, author, location, format, language, and place of publication” has more in common with database fields than traditional facets (Broughton, 2006, p. 58). Broughton is quite correct when she observes that this “has a great deal in common with the FAST project … where “topic” (or subject) is regarded as one facet among a list of non-subject elements of bibliographic description.” Even though some North American researchers might consider these to be equivalent to “fundamental facets,” many would instead recognize their genesis in MARC and Dublin Core fields, not as actual or necessarily appropriate products of rigorous facet analysis. This superficial notion of facet (akin to leveraging existing data fields) also appears in many Next Generation Catalogs for bibliographic holdings in libraries. It is clear that facet analysis has yet to become a standard part of OPAC system design. While the practices embedded in Next Generation Catalogs may be unique to the United States, this situation is not an indicator of unanimity in understanding. Rather, it is an expediency eagerly grasped by a few OPAC software designers. Few North American facet theorists would disagree with the statement that the true strength of facet analysis lies in the way it can “peel the onion of an idea”(Vickery, 1966, p. 13-14). Too often, concerns about time and money work against deep subject analysis that could be most fully displayed in the context of an OPAC display designed to fully leverage facets. In sum, it is clear that French, British, North American, and Indian facet theorists are equally concerned with lax terminological usage. A recent exchange with a member of the Indian school revealed finely nuanced differences in understanding (Raghavan, personal communication, 2010). For Raghavan, “a facet is just one aspect of a multidimensional subject or an attribute of a physical carrier.” The distinction between facets for subjects and objects, made by some members of the British tradition, does not seem to be held equally by members of the Indian tradition. For Raghavan a more important distinction exists: – categories are at a higher level and are arrived at by a process of “logical abstraction”. – the nature and number of categories would, according to some, depend on the – purpose, while for some others it is possible to arrive at a set of fundamental categories – that are applicable to all branches of knowledge (universals)… It is this categorical


A semantic (faceted) web?

111

– approach that characterizes a faceted classification; In other words we try to identify – all the different facets that occur or are likely to occur in a domain/discipline and try to – group these together based on what they represent (the process of logical abstraction). As Maniez (1999) indicated earlier, facet theorists must work to find common ground, and to regularize term use wherever practicable. Once freed from the reigning confusion, attention can be directed to promoting facet theory among researchers in cognate areas. As such, it may be useful to operationalize agreed-upon definitions and begin work on a set of functional requirements for facet analysis. Both deliverables are potentially useful to researchers hard at work creating formal representations of facet analytical approaches. The next section describes another critical step towards supporting creation of more robust and theoretically grounded Semantic (Faceted) Web applications: the creation of functional requirements for developers and researchers interested in applying facet theory to the enormous terrain of the Semantic Web. Functional Requirements of Facet Analysis In 2003, Phil Murray indicated that one motivation for the formation of the Faceted Classification listserv was his interest in expanding a preliminary set of functional requirements for facet theory: 1. What is the most effective way to model the process of facet analysis? 2. Is there a recognized way to design and model Faceted Classification? 3. How should a human or a machine index with facets? 4. What interchange formats are best for capturing facets and facet relations? 5. What software or metadata tools are best for faceted implementations and applications? 6. What is the best approach to selecting automated categorization tools for sharing schemas, supporting facets? 7. What approaches are similar to Faceted Classification or facet analysis? (Murray, 2003). These need refinement, but are a good start to a process of identifying mission critical features and functional requirements that are accessible for those who would seek to use facet theory. Slavic, who motivated the post by


112

LCN n° 3/2010. Organisation des connaissances et web 2.0

Murray, has also proposed the importance of agreed-upon functional requirements in support of online faceted classifications to better assist in maintenance and management, the creation of indexing tools, and as a way to “improve standards for the use and exchange of knowledge organization systems” (Slavic, 2008, p. 258). Other preliminary attempts to identify functional requirements reside in unexpected places. One source is Kashyap’s comparison of Ranganathan’s facet postulates and principles to Chen’s entity-relationship modeling. The analysis here helps bridge the understanding barrier between facet theorists and those working in cognate traditions (Kashyap, 2001). During an exchange about defining facets and fundamental categories on the SKOS list, Tudhope answered Vickery’s question - “Do we in fact need a much richer representation of semantic relationships such as some ontologies are now trying to achieve?” Tudhope asserted that the W3C standard OWL (Web Ontology Language) provides an appropriate vehicle for encoding facets and capturing facet relations because it provides “additional vocabulary along with formal semantics” (Tudhope, 2004, message 0051). By reminding discussants that OWL supports “describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes” (W3C website), Tudhope reiterated the utility of facet theory in ontology development. In the same vein, Broughton identified shared and complementary roles for facets in ontology building. Both provide excellent vocabulary control structures, support term disambiguation, enhance browsing and searching, and can be used to build a site navigation frameworks. This is true primarily because Faceted Classifications rely upon the use of mathematically based formal coding to express content and content element relations, with the result that the notation and facet indicators can be leveraged by search and access systems. This use of formal coding has the potential to enable manipulation and integration or conversion of a Faceted Classification into a fully developed ontology (Broughton, 2006, p. 65-66). To take this line of reasoning a bit further, because facets represent aspects or viewpoints from which an entity may be analyzed, Sigel and others promote the use of facet analysis of concepts into basic categories, not simply because it is an efficient way to identify the central concepts in a domain, or to make explicit all essential aspects of a concept, but because it can also uncover the relationships between concepts in a domain (Foskett, 1977; Sigel, 2003, p. 405; Soergel, 1985, p. 257, 280; Vickery, 1966). Because of this robust mutability,


A semantic (faceted) web?

113

Sigel observes that facet analysis can also support semantic factoring, or analysis of categories into primitives or basic level concepts at any level of an ontology (2003). Soergel also views semantic factoring as an equivalent process to facet analysis. He envisions the stepwise process of semantic factoring as a facet framework. If one conceives of each facet as a question, each answer thus represents one essential aspect of a given concept. This functional approach is demonstrated by the following example: – Of which class is the concept or object a member or subclass? – Is the object in a specific state, condition or circumstance? – What is it capable of doing? Does it have a specific purpose? – Does the object or concept cause, influence, produce or act upon another? – Is X a means by which to achieve something else? – Is it a theory or aspect? Is it a specific aspect or viewpoint? – Is it accompanied by something or does it accompany something? (Soergel, 1985, p. 258) Given such a framework, it is a small step to envision embedding formalized knowledge representations (such as a set of IF-THEN rules) into a semiautomated routine designed to assist with ontology creation. John Sowa (undated) provides support for this musing with an observation that “the techniques of semantic factoring can be applied to any level of an ontology from the highest, most general concept types to the lowest, most specialized types. The methods can be automated, as in formal concept analysis, which is a systematic technique for deriving a lattice of concept types from lowlevel data about individual instances.” Building on this suggestion, Priss (2008) presented a mathematical model for facets that utilizes Formal Concept Analysis (FCA) to support the creation of graphical representations of faceted systems. Might such an approach potentially serve as a functional model for formalized facet analysis as well? Both Priss and Sowa reference Wille (1992) as a fundamental source for understanding FCA, and a potentially fruitful area for those interested in promoting and extending the use of facet theory for ontology development. Facets on the semantic web? A recent special issue of Axiomathes engaged deeply with the role of facet analysis in cognate fields, and examined extensions and applications of facet


114

LCN n° 3/2010. Organisation des connaissances et web 2.0

analysis in the Semantic (and wider) Web. Tudhope and Binding (2008, p. 211222) discussed the role of faceted thesauri in ontology building. Vickery (2008, 145-160) provided an updated primer in facet analysis, addressed concerns and provided solutions to problems faced by developers as they work with facet analysis and Faceted Classification in a multitude of Web environments. The remaining sections of this paper will build on the previous operational definitions, and possible functional requirements through principled analysis of a sample of current understandings and applications of facets on the Semantic Web. Facets in semantic web standards Facets caught the attention of practitioners who create websites and Web applications (La Barre, 2006). The following definitions are drawn from such efforts in order to demonstrate a variety of approaches and understandings on the Semantic Web.

In the context of faceted Semantic search of Biomedical research papers: [F]acets “represent the characteristics of the information elements. These facets are then used to select or filter the relevant elements in a certain information space, leading users to the exact information required” (FuentesLorenzo, Morato and Gomez, 2009, p. 476).

In the context of a faceted Semantic Web browser: Faceted filtering is a method that “helps the user to get an overview of a given set of resources. The basic concept of facets is to partition the information space using orthogonal conceptual dimensions of the data” (Kobilarov and Dickinson, 2008, p. 3).

In the context of faceted interface design for Wikipedia: A faceted interface for a set of objects is a set of category hierarchies, where eachhierarchy corresponds to an individual facet (dimension, attribute, property) of the objects (Li, et al., 2010, p. 651). The present inquiry draws upon traditional descriptions of facets, facet analysis and Faceted Classification as a framework of analysis for Semantic Web applications that seem to use the terminology and processes of facet theory. In Semantic Web applications, it is interesting to note that the term facet appears in the W3C XML Schemas standard. As we shall see, several different understandings of this term were part of an extended discussion in 2004 about another W3C standard, SKOS (Simple Knowledge Organization System).


A semantic (faceted) web?

115

Facets, or what came to be known as scopes, also play an instrumental part in Topic Maps, a third standard, currently under consideration by the W3C. In the section that follows, each of these technologies will be defined and discussed in the context of their use of the term facet. Facets in XFML XML (eXtensible Markup Language) is “is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more” (Ray, 2003). Since the W3C approval in 1998 of Version 1.0 of XML, it has become “one of the most widely used formats for sharing structured information today: between programs, between people, between computers and people, both locally and across networks” (W3C, XML is 10!). In 2003, Peter Van Dijck introduced an extension to XML called XFML (eXchangeable Faceted Metadata), an open format for exchanging and publishing metadata, and for indexing web documents. XFML is based on Topic Maps, an ISO standard that will be discussed in a later section. According to Van Dijck, “The real power of XFML lies in the concept of directly connecting topics. This allows authors to reuse existing indexing efforts” (Van Dijck, 2003). In XFML, facets were defined as “the top node of each tree. The nodes in the tree are called topics. XFML can define multiple hierarchies, and each hierarchy is a facet” (Van Dijck, website). The following example of XFML offers a way to encode music resources by type and location: <facet id="city"> City </facet> <facet id="place"> Type of place </facet> <facet id="music"> Type of music </facet> <topic id="ny" facetid="city"><name> New York </name></topic> <topic id="la" facetid="city"><name> Los Angeles </name></topic> <topic id="bar" facetid="place"><name> bar </name></topic> <topic id="restaurant" facetid="place"><name> restaurant </name></topic> <topic id="blues" facetid="music"><name> blues </name></topic> <topic id="latin" facetid="music"><name> latin </name></topic>

In this extension of XML, the developer faithfully adheres to traditional understandings of facets: "[f]acets are mutually exclusive containers that contain hierarchies of topics. Mutually exclusive means that a certain topic can only possibly belong to one facet" (Van Dijck, website). In the above example,


116

LCN n° 3/2010. Organisation des connaissances et web 2.0

location and type of music are mutually exclusive because a music type facet attribute (Latin) would never be a location type facet attribute (New York). The developer maintained the importance of adhering to the principle of mutual exclusivity – part of traditional facet analysis – in that someone using XFML “should separate out a new facet when describing topics that can be usefully combined.” Encoding in XFML allows browsing of content in a nonhierarchical way. For example, perhaps a traveler wants to listen to blues music in a restaurant in New York, or is interested in finding all venues that host Latin music. Information encoded in this way can be displayed in the same order in which the query is formulated (by music type, instead of location), and breaks the information free from whatever order in which it might have originally appeared. By 2007, the developer ceased further development of XFML because of low demand, but XFML remains an exemplar for Web developers interested in facet applications. Facets in SKOS (Simple Knowledge Organization Systems) In 2009 SKOS was adopted as a W3C standard to integrate traditional thesauri, classification schemes, subject heading systems and taxonomies encoded in RDF into “distributed, decentralized metadata applications.” Several familiar thesauri or controlled vocabularies serve as SKOS use cases, including the Library of Congress Subject Headings (LCSH), AGROVOC, and UMTHES (Umweltthesaurus, of the Federal Environment Agency Germany)(SKOS website). Of interest to the present discussion is an announcement in 2004 to the listserv public-esw-thes, that because of the contentious nature of facet definitions and the variety of opinions about proper modeling approaches “skos:Facet class and the skos:inFacet and skos:facetMember properties in SKOS-Core 1.0 have been dropped for now” (Miles, Facets in SKOS-Core 1., 2004). At the core of the disagreement were two definitions of facet in the SKOS documentation: 1. “A concept is a member of a facet” (SKOS website, 2004) 2. “Facets provide a means of organising concepts along orthogonal dimensions. A facet is treated as a concept. A facet may have member concepts. A concept may be a member of only one facet.” (SKOS-Core RDF Schema, 2004). While the discussion finally came to a consensus that facet represented “mutually exclusive groupings with internal hierarchies of concepts” (Tudhope, personal correspondence, 2003), debate continued on other issues:


A semantic (faceted) web?

117

1. What is the difference between a fundamental facet, and a characteristic of division or array? 2. What is the difference between facets and member concepts? 3. Should OWL (Web Ontology Language) instead of SKOS be considered as the more appropriate vehicle to facilitate advanced reasoning with faceted schemes: – As a way to specify different sets of fundamental categories. – As a way to bootstrap fundamental categories as a bridge between different SKOS. – As a way to specify rules for synthesis of facet concepts. – As a way to better relate facets to a higher level ontology (Tudhope and Binding, personal correspondence, 2004) 4. “[I]s a 'facet' best modelled as a type of concept or as a separate entity, when considering future development of OWL?” (Tudhope, personal correspondence, 2004). The most recent SKOS primer (2009) indicates that “Concepts (not facets) are the central modeling primitive of SKOS” (SKOS primer, website). Because SKOS maintains adherence to the most current ISO standards for monolingual thesauri (ISO-2788:1986) and multilingual thesauri (ISO-5964:1985) terms in ISO standards correspond to labels of SKOS concepts. This advice is offered to those who want to use SKOS for faceted thesauri: SKOS allows the representation of groupings of concepts. But it focuses on the conceptual level, and no construct is given that biases towards a specific display strategy. As a result, collections in SKOS are not explicitly related to one "parent"concept. This link must be (re-)created via a specific display algorithm, or by using an ad-hoc extension (SKOS primer, 2009). This means that SKOS implementations of faceted classifications or thesauri, must rely on an external device that is not currently part of the standard, in order to model facet relationships. One exemplar approach to encoding faceted vocabulary in SKOS is the Tudhope’s STAR project (Hypermedia Research Unit, website), which uses SKOS representations of English Heritage vocabularies to enable search query expansion. Another complementary proposal to accommodate faceted subject vocabulary in SKOS demonstrates the use of an extended RDF schema as a core feature of a concept annotation system that utilizes an extended SKOS/RDF version of POPSI (POstulate based Permuted Subject Index) (Prasad and Guha, 2008).


118

LCN n° 3/2010. Organisation des connaissances et web 2.0

Facets in Topic Maps Topic Maps (ISO/IEC 13250, 2003) were originally designed to “handle the construction of indexes, glossaries, thesauri and tables of contents” (Park, 2003, p. 8). The initial data model for Topic Maps had two constructs: topics which have three characteristics: names, associations and occurrences, and relationships between topics. Facets were initially introduced as a construct to allow filtering of topics by domain, language, security and version. Facets were later subsumed by the construct scope, a unique way for topic maps to “incorporate diverse world views, and diverse languages, without loss of usefulness to specific users in specific contexts” (Park, 2003, p. 38). A scope is a means of filtering information that is based on the properties of the topics in a given map. The most common use of scoping is when topics have names in different languages. In this case, scoping enables names to be converted from a variety of languages to the language of the searcher. The following examples are drawn from the standard specification: ‘Suomi’ is the name of the country Finland in Finnish. This corresponds to assigning the topic name ‘Suomi’ to a topic representing Finland, and scoping it with a topic representing Finnish. According to expert X, the Tibetan script is an instance of the script type ‘abugida’ whereas according to expert Y it is an ‘alphasyllabary’. This corresponds to having two "type-instance" associations, each scoped with a topic representing the relevant authority (ISO/IEC 13250, Section 5.3.3). At least one facet theorist, Sigel, views scope as strongly related to facet because it can index resources according to different viewpoints, and supports “filtered and adapted result sets according to user profiles” (Sigel, 2003, p. 427). In 2006, the W3C began soliciting proposals to explore standardized guidelines to ensure the interoperability of RDF and Topic Maps. With the publication of an XML Syntax standard for Topic Maps in 2007 (ISO/IEC 2007) it is possible that the Topic Maps standard is poised for support by both ISO and the W3C (W3C , RDF/Topic Maps proposals, 2006). This discussion concludes the section on facets in use in semantic standards, understandings are variable, and as we have seen, facet theory has made some inroads. Conflicting definitions seem to have derailed further progress, though several facet theorists, Tudhope and Prasad among them, are hard at work to redress these issues. In the next section, this analysis will continue with an examination of a few representative Semantic Web applications.


A semantic (faceted) web?

119

Semantic web applications with Facets Evaluating applications on the Semantic Web is notoriously difficult according to a number of researchers (Ossenbruggen, Amin and Hildebrand, 2008; Clarkson, Navathe and Foley, 2009). Two evaluative frameworks were considered as possible approaches for the current paper. Ossenbruggen et al. began their examination with a search and annotation engine for cultural heritage implementations. Evaluation often relies on assessments of quality. After a review of similar implementations, the authors proposed that quality could be measured on three axes: 1. underlying dataset, 2. search and inference software and 3. user interface. A similar evaluative review of faceted browsers, some with Semantic Web functionality, proposed three dimensions along which to evaluate quality and functionality: 1. visual design, 2. interaction design and 3. structural design (Clarkson, Navathe and Foley, 2009). For the purpose of analysis here, these dimensions were insufficient avenues to approach the extent to which the three representative implementations reflect awareness of traditional facet theory. Nevertheless, these dimensions are offered as a potential framework for future analysis. Because Semantic Web applications that utilize ‘facets’ often exist in a proof of concept state, with little accompanying documentation, it can be difficult to discern how facets are generated or the mechanics of implementation. Here, three representative applications are examined, two are Semantic Web implementations: the Humboldt faceted browser, and Facetedpedia. These exemplars were selected from a steadily growing universe of ‘faceted’ Semantic Web applications because they represent state of the art research by projects outside the field of LIS, and offer ample public documentation of the underlying rationale for the algorithms and operational definitions that make facets manifest — despite the fact that some of the interfaces can only be viewed in the context of the printed page. To place these in context, a suite of open source applications that was not designed to rely on Semantic Web datasets or technologies, will also be discussed because an overwhelming majority of ‘faceted’ semantic web applications reference publications from this project. The following analysis begins with this project.


120

LCN n° 3/2010. Organisation des connaissances et web 2.0

Bailando’s FLAMENCO interface and Castanet algorithm Developed over the past ten years as part of Marti Hearst’s Bailando project at the University of California School of Information, FLAMENCO (Flexible information Access using MEtadata) provides search interface that uses ‘hierarchical faceted metadata.” Castanet, a companion to FLAMENCO, is an algorithm for semi-automatic generation of hierarchical faceted metadata from a monolingual textual corpus (Flamenco project, website). Many Semantic Web applications that incorporate ‘facets’ draw inspiration and operational definitions from publications associated with this project. Developers often directly cite Hearst’s definition of facets as “orthogonal sets of categories which together may be used to describe a topic” (Hearst, 2000). Another more recent and heavily cited example defines facets as “categories used to characterize information items in a collection” (Hearst, 2006, p. 1). Over time Hearst’s references to facet and her description of faceted navigation have remained consistent: The main idea is to build a set of category hierarchies, each of which corresponds to a different facet (dimension or feature type) that is relevant to the collection to be navigated. Each facet has a hierarchy of terms associated with it. After the facet hierarchies are designed, each item in the collection can be assigned any number of labels from the facet hierarchies. The resulting interface is known as faceted navigation (Hearst, 2009, p. 189). Hearst (2009) recently provided a few clues into the source of her own understanding of the meaning of facet: The term faceted was chosen by this project [Flamenco] to reflect the underlying spirit of the idea from library science. Ranganathan (1933) is often credited with introducing the idea with his Colon Classification system, which suggesteddescribing information items by multiple classes, and Bates (1988) advocated for faceted library catalog representations in the 1980s (Hearst, 2009, p. 189-190). In this project, facets are produced by the Castanet algorithm, which was created to reduce the amount of time needed to produce facets from a set of texts, or object descriptions. Castanet operates by generating hierarchical faceted metadata through the use of a lexical database, in this case WordNet, a browsable “network of conceptual semantic and lexical relations” (WordNet website). Though WordNet captures a variety of lexical relations, the Castanet algorithm only uses hypernymy (IS-A) relations (Stoica, Hearst and Richardson, 2007, p. 245). Castanet evaluates textual descriptions for frequently occurring nouns (or targets), references WordNet for appropriate synsets that match the target terms, and ensures term disambiguation. Facet selection proceeds as follows, “the goal is to create a moderate set of facets, each of which has


A semantic (faceted) web?

121

moderate depth and breadth at each level, in order to enhance the navigability of the categories. Pruning the top levels can be automated, but a manual editing pass over the outcome will produce the best resultsâ&#x20AC;? (Stoica, Hearst and Richardson, 2007, p. 246). Corpus documents are assigned to facet hierarchies by referencing the same textual descriptions that were examined as part of the initial process. Figure 1 is an example of a Flamenco interface for Nobel Prize Winners. The facets [gender, affiliation, prize and year] were drawn from the textual corpus of individual descriptions by the Castanet algorithm, and then further refined by project researchers.

Figure 1. Flamenco display of Nobel Prize winners Hearstâ&#x20AC;&#x2122;s work with hierarchical faceted metadata and faceted navigation brought facets to the attention of a generation of Web developers, and may be the spark that reignited an explosive decade of experimentation with facets as information retrieval devices. Castanetâ&#x20AC;&#x2122;s dependence on a textual corpus for semi-automated generation of facets imitates the practice of traditional facet analysis, which has long relied on literary warrant, by requiring the assembly of a body of material that reflects current domain language to analyze for facet candidates.


122

LCN n° 3/2010. Organisation des connaissances et web 2.0

Adhering to the spirit of facet analysis, Hearst foregrounds the user by paying close attention to information seeking experiences. This project provides a lodestone for those seeking data about user interactions with faceted systems. However, entirely missing from the documentation for FLAMENCO or Castanet is a recognition of the facet theoretical dictate to assess user interests prior to assembling a representative set of documents. Also missing is an operationalized set of fundamental categories to guide CASTANET in the automated selection of facets from the corpus, or the humans who refine the results (Ranganathan, 1933, 1957; Vickery, 1960, 2008). Close inspection of the Castanet algorithm may yield a set of principles similar in nature to those stated by Vickery, but nowhere are these made explicit: [t]he essence of facet analysis is the sorting of terms in a given field of knowledge into homogeneous, mutually exclusive facets, each derived by the parent universe by a single characteristic of division. We may look upon these facets as groups of terms derived by taking each term and defining it, per genus et differentiam, with respect to its parent class (Vickery, 1960, p. 12). Hearst’s definition of facets as ‘orthogonal sets of categories’ seems to be a nod in the general direction of facet theory but the precise meaning of orthogonal is nowhere defined by FLAMENCO documentation. In a loose sense of the word, orthogonal may be taken to mean mutually independent, non-redundant or irrelevant. WordNet lists three semantically related terms: irrelevant, statistically unrelated or rectangular, and lists unrelated – ‘lacking in logical or causal relations’ as the next most similar term (WordNet, website). This lack of definition is a critical omission. The preceding example points to another area that needs closer attention, reliance of CASTANET on WordNet, or another similar lexical database as part of the semi-automatic facet generation process. In recognition of facet theory, care must be taken that the domain, and domain language of interest is adequately represented by the lexical database, otherwise it may be possible for the algorithm to miss important or emergent topics. One facet theorist, PerezCarballo, sought to address these limitations with the FFID (Fast Facet ID) clustering algorithm, an application semi-automated facet generation that is well grounded in facet theory (Perez-Carballo, 2009).

Humboldt faceted browser This faceted application is typical of the type of browser projects that use facets – as defined by Hearst – to “partition the information space using orthogonal conceptual dimensions of the data” (Yee, et al., 2003). Because Humboldt is designed to query RDF data, it permits “multiple query-responsive representations of data according to different dimensions in the data or in the


A semantic (faceted) web?

123

query” (Kobilarov and Dickinson, 2008). Facets drawn from the underlying dataset appear alongside the result set as a way to enable searchers to continue browsing. This operation, termed pivot by the developers, treats facet values as result sets, and dynamically builds a faceted view around whichever facet the user selects from the initial result set, and each subsequent time the user selects a facet. Each new view has a set of facets that dynamically reflect the new results. User queries build “implicitly as each successive result set is browsed and thus the “implicit construction of queries” is supported (Kobilarov and Dickinson, 2008). In Figure 2, two screens appear.

Figure 2. Faceted filtering in Humboldt browser example The screen on the left is the initial result set [Films]. The display offers the facets [actors] [directors], and [companies] to users as a way to further refine the query. The right hand side of Figure 2 is the result of a user selecting the [actors] facet from the first result set. This new result set offers two more facet refinements [films] and [cities], presenting users with new ways drill down even further into the dataset. For example, if a user were to select [film] a new set of facets would be generated from the new result set and might include: [directed in], [has produced], [has written], [has edited], [won award]. The facets are not preset, which means that if there are no results to fill a given facet, it will not


124

LCN n° 3/2010. Organisation des connaissances et web 2.0

appear, thereby preventing a null set from being offered to the user. Here it is important to note that each time a user filters by a given facet value, “the interface uses the filter result to build the next view” (Kobilarov and Dickinson, 2008). Though building on the framework of Hearst’s research, Humboldt facets have a simple genesis: they are drawn from the underlying dataset by an algorithm that identifies the rdf:type of the resources in the result list. Humboldt also provides a linear display of the user’s browsing history in order to allow the user to quickly reorient or to easily backtrack. This browser is a classic example of using faceted browsing to support exploratory search tasks, and it reflects an understanding of facets that springs from Hearst’s framework. As such it suffers from the same limitations, and is subject to the same cautionary notes. Heavy reliance on rdf:types makes this prototype a further step or two removed from user interests and domain language, and moves it further away from traditional faceted principles.

Facetedpedia Li and colleagues proposed the development of Facetedpedia, a faceted information retrieval system capable of generating sets of query-dependent facets for Wikipedia articles. Marti Hearst’s (2006) work served as the basis of their definition of faceted interfaces: “A faceted interface for a set of objects is a set of category hierarchies, where each hierarchy corresponds to an individual facet (dimension, attribute, property) of the objects” (Li, et al., 2010, p. 651). The authors are well aware of Wikipedia users’ behavior and find that visitors most often want to explore a set of topics, but will avoid searching for an article on a particular topic because of typically lengthy and cluttered result sets. Thus, this project proposes a faceted retrieval system for Wikipedia articles as a way to provide more robust searching and retrieval. The authors note that while most faceted retrieval systems use a static set of facets generated from an unchanging set of material, the facets in Facetedpedia would be automatically generated based on the initial query, and drawn from the dataset as it exists at the time of the query. Because of this, no existing faceted interfaces “could be effectively applied in place of Facetedpedia, because none is fully automatic in both facet identification and hierarchy construction” (Li, et al., p. 653). This approach extends traditional facet analysis, from a text-based world into a digital world of constantly updated, updating and updateable material. The authors note that two questions must be addressed: (1) what are the facets of a Wikipedia article, and (2) where does the category hierarchy of a facet come from? A second challenge is that the system needs to be capable of


A semantic (faceted) web?

125

measuring facet “goodness” in order to ensure usefulness (Li, et al., 2010, p. 652). The authors acknowledge related projects such as: CompleteSearch, which supports refinement of queries in Wikipedia via three dimensions: query completions matching the query terms; category names matching the query terms; and categories of result articles attributes; and DBPedia, a faceted Wikipedia search interface, which uses facets that “appear to be queryindependently extracted from common Wikipedia info box attributes, although the underlying method remains to be proprietary at this moment” (Li, et al., p. 653). According to the researchers, the Wikipedia category system is a usergenerated vocabulary that should be considered an essential element in the development of faceted browsing options. They further explain: “collaborative vocabulary represents the collective intelligence of many users and rich semantic information, and thus constitutes the promising basis for faceted interfaces.” This project faithfully adheres to facet theory in that the domain vocabulary contained in the user-generated category structure is leveraged, thereby foregrounding user interests and maintaining the use of familiar domain language. In addition to use of the user-generated category system, Wikipedia articles hyperlinked from a search result article are also exploited to enrich article attributes with qualifiers such as significance or relevance. This proposal relies heavily on set theory, and is well worth further study. Of additional interest is the fact that the article contains numerous comparisons to the CASTANET algorithm as most readily suitable for comparison to Facetedpedia. An added dimension for comparison is the fact that in contrast to CASTANET, which works on a static document set, Facetedpedia is designed to work with a dynamically growing set. Concluding remarks From the outset, this paper has wrestled with a series of questions: – Where do we come from? – Where are we currently? – Where are we going?

– Do we in fact need a much richer representation of semantic relationships, such as some ontologies are now trying to achieve? This theoretic-historical response has considered a terrain of contested definitions and understandings of facets, by resorting to a brief exploration of canonical facet literature (Ranganathan, 1933, 1957, 1967; Vickery, 1960, 1966).


126

LCN n° 3/2010. Organisation des connaissances et web 2.0

Facet theory has been the object of periodic experimentation over the last fifty years and has long been considered a useful way to assist in the creation of controlled vocabularies and Faceted Classifications. Today, the Semantic Web reflects a welter of splintered understandings of facets that have resulted in a variety of incompletely realized instantiations. Not all efforts at faceting are well considered, and many evince little to no understanding or awareness of traditional facet theory. It seems certain that experimentation with facet theory will continue, whether for building of ontologies, enhancing search and browsing, or as part of Semantic Web standards, especially with regard to vocabularies and ontologies. Much remains to be done, including regularization of the vocabulary of facet theory, and preliminary work to create a set of functional requirements that would make facet theory more accessible to developers. Together, workable and agreed-upon operational definitions and functional requirements would make inclusion of facet theory possible in Semantic Web standards and further current work in automating ontology generation from existing vocabularies. It would also be useful to conduct rigorous analysis of facet use in the context of Semantic Web applications in order to identify exemplars or successful adaptations and extensions. Along the way renewed attention might be directed to seemingly forgotten or little known work, such as the view-based systems of Pollitt (1997), and work in the Indian tradition such as Bhattacharyya’s (1979) POPSI (POstulate based Permuted Subject Index) system and Classaurus (1982), especially in the context of semantic annotation and ontology building. Tudhope and others are answering Vickery’s question with another: Is it possible for an ontology to provide rich semantic relationships through more robust use of facet theory? Each of the suggestions in this paper represents a step toward building a more robust understanding of the implications and potential power of facet theory on the Semantic Web and calls upon facet theorists to work towards this internationally focused agenda. Bibliography Aitchison, J., UNESCO thesaurus: a structured list of descriptors for indexing and retrieving literature in the fields of education, science, social science, culture and communication, Paris, Unesco, 1977. Aitchison J., Clarke S.D., “The thesaurus: A historical viewpoint with a look to the future”, Cataloging & Classification Quarterly, vol. 37, n° 3, 2006, p. 5-21. Aitchison J., Gomersall A., Ireland R., “Thesaurofacet: A Thesaurus and Faceted Classification for Engineering and Related Subjects”, Whetstone, En. Elec. Co., 1969.


A semantic (faceted) web?

127

Anderson J., Ranie L., “The fate of the Semantic Web”, Imagining the Internet Project, Pew Internet & American Life, http://pewinternet.org/Reports/2010/Semantic-Web, 4 May 2010. Bates M., “How to use controlled vocabularies more effectively in online searching”, Online Review, vol. 12, n° 6, 1988, p. 45-56. Bhattacharyya G., “POPSI : Its fundamentals and procedure based on a general theory of subject indexing languages, in Library Science with a slant to Documentation”, vol. 16, n° 1, 1979, p. 1-34. Bhattacharyya G., “Classaurus: Its fundamentals, design and use”, in Universal classification, subject analysis and ordering systems, Proceedings of the 4th International Study Conference on Classification Research, Frankfurt, Indeks Verlaag, 1982. p. 139-152. Bliss H. E., The organization of knowledge and the system of the sciences, New York, Henry Holt, 1929. British Standards Institute (BSI), Structured vocabularies for information retrieval — Part 2: Thesauri, London, BSI, 2005. Broughton V., “The need for faceted classification as the basis of all methods of information retrieval”, Aslib Proceedings, vol. 58, n° 3, 2006, p. 49-72. Broughton V., Slavic A., Facet analytical theory in managing knowledge structures for the humanities (FATKS), 2004, http://www.ucl.ac.uk/fatks/fat.htm. Brownson H., “Research on handling scientific information”, Science, vol. 132, n° 3444, 1960, p. 1922-1931. Clarkson E.C., Navathe S.B., Foley J.D., “Generalized formal models for faceted user interfaces”, Proc. of the 9th ACM/IEEE-CS Joint Conf. on Digital Lib., 2009, p. 125134. Classification Research Group, “The need for faceted classification as the basis of all methods of information retrieval”, Proceedings of the International Study Conference on Classification for Information Retrieval, London, Aslib, 1957, p. 137-147. DBpedia website, available: http://dbpedia.org. de Grolier E., A study of general categories applicable to classification and coding in documentation, Translation by the National Science Foundation, Deventer Holland, Ysel Press/UNESCO, 1962, p. 15, Doctorow, C., Metacrap: Putting the Torch to Seven Straw-men of the Meta-utopia, Version 1.3, 26 August 2001, http://www.well.com/~doctorow/metacrap.htm. Flamenco search interface project website: http://flamenco.berkeley.edu/index.html. Foskett D. J., Subject Approach to Information, 3rd ed., London, Clive Bingley, 1977.


128

LCN n° 3/2010. Organisation des connaissances et web 2.0

Fuentes-Lorenzo D., Morato J., Gomez J.M., “Knowledge management in biomedical libraries: A semantic web approach”, Information Systems Frontiers, vol. 11, n° 4, 2009, p. 471-480. Hearst M.A., “Next generation Web search: Setting our sites”, IEEE Data Engineering Bulletin, special issue on Next Generation Web search, 2000, p. 1-11. Hearst M.A., “Design Recommendations for Hierarchical Faceted Search Interfaces”, ACM SIGIR Workshop on Faceted Search, 2006, available: http://flamenco.berkeley.edu/papers/faceted-workshop06.pdf. Hearst M.A., Search user interfaces, Cambridge, Cambridge University Press, 2009. Hendler J., “Dark side of the semantic web”, Intelligent Systems, vol. 22, n° 1, 2007, p. 24. Hypermedia Research Unit website, STAR project http://hypermedia.research.glam.ac.uk/kos/star/publications.

publications,

available:

ISO 2788:1986 Guidelines for the establishment and development of monolingual thesauri, Geneva, Switzerland, 1986, and ISO 5964:1985 Guidelines for the establishment and development of multilingual thesauri, Geneva, Switzerland, 1986. ISO/DIS 25964-1, Thesauri and interoperability with other vocabularies, Part 1: Draft for comment: Thesauri for information retrieval. British Standards Institution, London, BSI, 2010. ISO/ IEC 13250:2003, Topic Maps 2nd ed., available: http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0322.htm. ISO/IEC 13250:2007, Topic Maps: XML Syntax, available: http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0322.htm. Kashyap M., “Similarity Between the Ranganathan’s Postulates and Chen’s Entity Relationship Approach to Data Modelling and Analysis”, DESIDOC Bulletin of Information Technology, vol. 21, n° 3, 2001, p. 3-16. Kobilarov G., Dickinson I. Humboldt, Exploring Linked Data, Hewlett-Packard Labs Technical report, 2008, HPL-2008-23, available: http://www.hpl.hp.com/techreports/2008/HPL-2008-23.html. La Barre K., The Use of Faceted Analytico-Synthetic Theory in the Practice of Website Construction and Design, Doctoral thesis, Indiana University, 2006. La Barre K., “Facet analysis”, Annual review of Information Science and Technology, vol. 44, 2010, Medford NJ, Information Today, p. 243-284. Li et al., Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia, In Proceedings of the 19th international conference on World Wide Web, 2010, p. 651-660, http://doi.acm.org/10.1145/1772690.1772757.


A semantic (faceted) web?

129

Maniez J., « Des classifications aux thésaurus : du bon usage des facettes », Documentaliste Sciences de l’information, 1999, vol. 36, n° 4-5. Markoff J., “Entrepreneurs See a Web Guided by Common Sense”, New York Times, , http://www.nytimes.com/2006/11/12/business/12web.html?_r=1, 12 November 2006. Marshall C., Shipman F., “Which Semantic Web?”, Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, New York, ACM, 2003, p. 57-66. Miles A.J., Facets in SKOS-Core 1.0, email correspondence to org.w3.public-esw-thes, 16 March, 2004, available: http://markmail.org/message/5p7ojoriglxovycu. Murry P., Faceted Classification List, “Viewpoint (scope of FC list)”, message 97, 10 January 2003, available: http://www.oclc.org/research/activities/fast/default.htm NISO Z39.19-2005 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, 2005, http://www.niso.org/kst/reports/ standards/ OCLC FAST website, Faceted Application of Subject Terminology, http://www.oclc. org/research/activities/fast/default.htm Ossenbruggen van J.R., Amin A.K., Hildbrand M., “Why evaluating Semantic Web applications is difficult”, Proceedings of the CHI Semantic Web User Interface Workshop 2008, p. 1-4, http://repository.cwi.nl/search/fullrecord.php?publnr=12425. Otlet P., Traité de documentation, Bruxelles, Editions Mundaneum, 1934. Park J., “XML Topic Maps: Creating and using Topic Maps for the Web”, Boston, Addison, 2003. Perez-Carballo J., “Semi-automated identification of categories from large text corpora”, Academy of Information and Management Sciences Journal, vol. 12, n° 1, 2009, http://www.thefreelibrary.com/Academy+of+Information+and+Management+Sc iences+Journal/2009/January/1-p52151, retrieved 10 July 2010. Petersen T., Art and Architecture Thesaurus, Oxford, Oxford University Press, 1990. Pollitt A.S., “The Key Role of Classification and Indexing in View-based Searching”, Presented at the 63rd IFLA General Conference, 1997, http://www.ifla.org/ IV/ifla63/63polst.pdf. Prasad A.R.D., Guha, N., “Concept naming vs concept categorization”, Online Information Review, vol. 32, n° 4, 2008, p. 500–515. Priss U., “Facet-like structures in Comp. Science », Axiomathes, vol. 18, 2008, p. 243255. Ranganathan S. R., Colon Classification, Madras, Madras Library Association, 1933. Ranganathan S. R., Prolegomena to library science, New York, Asia Publishing, 1937 and 1967.


130

LCN n° 3/2010. Organisation des connaissances et web 2.0

Ranganathan S. R., “Library Classification, a discipline”, Proc. of the Int. Conference on Classification for Information Retrieval, Dorking, England, London, Aslib, 1957, p. 3-14. Ray E., Learning XML, 2nd ed., Sebastapol, Calif., O’Reilly, 2003. Shirky C., The Semantic Web, Syllogism, and Worldview, [First published on the Networks, Economics, and Culture mailing list.], 2003, http://www.shirky.com/writings/ semantic_syllogism.html. Sigel A., “KO as a use case for TMs”, XML Topic Maps: Creating and using topic maps for the Web, 2003. SKOS (Simple Knowledge Organization System), http://www.w3.org/2004/02/skos/, 2004. SKOS-Core RDF Schema, 2004, available http://www.w3.org/2004/02/skos/core. SKOS primer, 2009, available: http://www/w3C.org/TR/skos-primer. Slavic A., “Faceted Classification: Management and Use”, Axiomathes, vol. 18, n° 2, 2008, p. 257-271. Spiteri L., “A simplified model for facet analysis”, Canadian Journal of Information and Library Science, vol. 23, 1998, p. 1-30. Soergel D., Organizing Information: Principles of Database and Retrieval Systems, San Diego, Calif., Academic Press, 1985. Soergel D., Lauser B., Liang A., Fisseha F., Keizer J., Katz S., “Re-engineering thesauri for new applications”, Journal of Digital Information, vol. 4, n° 4, 2004, available: http://journals.tdl.org/jodi/article/viewarticle/112/111 . Stoica E., Hearst M.A., and Richardson M., “Automating Creation of Hierarchical Faceted Metadata Structures”, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Association for Computational Linguistics, 2007, p. 244-251, http://flamenco.berkeley.edu/papers/castanet.pdf. Sowa J., NCITS T2 Com listserv posting on Information Interchange and Interpretation, available: http://www.jfsowa.com/ontology/gloss.htm Tudhope D., personal correspondence, public-esw-these message 0051, 2004, available: http://lists.w3.org/Archives/Public/public-esw-thes/2004Feb/0051.html. Tudhope D., personal correspondence, public-esw-thes, message 0063, 2004, available: http://lists.w3.org/Archives/Public/public-esw-thes/2004Feb/0063.html. Tudhope D., Binding C., personal correspondence, message 0046, on public-esw-thes, 2004, http://lists.w3.org/Archives/Public/public-esw-thes/2004Feb/0046.html. Tudhope D., Binding C., “Faceted thesauri”, Axiomathes, vol. 18, n° 2, 2008, p. 211-212. Van Dijck P., eXchangeable Faceted Metadata Language website, available http://petervandijck.com/xfml/.


A semantic (faceted) web?

131

Vickery B., Faceted classification: A guide to construction and use of special schemes, London, Aslib, 1960. Vickery B., “Faceted classification schemes”, Rutgers Series on Systems for the Intellectual Organization of Information, vol. 5, New Brunswick, NJ, Graduate School of Library Science at Rutgers University, 1966. Vickery B. C., “Faceted Classification for the Web”, Axiomathes, vol. 18, n° 2, 2008, p. 145-160. W3C (World Wide Web Consortium) website About the W3C, available: http://www.w3.org/Consortium/. W3C website, RDF/Topic Maps interoperability proposals, 2006, available: http://www.w3.org/TR/rdftm-survey/. W3C website, Semantic Web, available: http://www.w3c.org/standards/semanticweb/. W3C website, XML is Ten!, http://www.w3.org/2008/xml10/xml10-pressrelease. Wille R., “Concept lattices and conceptual knowledge systems”, Computers and mathematics with applications, vol. 23, n° 6-9, 1992, p. 493-515. WordNet website, available: http://wordnet.princeton.edu/ . XML Schema Part 2: Datatypes 2nd ed., 2004, Section 2.1 Datatypes, Section 2.4: Facets, available: http://www.w3.org/TR/xmlschema-2/. Yee K.P., Swearingen K., Li K., Hearst M.A., “Faceted metadata for image search and browsing”, In CHI ’03: Proceedings of the SIGCHI conference on Human factors in computing systems, New York, ACM, 2003, p. 401–408.


A semantic faceted web?