Issuu on Google+

METADATA MODELS Human-enhanced time-aware multimedia search

CUBRIK Project IST-287704 Deliverable D2.1 WP2

Deliverable V1.0 – 31 March 2012 Document. ref.: cubrik.D2.1.POLMI.WP2.V1.0


Programme Name: ...................... IST Project Number: ........................... 287704 Project Title:.................................. CUbRIK Partners:........................................ Coordinator: ENG (IT) Contractors: UNITN, TUD, QMUL, LUH, POLMI, CERTH, NXT, MICT, ATN, FRH, INN, HOM, CVCE, EIPCM Document Number: ..................... cubrik.D21.POLMI.WP2.V.1.0.doc Work-Package: ............................. WP2 Deliverable Type: ........................ Report Contractual Date of Delivery: ..... 31 March 2012 Actual Date of Delivery: .............. 31 March 2012 Title of Document: ....................... D2.1 Metadata Models Author(s): ..................................... Alessandro Bozzon (POLMI), Martha Larson (TUD), Piero Fraternali (POLMI) Roula Karam (POLMI), Patrick Aichroth (FRH), Luca Galli (POLMI), Ismail Sengor Altingovde (LUH), Apostolos Axemppoulos (CERTH), Naeem Ramzan (QMUL), Uladzimir Kharkevich (UNITN) Approval of this report ............... Summary of this report: .............. Definition of a conceptual model for CUbRIK applications; survey of existing metadata representation formats. History: .......................................... Keyword List: ............................... Availability .................................... This report is: public

This work is licensed under a Creative Commons Attribution-NonCommercialShareAlike 3.0 Unported License. This work is partially funded by the EU under grant IST-FP7-287704

CUbRIK Metadata Models

D2.1 Version 1.0


Disclaimer This document contains confidential information in the form of the CUbRIK project findings, work and products and its use is strictly regulated by the CUbRIK Consortium Agreement and by Contract no. FP7- ICT-287704. Neither the CUbRIK Consortium nor any of its officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-ICT-2011-7) under grant agreement n째 287704. The contents of this document are the sole responsibility of the CUbRIK consortium and can in no way be taken to reflect the views of the European Union.

CUbRIK Metadata Models

D2.1 Version 1.0


Table of Contents EXECUTIVE SUMMARY

1

1.

2

PLATFORM MODEL 1.1 UNCERTAIN INFORMATION REPRESENTATION IN CUBRIK 1.2 CONTENT AND CONTENT DESCRIPTION MODELS 1.2.1 The Content Model 1.2.2 The Content Description model 1.2.3 Entity Model Description 1.3 PROVENANCE AND RIGHTS MODEL 1.4 USER AND SOCIAL MODELS 1.5 ACTION MODEL 1.6 GAMING MODEL 1.7 CONFLICT RESOLUTION MODEL 1.8 CONTENT PROCESSING MODEL

4 5 5 9 13 15 20 24 26 29 34

2. SURVEY OF EXISTING REPRESENTATION FORMATS FOR MULTIMEDIA CONTENTS 2.1 RICH MULTIMEDIA REPRESENTATION FORMATS 2.1.1 Descriptions on different levels of abstraction and different realizations 2.1.2 Temporal and spatial linking and aggregation 2.1.3 Relationships between items 2.1.4 Key Case: The Rich Unified Content Description (RUCoD) Format 2.2 CONTENT LICENSE / COPYRIGHT REPRESENTATION FORMATS 2.3 SOCIAL AND USERS REPRESENTATION FORMATS 2.4 GAMING REPRESENTATION FORMATS 2.5 CONFLICT RESOLUTION REPRESENTATION FORMATS AND CONFLICT MANAGEMENT 3.

TOWARD POTENTIAL FUTURE EXTENSIONS

BIBLIOGRAPHY

CUbRIK Metadata Models

36 36 37 38 39 40 43 44 48 49 52 55

D2.1 Version 1.0


Executive Summary The Metadata Models (D2.1) deliverable is a product of the activities developed within WP2 (Requirements and Models), which is in charge of defining the functional specifications and the underlying data models of the CUbRIK platform. In the context of WP2, this document has a twofold purpose: 1. To define the CUbRIK Platform Model. The Platform Model is the data model on which the platform components are founded. It describes the logical structure of data processed by the platform in terms of entities and relationships following the UML class diagram modeling approach. Chapter 1 details the proposed model, and it provides a unified vision of the main concepts managed by the platform, thus serving as a common dictionary for all the components and artifacts handled by CUbRIK applications. 2. To review the state-of-the-art in the metadata representation format of interest for the CUbRIK platform. Chapter 2 describes the main metadata representation formats for the platform model sub-domains, assessing their suitability to represent the metadata handled by the CUbRIK platform and applications. This deliverable addresses the requirements captured during the early phases of the project. However, further specifications may emerge as the project proceeds; we therefore acknowledge the possibility of updates in the platform model definition by dedicating Chapter 3 to the description of potential extensions.

CUbRIK Metadata Models

Page 1

D2.1 Version 1.0


1.

Platform Model

In this chapter, we define the Platform Model of the CUbRIK platform. The Platform Model is the data model on which the platform components are founded. It describes the logical structure of data processed by the platform in terms of entities and relationships following the UML Class Diagram model approach. The Platform model embodies a coherent configuration of the components in the system, such that the business requirements captured in the early phase of the project are satisfied. It includes the information model that external actors relate to, i.e. content providers, content owners, users, etc. Understanding the Platform Model is essential to anyone developing an application on top of the CUbRIK Platform. The Platform Model is composed of seven parts, which taken together give a conceptual view of the essential objects that make up a human computation platform for multimedia content processing and querying. Figure 1-1 gives an overview of organization of the Platform Model, which comprises: The Content and Content Description Model: it contains the concepts that denote the multimedia assets (e.g. images, video, etc.) analyzed and queried by the platform, and the metadata (aka annotations) that describe those objects. The Provenance and Rights Model: zooms in a specific aspect of the metadata that characterize content assets: their provenance and the access right for their utilization,. The User and Social Model: introduces humans into the conceptual model, by expressing the roles they can play as content consumers and performers of human computation tasks. Furthermore, the embedding of humans in social networks is considered by modeling the relations of users to communities and the most common properties that characterize a social activity profile. The Action Model: describes the range of actions that a user can perform; its central notion is that of Task, which is a unit of work that can be executed, with a variety of approaches, to solve a problem. The Gaming Model: focuses on a specific class of actions, which are deployed in the form of a game with a purpose (GWAP) and expresses the engagement and rewarding mechanisms typical of gaming. The Conflict Resolution Model: expresses the purpose of Tasks, that is, the aspects associated with the reduction of the uncertainty about content annotations by means of human or automatic actions. The Content Processing Model: focuses on the technical aspects of executing human and automatic tasks, by providing an essential description of pipelines and of the components that constitute them. The idea, however, is to keep the model lightweight, not to duplicate the information that can be represented by a full-fledged orchestration language, e.g., by BPEL or BPMN, which remain the primary means for describing the workflow structure of a content processing, query and feedback pipeline. Section 1.1 provides a brief introduction to a crosscutting modelling aspect of CUbRIK, i.e. the representation of uncertain information. The Content and Content Annotation Model, discussed in Section 1.2 deals with the content being managed by the platform, and specifies a taxonomy for content items, how content items relate to each others, and how they can be described through metadata. The Provenance and Rights Model, discussed in Section 1.3, deals with the structure of the permissions that content providers grant to the platform in order to analyze and view content. The User and Social Model (discussed in Section 1.4) provides a taxonomy of the types of users who interact with the platform, and discusses the interactions that users have with communities and other users in the social space. The ActionModel, in Section 1.5 describes the actions performed by users, while the Gaming Model of Section 1.6 presents the concepts needed for deploying tasks in the form of GWAPs. The Conflict Resolution Model, described in Section 1.7 deals with human-executed tasks

CUbRIK Metadata Models

Page 2

D2.1 Version 1.0


and interaction, so as to define how users interact with human computation within CUBRIK. Finally, Section 1.8 describes the Content Processing Model, which is the conceptualization of the main CUBRIK architectural artifacts that are involved in the content processing activities. Please notice that, throughout the document, all the diagrams will adopt the following colour convention to denote the domain of the represented entities: •

WHITE: entities belonging to the Content Model Taxonomy

YELLOW: entities belonging to the Content Description Model

DARK GREEN: entities belonging to the Provenance and Rights Model

LIGHT GREEN: entities belonging to the User Taxonomy

ORANGE: entities belonging to the User and Social Model

GREY: entities belonging to the Action Model

PINK: entities belonging to the Gaming Model

CYAN: entities belonging to the Conflict Resolution Model

VIOLET: entities belonging to the Content Processing Model

Figure 1-1 Bird's-eye view of the CUBRIK platform model.

CUbRIK Metadata Models

Page 3

D2.1 Version 1.0


1.1

Uncertain information representation in CUBRIK

An important issue when dealing with human and automatic computation applied to multimedia content is the management of uncertain information, because both algorithms and user’s contributions are approximate and their trust level can be appraised only probabilistically. Uncertainty can be related to several concepts in the system, and, typically, is the result of an approximate approach to the determination of a given fact. For instance, textual annotations produced by automatic classification algorithms are commonly associated with a trust value, i.e. a number that estimates the correctness of the given classification: in SIFT image similarity processing, the trust value can be the percentage of the characteristic points of the query image that have been identified in the target image. At query time, search engines evaluate the relevance of the query w.r.t to a set of content objects, describing it with a numeric value; Human Computation performers can be selected according to their fitness to the execution of a given task. Figure 1-2 describes how uncertainty is modelled within CUBRIK. Under the generic term of Confidence, we define the uncertainty degree associated with a piece of information, and we allow such degree to be expressed as a Confidence Value (e.g. 0.8), as a Confidence Interval (e.g. [0.6,0.8]), or as a Probability Distribution of a given type (e.g. normal, Poisson, etc.).

Figure 1-2 Modelling Uncertain information in CUBRIK

CUbRIK Metadata Models

Page 4

D2.1 Version 1.0


1.2

Content and Content Description Models

The CUbRIK project will provide a platform for multimedia processing enhanced by human computation. The aim of modelling such a platform using UML class diagrams is to be able to develop in an integrated way the different components and pipelines that extract knowledge from multimedia assets based upon a coherent view of data and metadata. The Content And Content Description Model described in this section is inspired by the RuCOD content description model, as defined in the ISearch FP7 Project [Daras,2012], the PHAROS content description model, as defined in the PHAROS FP6 Project [PHAROS], and the MPEG-7 [Nack1999] and Media RSS1multimedia metadata formats. The Content and Content Description Model aims at providing the minimum set of concepts and relationships able to cover the main requirements of content description and representation of CUbRIK, and it can be split into three main sub-models: •

The Content Model (Section 1.2.1), which describes 1) the concept of ContentObject, i.e. an abstract entity representing a piece of information that can be accessible through some kind of storage system; 2) the relationships that may occur between ContentObjects; and 3) a taxonomy of the main type of ContentObjects that can be managed by CUbRIK. The Content Model is depicted in Figure 1-3;

The Content Description Model (Section 1.2.2), which describes 1) the concept of Annotation, i.e. a metadata about a given ContentObject; 2) how the description of contents can be split in order to represent spatial/temporal/logical segments; 3) how Annotations can be aggregated and structured so as to represent complex organization of metadata. The ContentAnnotationModel is depicted in Figure 1-5;

The Entity Model (Section 1.2.3), which describes the organization of semantically well-defined classes of metadata that correspond to “real world objects or occurrences”. The Entity Model is depicted in Figure 1-7 ; To ease the discussion, the description of the models will be accompanied by a running example that exemplifies a query to a video collection dealing with “Obama Speeches in 2012” (Figure 1-4 and Figure 1-6) •

1.2.1

The Content Model

The model element ContentObject (shown in Figure 1-3) denotes an abstract entity representing a piece of information that can be accessible through some kind of storage system (e.g. relational, no-sql, or graph databases). Each Content Object (for example the Obama2012Video1, Obama2012Video2, O2012Video1Metadata, and O2012Video2Metadata objects Figure 1-4) is defined by: •

ID: Identifier to uniquely refer to a piece of content; it is mandatory.

MediaLocator: String that unambiguously identifies the location of the ContentObject in the storage system or its URI (for example Entitypedia.org/media/video/video1.mpg).

Descriptions: the set of ContentDescriptions (i.e. structured collections of Annotations) that describe the ContentObject. In the example of Figure 1-4, the Obama2012Video1 object is described by the O2012Video1Descriptor object. More details about ContentDescriptions are provided in Section 1.2.2.

Provider: the CUbRIKContentProvider that owns the ContentObject. This could be a web provider such as YouTube or Google Images or a person/organization. More explanation is given in the Provenance and Rights Model of Section 1.3.

Collections: an attribute denoting the Content Collections a ContentObject belongs to. A Content Collection is a meaningful aggregate of ContentObjects, and it is

1 http://en.wikipedia.org/wiki/Media_RSS CUbRIK Metadata Models

Page 5

D2.1 Version 1.0


described by an ID and a Name. A Content Collection can be defined a priori by a content provider (e.g., the set of freely available videos vs. the set of premium videos), by a system administrator of the CUbRIKPlatform (e.g. all the music objects), or by a user, for instance, when querying the system. Figure 1-4 contains an example of QueryResultList, i.e. a collection of ContentObjects created after the execution of a user query. We assume the collection ObamaSOUR to be created after the submission of the “the Obama speech on the state of the union 2012” query. •

StatusHistory: reference to the StatusHistory objects that recollect the lifecycle of a ContentObject in terms of verification quality. More explanation is given in the Provenance and Rights Model of Section 1.3.

Permissions: an attribute denoting the permissions of a given ContentObject (e.g. read only, store, transmit, etc.) as granted by its CUbRIKContentProvider. In the running example, Obama2012Video2 is a restricted version of Obama2012Video1: O2012Video1Permissions can be read/edit/store/transmit and O2012Video2Permissions can be restricted to read/store only. More information about content permissions in Section 1.3 (Provenance and Rights model).

MimeType: specifies the actual file format of the content object (e.g. mpeg2, MP3, etc.)

Provenance: an attribute denoting the provenance history (sources) of each content object for the purpose of estimating its data quality (trustworthiness and reliability). In the running example, objects come from the EntityPedia database. The Provenance (EntityPedia) can be different from the Provider (YouTube or SpeechesVideoProvider1). More explanation is given in the Provenance and Rights Model of Section 1.3 The Content Model Taxonomy of Figure 1-3 also develops a categorization of the type of ContentObjects that will be processed by the CUbRIK platform; the taxonomy includes (but it is not limited to): AudioVisualObjects such as AudioObjects (e.g. music tracks, speeches, etc.), VideoObjects and ImageObjects; TextualObjects (e.g. web documents, wiki, blogs); and StructuredDataObjects such as SyndicationObjects, i.e. objects used to publish information about contents or content collections (e.g. RSSObjects, or MediaRSSObjects), or -DescriptionObjects, i.e. objects that contain descriptions – in terms of annotations - of ContentObjects (e.g. MPEG7Objects, objects that contain annotations encoded according to the MPEG7 [Nack1999] standard, or RuCoDObjects, objects that contain annotations encoded according to the RuCoD format [Daras,2012]). ContentObjects can be connected to each other; such a connection, that materializes in ContentRelationship objects, may be motivated by the presence of physical or logical relationships, possibly existing before the management within the CUbRIK platform. An example of such pre-existing relationships is the one that connects a VideoObject with the HTMLObject that represents the HTML page that contained its description. An example of relationship created within the CUbRIK platform is the one that connects a VideoObject with the thumbnails (i.e. ImageObjects) generated from its segmentation. The ContentRelType enumeration of Figure 1-3 contains a (non-exhaustive) list of the relationships that can occur between ContentObjects within CUbRIK. The relationship of a video with its HTML page of origin is an example of CrawledDescription relationship; the relationship of a video with its thumbnails is an example of DerivedObject relationship. When two ContentObjects represent alternative versions of the same physical object, then they are related by a VersionOf relationship (e.g. a down-sampled version of the video or its audio tracks). The DescriptionOf relationship describes a connection between a ContentObject (e.g. a video) and another ContentObject that contains its metadata. For instance, in Figure 1-4, the O2012Video1Metadata and the O2012Video2Metadata are two RuCoDObjects that contain the annotations respectively associated with the Obama2012Video1 and Obama2012Video2 objects. Finally, the DuplicateOf or PerceptualDuplicateOf respectively represent duplicate or near-duplicate relationship between objects. In the running example of Figure 1-4, the two ContentObjects Obama2012Video2 and Obama2012Video1 are associated through a VersionOf •

CUbRIK Metadata Models

Page 6

D2.1 Version 1.0


ContentRelationship as the former video is a different version of the latter.

Figure 1-3 Content Model Taxonomy

CUbRIK Metadata Models

Page 7

D2.1 Version 1.0


Figure 1-4 ContentModelTaxonomy: Obama Speech Example CUbRIK Metadata Models

Page 8

D2.1 Version 1.0


1.2.2

The Content Description model

The Content Description Model comprises a set of entities and relationships that express knowledge about a ContentObject. This knowledge, typically expressed as a metadata Annotation, can be automatically or manually produced, and helps describing the ContentObject for search and retrieval purposes. In essence, the Content Description Model contains entities for 1) content objects description, 2) media segments descriptions, 3) media segments relationships, 4) annotation descriptions, and 5) annotations aggregations. Content Objects Descriptions The Content Description Model supports the description of a ContentObject as a whole and/or as a composite artefact made of parts or of sub-objects. A ContentObject can feature zero or more ContentDescriptions, where each ContentDescription is characterized by a unique ID, and by a Name that help identifying the scope of the description (e.g., the same content can be described multiple times by several parties). In the running example of Figure 1-6, O2012Video1Descriptor object is a descriptor of the Obama2012Speech1Metadata object which, in turn, is the container of the metadata associated with the Obama2012Video1 content object. A ContentDescription is composed by zero or more MediaSegmentDescriptions. A MediaSegmentDescription is the representation of a meaningful content segmentation, and it is characterized by a unique ID, a Title, and a SegmentationCriteria (i.e. a string that uniquely identifies the segmentation process that leads to the creation of the MediaSegmentDescription). In the running example of Figure 1-6, the O2012Video1Descriptor object is composed of five MediaSegmentDescriptions objects of different types. Media Segments Descriptions and Relationships To describe ContentObjects decomposition into media segments, the Content Description Model includes the classes of MediaSegmentDescriptions types that are deemed to be useful for CUbRIK. In more details, two main sub-classes of media segment descriptions are allowed: 1. SpatialSegmentDescriptions, i.e. descriptions of media segments that are defined in the spatial domain of the described ContentObject. An example of spatial segment description is the one identified by the bounding box of a face in an ImageObject; in such a case we talk about an ImageSegmentDescription. 2. TemporalSegmentDescriptions, i.e. descriptions of media segments that are defined in the temporal domain of the described ContentObject. Within TemporalSegmentDescriptions it is possible to identify two additional sub-classes of media segment descriptions: o VideoSegmentDescriptions, i.e. descriptions of media segments that are defined in the temporal domain of a video object. Examples of video segment descriptions are the one identified by a shot detection algorithm (ShotSegmentDescription) or by a scene detection algorithm (SceneSegmentDescription). In the running example of Figure 1-6, the O2012Video1Descriptor object contains three contiguous, non overlapping SceneSegmentDescriptions (O2012Video1Seg1Desc,O2012Video1Seg2Desc,O2012Video1Seg3Desc ) created using a VideoSpeakerTurn shot detection algorithm. o AudioSegmentDescriptions i.e. descriptions of media segments that are defined in the temporal domain of an object that contains an aural part (e.g. a video or a music track). An example of audio segment descriptions are the one identified by a speech/noise/music segmentation algorithm. In the CUbRIK Metadata Models

Page 9

D2.1 Version 1.0


running example of Figure 1-6, the O2012Video1Descriptor object contains two contiguous, non-overlapping SpeechSegmentDescriptions (O2012Video1SpeechSegDesc1,O2012Video1SpeechSegDesc2) created using a AudioSpeakerTurn shot detection algorithm. MediaSegmentDescriptions may be related by means of SegmentRelationships, which, according to the type of involved media segment descriptions, can specialize into three classes: 1. TemporalRelationships: i.e., relationships that involve TemporalSegmentDescriptions. TemporalRelationships can be modelled in several ways. A good example of temporal relationship modelling (and reasoning) is represented by the Allen's interval algebra2. The algebra defines possible relations between time intervals and provides a composition table that can be used as a basis for reasoning about temporal descriptions of events. Examples of temporal relationships are take place before, meets, overlaps, and during. Examples of meets TemporalRelationships are the ones that, in Figure 1-6, relate the SpeechSegmentDescriptions and SceneSegmentDescriptions, as the abovementioned segments provide a full coverage of the video trough nonoverlapping, adjacent temporal segments. 2. SpatialRelationships, i.e., relationships that involve SpatialSegmentDescriptions. SpatialRelationships can also be represented in several ways. For instance, the region connection calculus3 abstractly describes regions (in a Euclidean space, or in a topological space) by their possible relations to each other. Examples of spatial relationships are equal (EQ), partially overlapping (PO), and disconnected (DC). 3. LogicalRelationships, i.e., relationships that involve MediaSegmentDescriptions by describing logical properties like membership, order, etc. Content Annotations Annotations express metadata that describe a ContentDescription, or a MediaSegmentDescription thereof. Annotations related to a ContentDescription object are said to be defined at item-level (i.e., they describe the ContentObject as a whole), while Annotations related to a MediaSegmentDescription are defined at segment-level (i.e., they describe just a part of the ContentObject) Annotation objects are characterized by an AnnotationScheme (which uniquely identifies the type of annotation in a CUbRIK system according, for instance, to the annotation component that generated it), a Name, a CreationTimeStamp, and a textual Description (or the DescriptionURI that points to an external description). The AnnotationQuality attribute identifies a set of AnnotationConfidence objects that state the level of uncertainty associated with the truth-value of an annotation4. Within CUbRIK, the main sub-classes of Annotations are represented: 1. TextAnnotations, i.e., Annotations that contain textual values in a given Language. Examples of TextAnnotations are speech-to-text transcriptions, user comments or tags, etc. Figure 1-6 contains several examples of TextAnnotations (e.g. O2012SSD1SpeechToText1, O2012SSD1SpeechToText2, etc.) extracted through a speech-to-text annotation component. TextAnnotations can further specialize into two sub-classes. o The AnnotationClass entity identifies an annotation type for which the

2 http://en.wikipedia.org/wiki/Allen%27s_interval_algebra 3 http://en.wikipedia.org/wiki/Region_connection_calculus 4 Notice that we assume an Annotation to be associated with several AnnotationConfidence objects, so to model a situation where the same Annotation could be evaluated according to several point of views.

CUbRIK Metadata Models

Page 10

D2.1 Version 1.0


allowed values are constrained by a ClassificationScheme, i.e., a vocabulary or an ontology. o The StructuredTextualAnnotation entity, instead, represents textual annotations that are themselves structured according to a given AnnotationStructureType. Examples of StructuredTextualAnnotations are parse trees or part-of-speech tagging of sentences. 2. Low-Level Features, i.e., Annotations that contain array(s) of numerical values, typically representing the result of a numerical analysis of the content item. Examples of Low-Level Features are image histograms for the tonal distribution of a digital image, or the keypoints extracted by a Scale-invariant feature transform (or SIFT) algorithm. 3. Entities, i.e., semantically defined metadata that correspond to “real world objects or occurrences”. Entity annotations are described in more details in Section 1.2.3. Annotations are created by a CreationAction. In CUbRIK, an Action is an event, happening in a given time span, that involves the interaction with, the processing, or the creation of, Content Objects and Annotations (a more detailed description of Actions is provided in Section 1.5). For instance, in Figure 1-6, the O2012SSD1SpeechToText1 annotation has been created by the CUbRIKSpeechToText45 action, which lasted 10 seconds. Annotations can belong to an AnnotationAggregate, to denote the fact that: 1. Multiple annotations have been created by the same Action (e.g., a set of image tags created by an instance of a GamePlay with an image tagging GWAP), or 2. There is a logical or functional dependency (expressed by the AggregateType attribute) between annotations (DerivedAnnotations and SourceAnnotations attributes). For instance, in Figure 1-6 the O2012SSD1SpeechToText1 and O2012SSD1SpeechToText2 annotations are related to the SpeechRefinement96 AnnotationAggregate object to denote the fact that O2012SSD1SpeechToText2 (“To win the future”) is a refinement (AggregateType=MediaRefinement) of O2012SSD1SpeechToText1 which is incorrect (“To win the suture”).

CUbRIK Metadata Models

Page 11

D2.1 Version 1.0


Figure 1-5 Content Description Model

CUbRIK Metadata Models

Page 12

D2.1 Version 1.0


Figure 1-6 ContentModelDescription_ObamaExample

1.2.3

Entity Model Description

Entities specialize Annotations to express a semantically stronger class of metadata associated with a multimedia object that correspond to “real world objects or occurrences�. Thanks to entities, media objects can be annotated with the places, events, people they refer to. Furthermore, entities constitute a flexible and expandable type system, which makes it possible that media objects can be annotated with metadata of new types, when the need arises, just by integrating novel kinds of entities.

CUbRIK Metadata Models

Page 13

D2.1 Version 1.0


An entity is any object of the real world that is so important to be denoted with a name, for example, the city of Trento or the French Revolution event. Each Entity can also be used as a data type. For instance, the attribute Seat of an Organization is of type Location. This allows “linking” entities of different types. Among the types of entities which will be considered within CUbRIK there are: •

Location. An entity for which the spatial dimension is central. It refers to spatial objects, real entities occupying regions of space (e.g., regions, cities, boundaries, parcels of land, water bodies, roads, buildings, bridges, etc.). Latitude and Longitude are in WGS84 decimal format and Altitude is in meters.

Organization. Corporations, agencies, and other groups of people defined by an established organizational structure are all examples of organizations.

Event. Something that happens at a given location and time, for example a conference, a party or a battle.

• Person. A human being, for instance Obama the president of the USA. Entities can also represent media files, such as: •

Photo. A picture of something or somebody recorded by a camera on light-sensitive material or digital memory. Each photo has a corresponding ImageObject file used to store it. Attributes of photo correspond to those of the original image and which are invariants to possible manipulations on the image (e.g., resizing or color balancing). In particular, we keep track of all the camera settings used to take the photo as well as geo-temporal information. Geo temporal information includes GPS information (latitude, longitude and altitude), direction (reference and orientation) and universal time. Each of the image variations can be encoded as a different image (i.e. stored as a different file) and independently tagged using the tag attribute, which allows selecting part of the image using a bounding box.

Video. Any recording, e.g. a professional movie, a song video clip or other generic clip. Similarly to photos, each video has a corresponding VideoObject file used to store it. Similarly to photos, we allow individual frames to be tagged.

Note that the Entity Model is an open schema model, i.e., in addition to the required attributes, any number of attributes can be added to describe the entity. Moreover, it can be extended with new entity types. For instance, in Figure 1-7 , we show how a new type Monument (any structure erected to commemorate persons or events) can be defined by extending type Location. The entity model uses the Boolean, Integer, Long, Float, String and URL traditional data types for qualifying entity properties. In addition, the Moment and Duration data types are used for temporal properties, which follow the standard ISO 8601 (http://en.wikipedia.org/wiki/ISO_8601) and codify a moment or a duration in time (with 1ms precision) respectively. Moment is encoded as an interval given by two values that specifies the start point (S) and the end point (E) in the timeline. Duration is encoded as two values that specify the minimum (dtm) and the maximum (dtM) amount of time (e.g., a meeting may last about one hour, more or less 10 minutes). The value of entity attributes can be associated with a provenance, confidence, and validity information, as shown in Figure 1-8 , to denote the source and quality of that specific piece of data. Also notice that Entities can also contain references to ContentObject managed by the platform: for instance, the Photo entity contains an attribute Artifact that refer to a ImageObject object. Figure 1-6 depicts several examples of usage for the Entity Model. For instance, the O2012Video1Descriptor object (a ContentDescription), by referring to a State Of The Union speech made by Obama in 2012, is annotated with the State Of The Union 2012 Speech Entity, to represent the specific event. The State Of The Union 2012 Speech annotation is also related with several other entities in the Entity Model: for instance the Barack Obama, Michelle Obama, and Jason Furman entities of type Person are marked as Participants of the Event. CUbRIK Metadata Models

Page 14

D2.1 Version 1.0


Figure 1-7 Entity Model

Entity

Some_Entity +attr_name: attr_value

Provenance_Source +Name: String +Description: String +Created: Moment +Modified: Moment

Provenance_Info

Validity_Info

+attr_value +provenance: Provenance_Source +Confidence: Float +Created: Moment +Modified: Moment

+attr_value +Start: Moment +Stop: Moment +Duration: Duration

Figure 1-8 Entity attribute provenance and validity information

1.3

Provenance and Rights Model

The Provenance and Rights Model serves three main purposes: 1. Tracking of copyright information, to inform users about authorship, usage conditions and licensing (as required e.g. by Creative Commons licenses) CUbRIK Metadata Models

Page 15

D2.1 Version 1.0


2. Communication with rights holders and crowds to track, complement and modify rights, license and provenance information and resolve possible conflicts. 3. To allow the system to address copyright requirements with respect to content approval, storage, annotation, transformation, presentation and distribution to end users and external systems in a partially automated manner. More specifically: • To approve content, by using relevant information (contextual and otherwise) to assess the trustworthiness of the content provider. • To use rules and relevant metadata (e.g. license information) to derive permissions of how content and derived information should be managed in the various platform domains. To solve resulting conflicts by interaction with content providers and users, and afterwards, to communicate permissions to the relevant system domains for interpretation/enforcement. Figure 1-9 depicts the Provenance and Rights Model. With respect to this model, each ContentObject in the system is associated with a ContentProvider (a type of CUbRIKUser; more details about the User Taxonomy in CUbRIK are in Section 1.4). Within CUbRIK, ContentProviders are the class of users that are in charge of providing content to be processed and, eventually, retrieved. A ContentProvider can be a human being, or software that automatically generated a content to be fed to the platform. Creators, instead, are the original authors of a work, and it can also be either a human creator or a component. To mark the distinction, the model contains an attribute isRightsHolder that, when set to “true”, helps identifying a human creator. By acting as a delegate for a Creator w.r.t the CUbRIK platform, ContentProviders are able to assign UsagePermissions either explicitly, or implicitly by providing LicenseInformation (from which permissions are derived from, possibly requesting interaction with the ContentProvider for clarification purposes). The UsagePermission entity specifies a predefined set of possible usages (PermissionTypes), e.g. viewing, deleting, transcoding, cutting. Moreover, in order to gather information about the trustworthiness of ContentProviders, there is the Trustworthiness entity, which defines a trust level (expressed as a confidence value) and a time stamp. The trust level can be determined (and, in case, adapted) depending on the “quality” of the provided content, reputation of the ContentProvider, and considering possible conflicts / cases of misuse. Each ContentObject is also characterized by a ContentProvenance entity, which stores (resp. refers to) information related to the aggregation process, such as: AggregationContext: describes how a ContentObject has been provided to the system, e.g. via user upload or via crawling (determined by the system); • UploadTime: the time a ContentObject has been provided to the system (determined by the system); • Hash: a cryptographic hash used for identifying a ContentObject, ensuring integrity and exact duplicate detection (generated by the system); • Fingerprint: a perceptual fingerprint for perceptual duplicate detection / robust identification (generated by the system); • WorkTitle: the title of the work (provided by the Creator, e.g. extracted via content metadata / license information such as Creative Commons License): • CreationTime: the time of creation of the work (provided by the Creator, e.g. via Creative Commons License); • AttributionURL: the URL that has to be mentioned by users of the work, including the platform (provided by the Creator, e.g., via Creative Commons License) • AttributionName: a name that has to be mentioned by users of the work, including the platform (provided by the Creator, e.g., via Creative Commons License) • SourceURL: in case the provided ContentObject has been derived from another work, this information points to URL of the “original” work (provided by the Creator). The same ContentProvenance information can be shared by different ContentObjects, as the process of content provisioning may involve collections of objects. Moreover, ContentProvenances information can be related to each other by derivation relationships, so to reflect use cases where ContentProviders delegate other ContentProviders about the •

CUbRIK Metadata Models

Page 16

D2.1 Version 1.0


provision of new contents (e.g. when a human content provider delegates the CUbRIK platform), or to express derivative works. Licenses information are described by the LicenseInformation class, which stores information extracted from the provided licenses, which includes: LicenseType: a unique identifier for the type of the license, e.g. “cc-by”, a Creative Commons License type which allows to share, remix and make commercial use of a work under the condition of attribution of the original author • LicenseInfoURL: a URL pointing to information about the license, e.g. http://creativecommons.org/licenses/by/2.0/de/ • LicenseText: a human-readable license text • MorePermissionsURL: a contact point for getting information / negotiating further usage of the work or details beyond the described license (e.g. request for commercial use for a CC license that is only allows non-commercial use) • MachineReadableText: a machine readable version of the license, e.g. in XML format An important part of the provision process is the Approval of the content, i.e. the verification of the content with respect to the claimed ownership and copyrights. The status of approval for a ContentObject can be tracked using Approval entities, which store the approval Level (e.g. “1.0” for fully approved or “0.0” for not approved); thanks to the possibility to define several Approval objects, and to the presence of a date attribute, it is also possible to track the evolutions of a ContentObject approval status. To provide an example of provenance and rights management in CUbRIK, Figure 1-10 describes the following scenario: •

There is a well-known ContentProvider (“Jamendo”), thus having reached the highest possible trust level (“1.0”). • The ContentProvider uploads 2 different ContentObjects (the AudioObject and the VideoObject). • The ContentProvider also provides licenses for the ContentObjects, which contain information about the Creators of the ContentObjects as well as information that is used by system to derive and set actual UsagePermissions. In the given example the system only “knows” two possible UsagePermissions: “transcode” and “view”. Based on the provided licenses (in both cases CC BY) the system associates the existing UsagePermissions with the ContentObjects. • Based on the given License Information of the VideoObject, the system finds a derivation between the VideoObject and the AudioObject, thus establishing a relationship between the ContentObjects. • The Creator of the “Audio Object” is also a registered CUbRIKEndUser, thus the system is able to establish a relationship between the Creator and the system user. • An additional ContentObject (“Transcoded”) is derived from the AudioObject by a SystemComponent (the “Transcoder”). The “Transcoder” has a “transcode” permission (per definition). • Based on the ContentProvenance and License information of the AudioObject, the system derives a new LicenseInformation instance for the Transcoded object. Moreover, there is a CUbRIKEndUser “John Doe” which has the permission to view ContentObjects (per definition). •

CUbRIK Metadata Models

Page 17

D2.1 Version 1.0


Figure 1-9 Provenance and Rights Model

CUbRIK Metadata Models

Page 18

D2.1 Version 1.0


Figure 1-10 An Example of Provenance and Right Model

CUbRIK Metadata Models

Page 19

D2.1 Version 1.0


1.4

User and Social Models

Humans play a central role in CUbRIK. Therefore, there is the need to capture the properties of people who interact inside or at the perimeter of the platform. The CUbRIK platform is mainly devoted to the provision, processing and retrieval of multimedia objects. However, due to the human-enacted nature of several CUbRIK processes, people are also considered as members of social communities that drive, or take part in, computation activities. Figure 1-11 depicts the User Taxonomy model, i.e. a model that details the set of roles that humans can assume within the CUbRIK platform. The main concept in the user and social models is the User, i.e. a human being who plays a role for CUbRIK, but not necessarily in CUbRIK: with this distinction we would like to remark that we are interested in modelling also users that exist outside CUbRIK (i.e. users in a social network) but that, for other reasons (i.e. analysis, context, or dataset creation) are used by CUbRIK. A User is a Person (as described in the Entity Model) and, hence, she inherits all the properties that describe a person (e.g. Name, Surname, Gender, Date_of_Birth). A CUbRIKUser is a User that plays an active role within CUbRIK and, therefore, she has a unique identifier (PlatformID) within the platform, and a standard set of credentials (username and password). CUbRIKUsers can be further divided into three categories: CUbRIKAdministrators, CUbRIKContentProviders, and CUbRIKEndUsers. As their names imply, CUbRIKAdministrators and CUbRIKContentProviders are users that serve as the platform administrators with full control of the system and the provider of some Content Object along with certain usage permissions (see Section 1.3), respectively. The CUbRIKEndUser entity represents the class of CUbRIKUsers that 1. Consume resources (i.e. Content Objects or Annotations) provided by the platform, or 2. Produce new resources, under specific interaction constraints. Consumption and production can occur in two main scenarios: •

Users consume resources when they interact with a CUbRIK instance through a CUbRIKApplication, i.e. a software module that exploits CUbRIK to provide some added value functionalities. Examples of CUbRIKApplication are horizontal search engines - as the ones developed in the application domains of the horizontal demos used to showcase CUbRIK technical capacity in WP5-WP8 – or ad-hoc content analysis and interaction tools, as the ones developed in the vertical user scenarios of CUbRIK (in WP12). Users of CUbRIKApplications are called CUbRIKAppUsers, and they can be characterized by application-specific properties (e.g., the ApplicationUserID).

Users consume/produce resources when requested to solve some human computation tasks for conflict resolution. In such a case, CUbRIKUsers specializes into CUbRIKPerformers. Please notice that, in CUbRIK, games plays a central role both as consumers and producers of resources (i.e.,games can render Content Items or produce new Annotations). Therefore, a GamePlayer is characterized as a specific type of user that plays a game and, therefore, can both consume and produce resources, thus possibly being a CUbRIKAppUser and a CUbRIKPerformer at the same time. More information about how Users, CUbRIKAppUsers, CUbRIKPerformers, and GamePlayers are related with the overall platform model can be respectively found in the Social Model (described next), in the Action Model described in Section 1.5, in the Gaming Model of Section 1.6, and in the Conflict Resolution Model of Section 1.7. •

CUbRIK Metadata Models

Page 20

D2.1 Version 1.0


Figure 1-11 User Taxonomy Model Figure 1-12 depicts another important representation viewpoint for Users, that is, their relationships and interactions in the social space. Please notice that Figure 1-12 always refers to Users, i.e., the topmost user type of the taxonomy in Figure 1-11 ; this design choice is motivated by the need to represent people in their social space regardless of their affiliation to CUbRIK. One or more ConflictResolutionPlatforms, i.e., interaction platforms where user can perform conflict resolution tasks, can coexist at any given time. For the sake of CUbRIK description, SocialNetworks (e.g. Facebook, Google+, LinkedIn, Twitter) are specific types of ConflictResolutionPlatforms. A User can be subscribed to zero or more ConflictResolutionPlatforms; each subscription goes with a ConflictResolutionPlatformMembership, i.e. an entity that contains 1) the main CUbRIK Metadata Models

Page 21

D2.1 Version 1.0


authentication credentials to the social platform (e.g., username, password or AuthenticationToken), plus some metadata information that describes the User in the platform. Examples of such metadata are: •

The SocialPlatformProfile, i.e. the set of personal user details stored within the platform. The SocialPlatformProfile includes the Nickname of the user, a public Profile_URL, plus an open set of SocialProfileAttributes (e.g. birthdate, hometown, preferences, etc.)

A set of PlatformMetrics i.e., measures of the importance of the User within the social network space delimited by her set of friends and acquaintances. In CUbRIK such metrics are named UserNetworkRole, and examples are classic micro level analysis [Wiki:SocialNetworkAnalysis] such as Centrality, Prestige, and Authority.

A set of TopicalAffinities, i.e., topical relationship with a given (set of) topics. An affinity is represented by the UserTopicalAffinity, which embodies pointer to topics described as Entities. Within a ConflictResolutionPlatform, Users are related to each other through UserRelationships of a given UserRelationshipType [XFN1.1SPEC] (e.g., friendship, physical, professional, geographical, etc.). •

Another central concept for a social model is the one of Community, i.e., “A group of interacting people, living in some proximity (i.e., in space, time, or relationship). Community usually refers to a social unit … that shares common values and has social cohesion”5. A Community is characterized by a Name, and by a set of Topics that define the common values that tie together the Users belonging to it (Topics are described by Entities). A CommunityMembership, i.e., an entity that contains some metadata information about the user subscription, describes the affiliation of a User to a Community. Examples of metadata are: •

A set of CommunityMetrics i.e., measures of the importance of the User within the Community;

A set of TopicalAffinities, i.e., topical relationship with a given (set of) topics. Users may have affinities only to a sub-set of the Topics that describe the community, and such affinity can involve other Users. Communities can be real, that is, they belong to a SocialNetwork as SocialNetworkCommunities (e.g. groups), or can be DerivedCommunities, i.e. communities created outside the social platform (possibly cross-platforms) according, for instance, to some topical affinity or geographical perimeter. •

Finally, the User and Social Model allows for a definition of Users’ GlobalNetworkMetrics, i.e., a metric that measures of the importance of a User across social networks and communities, thus allowing the representation of network analysis calculations done over several ConflictResolutionPlatforms.

5 http://en.wikipedia.org/wiki/Community CUbRIK Metadata Models

Page 22

D2.1 Version 1.0


Figure 1-12 User & Social Model

CUbRIK Metadata Models

Page 23

D2.1 Version 1.0


1.5

Action Model

Basic action theory typically describes an action as behaviour caused by an agent in a particular situation6. In CUbRIK, an Action has a more specific meaning: it is an event, happening in a given time span delimited by a StartTime and an EndTime) that involves the interaction with, the processing, or the creation of, Content Objects and Annotations. Each Action can be associated with one or more QualityIndicators values, i.e. values that denote the quality of the given action according to a quality Scheme (e.g. annotation quality, evaluation quality, etc.). Two types of Actions can be identified: •

AutomaticActions, i.e., actions performed by software components like analysis or classification software;

•

HumanActions, i.e., actions performed by Users, possibly on a ConflictResolution platform (including social networks and crowdsourcing platforms).

Human Actions can be executed outside or inside of the CUbRIK platform. The former case embodies actions that are executed on ExternalActivityObjects that are not under the control of the CUbRIK platform; for instance, images or videos published on a social network by a given user through her SocialNetworkProfile. Examples of such a kind of actions are typical social interactions like Rating, Tagging, Commenting, and Bookmarking. These kinds of external actions are modelled because they might provide information that is important to characterize a social network or a User, for instance, through social network analysis techniques. Actions executed within the CUbRIK platform are called CUbRIKActions, and we assume that a CUbRIKEndUser executes them. We identify three main archetypes of CUbRIKActions, respectively Retrieval, Query and Task. The first two are typical examples of interactions performed in CUbRIKApplication, that is querying or consuming a collection of content items; therefore, we assume that Retrieval and Query actions are executed by a CuBRIKAppUser. Tasks, instead, relate specifically to human problem solving activities and, therefore, are executed by CUbRIKPerformers. Actions that produce annotations, like rating, tagging, etc. are currently modelled as Task actions. A GamePlay is a specific type of Task executed implicitly in a gaming application by a GamePlayer. More information about the gaming model is provided in Section 1.6.

6 http://en.wikipedia.org/wiki/Action_theory_%28philosophy%29 CUbRIK Metadata Models

Page 24

D2.1 Version 1.0


Figure 1-13 User Action Model

CUbRIK Metadata Models

Page 25

D2.1 Version 1.0


1.6

Gaming Model

CUBRIK leverages the entertainment capabilities of online games in order to motivate the users of the platform by providing an engaging experience that will be used to solve human computation tasks. In order to provide a full social experience even through game components, it is necessary to model the games that will be used as CUbRIK’s Applications and also to expand the model of users with enriched metadata, to be able to express their interactions among each other and within a game. Moreover to drive the motivation of the players, a model for an achievement system has been provided. The model of the achievement system represents data that is exploited to fulfil several roles: player retention, social challenges and interaction and player profiling. The gaming data model is depicted in Figure 1-14. Game is the core entity for the gaming model. The Mode attribute represents the possible gameplay modes of the game (Single Player, Multi Player, Cooperative...), while the Genre (e.g. Puzzle, Educational, etc. [GamesGenres,2012]) attribute identifies its genre. Associated to a game, an Achievement is defined as a means to foster an entertaining experience for the user and a way to profile him. An Achievement is a specific challenge or task that the player can perform in order to get a reward in terms of points or other special features (in-game items, artworks, behind the scene videos...). Once a player reaches the goals of a listed achievement, he will gain a Badge related to that specific achievement. An Achievement is thus just an abstract concept tied to a game and stating the actions that a player has to perform in order to obtain it, while a Badge is an instance of an achievement obtained by a specific player for a specific game. An Achievement has an Icon which is used to describe it in a visual way, a Category that specifies which kind of task the achievement was associated with (General, Quest, Exploration, Puzzle solving‌), an attribute called PointsGiven which contains the amount of points to be given if the requirements for the achievement have been met and a flag called OfTheDay that is used to define whether or not the achievement is a special achievement to be completed on this specific day in order to obtain virtual goods, more points, levels‌ The gaming model extends the CUbRIKAppUser with a Player entity, in order to accommodate all the social features needed to create a gaming experience, to give him a sense of self accomplishment. An Avatar allows the user to be recognizable using a custom image, while Motto, Biography and Gaming Rig are used to give him a sense of customization and differentiation form the other users and also to enforce his identity. Status is used to show if the player is online, offline, occupied or, if he is playing something, which game is being played. Screenshot and Videos are used to show exciting moments that the player wants to share with the community. A player can also choose his FavouriteGame and stats regarding which games have been played or are being played. Finally in order to motivate the players and profile them, the gaming model conceptualizes the set of game-relevant statistics: the PlayerLevel is used to represent the proficiency and the experience of a player as a means to aggregate in a compact way points gathered, hours spent playing or particular difficult task completed. The ExperiencePoints attribute stores all the points that have been achieved by the player in all the games after completing particular Achievements. ObtainedBadges represent the Achievements that have been unlocked by the player and shown on his profile. PlayerTitle is a special recognition given to the player for his actions, like a chivalry role, while the PlayerType (e.g. Achievers, Explorers, etc. [Bartle,1996]) is used to associate the player with a particular cluster of gamer type.

CUbRIK Metadata Models

Page 26

D2.1 Version 1.0


A GameBadge is used to relate a Player with the Achievement he has obtained. The CompletionPercentage field shows how much the player has already achieved in order to complete a specific task. StartDate and EndDate are the dates in which the player has started to work on the achievement’s goals and the date in which the achievement has been obtained. TrialsN is used to track how many times the user tried to fulfill the achievement’s requirements before reaching the goal. If an achievement requires only a single specific action in order to be obtained, the start and end date will be the same, the completion percentage will just vary from 0% to 100% and the TrialsN field will be set to 1. GameStats are stored in order to keep track of the HoursPlayed by a Player on a specific Game and eventually the Score he has obtained on that particular game. Finally, to associate a specific gaming session with the human computation actions that the user has performed while playing a specific game, the notion of HumanComputation Action is extended with a GamePlayAction, associated with a specific player with Gameplay; this action records the StartDate and EndDate of the gaming session for a player on a specific game and the GamePlay actions performed by the player on that specific time frame.

CUbRIK Metadata Models

Page 27

D2.1 Version 1.0


Figure 1-14 Game Model

CUbRIK Metadata Models

Page 28

D2.1 Version 1.0


1.7

Conflict Resolution Model

The ability to interact with humans for multimedia content processing and querying is one of the main features of CUbRIK. Such an interaction is not performed for the purpose of multimedia search; on the contrary, humans are involved so as to bring the quality of content analysis and querying to a next level, typically by overcoming the limitations of state-of-the-art content analysis components. These kinds of activities are typically referred to as “Human Computation”, i.e., a computation model where the interaction with/among users is harnessed to help in the cooperative solution of tasks; in CUbRIK, “Human Computation” processes are named “Conflict Resolution” tasks, to highlight their usage so to solve conflicts that might arise in the evaluation or analysis of multimedia contents. The Conflict Resolution model is depicted in Figure 1-15 , while Figure 1-16 depicts a running example. As expected, the main concept in the model is the Conflict, i.e. a situation during the analysis of a given ContentObject where absence or contradictions of facts recorded in Annotations may arise. Conflicts typically happen in the following scenario [ChengYu,2006]: •

Missing Annotation: a conflict happens when, during a given AnnotationAction, the performer (an annotation component or a human), is not able to find a suitable Annotation for the analysed content

Uncertain annotation: a conflict happens when, during a given AnnotationAction, the performer creates Annotation having an AnnotationConfidence within a given interval of confidence, thus leaving uncertainty about the truthfulness of such a value. Referring to the running example of Figure 1-16 , an annotation component for face recognition may report the presence of a face with a 0.6 confidence value (the FaceAnnotation1 object in orange); as the confidence is not too low (so to suggest the presence of a false recognition), this annotation can be classified as uncertain. Uncertainty might also arise when the AnnotationAction has been performed within a given interval of AnnotationActionQuality, thus raising doubts about the actual quality of the annotation activity. For instance, this scenario may occur when the AnnotationAction, performed by a CUbRIKPerformer, has been marked (automatically by the system, or manually by another user) as poorly executed.

Inconsistent Annotations: a conflict may happen in merging the Annotations between two different annotators for the same ContentObject. For example, some Annotation could be associated with a high AnnotationConfidence, however, they may lead to a wrong conclusion when they are put together, or they may contradict to a fact that the system might not know yet. Referring to our running example of Figure 1-16 , when two AnnotationComponents for face identification differently classify the bounding box of appearance of a face in the same Image (the FaceAnnotation1 and FaceAnnotation2 objects in orange); According to the definition above, a Conflict is therefore characterized by an ID (identifier), by a (set of) ConflictualContentObjects, and by a (possibly empty) set of related ConflictualAnnotations, i.e., the set of Annotations that generated the conflict. In our running example of Figure 1-16 , the ConflictFaceAnnotationObject1 is related to the conflictual object ImageObject1, and with the conflictual annotations FaceAnnotation1 and FaceAnnotation2, both produced by analysing the ImageObject1. •

When a Conflict occurs, then the CUbRIK system reacts by instantiating the set of humanenacted activities that may bring to the resolution of such Conflict. Such activities are defined in a ConflictResolutionTask, i.e. a MacroTask that consists of a set of humanenacted atomic Tasks. MacroTasks and Tasks are instances of human computation activity archetypes defined in the HUMAN COMPUTATION TASK METAMODEL, the part of the CUbRIK Metadata Models

Page 29

D2.1 Version 1.0


Conflict Resolution Model that expresses the types of the tasks that can be found in CUbRIK (see Figure 1-15 ). A TaskType is the abstraction of a piece of work that needs to be completed by a specific amount of workers in a given period of time. A TaskType is described by an ID (identifier), by a Name, and an Example (a textual description of the activities associated with the type of task). A TaskType typically details some constraints about the execution of the type of task; for instance, the MinDuration and MaxDuration allowed for the overall task execution, the MaxCost allocated for the task, etc. TaskType specialises into MicroTaskType and MacroTaskType. A MicroTaskType represents a unit of human computation activity performed by a given CUbRIKPerformer; in the human computation literature, MicroTaskType can be defined as highly fractioned tasks that do not require specialized skills and can be completed in a small amount of time. A MicroTaskType is characterized by a MicroTaskMetaType, i.e. a specific human computation activity type. Examples of MicroTaskMetaTypes are preference tasks and data manipulation tasks [Bozzon,2012]. The former correspond to typical social interactions (like, dislike, comment, tag), while the latter (create, order, complete, find, cluster) abstract simple and classical primitives of relational query languages that are common in human computation and social computation activities. In the running example of Figure 1-16 we provide an ObjectVerificationMicroTask type, a type of micro task that requires users to tag an object with a true/false value to verify the correctness of a given annotation. A MacroTaskType represents an aggregation of one or more micro tasks, organized in a workflow in order to achieve a high-level goal. An example of MacroTaskTypes is the classical human computation Improve/Edit/Fix pattern, where performers iteratively work to improve the description of a given ContentObject. Within a MacroTaskType aggregation, MicroTaskTypes present precedence relationships that define the order of the micro task. The Improve/Edit/Fix pattern, for instance, can be instantiated by pipelining three Complete micro task. In the running example of Figure 1-16 , a FaceAnnotationVerificationMacroTask type is presented; the macro task type requires the execution of a single micro task (an ObjectVerificationMicroTask), but it also requires a minimum of two performers for its execution, and a maximum duration of 100 seconds. The macro task FaceConflictMacroTask1 of Figure 1-16 is related to the same object of the conflict object ConflictFaceAnnotationObject1 that generated it, and it instantiates the FaceAnnotationVerificationMacroTask type in order to try and resolve the uncertainty produced by FaceAnnotation1 and FaceAnnotation2. A MacroTask is typically executed on one or more ConflictResolutionPlatforms (e.g. social networks or human computation frameworks) by a CUbRIKApp component, that is, a CUbRIK conflict resolution application. The selection of the involved ConflictResolutionPlatforms is done through the application of a given PlatformSelectionStrategy, i.e. a numerical, logical, or heuristic method that decides which are the best platforms to adopt for the solution of a Conflict; for instance, if the conflict resolution task involves the analysis of fashion pictures, then CUbRIK may decide that the best platform to tap for human computation is Facebook, rather than Twitter or LinkedIn. Likewise, a UserSelectionStrategy is a method that decides, for the selected ConflictResolutionPlatforms, which are the best performers to involve in order to satisfy the constraints defined in the MacroTaskType definition. Finally, the ConflictResolutionStrategy decides how to split the execution of Tasks among the selected performers, deciding, for instance, which conflictual Annotations or ContentObjects will be assigned to each performers; moreover, the ConflictResolutionStrategy dictates the result aggregation method (e.g. Majority Vote) to use for creating the final output of a MacroTask, which, typically, consists of one or more new Annotation objects.

CUbRIK Metadata Models

Page 30

D2.1 Version 1.0


The decisions undertaken by the PlatformSelectionStrategy, UserSelectionStrategy, and ConflictResolutionStrategy are directly mapped into the Tasks that compose the given MacroTask, as each Task is executed on a ConflictResolutionPlatform, by a (possibly singleton) set of CUbRIKPerformers, operating on a (possibly overlapping) subset of the ContentObjects and Annotations assigned to the related MacroTask. For instance, in Figure 1-16 a deployment scenario is depicted where the FaceAnnotationVerificationMacroTask is associated with two tasks, FaceVerificationTask1 and FaceVerificationTaks2. Each task is assigned to a distinct performer, respectively CUbRIKPerformer1 and CUbRIKPerformer2, assuming that they were selected within a pool of available performers; we also assume that a majority vote conflict resolution strategy has to be adopted, so that the annotations produced by each FaceVerificationTask will be aggregated in a single, new Annotation object. It is worth noticing that a Task is a specific type of ManualAnnotationActivity, from which it derives several properties, such as the execution StartTime and EndTime, and the Quality; moreover, being a ManualAnnotationActivity, a Task typically results in the creation of one or more new Annotations related to the considered ContentObjects. In addition, a Task relates with a (possibly overlapping) subset of the ConflictualObjects and ConflictualAnnotations of the resolved conflict; the allocation of the ContentObjects and Annotations to a given Task is performed according to the ConflictResolutionStrategy of the macro task. In the example of Figure 1-16 , the FaceVerificationTask1 and FaceVerificationTaks2 tasks are both assigned to the conflictual ImageObject1, but their tasks are devoted to the verification of a single annotation, respectively FaceAnnotation1 and FaceAnnotation2. As a Task can be assigned to several CUbRIKPerformers, each execution is associated with a TaskExecution, a class that contains information about the StartTime, EndTime, and QualityMetrics of the work performed by the single CUbRIKPerformer, plus reference to the Annotations created during the specific execution. In the example of Figure 1-16 , the FaceVerificationTask1 and FaceVerificationTaks2 tasks are respectively assigned to CUbRIKPerformer1 and CUbRIKPerformer2, which performed two task executions, respectively FaceVerificationTaskExecution1 and FaceVerificationTaskExecution2. The former execution produced a new textual annotation FaceVerificationAnnotation2 (highlighted in blue in the picture) that denies the truthfulness of FaceAnnotation1; the latter execution, instead, produced a new textual annotation FaceVerificationAnnotation1 (highlighted in blue in the picture) that confirms FaceAnnotation2. Please notice that an Annotation (or a set thereof) created during a MacroTask, can be, by definition, conflictual, and thus be the source of a new Conflict within the platform. The decision about the right course of action to undertake is typically related to the selected ConflictResolutionStrategy. In our example, the application of a majority vote policy leads to the creation of a new AggregateFaceAnnotation (highlighted in green in the picture) that borrows the face bounding box of FaceAnnotation2, but associating it with a confidence value of 1.0, so to reflect the fact that the annotation has been confirmed by a human verifier. Also notice how the model, through the FaceVerificationAggregate object, allows tracing the provenance of an annotation (in our case, the AggregateFaceAnnotation) by keeping the references to the annotations and actions that lead to its creation.

CUbRIK Metadata Models

Page 31

D2.1 Version 1.0


Figure 1-15 Conflict Resolution Model

CUbRIK Metadata Models

Page 32

D2.1 Version 1.0


Figure 1-16 An example instantiation of the Conflict Resolution Model

CUbRIK Metadata Models

Page 33

D2.1 Version 1.0


1.8

Content Processing Model

CUbRIK aims at providing a platform for multimedia content processing and management. One of the most important processes in the platform is the one devoted to the analysis of multimedia artefacts in order to enable their retrieval through text- or content-based search engines. In this section we describe the main concepts related to the content analysis process, and in Figure 1-17 we provide the Content Processing Model. The content processing model is meant to be a lightweight, technology-independent, description of the content processes that take place in a CUbRIK platform instantiation. More specific descriptions, e.g., the details of the orchestration of components that carry out a given content annotation process, are outside the scope of the CUbRIK data models and can be represented with a full-fledged orchestration language, e.g,, BPEL. As described in Section 1.2, ContentObjects are described by means of (sets of) Annotations, created by AnnotationActions, i.e. actions performed by a CUbRIK actor to change the current state of the platform by modifying the description of some content artefact. An AnnotationAction can be performed manually by some Users, or automatically. In the former case, we talk about a ManualAnnotationAction, performed at content provisioning time (see Section 1.3), or during a conflict resolution (see Section 1.7) or a social/gaming interaction (see Section 1.4). AutomaticAnnotationActions, instead, are executed by platform AnnotationComponents, i.e., software components devoted to the analysis of multimedia artefacts with the purpose of extracting some compact representation useful for search and retrieval. Examples of AnnotationComponents are SpeechToText transcribers, FaceRecognition or FaceIdentification software, etc. The content analysis process in CUbRIK is performed using SMILA, a framework for the management of unstructured information. In SMILA, contents are analysed by executing Pipelines; a Pipeline is a logical container for analysis components that solves the purpose of orchestrating the execution of AnnotationComponents in order to create a ContentObject description. In CUbRIK, Pipelines can also contain CUbRIKApps, conflict resolution applications that allow human interventions in the content analysis process (see example of Figure 1-16 and associated description). Pipelines, AnnotationComponents, and CUbRIKApps qualify as SystemComponents, i.e., software modules of the CUbRIK platform. A SystemComponent is described by a unique identifier ID, a Name, and a set of QualityIndicators, i.e., a set of quantitative attributes that define a specific quality property of the component; each ComponentQuality property is characterized by a Scheme (that indicates the nature of the property, e.g. Precision, Reliability, or others) and a Value that represent the numerical value for the indicator. SystemComponents are also associated with a set of ExecutionPermissions, i.e., rules defined by a CUbRIKContentProvider in order to specify the set of Actions (including AnnotationActions) that can be performed on a given ContentObject. Finally, it’s worth noticing that a SystemComponents may have the ability to create new ContentObjects, thus assuming the role of CUbRIKContentProviders and, hence, defining the permissions of usage for such new artefacts.

CUbRIK Metadata Models

Page 34

D2.1 Version 1.0


Figure 1-17 Content Processing Model

CUbRIK Metadata Models

Page 35

D2.1 Version 1.0


2. Survey of Existing Representation Formats for Multimedia contents In this section, a survey is provided of existing metadata representation formats that can be used to complement and specialize the conceptual models defined in the previous section. The relevant aspects for each model are highlighted and gaps with CUbRIK requirements are pointed out. These formats contributed to the motivation for the CUbRIK Platform Model. Further, they can also be used as ContentObjects within the platforms (cf. MPEG7Objects and RuCoDObjects mentioned above). The next subsections are related to the following models in the overall CUbRIK Platform Model discussed in Section 1: Content and Content Annotation Model, Provenance and Rights Model, User and Social Model, Gaming Model and Conflict Resolution Model.

2.1

Rich Multimedia Representation Formats

Rich representation formats for multimedia are formats that make it possible to describe multimedia content in more than one way. Rich representation is represented in the CUbRIK Platform Model with the Content and Content Annotation Model. An example requiring a rich representation format is rich audio transcripts, which describe an audio signal as spoken sentences, spoken words, different speakers talking to each other, and ambient noise like door slams and applauses. The scope and variety of the information encoded in rich representation formats makes it possible to use the representation within systems that are designed to fulfil a wide variety of requirements such as arise within real-world use scenarios. In this section, we select three aspects of rich multimedia representation that can be considered to be particularly important for multimedia search systems in general and for CUbRIK pipelines and applications in particular. Their importance is related to the increased frequency that they occur as multimedia collections become larger and more diverse. 1. Descriptions of multimedia content can also take place on different levels of abstractions (e.g., low versus high level), but also on different realizations or versions. Different levels of abstraction are particularly important within CUbRIK, since CUbRIK combines both human and machine computation. Machine computation conventionally involves features at a lower level of abstraction and human computation involves features at a higher level of abstraction. These topics will be discussed in Section 2.1.1. 2. We can describe content as a whole, but also in parts. For the latter we need ways of referencing parts of the multimedia item. Referencing parts of multimedia items is important within CUbRIK in order to be able to implement access to time-continuous media such as video and audio with a full range of granularity. Time granularity can be of particular importance when implementing time-aware multimedia search applications. An overview of referencing (possibly multiple) fragments of an item in the temporal and spatial domain is given in Section 2.1.2. 3. Multimedia content items stand in relationships to each other. Examples are duplicate relationships or collection relationships. These aspects are particularly important in CUbRIK since human computation opens up the possibilities for defining new types of relationships not yet in wide use in metadata standards (e.g., various forms of semantic near duplicates) Multimedia relationships are treated further in Section 2.1.3. We end this section (Section 2.1.4) with a key case. RUCoD is a rich multimedia representation format borrowing ideas from MPEG-7 and extending it.

CUbRIK Metadata Models

Page 36

D2.1 Version 1.0


2.1.1

Descriptions on different levels of abstraction and different realizations

We can describe multimedia based on the medium on which it is stored, the physical features (e.g., wave length) the content consists of and how they are perceived (e.g., color), and we can transcribe (e.g., speech transcript) the content. These lower-level descriptions represent largely independent views of the data. Multimedia objects can be also described architecturally by relating lower-level descriptions (e.g., relationships between segments of a multimedia item). Finally, descriptions can be of annotative nature (e.g., comments produced by humans), attached to instances of the description classes just mentioned [Nack1999]. This enumeration demonstrates the different levels of abstraction in existence. The need for supporting both high-level as well as low-level descriptions has been discussed before, e.g. in [Saathoff2010]. As discussed in Section 1.2, “Content and Content Description Model” the CUbRIK Platform Model is inspired by RUCoD, which encodes both lower-level and higherlevel descriptors, as shown in Figure 2-4 . “The RUCoD General format.” RUCoD in turn builds of MPEG-7, which constitutes the classic case of a metadata standard that undertakes to encode metadata representations on multiple levels. Here, we focus on discussing the MPEG-7 standard and how it combines encoding of both high-level and low-level representations. The developers of MPEG-7 realized it was necessary to reconcile approaches from different communities if they were to deliver a standard that addressed interoperability and globalization of metadata resources and data management flexibility [Ossenbruggen2004]. For example, the signal processing community’s main interest lay in the standardization of low-level content features and featuredetection algorithms, while other communities, e.g., the digital library community, stressed the need for high-level descriptions of audio-visual content. In MPEG-7, the strata of descriptions are made possible through Descriptors (D) and Description Schemes (DS). Descriptors deal with the representation of features and are combined using Description Schemes to build richer descriptions. In turn, Description Schemes themselves can be building blocks for other Description Schemes, as shown in the figure below.

Figure 2-1 Relations between Descriptors and Description Schemes [Nack1999b] In order to achieve flexibility, the MPEG-7 standard allows the creation of new Descriptors and Description Schemes that are not covered by the standard through the use of the Description Definition Language (DDL). This enables MPEG7 to cover unforeseen needs.

CUbRIK Metadata Models

Page 37

D2.1 Version 1.0


With the DDL, we can achieve a wider spectrum of descriptions by expanding the world of Descriptors and Description Schemes horizontally, but we can also build on top of existing schemes and expand vertically. The danger is, however, that this flexibility might harm compatibility. A proliferation of new Descriptors and Descriptor Schemes will make it harder for computers to understand MPEG-7 metadata. Computers may not know the semantics of newly introduced Descriptors and an unaddressed need might give rise to several competing but incompatible Descriptors [Nack1999b].

2.1.2

Temporal and spatial linking and aggregation

For rich temporal and spatial multimedia search, it is necessary to be able to refer to parts of multimedia content. This section describes work on this topic. The CUbRIK platform model handles temporal and spatial linking in two classes of MediaSegmentDescriptions types, TemporalSegmentDescriptions and SpatialSegmentDescriptions. A wide variety of metadata standards are designed to encode temporal and spatial decomposition for the purpose of linking or describing media fragments individually. Here, we give a brief overview of standards with functionality similar to that provided by the CUbRIK platform model, putting an emphasis on the representation of temporal decompositions. Media Fragments is a W3C candidate for referring to fragments of multimedia files [MF]. It supports fragments in the temporal as well as in the spatial domain, tracks (e.g., English audio track in a movie), and ids (i.e., named temporal fragments) for and uses URI fragments to accomplish this. An example is listed below. http://host/video.mp4#t=10,20 This example refers to a fragment of the video that starts at the 10th second and ends at second 20. Media Fragments is already supported in WebKit [WebKit], a major browser engine used by, e.g., Safari and Chrome. Annodex is a format for annotating and indexing continuous multimedia files that allows for hyperlinks to and from the media [Annodex]. Annodex hyperlinks can point to and from arbitrary points or intervals in a similar way as Media Fragments using URIs. Annodex only supports fragment in the temporal domain, but is more expressive in this aspect than Media Fragments. It supports concatenation of several intervals. MPEG-21 Part 17 of the corresponding MPEG-21 standard specifies a URI fragment syntax for identifying parts of an MPEG resource [W3CMPEG]. The syntax is rich and complex, but, unlike Media Fragments and Annodex, the syntax only applies to a select few MPEG MIME types. A complex example of denoting a moving region between 10 and 30 seconds is shown below. http://host/video.mp4#mp(/~time('npt','10','30')/~movingregion(rect(0,0,5,5),pt(10,10,t(5)),pt(20,20))) The previous listed approaches identify fragments through URIs. An alternative approach is to use XML. MPEG-7’s TemporalSegmentLocator can describe the location of temporal multimedia [W3CMPEG]. MPEG-7 also provides the TemporalDecompostion and SpatialDecomposition Descriptor Schemes that can be used for the purpose of fragment identification.

CUbRIK Metadata Models

Page 38

D2.1 Version 1.0


Figure 2-2 Relationships between packages in the MXF format. In this way, MSF aligns multimedia content along a timeline (Figure from [MXF]) The Material eXchange Format (MXF) is a wrapper that encodes metadata for multimedia streams. It is an SMPTE standard (Society of Motion Picture and Television Engineers) and it used in the broadcast industry. MXF can carry time code data or any other supplemental data. MXF uses a set of “Material packages” to represent the desired output drawn from a series of “File packages” representing the source material (Figure 2-2 ). Structural Metadata (SM) is a key component of MXF and makes it possible to relate packages along a timeline. In practice, implementations of MXF have encountered interoperability issues. Because of the high complexity of the information that MXF represents, it is difficult to establish consistency across the different varieties of MXF used in different operational settings [MXF]. It can be anticipated in the future that wider spread of the standard will help to overcome these issues. A particular strength of MXF is that it can represent “Real Time” data, e.g., insert information about the parameters of the camera along the video stream. Such information is important for 3D applications that need information about associated depth maps.

2.1.3

Relationships between items

In addition to different levels of descriptions, we can also consider different relationships between items. The CUbRIK platform model represents relationships between items in the Content Model, which allows relationships to be defined between ContentObjects. Here, we discuss the issues involved with representing relationships between items by making reference to the Multimedia Metadata Ontology (M3O) [Saathoff2010]. M3O is a generic modelling framework for annotating rich, structured multimedia presentations. Although the specific emphasis is on representing multimedia for the purpose of creating presentations, the considerations taken into account also apply to multimedia representation in other contexts. One important type of relationship exists between different realizations of multimedia objects, something that is generally not addressed by existing models [Saathoff2010]. Examples of information objects are songs, stories and narrative structures. Examples of different realizations of multimedia content include an image being available in different resolutions and a song being available in different bitrates. Semantically, the different realizations are the same and they may only differ in low-level features. A rich metadata format would be able to distinguish between the concept of a multimedia object and its realizations, such that descriptions can be applied to each. In M3O, information objects are represented separately from information realizations in the “Information Realization Pattern”, meaning that this

CUbRIK Metadata Models

Page 39

D2.1 Version 1.0


difference is encoded in the metadata at a fundamental level (see Figure 2-3 , Part b).

Figure 2-3 Examples from M3O Multimedia Metadata Ontology illustrating the importance of situation and also of the separation between multimedia objects and their realization. Both issues are important for representing relationships among multimedia content items (Figure excerpted from [Saathoff2010]). Different realizations are considered duplicates of each other. However, near duplicates and other semantically related items are also important to represent within multimedia information systems. Near duplicates are items that are semantically related. Annotations can be added that can help to establish items that are semantically related. For example, two items that have been assigned the same semantic class label can be considered similar (for example “History of the Olympics”). However, not all relationships hold for all situations. In this case, it is important to also represent information about the situation in which an item has been labelled. In M3O the “Description and Situation” pattern situation is encoded. In this way, it is possible to make distinctions between the recent history of the Olympic games, and the history of the origins of the Olympic games.

2.1.4

Key Case: The Rich Unified Content Description (RUCoD) Format

In Section 1.2.1 the ContentObject has been defined as an abstract entity representing a piece of information and can have different types, such as Audio-VisualObject, XMLObject, TextualObject and HTMLObject. A format that could be used for content metadata representation is the Rich Unified Content Description (RUCoD) format. RUCoD [Daras,2012] has been introduced in the context of the FP7-funded project I-SEARCH7 as a description format for Content Objects, where Content Object within I-SEARCH is a formal representation of rich media content. RUCoD specification has borrowed several elements from the MPEG-7 standard, especially those related to description of multimedia items. Although RUCoD was initially designed to fit the specific needs of I-SEARCH, it can be easily extended in order to meet the requirements of the CUBRIK content model. An overview of RUCoD basic features is given below. The general form of the RUCoD structure is given in Figure 2-4.

7 http://www.isearch-project.eu/isearch/ CUbRIK Metadata Models

Page 40

D2.1 Version 1.0


Figure 2-4 The RUCoD General format The RUCoD descriptions of Content Objects can be expressed as valid XML files, defined using appropriate XML schema documents. An example of a RUCoD description file, in XML, is given in Figure 2-5 . In the example below, the RUCoD description corresponds to the Content bject entitled “My Barking Bulldog”, and it consists of the following main parts: •

Header: includes general information about the ContentObject, such as the type, name, ID and creation information. Moreover, the RUCoD Header encloses some general information about the different media (3D, images, sounds, videos, text) and accompanying information (real world data, user-related cues) that constitute the ContentObject.

Description: it is the core part of the RUCoD including detailed information about the corresponding media and contextual information (real world, user-related). It consists of a) the L-Descriptors part, where the low-level descriptors, extracted from each separate media (3D, images, sounds, videos, text), are presented and b) the RDescriptors part, which maintains descriptors extracted from real-world sensors, representing time, weather, location, etc. and c) the U-Descriptors part, where descriptors related to the user behaviour (emotions, expressions) are stored.

The interrelations among the several parts of the RUCoD are given in the conceptual model of Figure 2-6 . At the centre of the model is the ContentObject, which has a one-to-one relation with the RUCoD. A CO has one-to-many relations with the constituting media items, while it has one-to-one relation with the user-related information and the real-world information. All the above media and accompanying information are connected with their corresponding descriptors in a one-to-one relation. Finally, each media item may produce one or more artefacts that can be used for descriptor extraction. Artefacts (or derived objects) are usually items that are (automatically) extracted from media in order to assist both descriptor extraction and visualisation. As an example, the multiple views of a 3D object or the multiple key-frames of a video are artefacts that will be used for low-level descriptor extraction. Similarly, a low-resolution image, a snapshot of a 3D object and a representative key-frame are artefacts that can be used for a quick visualisation of images, 3D objects and videos respectively.

CUbRIK Metadata Models

Page 41

D2.1 Version 1.0


Figure 2-5 The overall structure of a RUCoD file

Figure 2-6 Conceptual Model presenting CO and relations with constituting elements.

CUbRIK Metadata Models

Page 42

D2.1 Version 1.0


The RUCoD Header provides information about the ContentObject, the different media associated with it, real-world information, and user-related information. The L-Descriptor, instead, include information about LowLevelAnnotations, clustered according to the content type, and TextualAnnotations. Examples of LowLevelAnnotations are Shape3DDescription contains enclosing descriptors of a specific 3D object, ImageDescription encloses descriptors of a specific image (e.g. Edge Orientation Histogram (Eoh_32), Probability weighted histogram (Probrgb_6_2), Laplacian weighted histogram (Laplrgb_6), HSV standard histogram (Histohsv_std) or SIFT local descriptors (sift_desc)), VideoDescription describes f a set of visual objects present in several key-frames of the video and, for each object, a list of information on the key-frame images containing this visual object (time code of the image, position of the visual object inside the image), AudioDescription defines a set of parameters for the audio signal, such as its fingerprint (for computing audio similarity), its rhythmic pattern and melodic profile. Examples of TextualAnnotations are TextDescription, RelatedSemanticConcepts (concepts identified by a URI like http://dbpedia.org/resource/Bulldog), Language, NamedEntities, Sentiment, etc. RUCoD appears to be the ideal content metadata representation format for CUbRIK. However, in order to fully comply with the Content and Content Description model of Section 1.2.1, several improvements should be introduced. For instance, in RUCoD, the definition of relationships among ContentObjects is only supported through containment in the XML file. Such a limitation does not allow, for instance, the representation of relationships such as DuplicateOf, which are of paramount importance for an open platform as CUbRIK. Moreover, RUCoD does not support the description of (temporal/spatial/logical) segments, thus hindering the ability to represent the outcome of time- or spatial-aware content analysis components.

2.2

Content License / Copyright Representation Formats

The main purpose of the use of copyright and license metadata within CUbRIK is to aggregate and track information about copyright and license information, and where content originates. The goal is to use copyright and license information and other data to inform users about them, and to derive permissions to control how content needs to be handled managed within the platform, and how content may be presented and delivered to platform users: This can include controlling access to content (both by platform users and platform components), but it does not include the enforcement of usage rules outside of the platform / on the user side such as with so-called Digital Rights Management (DRM) Systems. In that sense, there are several potential standards and formats that can be considered interesting for CUbRIK: Probably the most relevant license scheme for CUbRIK is Creative Commons / CC (http://creativecommons.org), which is commonly used for so-called “some rights reserved” licensing for content, including A/V content: All CC licenses grant access rights for noncommercial use (i.e. they do not need access control) and require attribution of the right holder. However, depending on the chosen CC license, other rights such as commercial use, derivation/modification etc. may be allowed or not (i.e. need to be negotiated separately). In addition, CC licenses can also express a Public Domain “no rights reserved” license (http://wiki.creativecommons.org/Public_domain), which does not include any restrictions, but is only valid in certain jurisdictions (such as the US). CC licenses are easy to apply and, even more importantly, easy to understand by users, and are commonly used. CC is complemented by the so-called ccREL (http://wiki.creativecommons.org/CcREL), a specification describing how CC licenses may be described using RDF and how they can be attached to or embedded into different file types. While philosophy of CC licenses is similar to that of the GNU General Public License in the software domain, CC licenses are not all copyleft, in contrast to the ArtLibre license (http://artlibre.org/licence/lal/en), a copyleft license including the right for a modification of creative works. As a result, and similar to the domain of Open Source Software, some CUbRIK Metadata Models

Page 43

D2.1 Version 1.0


licenses are not compatible with each other, which can create problems especially for remixing, this may be the case even if all licenses included are copyleft. With respect to a pragmatic, automatic copyright-aware processing of content, things get of course much more difficult for “all rights reserved” content. While there are several Rights Expression Languages (REL) such as ODRL (http://odrl.net/) or XrML (http://www.xrml.org/), which provide a rich repertoire for expression, but tend to be much more complex to use and interpret, and less commonly used. As an alternative to that, so-called permission protocols could be applied. They are based on the idea that permissions can be communicated directly from rights holder to a system (such as the CUBRIK platform) by providing information about what may or may not be done with a specific content item: Automated Content Access Protocol (ACAP) (http://www.theacap.org), an extension / successor of the Robots Exclusion Protocol (REP) (http://www.robotstxt.org/), represents such an approach. However, both ACAP and REP have focused on textual data, and at least until now cannot handle A/V content. Finally, relevant information for license and copyright purposes can of course be extracted from generic metadata formats such as Dublin Core, RSS/Media-RSS, AudioMD, VideoMD, METS, P/META, MPEG-7, and via tagging formats, including MP4 iTunes Tagging, MP3 ID3, MKV Tags, Vorbis Comments, EXIF, BWF, XMP and many others. For instance, the Dublin Core Metadata Element Set, or DCMES8, provides a vocabulary of fifteen properties for use as a description for entities such as: “contributor”.

2.3

Social and Users Representation Formats

While there has always been an interest in the literature to model users for various application areas, with the advent of social networks and user generated content on the Web many researchers revisited and enriched user models with social aspects. We summarize a number of user modelling approaches, extending the survey presented in [Daras,2012], and commenting on the conformance of each model to the requirements of CUbRIK, as expressed in the User and Social Models of Section 1.4: Friend of a Friend (FOAF)9: FOAF is a project that aims to describe users, their links and activities within Web pages using RDF/XM. In •

Figure 2-7 , we provide an example of FOAF document. While FOAF can capture basic user related information, it lacks the expressive power to represent social interactions (commenting, etc.), social cognition and meta information. It allows specifying some social relations like group membership or “\knows" relations to other FOAF profiles but the relation is not typed; such a choice has been motivated by an empirical observation that shown how relationships type varies from community to community, and, thus, they must be deduced by the surrounding environment. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Leigh Dodds</foaf:name> <foaf:firstName>Leigh</foaf:firstName> <foaf:surname>Dodds</foaf:surname> <foaf:mbox_sha1sum>71b88e951cb5f07518d69e5bb49a45100fbc</foaf:mbox_sha 1sum> <foaf:knows> <foaf:Person> <foaf:name>Dan Brickley</foaf:name>

8 http://dublincore.org/documents/dces/ 9 http://www.foaf-project.org CUbRIK Metadata Models

Page 44

D2.1 Version 1.0


<foaf:mbox_sha1sum>241021fb9f92815fc210f9e9137262</foaf:mbox_sha1sum> <rdfs:seeAlso rdf:resource="http://rdfweb.org/people/danbri/foaf.rdf"/> </foaf:Person> </foaf:knows> </foaf:Person> </rdf:RDF>

Figure 2-7 Example FOAF document with “knows” relationship between two users10 User profile information is also shallowly described in FOAF. For instance, concepts such as social activities, spoken language, events and participations, address, sexual orientation, relationship status, etc. are missing. •

OpenSocial API11: Open social is a recent effort with the goal of providing a common set of APIs to access major social networking applications (such as Myspace, Orkut, Yahoo!, etc.). To achieve this goal, OpenSocial provide a very abstract and generic data model, so to be able to fit information coming from several social networks. In Open Social, relationships between users are described using the XFN taxonomy (e.g. contact, acquaintance, parent, etc.) or personalized values.

Semantically-Interlinked Online Communities, or SIOC12, is an RDF/OWL representation to link and describe online community sites (e.g., message boards, wikis, weblogs, etc.). SIOC composes of 3 modules: Access, Services and Types. Access module is used to describe users' permissions and status of content Items. The Services module is used to indicate if a web service is associated with (located on) a sioc:Site or if it is a part of it without describing the web service in details. The Types module is used for forums and post description. For applications needs, SIOC documents may use other existing ontologies to enrich the information described such as Dublin Core metadata, FOAF, RSS1.0, etc. For example, more information about the creator of the post can be described using FOAF. Rich content of the post (e.g., HTML representation) can be described using Atom Owl or the RSS 1.0.

General User Model Ontology (GUMO) [Heckmann, 2005]: This is a general and comprehensive approach for user modelling based on OWL. It captures users profiles (and associated metadata) to support the uniform interpretation of distributed user models in intelligent semantic web enriched environments. The GUMO ontology is divided into four parts: (1) GUMO-Basic ontology that models user dimensions like personality traits and user characteristics, (2) GUMO-Context, that models the world directly around the user, including the events, as well as the environmental context. Domain-specific concepts are stored in the (3) GUMO-Domain ontology which includes a general interest model in the user model, and finally the (4) GUMOExtended ontology that collects the ranges, rating dimensions and predicates that address other attributes. GUMO does not support the modelling of social relationships.

User Role Model (URM) [Zhang,2006]: another ontology based approach for modelling users and their access roles to support cross-system personalization. While this model covers social relationships of users, it does not capture profile information.

Grapple User Modelling Framework (GRAPPLE) [Aroyo,2010][GRAPPLE D2.1]: GRAPPLE is 7th program EU project; one of its goal is the the “Definition of an ontology-based user model format”, specially targeted to e-learning systems. In the

10 From FOAF Examples page http://wiki.foaf-project.org/w/UsingFoafKnows 11 http://www.opensocial.org 12 http://rdfs.org/sioc/spec/ CUbRIK Metadata Models

Page 45

D2.1 Version 1.0


Grapple user modelling framework (GUMF) a user profile consists of a set of statements, which are called Grapple Statements (or Grapple Statements or just statements). In a broad sense, Grapple statements can be considered as statements humans formulate in their everyday life. GRAPPLE statements basically consist of a subject (usually the user), a predicate (some property of the subject), and an object (the value of the predicate). Each Grapple statement has a globally unique ID as well as some other additional properties that further describe the statement itself as well as the circumstances, in which the statement was made. The Grapple Core Ontology specifies the lingua franca for exchanging user profile information and user observations in the Grapple user-modelling infrastructure. It follows the approach of the General User Model Ontology (GUMO) as it is built upon the notion of reified subject-predicate-object statements. The GRAPPLE framework appears very suited to allow the representation of user profiles and, being based on an open RDF model, can be freely extended; however, the framework does not cover social relationships and social actions. •

The PIMO13 ontology can be used to express Personal Information Models of individuals. It is based on RDF and NRL (NEPOMUK Representational Language) and it can be extended so to embed other ontologies. PIMO is kept sufficiently simple so that it can be applied by any relevant domain. Based on RDF, it can easily be extended to include the user’s BibTeX of his scientific publications or his special topics/interests in multimedia domain.

Social Web User Model (SWUM) [Plumbaum, 2011] is a generic model that focuses on the Social Web domain; its goal is to support data sharing and aggregation of social networks user models (e.g., Facebook, Twitter, LinkedIn, etc) by aligning concepts, attributes (e.g., personal characteristics, interests and preferences, needs and goals, knowledge and background), user behaviours (e.g., previous search behaviour using Google dashboard), location and time.

TweetUM14 is the Twitter-based User Modeling Framework developed by TU Delft. It allows application developers to get the Twitter-based profiles in RDF format. It creates semantically enriched user profiles as inferred from users' tweets, retrieving all topics that the user is interested in (e.g., basketball), the entities information (e.g., city where he born), the top hash tags (e.g., keywords starting with # for academic conferences) or top entities cloud (e.g., Facebook, Twitter, etc.). GeniUS15 is another project from TUDelft that can be used to enrich the semantics of social data: Given a stream of messages from users' status for example, it can generate the topic and some information about the user. The great amount of available user and social representation formats calls for an additional conceptualization effort aimed at providing a unified model. TheHiddenU project16 is an attempt in this direction, as it compared several models in terms of their coverage different dimensions of user and social concepts, providing a UML description of a proposed reference model [Kapsammer,2011a] [Kapsammer,2011b]. Figure 2-8 (adapted from [Kapsammer,2011b]) shows that the models describe above lack covering some of the concepts that can be used for representing users profiles. It is observed that two models GUMO and OpenSocial have a broader focus: GUMO supports most of the concepts except social relationships whereas OpenSocial is weaker in social cognition concepts but provides a strong coverage of communication and social interaction concepts. •

13 http://www.semanticdesktop.org/ontologies/pimo/ 14 http://wis.ewi.tudelft.nl/tweetum/ 15 http://www.wis.ewi.tudelft.nl/genius/ 16 http://social-nexus.net/ CUbRIK Metadata Models

Page 46

D2.1 Version 1.0


Figure 2-8 Comparison of social networks (left) and user models (right) (from [Kapsammer,2011b]) TheHiddenU project proposes a reference model that can cover arbitrary resources and relations; Figure 2-9 provides the conceptual reference model described in UML (adapted from [Kapsammer,2011a]) for major social networking applications like Facebook, LinkedIn and Google+. While a Resource can be specialized as a user or object, a Relation can capture user-user or user-resource relationships. Furthermore, meta information representing privacy, provenance, quality and context can also be associated with a resource (or relation, which is indeed the reason for defining relations as an extension of resources). [Kapsammer,2011a] and [Kapsammer,2011b] identify four subclasses of a proposed Relation class oriented to social networks modelling. The first two are called structural, since they model a static property of the network: •

SocialRelation: from User to User. Friendship and similar relationships.

• SocialCognition: from User to Resource. Interests and tags for the User. These relations are already covered by the User and Social model, in the form of the UserRelationship and UserTopicalAffinity classes. The other two are instead called behavioural, as they capture events published on a social network. •

SocialInteraction: from User to User, e.g., messages (public and private) and comments. Social interactions types are borrowed from their RDF definitions in the SIOC ontology. Most of them are subclasses of ns#Forum17 (e.g. Discussion, Channel, MailingList, Board, Blog), while the others are subclasses of ns#Post (e.g., Comment, InstantMessage, MailMessage, etc.)

SocialActivity: from User to Resource (e.g. user works in X, user has checked-in at Y). SocialActivity is subclassed by several real world habits (e.g., SmokingHabits/DrinkingHabits) and activities (hobbies listed by the user, e.g. Game and Application, or Tool and Service).

17 http://rdfs.org/sioc/ns CUbRIK Metadata Models

Page 47

D2.1 Version 1.0


Figure 2-9 Main packages and classes in the reference model of TheHiddenU project (from [Kapsammer,2011a]).

2.4

Gaming Representation Formats

Considering that the introduction of social gaming platforms has been done just in the past few years and the fact that this phenomenon has gained significant importance only after the introduction of the Xbox Live! Achievement system (which is proprietary), the information that can be gathered in literature is scarce. The model that has been developed takes into consideration several existing platforms and gaming communities in order to obtain a description that can be as general as possible while being useful for the rewarding and player retention purposes needed for the platform. To make a description for "Game" the source of information that has been taken into consideration was VideoGameGeek[VGG] which is one of the largest database in the world for what concerns videogames and contains an exhaustive list of game genres, modes and themes. To define the useful characteristics that can be used to describe a player, several social gaming platforms have been analysed, such as Microsoft’s Xbox Live! [XboxLive], Sony Computer Entertainment’s Playstation Network [PSN], Valve’s Steam [Steam], Kongregate [Kongregate], GamersBook [GamersBook]. By the analysis of these platforms, a common set of the most relevant features that can be used to drive the interests of players and their participation to a community have been extracted. In order to describe a possible model for a rewarding system, the architecture of the aforementioned systems has been analysed along with the information obtained through a set of research articles and patents. In Achievement Unlocked [Curley,2010] Vince Curley, one of the heads behind the creation of the Xbox's Gamer Profile, talks about the implementation of Achievements inside the Xbox Live! Platform, while on the U.S. Patent [Michal Bortnik,2011] are described systems and CUbRIK Metadata Models

Page 48

D2.1 Version 1.0


methods for providing a game achievement system where players are rewarded with game achievements based on mastering certain in-game facets of the games they play, which will be turned later on into the Xbox Live! platform that is well known nowadays and from which all the other existing platforms took inspiration. In [Montoya,2009] Markus Montola et el. used a custom achievement system in order to enhance user experience in a photo sharing service. The Bunchball's Gamification 101 whitepaper [Bunchball,2010] describes techniques used in Gamification that can be applied in the development of Achievement Systems to influence and motivate group of people. The taxonomy used to categorize players is based on the work made by Richard Bartle [Bartle,1996] on Multi User Dungeons.

2.5 Conflict Resolution Representation Formats and Conflict Management A conflict can be defined as a â&#x20AC;&#x153;competitive or opposing action of incompatibles: antagonistic state or action (as of divergent ideas, interests, or persons)â&#x20AC;?18. Conflict management is an activity that requires the implementation of strategies to limit the negative aspects of conflict and to increase the positive aspects of conflict at a level equal to or higher than where the conflict is taking place19. The problems arising from the presence of conflicts have been extensively studied in several fields. For instance, [Fitzgerald, 2008] explores the problem of requirements conflict resolutions. [Ouertani,2006] elaborates on conflict resolution in multi-actors interaction scenarios in the field of engineering and design. Conflicts are also typical in the knowledge merging and resource competition scenario, where conflicts can be categorized as naming conflicts (e.g. synonym and homonym), scaling conflicts (domain mismatch in the price unit for example), and confounding conflicts (equating concepts are actually different) [MAIF,2007]. Conflicts are a big concern also in the Modeling Driven Engineering field, where models can be developed in a distributed environment, thus once they have merge together, conflicts and inconsistencies should be detected in a convenient way in order to let them be reconciled. In [Cicchetti,a] [Cicchetti,b], authors suggested a metamodel, depicted in Figure 2-10 , for conflict detection that is based on difference metamodel. It contains left and right patterns definition for two input models. Each pattern includes one or more difference element. Patterns are described in OCL language

18 http://www.merriam-webster.com/dictionary/conflict 19 http://en.wikipedia.org/wiki/Conflict_management CUbRIK Metadata Models

Page 49

D2.1 Version 1.0


Figure 2-10 Conflict Metamodel according to [Cicchetti,a] In [Pordel,2009], authors exploited the conflict metamodel of [Cicchetti,a] to develop an Eclipse plug-in for developing conflict models in the language of conflict metamodel. In the context of multimedia processing and management, a conflict is situation during the analysis of a given content object where absence or contradictions of annotations or descriptions may arise. In [Cheng-Yu,2006] authors apply the notion of conflict in merging image annotations, and they assume the appearance of conflicts in the following phases of the content analysis process: •

Conversion phase: a conflict happens in the translation from a manually or automatically produced annotation descriptions to the internal representation of the system;

Merging phase: a conflict happens in merging the annotations between two different annotators (or between an annotator and a software agent) for the same image.

Consistency of annotation: a conflict happens in the annotations (after the merging process). The annotation could still have potential contradictions in which are produced by a single annotator or by a merging process (Inference conflicts). For example, some descriptions could be right when we examine them individually, however, they may lead to a wrong conclusion when they are putting together, or they may contradict to a fact that the system might not know yet. Although conflicts represent a very important concern in the multimedia analysis application domains, no representation format has been explicitly advised, thus leaving the space for a contribution in the field of domain specific models and languages. •

In CUbRIK, the conflict management activity is performed through CUbRIKApps components that exploit existing human computation and/or social network platforms. Human Computation [von Ahn, 2009] is a paradigm applied in business, entertainment and science, where the interaction among users is harnessed to help in the cooperative solution of tasks. According to [Quinn,2011], a system belongs to the area of Human Computation when

CUbRIK Metadata Models

Page 50

D2.1 Version 1.0


human collaboration is facilitated by the computer system and not by the initiative of the participants. In the context of processing for multimedia search applications, the is typically to automatically classifying non-textual assets, audio, images, video, to enable information retrieval and similarity search, for example, finding songs similar to a tune whistled by the user or images with content resembling a given picture. Crowdsourcing is a facilitator of human computation: it addresses the distributed assignment of work to a vast community of executors in a structured platform [Howe, 2006]. A typical crowdsourcing application has a Web interface that can be used by two kinds of people: work providers can enter in the system the specification of a piece of work they need (e.g., collecting addresses of businesses, classifying products by category, geo-referencing location names, etc); work performers can enrol, declare their skills, and take up and perform a piece of work. The application manages the work life cycle: performer assignment, time and price negotiation, result submission and verification, an payment. In some cases, the application is also able to split complex tasks into micro-tasks that can be assigned independently, e.g., breaking a complex form into sub-forms that can be filled by different workers. The approach of CUbRIK to the definition of a conflict resolution applications aligns with the characteristics of Crowdsourcing applications, but it can also be regarded as a generalization of such systems for crowd-enabled data management as CrowdDB [Franklin,2011], Turk [Marcus,2011], and Snoop [Parameswaran,2011], which transparently combine data and human sources. These systems integrate with crowdsourcing engines (specifically Mechanical Turk) and not with social networks; they advocate transparent optimization, while we advocate conscious interaction of the query master; they involve users only in data completion, while we also plan to involve users in other classical social responses, such as liking, ranking, and tagging.

CUbRIK Metadata Models

Page 51

D2.1 Version 1.0


3.

TOWARD POTENTIAL FUTURE EXTENSIONS

The platform model set out above in Section 1 describes the logical structure of the data represented in the CUbRIK platform. Although the model can be expected to provide the foundation for a large number of CUbRIK components, it is also anticipated that requirements will arise during the course of the project that will make it necessary to extend it. Such requirements will emerge from application scenarios in CUbRIK, both from within the domains of practice as well as from within the application domains of the horizontal demos. Since the development of these requirements will follow an iterative cycle that will extend from M1-M24, it is critical to take into account that the platform model may need to be extended in order to accommodate them. In this section, a series of dimensions is described along which we have identified as being potentially important for inclusion in the platform model, but which cannot be filled out without further input from domains of application. The section concludes with an observation about what the dimensions may all have in common. The platform model in Section 1 was established by making use of three sources of information: 1. Previous experience in the group with multimedia search systems; 2. Reflection on logical completeness and consistency of the system; and 3. A minimal amount of input from the early stages of requirements formulation from the Domains of Practice. During the process of establishing the model, a number of areas came up that were either difficult to fit into the model, or were found to be difficult to define due to lack of information concerning the ultimate scenarios of use. These areas were discussed and many were found to be addressable in the model or were considered to be too detailed to warrant inclusion at a conceptual level. However, there were also aspects that were found to exist on a high conceptual level, and were deemed potentially relevant within CUbRIK depending on additional information that becomes available concerning requirements for the CUbRIK Apps and horizontal demos arising from within the Domains of Practice and the ultimate scenarios of use. These areas are discussed in more detail in this section. This discussion will provide the basis of an extension of the metadata model, should any of these areas emerge as being essential. Duplication: Duplication within the system represents a particular challenge. Duplication is covered in the "Content and Content Description Models" by allowing ContentObjects to be related with each other via ContentRelationships. For duplicates, the value of ContentRelationship is +DuplicateOf. It is clear that CUbRIK will need to handle duplications since the scenarios of use will involve multiple items drawn from multiple collections. However, the Platform Model does not represent a confidence value on the presence of duplicates in the system. Such a value would require retaining information on when the last global de-duplication had been carried out. Further, in some cases, equating certain entities or items with other entities or items might be a matter of expert opinion; here the meatadata would need to represent whether de-duplication hypotheses have been put forward, or whether no de-duplication has been attempted. Entity status: CUbRIK focuses on time aware search and also puts emphasis on location. Some points in space have the status of place entities (e.g., addresses, cities) and others do not. Likewise, some points in time are associated with events (e.g., treaties, inaugurations) and others are not. As set out above, within CUbRIK "An entity is any object of the real world that is so important to be denoted with a name." The decision concerning which points in space are important enough to be places and which time-space coordinates are important enough to be events is assumed to have been made before the data is encoded into the Platform Model. In other words, the model does not support the emergence of an arbitrary point in time-space as an event as a result of, for example, a historian interacting with the system discovering its importance. The system is, however, able to handle entities of ambiguous status by simply encoding both alternatives. For example, â&#x20AC;&#x153;Berlin Brandenburg

CUbRIK Metadata Models

Page 52

D2.1 Version 1.0


International Airport” can exist in the system parallel to “Schönefeld”. A link might identify the two as identical. However, the two entities are not necessarily collapsed into one, since an archivist labelling a photo needs to be able to indicate that the photo was taking “Schönefeld”, without implying that it was taken at “Berlin Brandenburg International Airport”. The CUbRIK metadata cannot encode the decision of the archivist that “Schönefeld” is important enough to be promoted to an entity. The point is particularly critical for geo-political entities of disputed status Authority and expertise: The platform model contains a basic representation of authority and expertise. Both the notion of expertise deriving from knowledge or station and the notion of expertise derived from track record (for example, by doing) is represented. However, the manner in which expertise and authority relates to particular domains and the manner in which various sorts of expertise interact is not represented. The point is important because in some cases, one user will trust another user on an unknown topic simply because the user is a known fried. In other cases, experts in one knowledge domain (e.g., impressionist painting) cannot be assumed to be experts in another knowledge domain (e.g., culinary innovation). The metadata model is able to cover the topical aspects of expertise by using the UserTopicalAffinity property of the User in the User Model. However, this representation might need to be extended in order to allow it to develop over time and also in order to allow it to represent the reason why the person has a particular topical affinity (personal interest or professional qualifications). We close this section by stepping back to take a wide perspective on the extensions. Seen from this perspective, the areas of the platform model that we have identified as being most likely to need extensions have something interesting in common. All of the areas are related to the need to identify the appropriate balance within the platform model between retention of the maximum possible underlying information (e.g., every single user interaction with an item) and summarization of information into either more general annotations or annotations in which contradictions have been resolved. In order to make this point clearer, we consider the two extremes. At one extreme, the platform model could be designed to store everything. For example, for a given image, every tag that has ever been assigned to that image by a user or a crowdsourcing worker could be represented. Then, the application itself could be in charge of resolving the contradictions between the tags or choosing the most reliable tag for the purposes or finding the image or for display with the image. There are several distinct disadvantages to storing a maximum amount of information and allowing the application to perform this computation. First, the stored annotations can consume a lot of storage space. Second, the computation could be too computationally expensive to be performed on-line at application run time. Third, the application would need access to information concerning how the annotations are created in order to resolve contradictions (for example, one tag assigned by an expert outweighs three tags assigned by a non-expert). If such information is missing, then the application is unable to make use of the metadata. At the other extreme, the platform model could be designed to store only annotations, items and entities that have been resolved and are unambiguous. There are also several distinct disadvantages to this approach. First, in an application that is aimed at supporting users in discovery, it can be anticipated that new entities emerge during the course of user interaction with the data. For example, a historian discovers the significance of a particular exchange of letters, which was not previously considered to be a historical event. The historian should be able to allow the new event to be represented in the platform model and allow it to retain some degree of ambiguity in its event status until it can be validated by other historians. Identification or emergence or new entities is not possible in a platform model in which all contradictions have been resolved so as to make all entities perfectly unambiguous. Further, in some scenarios or application, different forms of resolution will be more appropriate. For example, for some applications, the tags of three non-experts should outweigh the tag of the expert (for example, in a moving rating scenario, a film that received high critical acclaim but CUbRIK Metadata Models

Page 53

D2.1 Version 1.0


which was disliked by the general public should not necessarily be considered a good film). If the platform model resolves all contradictions, application specific forms of resolution are not possible. Finally, the traceability of the path by which the system arrived at a particular conclusion is important in many application scenarios. The very first discussions with the domains of practice identified the importance of this point. Archivists find it important to be able to trace back the route along which facts included in the archive have been established. This requirement exists not only among professional users, but also among non-professional users, following a general trend in supporting users by offering explanations along with retrieval or recommendation results. In sum, although it is impossible to predict the ways in which the current platform model might reveal itself as ill suited for covering the components necessary in the CUbRIK vertical applications and horizontal demos, the situation is not entirely bleak. Instead, a platform model with a high level of coverage has been established and a series of areas have been identified in which extensions are seen as mostly likely to be necessary. Although the list is long, a common thread runs through all the areas. This common thread concerns maintaining a balance between representing all possible information in the platform model and representing only information on a summarized level or a level at which contradiction has been involved. We anticipate that this insight into nature of the extensions that might later be necessary for the platform data model will help to guide the instantiation of the data model into individual metadata standards for use in CUbRIK applications.

CUbRIK Metadata Models

Page 54

D2.1 Version 1.0


Bibliography [Agathos,2009]

Agathos A., Pratikakis I., Papadakis P., Perantonis S., Azaridis P., Sapidis N.: Retrieval of 3D Articulated Objects using a graph-based representation. In Eurographics Workshop on Shape Retrieval (2009), pp. 1–8. [Annodex] Silvia Pfeiffer, Conrad Parker, and Claudia Schremmer. Annodex: a simple architecture to enable hyperlinking, search and retrieval of time-continuous data on the web. In 5th ACM SIGMM International workshop on Multimedia information retrieval, pages 87–93, 2003. [Aroyo,2010] L. Aroyo and G.-J. Houben. User modeling and adaptive Semantic Web. Semantic Web, 1(1):105-110, 2010. [Bartle,1996] Richard Bartle. Hearts, Clubs, Diamonds, Spades: Players who suit muds. http://www.mud.co.uk/richard/hcds.htm [Bozzon, 2012] Alessandro Bozzon, Marco Brambilla, Stefano Ceri. Answering Search Queries with CrowdSearcher. In WWW2012 conference, Lyon, France. [Bunchball,2010] Gamification 101. www.bunchball.com/gamification/gamification101.pdf [Cheng-Yu,2006] Cheng-Yu Lee, Von-Wun Soo, The conflict detection and resolution in knowledge merging for image annotation, Information Processing &amp; Management, Volume 42, Issue 4, July 2006, Pages 10301055, ISSN 0306-4573, 10.1016/j.ipm.2005.09.004. [Cicchetti,a] Antonio Cicchetti, PHD Thesis, University of L’Aquila, Italy, “Difference Representation and Conflict Management in ModelDriven Engineering” [Cicchetti,b] Antonio Cicchetti, Alessandro Rossini: Weaving models in conflict detection specifications. SAC 2007: 1035-1036 [Curley,2010] Achievement Unlocked, Vincent Curley http://www.xbox.com/en-AU/Live/EngineeringBlog/071510-AchievementsUnlocked [Daras,2012] P. Daras, A. Axenopoulos, V. Darlagiannis, D. Tzovaras, X. Le Bourdon, L. Joyeux, A. Verroust-Blondet, V. Croce, T. Steiner, A. Massari, A. Camurri, S. Morin, A-D. Mezaour, L. Sutton, S. Spiller, "Introducing a Unified Framework for Content Object Description", International Journal of Multimedia Intelligence and Security, Special Issue on “Challenges in Scalable Context Aware Multimedia Computing”, Volume 2, Number 3–4/2011, DOI 10.1504/IJMIS.2011.044765, Pages: 351-375, January 2012. [Dey,1999] Dey A. K., Abowd G. D., “Towards a Better Understanding of Context and Context-Awareness”, In HUC ’99: Proceedings of the 1st international symposium on Handheld and Ubiquitous Computing, 1999. [Deselaers,2007] T. Deselaers, D. Keysers and H. Ney, Features for image retrieval: An experimental comparison, Inform. Retri. 11(2) (2007) 77, 107. [GamesGenres,2012] VideoGames Genres. http://videogamegeek.com/browse/videogamegenre [Fitzgerald,2008] Camilo Fitzgerald. Requirements Conflict Resolution in Collaborative Environments. First year VIVA Report for the degree of Doctor of Philosophy at University College London. http://sreresearch.cs.ucl.ac.uk/FirstYearVIVAReport.doc [Franklin2011] Franklin M.J. et al. CrowdDB: answering queries with crowdsourcing. CUbRIK Metadata Models

Page 55

D2.1 Version 1.0


[GamersBook] [GRAPPLE D2.1] [Heckmann, 2005]

[Howe, 2006] [Kapsammer,2011a]

[Kapsammer,2011b]

[Kongregate] [Lidy,2005]

[MAIF,2007] [Marcus2011]

In Proceedings of the 2011 international conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 61-72. GamersBook community for gamers. http://www.gamersbook.com/ www.grapple-project.org/.../D2.1-WP2-UserProfileFormat-v1.0.pdf D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendor. GUMO - The General User Model Ontology. In 10th Int. Conf. on User Modeling, pages 428-432. Springer, 2005. Howe, J., 2006. The rise of crowdsourcing. Wired 14 (6). E. Kapsammer, S. Lechner, S. Mitsch, B. Pröll, W. Retschitzegger, W. Schwinger, M. Wimmer, and M. Wischenbart. In Int. Workshop on Personalized Access, Profile Management, and Context Awareness in Databases, at 37th International Conference on Very Large Data Bases (VLDB), 2011. E. Kapsammer, S. Mitsch, B. Pröll, W. Schwinger, M. Wimmer, and M. Wischenbart. A First Step Towards a Conceptual Reference Model for Comparing Social User Profiles. In Int. Workshop on User Profile Data on the Social Semantic Web, at 8th Extended Semantic Web Conference (ESWC), 2011. Kongregate. http://www.kongregate.com/ T. Lidy and A. Rauber. Evaluation of feature extractors and psychoacoustic transformations for music genre classification. In Proc. ISMIR, pages 34–41, London, UK, September 11-15 2005. Multimedia Annotation Interoperability Framework. http://www.w3.org/2005/Incubator/mmsem/XGR-interoperability/ Marcus, A. et al. Crowdsourced Databases: Query Processing with People. Conference on Innovative Data Systems Research. 2011 (Asilomar, CA, 2011), 211-214

[MF] [Michal Bortnik,2011]

Media Fragments. http://www.w3.org/TR/media-frags Game achievements system, Michal Bortnik et al http://www.google.com/patents/US7887419

[Mirex,2010]

http://www.musicir.org/mirex/wiki/ 2010:Audio_Music_Similarity_and_Retrieval_Results

[MXF]

Pedro Ferreira (23 July 2010). "MXF - a progress report (2010)" http://tech.ebu.ch/docs/techreview/trev_2010-Q3_MXF-1.pdf Markus Montola, Timo Nummenmaa, Andrés Lucero, Marion Boberg, and Hannu Korhonen. 2009. Applying game achievement systems to enhance user experience in a photo sharing service. In Proceedings of the 13th International MindTrek Conference: Everyday Life in the Ubiquitous Era (MindTrek '09). ACM, New York, NY, USA, 94-97. Nack, F.; Lindsay, A.T.; "Everything you wanted to know about MPEG-7. 1," Multimedia, IEEE, vol.6, no.3, pp.65-77, Jul-Sep 1999 Frank Nack and Adam T. Lindsay. 1999. Everything You Wanted to Know About MPEG-7: Part 2. IEEE MultiMedia 6, 4 (October 1999), 64-73. DOI=10.1109/93.809235 http://dx.doi.org/10.1109/93.809235 Jacco van Ossenbruggen, Frank Nack, and Lynda Hardman. 2004. That Obscure Object of Desire: Multimedia Metadata on the Web, Part 1. IEEE MultiMedia 11, 4 (October 2004), 38-48.

[Montoya,2009]

[Nack1999] [Nack1999b]

[Ossenbruggen2004]

[Ouertani,2006]

CUbRIK Metadata Models

Ouertani, MZ and Gzara Yesilbas, L and Lombard, M (2006) Managing data dependencies to support conflict management. In: 16th CIRP International Design Seminar: Design and Innovation for a Page 56

D2.1 Version 1.0


Sustainable Society, 16-7-2006 to 19-7-2006, Kananaskis, Alberta, Canada. [Plumbaum, 2011] Till Plumbaum, Songxuan Wu, Ernesto William De Luca, Sahin Albayrak. User Modeling for the Social Semantic Web, In: 2nd Workshop on Semantic Personalized Information Management: Retrieval and Recommendation, in conjunction with ISWC 2011. Bonn, Germany, 2011. [Parameswaran,2011] Parameswaran, A. and Polyzotis, N. Answering Queries using Databases, Humans and Algorithms. Conference on Innovative Data Systems Research 2011 (Asilomar, CA, 2011), 160-166. [PHAROS] The PHAROS (Platform for searcHing of Audiovisual Resources across Online Spaces) project, an Integrated Project co-financed by the European Union under the Information Society Technologies Programme (6th Framework Programme). http://www.pharosaudiovisual-search.eu/ [Pordel,2009] Mostafa Pordel. A Metamodel independent approach for Conflict Detection to support distributed development in MDE. Master Thesis, 2009. [PSN] Playstation Network. http://uk.playstation.com/psn/ [Quinn,2011] A. J. Quinn and B. B. Bederson. Human computation: a survey and taxonomy of a growing field. In Proceedings of the 2011 annual conference on Human Factors in Computing Systems, CHI'11, pages 1403-1412, 2011 [Saathoff2010] Carsten Saathoff and Ansgar Scherp. 2010. Unlocking the semantics of multimedia presentations in the web with the multimedia metadata ontology. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 831840.workshop on Multimedia information retrieval, pages 87â&#x20AC;&#x201C;93, 2003. [STEAM] Steam Digital Delivery Platform. http://store.steampowered.com/ [XboxLive] Xbox Live! Platform. www.xbox.com/live [VGG] VideoGame Geek. http://videogamegeek.com/ [WebKit] Media Fragments support in WebKit. http://trac.webkit.org/changeset/104197 [W3CMPEG] Relevant MPEG Metadata Technologies, Ian Burnett, Stephen Davis, Julie Lofton (University of Wollongong, Australia/Hotato, Inc). http://www.w3.org/2007/08/video/positions/Burnett.pdf [Zhang,2006] F. Zhang, Z. Song, and H. Zhang. Web Service Based Architecture and Ontology Based User Model for Cross-System Personalization. In Int. Conf. on Web Intelligence (WI'06), pages 849-852. IEEE, 2006.

CUbRIK Metadata Models

Page 57

D2.1 Version 1.0


CUbRIK Metadata Models