http://www.tracingisdrawing.com/Docs/002

Page 1

Pasquale J. Festa Rosangela Briscese INF384C – Organized and Providing Access to Information Prof. Efron 3.1.2007 NSDL Assignment For our NSDL Project we chose to look at the Atomic Archives (setSpec:920106) and Prentice Hall Online Gallery (setSpec: 1537388) projects in the NSDL. Each project was relatively small in size and did not require resumption tokens to retrieve additional records from the collection. Atomic Archives held a total of 44 records. Prentice Hall held a total of 127 records. Below you will find Metadata analysis exploring which elements both projects utilized in their Dublin Core records in both tabular and grammatical forms. I. Quantitative Metadata Analysis: 1. Which DC Elements do the Project Records Use? For the most part, both projects utilized the same Dublin Core elements in their metadata. Two particular elements (contributor and source) were absent from use in both projects. In addition, Atomic Archives did not use the rights element and Prentice Hall did not use the date, coverage and relation elements. A table of this data as compiled from Harvester output analyzing the xml of each project can be found below. Table 1: Dublin Core Elements Utilized by Projects: Dublin Core Element Title Creator Contributor Subject Description Date Source ID’s Coverage Rights Relation Type Language Format Publisher Non-Zero

Atomic Archives Utilized Utilized Not Utilized Utilized Utilized Utilized Not Utilized Utilized Utilized Not Utilized Utilized Utilized Utilized Utilized Utilized Utilized

Prentice Hall Utilized Utilized Not Utilized Utilized Utilized Not Utilized Not Utilized Utilized Not Utilized Utilized Not Utilized Utilized Utilized Utilized Utilized Utilized


2. Do all DC Records use the same Elements? A. Atomic Archives For Atomic Archives, certain Elements are used by each record (title, creator, description, ID, type, language, format, and publication). While the average number of times each element is used differs from set to set, the average percentage of their use is 100% across the board. This data can be found in the “Atomic Archives – Proportions” worksheet in the nsdlDataWorkbook.xls Excel workbook. The subject element, which is of great interest, varies in usage between records from not being utilized at all to being utilized up to 16 times by one particular set in the collection. While, on average, the element is used approximately 6 times, it’s percentage of use in the entire collection is 95%. Two of the 45 records did not utilize the subject element. Other elements of interest are date, coverage, and relation. These elements seem to be haphazardly used with their proportion of use in the collection ranging from 31% to 68%. The reason for this inconsistency is unknown, but due to this a good proportion of the records would not be able to be searched for based upon these criteria while a number of the records would. This data is provided in the “Atomic Archives – Proportions” worksheet. Table 2: Maximum Number of Times an Element is Utilized in Atomic Archive Dublin Core Element Title Creator Contributor Subject Description Date Source ID’s Coverage Rights Relation Type Language Format Publisher Non-Zero

Max. Number of Use 2 1 0 16 1 1 0 1 1 0 1 1 1 2 1 12


Graph 1: Proportion of Element Usage in Atomic Archives Proportion of Element Usage in Atomic Archives 120.00%

Percentage of Use

100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Element

Graph 2: Average Frequency of Element Use in Atomic Archives Average Frequency of Element Use in Atomic Archives

Average Number of Times Utilized

12 10 8 6 4 2 0 1

2

3

4

5

6

7

8

9

Element

10

11

12

13

14

15

16


B. Prentice Hall For Prentice Hall, not one element was used 100% of the time. Instead, all elements that were utilized at all were used 98% of the time across the board. Here, unlike in the example of Atomic Archives, we find a relatively consistent usage of the Dublin Core Elements. However, because elements that were utilized appeared 98% of the time, it is apparent that there are records in this collection which are absent of Dublin Core metadata and, essentially, unsearchable. This data can be found in the “Prentice Hall – Proportions” worksheet. Again, as in the case of Atomic Archives, certain elements were utilized more than others. In Prentice Hall metadata analysis we find that the subject element is again used on average more than once per record (1.9 times to be exact). The creator element is also used on average 1.3 times throughout the collection. The rest of the elements utilized (title, description, ID, rights, type, language, format, and publication) are used an average of .98 times, the same as the proportion of use found above. This is due partly to the fact that each of these elements is “single use” (i.e. one title for a record, not two). However, due to the fact that there are sources with no Dublin Core records in this collection, we do not find these elements to be used once for each record as the blank metadata pulls the average use below one. Nonetheless, 98% is pretty consistent in terms of element use for a records collection. Table 3: Maximum Number of Times an Element is Utilized in Prentice Hall Dublin Core Element Title Creator Contributor Subject Description Date Source ID’s Coverage Rights Relation Type Language Format Publisher Non-Zero

Max. Number of Use 1 4 0 2 1 0 0 1 0 1 0 1 1 1 1 10


Graph 3: Proportion of Use of Elements in Prentice Hall

Proportion of Use of Elements in Prentice Hall

Percentage of Use

120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 1

3

5

7

9

11

13

15

Element

Graph 4: Average Frequency of Use of Elements in Prentice Hall Average Frequency of Element use in Prentice Hall

Average Number of Times Utilized

12 10 8 6 4 2 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

Element


3. Distribution of the Subject Element In creating a histogram of the distribution of the subject element in both the Atomic Archives and Prentice Hall projects, it was found that the highest rate of frequency of use was 2 instances per record, which was the case in 121 of the records. It is important to note that above it was stated that the Atomic Archives had an average rate of use of the subject element that was over 6 (i.e. on average, each record used the subject element around 6 times). The Prentice Hall average use of the subject element was around 2 (1.9 to be exact). When all the data is averaged out together the fact that the Prentice Hall collection is substantially bigger than the Atomic Archives collection comes into play. While the Atomic Archives utilized the subject element to a greater degree than Prentice Hall, the shear fact that Prentice Hall is about 3 times larger in size explains why an analysis of the frequency of the subject element in both collections would yield such a low number (2). It could be argued that this mean frequency across the two collections is not statistically significant due to the fact that the two collections are not close enough in size and therefore will give us disproportionate information. If the Atomic Archives collection were closer in its size to the Prentice Hall collection, then it could be said that the frequency of distribution of the subject element across the two collections would be statistically significant. Graph 5: Distribution of the Subject Element in Prentice Hall Collection

200 100

Frequency

16

14

12

10

8

6

4

2

0

0

Frequency

Histogram Prentice Hall

Bin

Graph 6: Distribution of the Subject Element in Atomic Archives Collection

10 5

Frequency

Bin

16

14

12

10

8

6

4

2

0

0

Frequency

Histogram Atomic Archives


Graph 7: Distribution of the Subject Element in both Collections

140 120 100 80 60 40 20 0 Bin

16

14

12

10

8

6

4

2

Frequency

0

Frequency

Histogram


II. Qualitative Metadata Analysis: To evaluate the accuracy of the DC records, randomly selected records were compared to the web pages which they described. At the outset of this investigation, it was decided that an examination of five records from each collection would be sufficient in determining the accuracy. As the process went on, it seemed fit for an additional record from the Prentice Hall Online Gallery to be examined, due in fact to the problems present in that collection. These problems are explained later in this document in the Prentice Hall Online Gallery section. A. Atomic Archives The records in this collection correspond to web pages or groups of web pages within the site www.atomicarchive.com. The site was built by A.J. Software and Multimedia, and Christopher Griffith is credited as the “Producer/Technical Lead.” These specifics are worth noting because they inform the data present in the DC records. For example, “Christopher Griffith” always appeared in the creator field. That said, it can be concluded that the DC records connected to Atomic Archive are accurate, for upon examination, they link to the correct pages, and the pages reflect the content enumerated in the record. However, the specific words that form the contents of the DC elements do not necessarily appear within the text of the page itself. For example, the creator information typically does not appear on each page, but rather on the “about us” page. Likewise, the words present in subject are not always present on the page. However, the description always contained detailed information taken directly from the page. Overall, it appears that the Atomic Archives DC records would be helpful for resource discovery. The subject elements look like they could be standard cataloging terms. It could be argued that they cast somewhat of a wide net, but they do not appear to veer off-topic. Rather, they seem to represent the contents of the pages at different angles or levels of specificity. For example, the subject headings listed on the Manhattan Project page are: Science - - Physics - - Atomic Physics History Political Science History of Science History - - World War II Atomic Bomb Nuclear Fission (http://www.atomicarchive.com/Docs/ManhattanProject/index.shtml) The description field provides additional detail in all examined cases. For example, in the case of the Manhattan Project, the description field reads as follows: These documents chronicle the establishment of a secret program—which came to be known as the Manhattan Project—to develop an atomic bomb, a powerful explosive


nuclear weapon. Principal documents include: “The Quebec Agreement,” The RooseveltChurchill "Tube Alloys" Deal, Interim Committee's Report, “Report of the Committee on Political and Social Problems (The Franck Report), and “Atomic Energy for Military Purposes (The Smyth Report).” Based on the high level of attention which seems to have been devoted to these records, it seems safe to conclude that the DC records for the Atomic Archives were generated by a person. Possibly even an individual with an information science background. In particular, the presence of subject headings which reflect the gist of the content and not necessarily exact phrases within this content is strong evidence for this conclusion. B. Prentice Hall Online Gallery At their best, the DC records for the Prentice Hall Online Gallery point to textbook information pages with useful online resources such as Power Point slides and test questions. At their worst, they do not even link to the correct pages. The poor performance, so to speak, of the Prentice Hall Online Gallery in the DC Record/web page comparison led to the inspection of an additional record, out of pity and just in case particularly problematic records were randomly selected. Three of the examined records were deemed accurate, e.g., the links corresponded to the correct pages, titles and creators matched, subject listings and descriptions seemed appropriate. However, three other records were quite problematic. The first of these, which only listed an author’s last name (“Dykstra”) in the creator field, contained an identifier which led to a page presenting two different books by two different Dykstras. After selecting the Dykstra text which matched the original record, a user may note discrepancies between the title as presented in the DC record, the title present on the book icon (on the page with the two Dykstras), and the title as presented when having finally arrived to the correct web page. These varying titles are: Physical Chemistry; Introduction to Quantum Chemistry; and Physical Chemistry, a Modern Introduction, respectively. The resource itself is accurate in respect to the subject and description, once found. The two other problematic records were even more frustrating because the web pages could not be found. One identifier inexplicably linked to a completely unrelated site (contact lens ordering), whereas the other led to a page displaying the message “this companion website is no longer available.” The problem in the case of the latter is therefore not so much in the DC record creation as in the maintenance. In the case of the former, the error was likely to have occurred during the DC record creation. This odd record describes the third edition of a text which has since been published in fourth and fifth editions, according to our brief online research. Whether or not this is relates to the incorrect linkage is unknown. The bizarre link is: http://www.nuvisionmiami.com/books/asm/ Upon going to that link, it is quite obvious that it does not correspond to Assembly Language for Intel-Based Computers, 3/e, with alleged “resources for instructors including answers to even-numbered review questions, solutions to programming


assignments, PowerPoint(r) slides for all the chapters, sample tests and quizzes, and official Microsoft(r) Assembler Manuals.” As it was impossible to verify the accuracy of the other DC elements when the identifier was incorrect, the most useful observation that may be made about these records is that, as with the accurate records, the subject listings are rather limited (e.g., “Science; Computer Science,” “Science; Physics”) and the descriptions vary greatly in terms of length and detail (e.g., from five words to five sentences). The brief list of subject headings could possibly be helpful for locating textbooks for a class, but an instructor would still likely be in for an arduous search. And, as the descriptions vary greatly, the degree to which they are helpful would be unreliable. Most shocking and appalling of all is the fact that Prentice Hall DC records do not contain the date element. As the Prentice Hall Online Gallery contains representations of published works, typically the title and author do appear within the content of the page. The description, too, also often contains words that appear on the page, for typically the description summarizes the teaching tools available on the site, and if one goes to the site, there are links for said teaching tools. The subject headings, as in the case of the Atomic Archives, generally do not appear on the web page itself. There is great inconsistency in the Prentice Hall records which makes it more difficult to determine whether they were human or machine generated. Are the errors attributable to machine malfunction or human slips? Certainly, some of the descriptions contain fairly-crafted prose, but these descriptions could have been inserted from some sort of database. Perhaps a hybrid method was used, if such a thing exists. Or, if people did create these DC records, they were not working against a uniform standard.


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.