IMS2603 Information Management in Organisations Lecture 17 MaRC as Metadata Revision Last lecture looked at philosophical bases for thinking about metadata, in particular looking at ontology as an approach that might enable smart software [rather than people] to enable the semantic web. 2 Outline MaRC what it is MaRC tags MaRC header data MaRC as a database structure MaRC and XML 3 1
Reading Tag list: Bibliographic http://www.itsmarc.com/crs/bib1468.htm Library of Congress MARC Standards http://www.loc.gov/marc/ 4 MaRC what it is Machine Readable Catalog A standard for moving catalog records from one place to another A tape standard [more about this later] Created before relational data structures were defined by Codd. A very important precurser to discussions about metadata. 5 Card Catalog [keyword? you wish!] 6 2
Catalog Card 7 Catalog Card and data elements revised 8 Data Elements I Take a few minutes for this Start to design a Relational Structure to deal with the data elements you can see on a catalog card. 9 3
Data Elements II What size are the data fields? Are all fields always used? How many tables are needed? 10 Data Structure for cataloguing data Variable length Sophisticated [well, complex] content Either very empty structure [i.e. allow all possible data structure elements to be present, but very few will be actually used for data for any given record] or data structure is part of the record 11 Now let s think about tape Tape was a common format for storing and moving data before ubiquitous networks What do you think are some features of tape? What are some of the problems? 12 4
Getting the two problem areas together Because the data are on tape, then the length of the record should be part of the record [i.e. did you get all the data, can the system move to the nth record on the tape] The head of the record contains some metadata, and also the structure of that particular record [is this meta meta data?] 13 MARC tags [or fields] are Grouped 000s - metadata [about metadata?] 100s - author stuff 200s - title/publisher stuff 300s - physical description 400s - series statements 500s - notes 600s - subject things 700s - added entries [authors, titles] 800s - series added entries 900s - local data 14 MaRC tags Brought to you by librarians and their systems people, so a decimal system? E.g. 100 = personal author 245 = title 500 = notes 650 = subject heading 15 5
Subfields E.g. 100 is the Personal Name [main entry] Each subfield is used for aspects of the personal name $a for the personal name, another [$d] is used for dates, titles [$c], etc. 16 Subfields cont d Subfields are not used consistently. In the case of other personal names [700 tag] the same subfields apply as for 100, but subfields in the 245 [title] tag have totally different meanings [$a= title, $b= remainder of title [subtitle?]] In the case of 260 [publication] tag $a is place, $b is name and $c is date of publication 17 Indicators These are two optional additional entries, used between the tag and its content. E.g. 245 14 $athe wind in the willows means A title entry [245], which is a main title entry[1] [a filing point] and the item files [4] under the 4 th character [ w ] 18 6
Leader More tape stuff. First 24 characters are about the record 0-4 - 5 numeric length of the record Coded entries for things like 06 - Type of record 07 - Bibliographic level 17 - Encoding level etc. etc. 19 Directory A computer-generated index to the location of the variable control and data fields within a record. The Directory immediately follows the Leader at character position 24 and consists of a series of fixed-length (12 character positions) entries 20 Directory Each entry consists of the following character positions 00-02 - Tag Three numeric or alphabetic (uppercase or lowercase, but not both) characters that identify an associated field. 03-06 - Field length Four numeric characters that indicate the length of the field, including indicators, subfield codes, data, and the field terminator. The number is right justified and each unused position contains a zero. 07-11 - Starting character position Five numeric characters that indicate the starting character position of the field relative to the Base address of data (Leader/12-16) of the record. The number is right justified and each unused position contains a zero. 21 7
MaRC header data Fields of data, often fixed length. Typically contains coded data. E.g. 008 is a 40 character field Where the 26th character position is used to indicate that this record is for a computer file and A character code is used to indicate the type of file g is a game, b is a program. A textual description of the file will also be given in the notes [500s] area, at tag 516 22 MaRC as a database structure Fields [tags] are repeatable [there can be more than one added personal author, more than one subject tag] Not all fields are used Some fields are specific to particular record types [computer file, monograph. serial, etc] 23 MARC and library systems Because the MARC standard exists, and existed, And because that s how records get moved around And because systems get updated about every 7-10 years MARC is [or at least has been] often used as the database structure 24 8
MARC and access to MARC databases MARC is OK as a basis for a flat-file database system Which is subsequently indexed extensively So that users can retrieve records 25 Relational files and MARC Suppose you want the level of control that relational files can create, so that there is just one entry for the author William Shakespeare, or just one entry for the title The wind in the willows [whether film, book, cartoon, television series, etc] Then tables are needed for authors, titles, publishers, places, subject headings, subheadings, sub-sub-headings, etc. And perhaps a single table for the records themselves. 26 MaRC and XML Library of Congress has developed a number of XML schema for MARC, for authority records They have also developed a specific DTD for use with transmission of MARC/XML records XML-native databases and cross-walks to other XML structures are then possible. 27 9
Will this be on the exam? Sort of you should have some knowledge of MARC when you are discussing options for metadata schema. 28 Conclusion MARC is now here [to stay?]. A tape standard abused into being a base data [and database?] standard. Is complex and inconsistent and very very detailed. Is used for a variety of data formats, with different data elements. Has been given extended life by being migrated into new XML environments. 29 10