LC and the Semantic Web

by

Early in 2008, the Library of Congress Working Group on the Future of Bibliographic Control published the report On the Record.  This Report, directed at both the Library of Congress and the American bibliographic community at large, was to serve as a “’call to action; that informs and broadens participation in discussion and debate, conveys a sense of urgency, stimulates collaboration, and catalyzes thoughtful and deliberate action.” (pp. 3)  The Working Group recognized that the “future of bibliographic control will be collaborative, decentralized, international in scope, and Web-based.” (pp.4)  Pursuant to this assertion, the Working Group made a number of recommendations that pushed libraries ever closer to participation within the Semantic Web.  The Semantic Web, with its inherent organization and utilization of linked data, is an ideal venue for libraries and, more importantly, library data.  By putting library data into the structure of the web itself, thus allowing this data to be used by a wide variety of information communities, libraries can gain a greater sense of relevance and importance in this increasingly digital world.

It is this driving of library practice and standards towards the Semantic Web and the clear articulation of both benefits of the change and the inherent issue in the status quo that is, perhaps, one of the most important outcomes of this Report since its publication.  Yet, this drive has not been without difficulty or setbacks.  While the Report calls for sweeping change and increased collaboration and communication, the actions of the Library of Congress and the reluctance of the library community do not necessarily echo this charge.    As discussed in the Report and on the message boards for this course, the fate of libraries as significant information providers hangs on their ability to follow their users into web.  The third recommendation of the Report exhorts libraries to position themselves and their technology for the future “by recognizing that the World Wide Web is both our technology platform and the appropriate platform for the delivery of our standards.”  (pp. 5)  Additionally, the library community must recognize that “machine applications” are also potential users of “the data we produce in the name of bibliographic control”. (pp.5)  Currently, libraries, via their catalogs, are on the web.  However, the data that libraries produce are locked within their databases.  This data is not in the web in that it cannot be utilized, shared, mashed up, or effectively linked too (or, at least, not with any real ease).  By remaining on the outskirts of what has become a flourishing information and communication platform, libraries do themselves—and their patrons—a great disservice.

The Working Group focuses the efforts of libraries and LC on entering the Semantic Web by advocating for a change in the current standards libraries use to maintain and share their data.  One standard in particular is MARC.  I have already written about what I perceive to be the limitations of MARC in the Semantic Web.  In 3.1.1, the Working Group recognizes that “Z39.2/MARC are no longer fit” standards for metadata and calls on the Library of Congress to “work with the library and other interested communities” to create a metadata carrier that will be amenable to libraries and that will allow libraries to exchange data with other information communities.” (pp.25)  By moving from the MARC “stack” and by actively collaborating with other information communities libraries will be well placed to interact within a web environment.

This is a fairly bold statement, particularly coming from an institution such as the Library of Congress, which  currently holds the responsibility for maintaining MARC21 (pp.7).  While this is something that other librarians or information professionals had been discussing, coming from the Library of Congress, this carrier a certain amount of weight.  Even if LC does not have the mandate (pp.6), and matching funding, to be the “National” library, is has undertaken this role and it certainly leads by example .  In this Report, LC demonstrates its open-mindedness and practicality by looking the future square in the eye.

However, I am unconvinced that LC has made much headway in this area –a movement that I recognize as easier said than done.  While bibliographic utilities such as OCLC [cite] can convert library data into other, more interoperable standards, the library community as a whole is still MARC based.  On April 21st, LC released more information on its testing of RDA.  This testing is still inherently MARC based, with additional fields added to bibliographic records to indicate manifestations, while work and expression will be created as MARC authority records.  I am saddened that LC is not taking the opportunity, with the emergence of the new cataloging code, to perhaps embrace a new carrier, instead of adding more complexity to the already dated MARC standard.  This also, as indicated in class discussion by Elizabeth Mitchell, directly contradicts 3.1.3 of the Report, which calls for the entire library community to “include standard identifiers for individual data elements in bibliographic records.”  MARC currently favors textual strings, not the URIs from which the Semantic Web draws its power.  While I acknowledge that the blow of a new standard might be mitigated by encoding it in the familiar way, this could be a setback for libraries in their effort to enter the web.

The Library of Congress has been much more successful in preparing and releasing its vocabularies in forms more conducive for sharing via the web.  This is a very important and very impressive move, as the controlled vocabularies utilized by libraries is what has lead, in part, to the richness and coherence of our bibliographic descriptions.  The web friendly version of LCSH is located here: http://id.loc.gov/authorities/.  In the “About” section, LC acknowledges the “Linked Data” community and provides a list of other vocabularies or codes that will soon receive the same treatment.  Benefits, for both users and machines, are outlined.  Users can download entire vocabularies in RDF/XML or N-Triples.  Here, LC follows its own suggestions and embraces the power of the URI.  Thus “Fencing coaches” can be found at http://id.loc.gov/authorities/sh93010603, a location based on an alphanumerical string instead of the usual textual string matching.  Other communities in the web now can use this concept via this link or its RDF/XML format, forge a link between this particular URI and similar or related concepts, and generally enhance what is already a pretty powerful tool.  This tool also raises the profile of libraries by not only bringing the data out into the web, but also demonstrating that libraries are now willing and interested in sharing and playing with the rest of the information community.

Yet, this move was not without seeming difficulty and controvery.  LC launched this particular system only after it asked LC cataloger Ed Summers to shut down his own SKOS-generated version of LCSH, formerly located at http://lcsh.info. (More information on the creation of this vocabulary can be found here: http://arxiv.org/abs/0805.2855). Though the version hit the web less than a year after Summers’ site came down, by terminating this innovative service, a service that was already in use by others in the metadata and library communities, the Library of Congress looked somewhat reactionary, if not backwards (http://lcsh.info/comments1.html).   I do understand that LC might want to have more centralized control over bibliographic tools that they developed over years, but I am not fully convinced that this aggregated information, added to for years by librarians around the country, is solely within their domain.  Despite the legal consideration, LC’s action, as a commenter on Summers’ closing point indicated, seemed to fly in the face of the Working Group’s recommendation that LC consider their strengths and priorities and allow others in the community to pick up the slack and innovate for them.  While LC eventually joined the Semantic Web party, so to speak, they made it clear that they would be doing so only on their own terms.  Their actions might also impact innovators who might have, on their own, taken the time to prepare for the Semantic Web other LC tools or tools involving LC data.

Clearly, LC is trying to continue their work in bringing libraries into the Semantic Web and I applaud this commitment.  In furtherance of this goal, I would like to see them move towards adopting or integrating the RDA vocabularies to augment or supplement their existing vocabularies and resources.  Initially in their Report, the Working Group did call for the cessation of RDA development (3.2.5).  This was due to the unsatisfactory business case, a lack of confidence in the benefits of the new code, and a sense that FRBR was perhaps too untried for straight implementation (pp.25).  Later LC moved towards recommending a period of testing instead (See LC’s comment on the Report here).  In the Report, the working committee called for RDA/DCMI collaboration and development of a “Bibliographic Description Vocabulary” (pp.25).  As seen in LC’s response to the Report, they are still committed towards supporting this work and in developing other vocabularies along the same line (LC response, pp.41).By helping incorporate the RDA Vocabularies into RDA testing and implementation, LC could truly start moving towards cataloging in a web environment.  This step could only be improved by including solid efforts towards finding a replacement carrier for data, ideally one that will interoperate with MARC.  While this will undoubtedly be easier said than done, it is necessary to the future of not only bibliographic control, but of libraries themselves.

Advertisements

One Response to “LC and the Semantic Web”

  1. eawm Says:

    Thank you, Maggie! I learned still more from this post about LC’s ambivalence toward linked data, Semantic Web, and its own leadership role. Too bad they shut down the SKOS-generated LCSH.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: