Author Archive

LC and the Semantic Web

May 1, 2010

Early in 2008, the Library of Congress Working Group on the Future of Bibliographic Control published the report On the Record.  This Report, directed at both the Library of Congress and the American bibliographic community at large, was to serve as a “’call to action; that informs and broadens participation in discussion and debate, conveys a sense of urgency, stimulates collaboration, and catalyzes thoughtful and deliberate action.” (pp. 3)  The Working Group recognized that the “future of bibliographic control will be collaborative, decentralized, international in scope, and Web-based.” (pp.4)  Pursuant to this assertion, the Working Group made a number of recommendations that pushed libraries ever closer to participation within the Semantic Web.  The Semantic Web, with its inherent organization and utilization of linked data, is an ideal venue for libraries and, more importantly, library data.  By putting library data into the structure of the web itself, thus allowing this data to be used by a wide variety of information communities, libraries can gain a greater sense of relevance and importance in this increasingly digital world.

It is this driving of library practice and standards towards the Semantic Web and the clear articulation of both benefits of the change and the inherent issue in the status quo that is, perhaps, one of the most important outcomes of this Report since its publication.  Yet, this drive has not been without difficulty or setbacks.  While the Report calls for sweeping change and increased collaboration and communication, the actions of the Library of Congress and the reluctance of the library community do not necessarily echo this charge.    As discussed in the Report and on the message boards for this course, the fate of libraries as significant information providers hangs on their ability to follow their users into web.  The third recommendation of the Report exhorts libraries to position themselves and their technology for the future “by recognizing that the World Wide Web is both our technology platform and the appropriate platform for the delivery of our standards.”  (pp. 5)  Additionally, the library community must recognize that “machine applications” are also potential users of “the data we produce in the name of bibliographic control”. (pp.5)  Currently, libraries, via their catalogs, are on the web.  However, the data that libraries produce are locked within their databases.  This data is not in the web in that it cannot be utilized, shared, mashed up, or effectively linked too (or, at least, not with any real ease).  By remaining on the outskirts of what has become a flourishing information and communication platform, libraries do themselves—and their patrons—a great disservice.

The Working Group focuses the efforts of libraries and LC on entering the Semantic Web by advocating for a change in the current standards libraries use to maintain and share their data.  One standard in particular is MARC.  I have already written about what I perceive to be the limitations of MARC in the Semantic Web.  In 3.1.1, the Working Group recognizes that “Z39.2/MARC are no longer fit” standards for metadata and calls on the Library of Congress to “work with the library and other interested communities” to create a metadata carrier that will be amenable to libraries and that will allow libraries to exchange data with other information communities.” (pp.25)  By moving from the MARC “stack” and by actively collaborating with other information communities libraries will be well placed to interact within a web environment.

This is a fairly bold statement, particularly coming from an institution such as the Library of Congress, which  currently holds the responsibility for maintaining MARC21 (pp.7).  While this is something that other librarians or information professionals had been discussing, coming from the Library of Congress, this carrier a certain amount of weight.  Even if LC does not have the mandate (pp.6), and matching funding, to be the “National” library, is has undertaken this role and it certainly leads by example .  In this Report, LC demonstrates its open-mindedness and practicality by looking the future square in the eye.

However, I am unconvinced that LC has made much headway in this area –a movement that I recognize as easier said than done.  While bibliographic utilities such as OCLC [cite] can convert library data into other, more interoperable standards, the library community as a whole is still MARC based.  On April 21st, LC released more information on its testing of RDA.  This testing is still inherently MARC based, with additional fields added to bibliographic records to indicate manifestations, while work and expression will be created as MARC authority records.  I am saddened that LC is not taking the opportunity, with the emergence of the new cataloging code, to perhaps embrace a new carrier, instead of adding more complexity to the already dated MARC standard.  This also, as indicated in class discussion by Elizabeth Mitchell, directly contradicts 3.1.3 of the Report, which calls for the entire library community to “include standard identifiers for individual data elements in bibliographic records.”  MARC currently favors textual strings, not the URIs from which the Semantic Web draws its power.  While I acknowledge that the blow of a new standard might be mitigated by encoding it in the familiar way, this could be a setback for libraries in their effort to enter the web.

The Library of Congress has been much more successful in preparing and releasing its vocabularies in forms more conducive for sharing via the web.  This is a very important and very impressive move, as the controlled vocabularies utilized by libraries is what has lead, in part, to the richness and coherence of our bibliographic descriptions.  The web friendly version of LCSH is located here: http://id.loc.gov/authorities/.  In the “About” section, LC acknowledges the “Linked Data” community and provides a list of other vocabularies or codes that will soon receive the same treatment.  Benefits, for both users and machines, are outlined.  Users can download entire vocabularies in RDF/XML or N-Triples.  Here, LC follows its own suggestions and embraces the power of the URI.  Thus “Fencing coaches” can be found at http://id.loc.gov/authorities/sh93010603, a location based on an alphanumerical string instead of the usual textual string matching.  Other communities in the web now can use this concept via this link or its RDF/XML format, forge a link between this particular URI and similar or related concepts, and generally enhance what is already a pretty powerful tool.  This tool also raises the profile of libraries by not only bringing the data out into the web, but also demonstrating that libraries are now willing and interested in sharing and playing with the rest of the information community.

Yet, this move was not without seeming difficulty and controvery.  LC launched this particular system only after it asked LC cataloger Ed Summers to shut down his own SKOS-generated version of LCSH, formerly located at http://lcsh.info. (More information on the creation of this vocabulary can be found here: http://arxiv.org/abs/0805.2855). Though the version hit the web less than a year after Summers’ site came down, by terminating this innovative service, a service that was already in use by others in the metadata and library communities, the Library of Congress looked somewhat reactionary, if not backwards (http://lcsh.info/comments1.html).   I do understand that LC might want to have more centralized control over bibliographic tools that they developed over years, but I am not fully convinced that this aggregated information, added to for years by librarians around the country, is solely within their domain.  Despite the legal consideration, LC’s action, as a commenter on Summers’ closing point indicated, seemed to fly in the face of the Working Group’s recommendation that LC consider their strengths and priorities and allow others in the community to pick up the slack and innovate for them.  While LC eventually joined the Semantic Web party, so to speak, they made it clear that they would be doing so only on their own terms.  Their actions might also impact innovators who might have, on their own, taken the time to prepare for the Semantic Web other LC tools or tools involving LC data.

Clearly, LC is trying to continue their work in bringing libraries into the Semantic Web and I applaud this commitment.  In furtherance of this goal, I would like to see them move towards adopting or integrating the RDA vocabularies to augment or supplement their existing vocabularies and resources.  Initially in their Report, the Working Group did call for the cessation of RDA development (3.2.5).  This was due to the unsatisfactory business case, a lack of confidence in the benefits of the new code, and a sense that FRBR was perhaps too untried for straight implementation (pp.25).  Later LC moved towards recommending a period of testing instead (See LC’s comment on the Report here).  In the Report, the working committee called for RDA/DCMI collaboration and development of a “Bibliographic Description Vocabulary” (pp.25).  As seen in LC’s response to the Report, they are still committed towards supporting this work and in developing other vocabularies along the same line (LC response, pp.41).By helping incorporate the RDA Vocabularies into RDA testing and implementation, LC could truly start moving towards cataloging in a web environment.  This step could only be improved by including solid efforts towards finding a replacement carrier for data, ideally one that will interoperate with MARC.  While this will undoubtedly be easier said than done, it is necessary to the future of not only bibliographic control, but of libraries themselves.

A “Dear MARC” Letter…

April 16, 2010

Yee’s 2004 article on the issues related to the utilization of MARC 21 provides a much needed analysis on an oft-maligned, and perhaps somewhat misunderstood, standard.  Admittedly, I am someone who has taken MARC cataloging as a given in libraries and who has mistakenly, like many in the library community, conflated the data structure standard (MARC) and the data content standard (AACR2R) (pp. 4). As Yee amply demonstrates, this conflation has lead people to throw at MARC’s complicated and confusing feet all of the ills and woes of cataloging today.   Yet, the fact that the conflation of MARC and AACR2 is so prevalent in the library community is quite worrisome, especially when considering the future of cataloging where at least one of those standards is already slated to be replaced.  It’s no small wonder that there is some resistance to a MARC-less cataloging environment if MARC is, to many, cataloging in all of its complex and arcane glory. But, if we as a community are to cooperate and innovate with those outside of libraries, an act that many see as necessary to the very survival of libraries, our standards must become simpler or at least the lines between them less obscured.  What this means for MARC, however, isn’t quite clear.

Yee also raises an important challenge to the current notion of shared cataloging and demonstrates how its implementation is hampering future innovation within the library community (pp. 10).  Throughout her carefully researched and clearly composed article, Yee cites numerous ways that MARC could be adjusted or better implemented to assist this transition and to improve life for our users.  While Yee questions the way the library community currently utilizes MARC and how vendors impact this utilization, she never questions that this is the right standard for libraries or for information organization in general.  For all her suggestions and concerns, she still advocates for MARC.  I find this troubling as MARC is a standard that, despite its “machine readability”, is deeply steeped in the history of the card catalog.  MARC was designed to carry information based on the card– information that is necessarily heavy on textual strings (a huge issue in the digital age though Yee proposes excellent ways to mitigate this issue).  If we are moving forward with a new cataloging data standard (RDA), does this necessitate a new data carrier?  If MARC is already so woefully underutilized for our current practice and if our vendors cannot seem to make it work for us in this new environment (though that is perhaps an entirely separate post), why bring it forward with us?  Unless the shared cataloging community and vendors can change in the ways that Yee suggests, I am worried that MARC might continue to hamper the library community for years to come.

Additionally, the notion of what constitutes a shared cataloging community is again where I differ with Yee.  I did not leave Yee’s article with the sense that the shared cataloging should be expanded to include systems or people from outside of the library, and thus MARC based community.  In her final paragraph (pp. 29), Yee notes that libraries, through MARC, have been developing a semantic web for years and thus we should “be careful not to destroy what we have in a rush to emulate the rest of the world, which may be on the threshold of recognizing its own need to develop solutions similar to the ones we in the library world already employ.”  While I agree with many of the points raised on her “shopping list” of improvements to both MARC utilization and shared bibliographic systems, I can’t help but wondering if MARC is still the way to go, which is easy to say since I’ve only known MARC for a few short years, not a life time.  Moving forward and taking into account the innovation of “the rest of the world” is perhaps where we need to be as a community, for “the rest” aren’t hampered by MARC or the card.  They’re already playing nicely with each other.  If we can’t get MARC to play along as well, maybe it’s time to retire a standard that is still so seemingly imperfect and inflexible.

****

Yee, M (2004). New Perspectives on the shared cataloging environment and a MARC 21 shopping list. Library Resources & Technical Services, 48(3), 165-178.  Accessed at: http://escholarship.org/uc/item/6z76m6p9