Author Archive

The future of bibliographic control: Data infrastructure

May 3, 2010

In January 2008, the Library of Congress Working Group (LCWG) on the Future of Bibliographic Control issued a report with findings on the state of bibliographic control, recommendations for the Library of Congress as well as libraries of all types across the country, and predictions of what might happen if recommended changes do not take place. [1] Recommendations ranged from increasing efficiency of bibliographic record production to positioning cataloging technology – and the cataloging community as a whole – for the future to strengthening the library and information science profession. As someone who works with other library technologies but has no experience with cataloging (other than MLIS coursework), my interest in this topic is drawn primarily to the future of cataloging technology. As I see it, the future of bibliographic control is tied to data infrastructure.

I don’t think I did a satisfactory job of explaining my position in an earlier post; my criticism of RDA is not based on the vocabularies or the not-yet-released text but on the decision to retain the MARC21 standard with the implementation of RDA rules based on the FRBR model. I strongly believe that updating the metadata infrastructure will have benefits in several of the areas discussed in the LCWG report, including sharing bibliographic data, eliminating redundancies in cataloging, and strengthening the LIS community’s place in the information arena. Even before the report’s release two years ago, the cataloging community was issuing calls for a more extensible metadata infrastructure that would permit data sharing both within and outside libraryland. [2] An important outcome (perhaps the most important outcome?) of the LCWG report is the increased discussion of the metadata infrastructure issue among the cataloging community in the literature, the blogosphere, and email listservs. [3, 4]

Reducing redundancy and increasing efficiency by sharing metadata

The first set of recommendations in the LCWG report dealt with eliminating redundancies; this goal has not been accomplished yet but the cataloging community’s discussions about formatting records in RDA to facilitate sharing among entities within and outside libraryland are a start. Among the LCWG recommendations to increase use of bibliographic data available earlier in the supply chain were recommendation 1.1.1.2:

“All: Analyze cataloging standards and modify them as necessary to ensure their ability to support data sharing with publisher and vendor partners;”

and recommendation 1.1.1.5:

“All: Work with publishers and other resource providers to coordinate data sharing in a way that works well for all partners.”

Not to minimize concerns about loss of bibliographic control, but libraries might as well take advantage of the metadata created elsewhere by trusted partners; if partners are selected carefully, the benefits should outweigh the risks. Shared metadata could reduce the number of redundant records and, by distributing the responsibility (or burden, if you wish) of creating metadata among more players, each party might reclaim time, funding, or manpower to put toward other efforts. Of course, these arguments are essentially theoretical until they can be tested. Getting libraries to a point where we can try sharing more data with other entities and information sources will require a shift in attitudes and comfort zones as well as a change in the technology supporting our records.

Positioning our technology for the future

Section 3 of the LCWG report called for the greater cataloging community to “Position Our Technology for the Future.” [1] The first recommendation in this section was to “develop a more flexible, extensible metadata carrier,” including:

“3.1.1.1 LC: Recognizing that Z39.2/MARC are no longer fit for the purpose, work with the library and other interested communities to specify and implement a carrier for bibliographic information that is capable of representing the full range of data of interest to libraries, and of facilitating the exchange of such data both within the library community and with related communities.”

One potential replacement for the MARC standard is RDF/XML. The RDF (resource description framework) data model and XML (eXtensible Markup Language) syntax were in existence well before the release of the LCWG report but are getting more attention from the cataloging community as discussion turns to data management and the Semantic Web. [5] Although there are other languages which might prove to be suitable, XML is well-enough established to be in wide use, including by many of our potential partners in metadata sharing.

XML can be used with established HTML tags for formatting (“markup”) but is infinitely more adaptable (“extensible”) because the user can define his or her own XML tags to be used as “containers” for different types of data. The tags are then used for data manipulation and display by referencing data, using unique identifiers to pull them from Web-accessible databases. Essentially, XML enables computers to read and process data, which is one of the main principles of the Semantic Web. MARC was designed to make metadata readable by machines, too (hence the name Machine Readable Cataloging), but the problem is that no one outside of libraries, publishers, and distributors is using MARC21. XML, on the other hand, is not only machine-readable but also machine-actionable and it isn’t limited to libraries and related industry; it’s used by players in all kinds of fields. What does this have to do with the future of bibliographic control? Packaging our metadata in an arrangement that is flexible, machine-accessible, and, perhaps more importantly, used by others outside of libraries but within the information arena would permit more give-and-take with record creation, hopefully resulting in less duplication of effort and more accurate records (as long as everyone uses the same tags, which was touched on by recommendations 3.1.1.2 and 3.1.3 in the LCWG report and is another discussion unto itself). By letting the machines do the heavy lifting, so to speak, we could then use the data more efficiently and with more confidence. This would have benefits both for the cataloging community and our users.

Go where the patrons are, or: How I learned to stop worrying and love Web 2.0

Library users are demonstrating different search strategies than in the past; now, users often search for bibliographic information from sources like Amazon.com and Google instead of the OPAC. [6] Web-based tools like LibraryThing pull in bibliographic metadata, reviews, and references to the item found online (such as on Wikipedia). Sources of information like Amazon.com and Google are often more intuitive to the user than a typical OPAC, so it’s not surprising that users use what they are comfortable with. Instead of watching from the sidelines, libraries should join in and take advantage of the metadata that’s already available on the Web. The phrase, “Go where your patrons/users/customers are,” is often applied to libraries’ use of Web-based technologies and social media and it is applicable here too.

In addition to importing jacket cover images and professionally-generated reviews from non-library sources, some library OPACs also are satisfying users’ desire to contribute user-generated content like ratings, reviews, and comments. Despite the increase in user-generated content, and the users’ desire to generate said content, libraries want to maintain bibliographic control by not permitting users to edit catalog data. Although maintaining control in this manner is understandable, given the lack of cataloging training held by the majority of users, it seems like libraries could harvest some data from users – with some limitations on what can be edited – with less output of effort than doing all original cataloging and without sacrificing the integrity of data created by trained catalogers. In other words, wouldn’t some help be better than no help? I don’t think the question could be answered adequately without giving it a shot. It is in the best interest of the LIS profession to implement and embrace the Web 2.0 features our patrons want; we can benefit from the give-and-take of metadata from patrons and other sources while keeping ourselves relevant as an online source of information.

Still waiting for the right outcome

The outcome of the LCWG report in these aspects has been more discussion, rather than decisions. In addition to a data container that will work with others inside and outside libraryland, a new data structure would, ideally, provide catalogers with linked access to standard vocabularies and provide for newer forms of metadata like user-generated ratings, reviews, and tags. Developing standards is such an intricate and complex process, though, that it is better to take the time to examine the situation thoroughly and try to get it right the first time rather than rush into a “solution” which does not facilitate desired functions and lacks long-term viability. That was part of the reasoning behind LCWG’s recommendation 3.2.5 to suspend work on RDA – “Assurance that RDA is based on practical realities as well as on theoretical constructs will improve support for the code in the bibliographic control community” (p. 30) – a recommendation which has not been adopted by the Joint Steering Committee for Development of RDA. The retention of MARC21 will have implications on libraries’ ability to incorporate other LCWG recommendations which might be realized sooner with the proper metadata infrastructure.

Notes

  1. Library of Congress Working Group. (2008). Report of the Library of Congress Working Group on the Future of Bibliographic Control. On the Record, January 2008. Retrieved from: http://www.loc.gov/bibliographic-future/news/lcwg-ontherecord-jan08-final.pdf.
  2. See, for example: Coyle, K. & Hillmann, D. (2007). Resource Description and Access (RDA): Cataloging rules for the 20th century. D-Lib Magazine, 13(1/2). Retrieved from: http://www.dlib.org/dlib/january07/coyle/01coyle.html.
  3. Coyle, K. (2010). RDA vocabularies for a twenty-first-century data environment. Library Technology Reports, 46(2). Retrieved from: http://alatechsource.metapress.com/content/k7r37m5m8252/?p=f27fdbe2e2904acfbea08ee4c96e8ad8&pi=1 (links to each of the six chapters/articles available here).
  4. RDA-L (RDA email listserv). Retrieved from: http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/.
  5. See, for example: Coyle, K. (2010). Understanding the Semantic Web: Bibliographic Data and Metadata. Library Technology Reports, 46(1). Retrieved from: http://alatechsource.metapress.com/content/g212v1783607/?p=a596ecbea377451cbc6a72c8e28bb711&pi=2 (links to each of the three chapters/articles available here).
  6. De Rosa, C. et al. (2005). Perceptions of Libraries and Information Resources. Dublin, OH: OCLC Online Computer Library Center. Retrieved from: http://www.oclc.org/reports/pdfs/Percept_all.pdf.
Advertisements

Let’s be brave!

April 17, 2010

From what I’ve read and heard in the course so far, I’m finding many reasons to think RDA is not the new standard we need. I don’t think it’s enough of the change we need. Karen Coyle and Diane Hillmann’s 2007 paper makes a complete and concise argument against RDA for a number of reasons. [1]

Trying to fit RDA to MARC feels like we’re holding ourselves back. Don’t get me wrong; MARC served a purpose in moving records from the card catalog to being machine-readable. I’m sure that was heady stuff in its day. But technology has developed so much since then – if we’re building new systems, why not use new models and new tools? I thought libraries were supposed to be on the cutting edge of technology, incorporating it into our collections and introducing it to our patrons. Some of the resistance to a new standard – one that’s not so closely tied to the current way of doing things – feels like fear of the unknown. Let’s be brave!

Count me among those who wish to see libraries enter the digital age like we mean it. As Coyle and Hillmann point out, a library’s signature service is the catalog. [1] If libraries are to compete with services like Google and Amazon, our catalogs must offer the same ease of use and ability to connect users with the materials they seek. Ultimately the patron just wants to find what he or she is looking for. I’m all for the value-added services a librarian can provide in helping a patron locate materials, but we also need a cataloging standard that will support the functions of identifying and finding materials without an expert guide. And, from the cataloger’s perspective, we need a standard that is efficient and easy to use. Thanks to keyword searching, we’re no longer limited in our choice of access points. At PLA last month, I heard the same thing over and over in the sessions I attended: meet your patrons where they are. New cataloging standards should reflect the way users search for data.

Patron search behavior isn’t the only thing that has changed with the digital era, of course; today libraries regularly collect more than print materials (true, many special libraries always collected more than print, such as art and music, but here I refer to other types of library as well). DVDs, CDs, electronic periodicals, websites, and web-based databases have joined the books and print journals. The current standards are no longer adequate for non-book materials, especially those without title pages or colophons, and resources published in several formats. [1] Schneider, as cited by Coyle and Hillmann, touched on this as well: “The ‘multiple versions problem,’ [is] one of the more glaring ways that current cataloging rules no longer serve the library’s users, and even hinder the ability of systems designers to provide an efficient service for library catalog users.” [2]

Trying to fit RDA (or any new standard) into the old infrastructure seems like a waste of time, money, and brainpower in the long run because MARC will limit what we can do. A markup language like XML, on the other hand, is more flexible and customizable. It can be adapted by other industries or entities outside library land, in keeping with the JSC’s goals: “The new standard is being developed for use primarily in libraries, but consultations are being undertaken with other communities (archives, museums, publishers, etc.) in an effort to attain an effective level of alignment between RDA and the metadata standards used in those communities.” [3] I agree with Coyle and Hillmann’s assertion that it is “[f]ar better not to ‘stay the course’ on RDA, but to set a new goal to achieve consensus on the top layer: model, basic principles and general rules, and leave the details to the specialized communities.” [1]

With the amount of resources invested in the development of new standards, why aren’t we aiming higher? I’m not a cataloger; I don’t have years of experience to inform my opinion. But I think we’re setting ourselves up for a system that will need another revision sooner than later.

1. Coyle, Karen and Diane Hillmann. Resource Description and Access (RDA): Cataloging rules for the 20th century. D-Lib Magazine, Jan./Feb. 2007, v. 13, no. 1/2. Available at: http://www.dlib.org/dlib/january07/coyle/01coyle.html

2. Schneider, Karen. How OPACS Suck. ALA Techsource, 2006. Available at: http://www.techsource.ala.org/blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the-lack-of-it.html (Part 1); http://www.techsource.ala.org/blog/2006/04/how-opacs-suck-part-2-the-checklist-of-shame.html (Part 2); http://www.techsource.ala.org/blog/2006/05/how-opacs-suck-part-3-the-big-picture.html (Part 3)

3. Joint Steering Committee for the Revision of Anglo-American Cataloguing Rules. RDA: Resource Description and Access: Prospectus. Available at: http://www.collectionscanada.ca/jsc/rdaprospectus.html.