The future of bibliographic control: Data infrastructure

by

In January 2008, the Library of Congress Working Group (LCWG) on the Future of Bibliographic Control issued a report with findings on the state of bibliographic control, recommendations for the Library of Congress as well as libraries of all types across the country, and predictions of what might happen if recommended changes do not take place. [1] Recommendations ranged from increasing efficiency of bibliographic record production to positioning cataloging technology – and the cataloging community as a whole – for the future to strengthening the library and information science profession. As someone who works with other library technologies but has no experience with cataloging (other than MLIS coursework), my interest in this topic is drawn primarily to the future of cataloging technology. As I see it, the future of bibliographic control is tied to data infrastructure.

I don’t think I did a satisfactory job of explaining my position in an earlier post; my criticism of RDA is not based on the vocabularies or the not-yet-released text but on the decision to retain the MARC21 standard with the implementation of RDA rules based on the FRBR model. I strongly believe that updating the metadata infrastructure will have benefits in several of the areas discussed in the LCWG report, including sharing bibliographic data, eliminating redundancies in cataloging, and strengthening the LIS community’s place in the information arena. Even before the report’s release two years ago, the cataloging community was issuing calls for a more extensible metadata infrastructure that would permit data sharing both within and outside libraryland. [2] An important outcome (perhaps the most important outcome?) of the LCWG report is the increased discussion of the metadata infrastructure issue among the cataloging community in the literature, the blogosphere, and email listservs. [3, 4]

Reducing redundancy and increasing efficiency by sharing metadata

The first set of recommendations in the LCWG report dealt with eliminating redundancies; this goal has not been accomplished yet but the cataloging community’s discussions about formatting records in RDA to facilitate sharing among entities within and outside libraryland are a start. Among the LCWG recommendations to increase use of bibliographic data available earlier in the supply chain were recommendation 1.1.1.2:

“All: Analyze cataloging standards and modify them as necessary to ensure their ability to support data sharing with publisher and vendor partners;”

and recommendation 1.1.1.5:

“All: Work with publishers and other resource providers to coordinate data sharing in a way that works well for all partners.”

Not to minimize concerns about loss of bibliographic control, but libraries might as well take advantage of the metadata created elsewhere by trusted partners; if partners are selected carefully, the benefits should outweigh the risks. Shared metadata could reduce the number of redundant records and, by distributing the responsibility (or burden, if you wish) of creating metadata among more players, each party might reclaim time, funding, or manpower to put toward other efforts. Of course, these arguments are essentially theoretical until they can be tested. Getting libraries to a point where we can try sharing more data with other entities and information sources will require a shift in attitudes and comfort zones as well as a change in the technology supporting our records.

Positioning our technology for the future

Section 3 of the LCWG report called for the greater cataloging community to “Position Our Technology for the Future.” [1] The first recommendation in this section was to “develop a more flexible, extensible metadata carrier,” including:

“3.1.1.1 LC: Recognizing that Z39.2/MARC are no longer fit for the purpose, work with the library and other interested communities to specify and implement a carrier for bibliographic information that is capable of representing the full range of data of interest to libraries, and of facilitating the exchange of such data both within the library community and with related communities.”

One potential replacement for the MARC standard is RDF/XML. The RDF (resource description framework) data model and XML (eXtensible Markup Language) syntax were in existence well before the release of the LCWG report but are getting more attention from the cataloging community as discussion turns to data management and the Semantic Web. [5] Although there are other languages which might prove to be suitable, XML is well-enough established to be in wide use, including by many of our potential partners in metadata sharing.

XML can be used with established HTML tags for formatting (“markup”) but is infinitely more adaptable (“extensible”) because the user can define his or her own XML tags to be used as “containers” for different types of data. The tags are then used for data manipulation and display by referencing data, using unique identifiers to pull them from Web-accessible databases. Essentially, XML enables computers to read and process data, which is one of the main principles of the Semantic Web. MARC was designed to make metadata readable by machines, too (hence the name Machine Readable Cataloging), but the problem is that no one outside of libraries, publishers, and distributors is using MARC21. XML, on the other hand, is not only machine-readable but also machine-actionable and it isn’t limited to libraries and related industry; it’s used by players in all kinds of fields. What does this have to do with the future of bibliographic control? Packaging our metadata in an arrangement that is flexible, machine-accessible, and, perhaps more importantly, used by others outside of libraries but within the information arena would permit more give-and-take with record creation, hopefully resulting in less duplication of effort and more accurate records (as long as everyone uses the same tags, which was touched on by recommendations 3.1.1.2 and 3.1.3 in the LCWG report and is another discussion unto itself). By letting the machines do the heavy lifting, so to speak, we could then use the data more efficiently and with more confidence. This would have benefits both for the cataloging community and our users.

Go where the patrons are, or: How I learned to stop worrying and love Web 2.0

Library users are demonstrating different search strategies than in the past; now, users often search for bibliographic information from sources like Amazon.com and Google instead of the OPAC. [6] Web-based tools like LibraryThing pull in bibliographic metadata, reviews, and references to the item found online (such as on Wikipedia). Sources of information like Amazon.com and Google are often more intuitive to the user than a typical OPAC, so it’s not surprising that users use what they are comfortable with. Instead of watching from the sidelines, libraries should join in and take advantage of the metadata that’s already available on the Web. The phrase, “Go where your patrons/users/customers are,” is often applied to libraries’ use of Web-based technologies and social media and it is applicable here too.

In addition to importing jacket cover images and professionally-generated reviews from non-library sources, some library OPACs also are satisfying users’ desire to contribute user-generated content like ratings, reviews, and comments. Despite the increase in user-generated content, and the users’ desire to generate said content, libraries want to maintain bibliographic control by not permitting users to edit catalog data. Although maintaining control in this manner is understandable, given the lack of cataloging training held by the majority of users, it seems like libraries could harvest some data from users – with some limitations on what can be edited – with less output of effort than doing all original cataloging and without sacrificing the integrity of data created by trained catalogers. In other words, wouldn’t some help be better than no help? I don’t think the question could be answered adequately without giving it a shot. It is in the best interest of the LIS profession to implement and embrace the Web 2.0 features our patrons want; we can benefit from the give-and-take of metadata from patrons and other sources while keeping ourselves relevant as an online source of information.

Still waiting for the right outcome

The outcome of the LCWG report in these aspects has been more discussion, rather than decisions. In addition to a data container that will work with others inside and outside libraryland, a new data structure would, ideally, provide catalogers with linked access to standard vocabularies and provide for newer forms of metadata like user-generated ratings, reviews, and tags. Developing standards is such an intricate and complex process, though, that it is better to take the time to examine the situation thoroughly and try to get it right the first time rather than rush into a “solution” which does not facilitate desired functions and lacks long-term viability. That was part of the reasoning behind LCWG’s recommendation 3.2.5 to suspend work on RDA – “Assurance that RDA is based on practical realities as well as on theoretical constructs will improve support for the code in the bibliographic control community” (p. 30) – a recommendation which has not been adopted by the Joint Steering Committee for Development of RDA. The retention of MARC21 will have implications on libraries’ ability to incorporate other LCWG recommendations which might be realized sooner with the proper metadata infrastructure.

Notes

  1. Library of Congress Working Group. (2008). Report of the Library of Congress Working Group on the Future of Bibliographic Control. On the Record, January 2008. Retrieved from: http://www.loc.gov/bibliographic-future/news/lcwg-ontherecord-jan08-final.pdf.
  2. See, for example: Coyle, K. & Hillmann, D. (2007). Resource Description and Access (RDA): Cataloging rules for the 20th century. D-Lib Magazine, 13(1/2). Retrieved from: http://www.dlib.org/dlib/january07/coyle/01coyle.html.
  3. Coyle, K. (2010). RDA vocabularies for a twenty-first-century data environment. Library Technology Reports, 46(2). Retrieved from: http://alatechsource.metapress.com/content/k7r37m5m8252/?p=f27fdbe2e2904acfbea08ee4c96e8ad8&pi=1 (links to each of the six chapters/articles available here).
  4. RDA-L (RDA email listserv). Retrieved from: http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/.
  5. See, for example: Coyle, K. (2010). Understanding the Semantic Web: Bibliographic Data and Metadata. Library Technology Reports, 46(1). Retrieved from: http://alatechsource.metapress.com/content/g212v1783607/?p=a596ecbea377451cbc6a72c8e28bb711&pi=2 (links to each of the three chapters/articles available here).
  6. De Rosa, C. et al. (2005). Perceptions of Libraries and Information Resources. Dublin, OH: OCLC Online Computer Library Center. Retrieved from: http://www.oclc.org/reports/pdfs/Percept_all.pdf.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: