Author Archive

User-generated cataloguing and its expanding role

May 1, 2010

Although it provides a set of guidelines for the library world as a whole, “On the Record”, the report of the Library of Congress Working Group on the Future of Bibliograhic Control, was primarily directed at the Library of Congress itself. By June of the same year, the Library of Congress had published a report responding to the recommendations of “On the Record”. That report is available online at; it makes the point early that it “is not an official program statement from the Library of Congress, nor is it an implementation plan”. However, the report is generally enthusiastic about the vision outlined in “On the Record”.

Among the recommendations of “On the Record” were three suggestions in section 4.1, ” Design for Today’s and Tomorrow’s User”, regarding greater integration of library bibliographic data with external sources. The changes discussed are not those related to the possible use of FRBR to make library bibliographic data part of the Semantic Web; “On the Record” deals with user-created content, such as ‘tags’ and user-supplied reviews. The Library of Congress responded to these suggestions by naming a number of small projects focused on similar goals, and resolving to support their work and search out more projects. Amoung the projects named were:

  • The Library of Congress’s own Bibliographic Enrichment Advisory Team (BEAT), which attempts to add data such as tables of contents and reviews to bibliographic records.
  • WPopac Project at Plymouth State University, which supports user-generated tags for records
  • PennTags at the University of Pennsylvania Libraries , similar to WPopac
  • The Library of Congress’s Prints and Photograph Division Flickr project, which allows users to tag photos, with significant guidelines regarding how to tag

These projects already existed before “On the Records” appeared, and the Library of Congress did not propose any new projects in response to the report.

The fate of these projects has been mixed:

  • The BEAT project‘s website has not been updated since 2008, and the project itself is not mentioned in any other articles I have been able to locate since that date; as the website describes it as an all-volunteer project, it’s possible that the volunteers involved have moved on without finding anyone to take up the slack.
  • WPopac has been renamed Scriblio, and is still being worked on; the software is in use by several libraries. It is not clear if Scriblio still supports the addition of user-generated content.
  • PennTags still exists, and the University of Pennsylvania’s Franklin Library online catalog still includes a link to “Add to PennTags” at the bottom of each record. However, I was unable to locate any records which included PennTags content. Either the system has not been utilized enough to make tags available for all or most records, or the tags are only visible to those logged into the website as members of the UPenn community; while only allowing community members to contribute tags makes sense, it seems odd that only community members would be allowed to use them for browsing.
  • However, the Library of Congress’s Prints and Photographs Flickr project has been healthy and successful. In October of 2008, a report was released discussing the progress of the project. At that point, of the close to 5000 images the LOC had uploaded to Flickr, almost all had been tagged, and more than 65,000 tags had been added in total to the collection. Users had also added helpful comments to many photos, including information such as detailed description of locations and events, and links to related photos.

The Prints and Photographs project had one additional outcome that demonstrates the power of user-supplied content: it inspired the creation of the Flickr Commons, a more broad-reaching project to allow Internet users at large to assist in describing and tagging image collections of historical or cultural importance. This project currently includes collections from over 40 institutions, and is an apt demonstration of the power of user-supplied content.

The success of the Flickr Commons compared to other projects involving user-generated content suggests that the guidance suggested in recommendation is a vital component in developing useful user-generated content. Other factors which might have contributed to its success were a connection with an already thriving collaborative community, as Flickr has a large user base and publicized the project heavily, and ease of contributing, as photographs are relatively uncontroversial and easy to identify. Collaborating with existing sites might be a worthwhile strategy to pursue for libraries attempting to add user-generated content.

The “Response” pointed out that “the relationship of entry vocabulary to controlled terms is a challenge for all
catalogs”, and that accordingly, much guidance will need to be provided to allow users to add useful and meaningful tags. It mentioned the extensive guidance provided in the Flickr project as a reason for its success. This is an important point that will need to be remembered. The response also mentions the ongoing debate about pre-coordination and post-coordination of Library of Congress subject headings; the implications of user-generated content as related to subject headings are too numerous to discuss in detail here.

Although the “Response” shows good intentions, it seems that little has in fact been accomplished in response to its recommendations regarding community interaction. The Flickr Commons project arose independently of “On the Record”, although the Commons is aligned with the community-related goals stated in it. However, its success demonstrates that user-generated content can be used effectively to enhance a catalogue, and that, indeed, given a sufficiently large and motivated community can become a catalogue in itself.

Why have more libraries not made the leap to include user-generated content? A possible reason is identified in the “Response” itself, although not as answers to that question: “preserving the library-created data is essential to both access and reuse in the future”. Libraries may be concerned that user-generated content would not be sufficiently differentiated from content originating with catalogers, and that the content might not be of the best possible quality. This is a legitimate concern. However, accepting user-generated data “without interfering with the integrity of library-created data” (as suggestion puts it) is a technical problem, not a systemic problem. Appropriate OPAC interface design should allow segregation by content source, and perhaps even allow users to hide content from external sources if they do not think it helps them locate information.

Included as well under the general heading of “positioning our community for the future” were the broad suggestion of additional testing of FRBR (section 4.2), and a number of specific proposals related to reshaping the Library of Congress Subject Headings (under section 4.3). Proposal was to “Make LCSH openly available for use by library and non-library stakeholders””; was to “Transform LCSH into a tool that provides a more flexible means to create and modify subject authority data.” Again, an existing project was singled out for attention to implement the suggestion: in this case SACO, the Subject Authority Cooperative Program, which provides a means for libraries to submit proposed changes to the LCSH quickly and easily. This project is still ongoing and is working very well; more than 50,000 proposals had been submitted as of January 2010. Currently, only libraries can participate; they are required to have access to LOC’s online authority files. Making LCSH more openly available might facilitate the goal of improving it, as it would allow more public review and suggestions for improvement. suggested creating more linkages between LCSH and other subject authority files; the Response said this was technically unfeasible, although desirable. If the creation of such subject linkages were done with the assistance of interested members of the public, the unmanageably large task might become possible to complete.

The possibilities of user-generated content for enhancing catalogues are well known, and “On the Record” acknowledged and encouraged them. Although the “Response” agreed that the proposals were good ideas, so far little has been done to implement them. In that sense, the report cannot be said to have had results at all. Nonetheless, that it brought these ideas into the public eye and that LOC agreed in principle with their ideas suggests that user-generated data has not been banished from the world of cataloguing – only put aside in the face of ongoing struggles to keep up with technology, and the ongoing RDA debate.


RDA’s ‘Legacy’ Approach

April 19, 2010

Hillmann and Coyle’s article on the problems of RDA (available at raises a number of important points. However, unlike at least one classmate, I disagree with their conclusion that a fundamental restructuring is necessary. By using ‘legacy’ forms, RDA becomes a development of cataloging standards, rather than a completely new form of cataloging; the more different RDA is from existing forms, the more difficult the transition will be. The article was written in 2007, and since then, the standard has changed; however, the idea that RDA is ‘not different enough’ seems to be alive and well.

The other idea that is alive and well is that RDA will be too difficult to implement; that the cost of a subscription alone will prevent small libraries from using RDA, and that the complexity will scare off potential users (see for example I don’t think that complexity is necessarily a problem in itself. As a number of commentors have pointed out (for example, Roy Tennant at cataloguing as it exists is already very complex; MARC has fields that very few people understand or use, and more are being added to deal with RDA-specific information. The problem comes when cataloguers familiar with AACR2 are asked to learn a new system from scratch, when the benefits of it are not readily apparent. Problems will also arise when attempting to convert existing records, if the differences between the two systems are too great; as Tennant’s post points out, some of the extra complexity is to assure ‘a smoother transition’.

It’s great to expand the cataloging universe, but we shouldn’t forget that most of a library catalog’s contents are already present and accounted for. I wince when I see articles discussing how the catalog must change if it’s to compete with Google. It isn’t. Google serves its functions well, but for anything beyond known-item searches – that is to say, for a website you’ve already visited, or that you know must exist even if you’re unsure of the URL – it’s far less than ideal, a fact all but the most casual users are already aware of. Google even has its own version of the ‘multiple-versions problem’, in the form of webpages that borrow content from Wikipedia; unlike library catalogs, it has made no attempt to solve it, and seems doomed to index the web at the ‘item’ level and only the ‘item’ level. Catalogs have a much smaller world to index, and do it better. The structured format of catalogs allows more precise searching; subject headings are even better than keyword searching at precise retrieval.

There are problems with the content of catalogs as they exist, but they do exist, and do their job well; transitioning to a new format is a Herculean enough task without tossing out what we already have. Burton stated in that, “Trying to fit RDA (or any new standard) into the old infrastructure seems like a waste of time, money, and brainpower in the long run because MARC will limit what we can do.” Similarly, raises problems with the very idea of MARC because its foundations in the card catalog left so much of a mark. (Pun not intended.)

The trouble is that new solutions will have to be built on the old infrastructure. MARC can, and should be, expanded to deal with additional types of information, without reducing its present usefulness. Backwards-compatibility is an important goal for software design, because users of a new version will inevitably have old files they need to keep using. We already have library catalogs, and the contents of existing catalogs will continue to be the resources most catalog users are searching for. By sticking closely to current standards, RDA is ensuring that instead of a ‘clean break’, we can have a smooth, effortless transition.