A Pain in the Metadata

Seb points to an interesting presentation on metadata written by Stefano Mazzocchi.

The presentation dances around an issue we ran into like a brick wall: quality metadata is needed to provide quality search and retrieval on large collections of material, however, the amount of human effort needed to create the metadata is directly proportional to the size of the collection.

This does not scale well and has to be done in a distributed way unless you can afford a room of librarians on staff. The problem with distributed metadata creation is one of training. Expecting our usual web content contributors to be experts in applying our full thesaurus is not realistic. Hell, I’m not an expert in applying it either.

So what to do? I’m open to suggestions!

We are experimenting with targeting specific, high knowledge-value, subcollections for in-depth metadata tagging by a central expert. A ‘shallow’ representation of the full thesaurus would be used for indexing normal content on the web site by distributed content contributors.

The idea is that the high-value resources, typically used in academic research, allow for the most finely tuned searching while less valuable content is tagged in much less detail. All of it in combination should be supportable by existing staff resources.

I also want to explore allowing our users to rate the value of individual pages/items and see if that provides better rankings than we can do internally.