Thesauri and Web Logs

A common tool used in knowledge management is the thesaurus. There are a variety of definitions out there but I’ll use this one for our purpose here:

Thesaurus — The vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts (for example as “broader” and “narrower?) are made explicit. (ISO 2788, 1986:2)

A thesaurus is not only a list of keywords (or terms) and their synonyms: it also embodies an overall hierarchy of related terms. These relationships can be compared to Yahoo!’s branching subject index.  An XML DTD already exists to document these relationships between terms in a thesaurus.

The importance of a thesaurus to knowledge management is that it gives a common language to users who are keywording content for an index. If everyone agrees to use the same terms for the same meaning then metadata indexes become much more effective. Consistent relationships can then be inferred among documents and other content.

Thesauri have to be living documents if they are to remain effective. New terms must be added as the language of a particular field changes. Existing terms may need to be refined or even retired if they fall out of use. This requires a human to manage the thesaurus based on feedback from the users of that thesaurus.

So how could a thesaurus be used with a blog network?  Here are some ideas:

  • Intranet bloggers use thesaurus terms to create categories for their web log. Readers on an intranet, for example, could then see blog posts made by anyone on the network for a particular thesaurus term.  Links to related, broader and narrower categories could be created automatically.  Essentially a meta-blog of content based on commonly used thesaurus terms.
  • The preceding idea could also be done by assigning thesaurus terms to individual blog entries and then indexing that metadata.
  • A hierarchical subject index of blogs could be created based on the categories that are used by individual blog writers. They are added to more categories as they write content in those areas.
  • A Yahoo-like directory/index of an intranet could be created based on the thesaurus which then indexes a blogged set of content. The google-bombing effect of blogs then raises more relevant content to the top of the search results list.
  • Blogs indexed by a structured thesaurus makes it much easier to find other blogs that talk about similar topics without having to rely on the bloggers themselves to create the association via direct links. This could be a supplemental tool to the referrals that currently drive traffic between blogs.
  • A thesaurus manager could monitor related weblogs for new language being used that should be entered into the thesaurus as a formal term.

Those are only a few ideas and I am sure there are many more creative applications out there.  The biggest challenge I see is learning how to merge a more formal document such as a thesaurus with the very informal and hierarchy busting dynamic of a weblog.  However, a structured thesaurus could be a potentially powerful supplmental tool for bloggers to use.