From a post on the IA-CMS list: Thesaurus::RDF — The RDF Thesaurus descriptor standard for the Zthes thesaurus format.
This article by Claire Hudson in FirstMonday proposes 7 different categories of functions performed by hyperlinks within web sites:
Further definitions of each category are provided in the article. This view of links could be useful when trying to troubleshoot why certain links are not followed by users. Perhaps you intend it as a mode-changing link but your users tend to identify it as authorizing (and thus ignore it :).
Here is an interesting quote from that same article that applies to the weblogging world:
In other words, no hypertext – whether static or dynamic, explicit or implicit, and strongly or weakly authored – can be divorced from the subjectivity of human choice.
Hypertexts, then, are a social/cultural phenomenon, based on the ideologies of the particular communities – for example, a corporation, government department, non-profit organization – from which they emerge. These ideologies work to create, enhance, and restrict users’ access to information.
This furthers the theme that collections of links in a weblog, even without explicit editorial comments, do convey the editorial opinions of the weblog author.
(Originally posted on the XFML Yahoo! Group.)
The true power of faceted metadata, imho, is that you can triangulate on resources by searching with terms from multiple facets.
Let’s assume I have a thesaurus of communication disorders. One facet might be “Disorders”. By picking a term within Disorders, say ‘Dysphagia’, I can identify resoruces related to that particular disorder. However, what if I was only interested in that disorder as it affects newborn children? I could use an “Age” facet and add the ‘Newborn’ term from that facet to my search criteria. That will then help me narrow further. Let’s add a “Geography” facet and select ‘North America’. Now we have results for ‘Newborns’ in ‘North America’ that have ‘Dysphagia’. This requires accurate indexing of material to be effective.
That is what I think the biggest advantage of faceted metadata rather than just a list of keywords. The Flamenco project is a good example of how it might look in a deployed web site.
XFML is an xml spec for representing a light-weight, faceted, topic map. Peter is getting close to finalizing the 1.0 version.
Another approach to Logical Data Model for a thesaurus could look something like this:
term_id (primary key)
(values: broader_narrower, related)
More CMS vendors need to provide support for thesauri management and integration (the database structure above is based on what is used in Documentum). Don’t take my word for it. Listen to Lou:
Content management system vendors take note: your bloody products might actually provide value if 1) you enabled manual indexing by integrating thesaurus management capabilities; and 2) that manual indexing stuff is “real work” too, so start figuring out how to better integrate it within your work flow support.
At first glance it looks like it will give you the ability to add pointers to related information and/or topics on your own web site or elsewhere. Taking a weblog as an example, you could add category-specific archives links to individual posts in an RSS feed. A news reader could then render links to your category archive for a particular post which the user could then follow if they want to see whatelse you have said on the overall subject.
A new XML standard has been proposed for publishing faceted metadata. XFML.
eXchangable Faceted Metadata Language. XFML is an open XML format to publish and share faceted metadata for websites. It allows for easy creation of advanced, automatically generated navigation for your website. You can even automatically generate links to related topics on other websites. It also allows for merging of metadata between different websites.
This looks promising for publishing meta data that can be used by other web sites and/or user client software. After reviewing the information on the site it does not appear that they borrowed anything from the Zthes DTD for xml representation of a thesaurus. It seems to me that creating linkages between the two could make both standards stronger.
A common tool used in knowledge management is the thesaurus. There are a variety of definitions out there but I’ll use this one for our purpose here:
Thesaurus — The vocabulary of a controlled indexing language, formally organized so that the a priori relationships between concepts (for example as “broader” and “narrower?) are made explicit. (ISO 2788, 1986:2)
A thesaurus is not only a list of keywords (or terms) and their synonyms: it also embodies an overall hierarchy of related terms. These relationships can be compared to Yahoo!’s branching subject index. An XML DTD already exists to document these relationships between terms in a thesaurus.
The importance of a thesaurus to knowledge management is that it gives a common language to users who are keywording content for an index. If everyone agrees to use the same terms for the same meaning then metadata indexes become much more effective. Consistent relationships can then be inferred among documents and other content.
Thesauri have to be living documents if they are to remain effective. New terms must be added as the language of a particular field changes. Existing terms may need to be refined or even retired if they fall out of use. This requires a human to manage the thesaurus based on feedback from the users of that thesaurus.
So how could a thesaurus be used with a blog network? Here are some ideas:
Intranet bloggers use thesaurus terms to create categories for their web log. Readers on an intranet, for example, could then see blog posts made by anyone on the network for a particular thesaurus term. Links to related, broader and narrower categories could be created automatically. Essentially a meta-blog of content based on commonly used thesaurus terms.
The preceding idea could also be done by assigning thesaurus terms to individual blog entries and then indexing that metadata.
A hierarchical subject index of blogs could be created based on the categories that are used by individual blog writers. They are added to more categories as they write content in those areas.
A Yahoo-like directory/index of an intranet could be created based on the thesaurus which then indexes a blogged set of content. The google-bombing effect of blogs then raises more relevant content to the top of the search results list.
Blogs indexed by a structured thesaurus makes it much easier to find other blogs that talk about similar topics without having to rely on the bloggers themselves to create the association via direct links. This could be a supplemental tool to the referrals that currently drive traffic between blogs.
A thesaurus manager could monitor related weblogs for new language being used that should be entered into the thesaurus as a formal term.
Those are only a few ideas and I am sure there are many more creative applications out there. The biggest challenge I see is learning how to merge a more formal document such as a thesaurus with the very informal and hierarchy busting dynamic of a weblog. However, a structured thesaurus could be a potentially powerful supplmental tool for bloggers to use.
The Flamenco Search System project is exploring how to best create web-based search interfaces based on faceted thesaurii. The demo inteface they have built for an architecture image collection is excellent. It allows the user to triangulate a set of results by selecting terms from multiple facets. This triangulation allows a user to quickly narrow down to a small set of specific records even within a large overall set of records.
The site also has several articles that give the background on how they created their design.