Semantic Islands

From :Column Two: Death of keywords:

To me, this really highlights the challenges (futility?) of the so-called “semantic web”, where everything describes itself, cross-linking happens automatically and accurately, and search engines only return useful results…

If we can’t get even simple keywords tags to work in practice, what hope is there for RDF, and the rest?

My own opinion is that any acitivity or tool that requires consistent, similar, behaviors across the entire Web (such as accurate keywording of web pages) will not happen.

However, that doesn’t mean the keyword metatag is dead. It can still be an effective tool for a collection of content whose authors/owners are willing to invest time and effort into for accurate searching and indexing. The Web might evolve into small, organized, clusters of content that create semantic islands in a chaotic sea.

Classification of Links According to Primary Function

This article by Claire Hudson in FirstMonday proposes 7 different categories of functions performed by hyperlinks within web sites:

  • Authorizing
  • Commenting
  • Enhancing
  • Exemplifying
  • Mode-Changing
  • Referencing/Citing
  • Self-Selecting

Further definitions of each category are provided in the article. This view of links could be useful when trying to troubleshoot why certain links are not followed by users. Perhaps you intend it as a mode-changing link but your users tend to identify it as authorizing (and thus ignore it :).

Here is an interesting quote from that same article that applies to the weblogging world:

In other words, no hypertext – whether static or dynamic, explicit or implicit, and strongly or weakly authored – can be divorced from the subjectivity of human choice.

Hypertexts, then, are a social/cultural phenomenon, based on the ideologies of the particular communities – for example, a corporation, government department, non-profit organization – from which they emerge. These ideologies work to create, enhance, and restrict users’ access to information.

This furthers the theme that collections of links in a weblog, even without explicit editorial comments, do convey the editorial opinions of the weblog author.

Cloaking Defined

andersja’s blog provides a definition of cloaking:

a server-side hack that allows the webserver to serve different content to search engine spiders and visitors; for example while the visitor sees a whiz-bang flash animation, a search engine may see a plain vanilla HTML chucked full of crosslinks, keywords, meta- and header-tags — just the way they (the search engines) like it.

He points out that this approach is somewhat risky in that you might get yourself kicked out of the search engine if they discover you are trying to game it with content other than what you show to your human readers.

The other thing: how frustrating for a site to have very usable and accessible content but they only make it available to bots!

DTD Example for Scholarly Journals

article.dtd is an XML document type definition for use with scientific and scholarly articles.

I’ve been investigating xml for publishing scholarly journals off-and-on over the past year and this was a recent find. It seems like our organization is getting close to the point where we will be able to begin moving to xml as a long-term storage format for our academic journal content. This will open up worlds of opportunity for us in the re-use and publishing of that content.

DB Table Structure for Thesauri

Here is a nice db structure shared by Dale Mead on the IA-CMS list (a new Yahoo group that just started up recently):

Another approach to Logical Data Model for a thesaurus could look something like this:

Table: Term
______________________
term_id (primary key)
term_name
term_description

Table: Relation_Dict
_______________________
relation_type
(values: broader_narrower, related)

Table: Relation
_________________

relation_id
parent_id
child_id
relation_type

More CMS vendors need to provide support for thesauri management and integration (the database structure above is based on what is used in Documentum). Don’t take my word for it. Listen to Lou:

Content management system vendors take note: your bloody products might actually provide value if 1) you enabled manual indexing by integrating thesaurus management capabilities; and 2) that manual indexing stuff is “real work” too, so start figuring out how to better integrate it within your work flow support.

ClickTracks

I learned about a new web traffic analysis tool, called ClickTracks, via Phil Windley’s weblog.

In a nutshell, this tool superimposes data from your web server logfiles onto your web pages to indicate percentage of traffic clicking on each link on a page. There are additional features for slicing and dicing the data but that’s the core of it. Here is a screen shot of an analysis of a couple days of traffic for High Context. (The percentages are rather low due to the fact that my rss feed is the most requested file on my site.)

Very cool and innovative software. But, how can this visual data be analyzed to improve your web site? Clicktracks doesn’t offer any suggestions on their site (they should for marketing purposes alone).

A couple of thoughts I have on how to use the results:

  • Identifying which regions of your page design tend to get the most clicks;
  • Analyzing click patterns after a user observation session (you would have to isolate the traffic on a test site so other traffic doesn’t get into the data set);
  • Visual display of data for the quantitativiely challenged.

It certainly isn’t a replacement for standard log analysis reports but it could be a useful tool for usability studies and alternate display of data. Might even be worth $495 they are charging for it.