Category Archives: Taxonomies

On pegging down taxonomy

Tonight I watched The Collectors on ABC TV – good to have the show back in 2008. One of the featured collections was the peg collection of Mike Bradley. Yes, that’s right, a collection of clothes pegs!

There were some really interesting moments (yes, truly) in this segment on pegs. Firstly, Mike Bradley was terrific at telling his story about the whys and wherefores of his collection. He postulated that his penchant for pegs may have stemmed from his gypsy heritage (gypsies introduced pegs to the world, apparently). The stories were personal, interesting, and humorous.

He told a great story about a trip to India and the purchase of some unusual pegs – the guide/translator telling the storekeeper about “this idiot who loves pegs” and then arranging a higher price than normal and splitting the difference! Mike also related a shopping trip to “plastic city” in China where he bought a load of plastic pegs with different designs. Returning to Australia, the customs officer wanted to know about the metal clips showing up in the X-ray image. Mike replied that they were pegs he was bringing back from overseas, to which the customs officer replied: “don’t we make pegs in Australia?”.

Mike reckons he has the largest collection of pegs in the world, albeit only about 50% of the total number of different pegs out there. And he admitted to not being shy in swapping an ordinary peg for one not in his collection if he comes across such a specimen on someone’s backyard clothesline! You have been warned.

Mike turned a potentially lacklustre story into a great feature on collecting; turning the mundane into something special with his natural storytelling abilities. The storytelling worked. [Note to ABC TV – a podcast or videocast of such segments would be really worthwhile].

Then there was the in-studio discussion between Mike and Collector panelists Nicole and “The Professor” (and this strikes at the heart of taxonomy, can you believe it?). Nicole admitted to hanging clothes on the line with a pair of pegs that had to be the same colour. She wanted to reassemble and order Mike’s peg collection by colour. Mike actually ordered his collection by size and type of clip. The Professor had no such preference for peg order. And now let me confess, that when I hang out the washing the pair of pegs for each article of clothing must be of the same type – no mix and match here!

Now if there was a manual for the “correct” way of pairing pegs or assembling pegs in a collection, what would be the one-size-fits-all determining taxonomy? Would it be chronological (historical or purchase date?), or colour, or size, or type, or shape, or country of origin,  or type of use, or complete randomness (perhaps the order being determined by the position of different clothes on the clothesline itself)? You see, it all depends on what means the most to the individual, if in fact it means anything at all.

The moral of the story: don’t put a square peg in a round hole … or can you?


On tagging, the grey side

My last two posts have been about tagging based on my presentation last week at the conference in Sydney, “Enhancing search and retrieval capabilities and performance”.

I want to look at some of the perceived disadvantages of tagging that I briefly mentioned in my presentation:

  1. Lack of specificity – refers to the fact that an item can have innumerable headings (tags) and there is no fixed agreement as to the most suitable term. A formal taxonomy and classification system have been the traditional ways of asserting specific terms to items.
  2. Ambiguity and inconsistency – because anyone can apply a tag to an item, there will be a multitude of tags that do not clearly and consistently apply to a specific item. Some people may tag something as “locomotive” and another “train”. The same person may use “locomotive” now but three weeks previously used the term “train”. And train may in fact not refer to a locomotive at all (with or without carriages or wagons) but to a wedding dress, a series of thoughts, or to an adult education class.
  3. Lack of structure – The traditional relationship between broad and specific terms (the parent-child tree structure that historically organised information into “like things”) is not there in a tagging system. Weinberger refers to a tagging system as one that looks at the leaves on a tree rather than just the branches.
  4. Problems with stemming or truncation – words like plurals, or words with a s or z in them.
  5. Ceding control of search terminology to the “inexperienced” – using the correct terms is an important exercise not to be trifled with by amateurs and the inexperienced professional.

It is true that there will be imprecision in tag terms and inconsistency in the application of tags to items that look to be the same things. It is also true that the same individual may use different tags over time to describe essentially the same thing. And tagging might thus be perceived as a mess, needing an experienced taxonomist and library professional to make sense for us. People in the information business who like order and structure have a long historical paradigm to work from.

Yet all is not lost. Tagging will become self-refining, gradually highlighting preferred terms (perhaps through a tag cloud) or via suggested or similar headings. Collaborative tagging and folksonomies will help shape a form of group consensus leading to a meaningful sense of order. And technologies will improve to cater for some of the weaknesses of current tagging systems. One example is Raw Sugar.

Overall, tagging will continue to grow simply because digital information will grow at time-warp-like speed. The sheer scale of the digital world, and the cost of ordering that digital information, will not easily permit formal and timely classification. Just imagine trying to keep up with all the blogs in the world, let alone the individual blog posts from each of them. 

Tagging will become more important and self-fulfilling due to both the technology and the demographic changes in society, responsive to the digital world and the need to make sense in it for individuals and their peers. The changing nature of information, and the new consumers and producers of that information, means that change is inevitable.

Interestingly, a recent article highlighted the changing nature of reading – the development of an information browsing culture among the digital natives. The impact of the digital world should not be underestimated.

In looking at tagging so far, perhaps one could say we are in a period of transition from the structure and hierarchy of giving order to physical information (like books, journal articles and celluloid film) to one where digital information allows for innumerable access points, innumerable tags and descriptors, and seemingly available from anywhere.

[Of interest, check out this podcast from Beth Jefferson on transforming public libraries’ online catalogues into environments for social discovery of resources that are catalogued not only by librarians, but also by patrons. A salient quote on social cataloguing – collaborative tagging if you like: “the metadata people create by cataloguing content is what enables social search and discovery”. Beth Jefferson wants to enhance social search and discovery across North American public libraries through collaborative cataloguing, whether by evaluative comment or by description. Tagging and thesauri may indeed coexist.]

So the question remains – is the traditional way of ordering information and establishing a single authority for fixed terms appropriate in the modern digital world? And practically speaking, what is the right balance between order and miscellany in any given context?

I will feature one more blog post on the tagging issue looking at how the enterprise (the firm, not the fictional space ship), might take to the tagging phenomenon. Stay tuned…

On tagging (2)

I previously made some comments about tagging. I believe tagging has its place as does controlled vocabularies. John Udell’s blog post yesterday on tagging and foldering made the point that: “On the desktop as well as on the web, we’re in the midst of a long transition from container-based to query-based storage and retrieval”.

In container-based storage one looks for what you want by going to the container and looking to see what is in it. In a search-based world, the container is irrelevant so long as access to the contents of the container can be searched, made even more powerful by being able to search across multiple containers. And even the notion of containers is becoming obsolete as digital content becomes miscellaneous.

Interestingly, I was left thinking about the notion of access points that we looked at years ago in my librarianship training. Traditionally, access points were different ways of accessing a library catalogue but now access points relate also to the digital domain. The fact is that now we can have an enormous number of access points and these access points can now be determined by users with user-generated content and tagging.

The Udell blog post reignited some thoughts on my own plans for my home digitisation project to convert several thousand hard copy prints and slides into digital images. The workflow includes using cataloguing software for categorising and searching my photo collection (where are the digital images located on my computer and external drives and what terms will I use to be able to search and find the ones I want?).

The issue for me is that I need a controlled vocabulary to ensure consistent and accurate description and searchability of my photo collection. In addition, I will be undertaking this catagorisation myself so there is little benefit gained from tagging since I am not saving time by having others do the categorisation for me. And certainly, there is no user-generated aggregation as there could be if I used Flickr as my host and archive.

And this is the point: tagging works best in aggregate for two reasons: Firstly, aggregation enables some semblance of preference that gives a general consensus from which patterns emerge (folksonomies) – a kind of user-generated thesaurus. Secondly, tagging works because aggregation also takes place at the actual labeling end of the workflow – individuals tagging upon production and subsequently by use, a scale issue that traditional thesaurus-based cataloguing cannot compete with. In other words, there is so much digital content out there that changes all the time that a consistent, centrally-determined traditional classification scheme and workflow is impossible.

But at home, I can generate my own controlled vocabulary to ensure accuracy and consistency across my photo collection, make reference to it for future additions, and find what I want in a reliable manner. If I was tagging, in the end I would probably have a defacto controlled vocabulary, but something less than consistent and no more meaningful.

The future may yet bring, however, the opportunity for improved tagging that generates greater consistency and reliability while still maximising scale. Even so, for my home project that’s not needed.

On tagging

I have been giving some attention of late to tagging, partly because of some research I am doing for university, and partly in response to a challenge Matt Moore gave me a while back to start putting some of my photos up on Flickr.

A key feature of Flickr is tagging, but tagging has become much more widespread. US research indicates that tagging is a popular user generated activity with 28% of internet users having tagged online content.

Thomas Vanderwal has written a great post on tagging. In it, he describes the history and current state of tagging and what improvements he’d like to see (stemming to see different versions of the same word, for example).

What I find interesting, coming from a background in librarianship and functional thesauri, is that there now seems to be more interest in organising tags so they become more meaningful and less ambiguous. Ambiguity is a real issue for modern libraries, particularly structuring folksonomy tags in public libraries.

Tagging works well with scale because scale gives weight to more popular tags than others. Popularity of tag terms becomes the defacto preferred term that a thesauri might recommend under a controlled vocabulary environment. However, popular tags may have even greater weight and value if the same tags are agglommerated with like tags (tags that are either similar or the same, using a different word or spelling for example).

One initiative that has some promise is FaceTag, a semantic collaborative tagging tool, described in a recent article in ASIS&T Bulletin. It’s early days but FaceTag may be on the right road in looking at relational and heirarchical issues within tagging folksonomies.

On taxonomies

Patrick Lambe presented this evening on taxonomies at the NSW KM Forum in Sydney – not sure if the slides will be available or not (I will need to check later with Patrick or James).

Firstly, Patrick should be congratulated for making a potentially dry topic most interesting and informative. Of note was Patrick’s “Map of Findability” that gave a visual and powerful metaphor for the different types of taxonomies and how they are used. The map really enhanced Patrick’s story.

Patrick delved into lists (clusters of related things with limited scale), into different types of tree structures for organising knowledge (for example, the heirarchical thesaurus), and into more complex forms such as facets (matrices beyond three dimensions that explore the linkages and intersections between multiple category lists in one related context). A good example of a facet taxonomy was an online wine merchant that used type of wine (red, white, rose’, sparkling, etc.), region (Australia, New Zealand, Chile, Bordeaux, etc.) and price points ($10-$20, $21-$30, etc) from which a customer could use to navigate and then search with and between a combination of the three types.

Patrick also discussed folksonomies and how they can be effective when scale is sufficiently large enough to enhance confidence and predictability from search results (Flickr is one of the best known examples). Folksonomies with user generated tagging also require favourable participatory environments to make them work effectively.

For me, it was good to reconsider the various forms of taxonomies and how they suit different situations more than others (especially the issues related to scale and levels of abstraction). My professional training in library and information science (in the 1980’s) certainly focused on controlled vocabularies and thesauri. Naturally, the development of folksonomies in more recent times has made me more aware of alternative taxonomies and the possibilities they have to offer.

Patrick concluded his presentation with a nice illustration of the taxonomy continuum. At one end, a taxonomy could be low design/low cost/low precision/high ambiguity and at the other end of the continuum there would be high design/high cost/high precision/low ambiguity (and of course the various combinations in between). This enables us to determine our own requirements and to judge what form of taxonomy is preferred in the prevailing context.

I also attended the second day of KM Australia 2007 today. I will make some comments on some of the presentations from this conference tomorrow.