Category Archives: Folksonomies

On tagging and the enterprise (and RSS)

I want to conclude my blog summary from the presentation I gave last week on tagging and the enterprise. The previous three entries should be read in conjunction with this instalment, if you haven’t followed the story so far…

I used IBM’s dogear as an example of an enterprise using tagging within the firm. However, instead of me explaining all about it, I have listed here three sources that explain the way in which social bookmarking and tagging may be used within the enterprise, including dogear at IBM:

Am I being lazy? Well, the web is all about links so I may as well use them!

Finally, as an aside, I discovered today a way of using RSS feeds to populate a newsletter. Yes, it is an interesting combination of web 2.0 (RSS) and the old way of communication (newsletters) but it may well work as a valuable bridge for people still not accustomed to the full array of web 2.0 communication channels. The product is and it’s relatively new. It is definitiely worth a look if you want to mesh RSS content within a newsletter format.

And if anyone knows about other services like this, please advise with a comment!


On tagging, the grey side

My last two posts have been about tagging based on my presentation last week at the conference in Sydney, “Enhancing search and retrieval capabilities and performance”.

I want to look at some of the perceived disadvantages of tagging that I briefly mentioned in my presentation:

  1. Lack of specificity – refers to the fact that an item can have innumerable headings (tags) and there is no fixed agreement as to the most suitable term. A formal taxonomy and classification system have been the traditional ways of asserting specific terms to items.
  2. Ambiguity and inconsistency – because anyone can apply a tag to an item, there will be a multitude of tags that do not clearly and consistently apply to a specific item. Some people may tag something as “locomotive” and another “train”. The same person may use “locomotive” now but three weeks previously used the term “train”. And train may in fact not refer to a locomotive at all (with or without carriages or wagons) but to a wedding dress, a series of thoughts, or to an adult education class.
  3. Lack of structure – The traditional relationship between broad and specific terms (the parent-child tree structure that historically organised information into “like things”) is not there in a tagging system. Weinberger refers to a tagging system as one that looks at the leaves on a tree rather than just the branches.
  4. Problems with stemming or truncation – words like plurals, or words with a s or z in them.
  5. Ceding control of search terminology to the “inexperienced” – using the correct terms is an important exercise not to be trifled with by amateurs and the inexperienced professional.

It is true that there will be imprecision in tag terms and inconsistency in the application of tags to items that look to be the same things. It is also true that the same individual may use different tags over time to describe essentially the same thing. And tagging might thus be perceived as a mess, needing an experienced taxonomist and library professional to make sense for us. People in the information business who like order and structure have a long historical paradigm to work from.

Yet all is not lost. Tagging will become self-refining, gradually highlighting preferred terms (perhaps through a tag cloud) or via suggested or similar headings. Collaborative tagging and folksonomies will help shape a form of group consensus leading to a meaningful sense of order. And technologies will improve to cater for some of the weaknesses of current tagging systems. One example is Raw Sugar.

Overall, tagging will continue to grow simply because digital information will grow at time-warp-like speed. The sheer scale of the digital world, and the cost of ordering that digital information, will not easily permit formal and timely classification. Just imagine trying to keep up with all the blogs in the world, let alone the individual blog posts from each of them. 

Tagging will become more important and self-fulfilling due to both the technology and the demographic changes in society, responsive to the digital world and the need to make sense in it for individuals and their peers. The changing nature of information, and the new consumers and producers of that information, means that change is inevitable.

Interestingly, a recent article highlighted the changing nature of reading – the development of an information browsing culture among the digital natives. The impact of the digital world should not be underestimated.

In looking at tagging so far, perhaps one could say we are in a period of transition from the structure and hierarchy of giving order to physical information (like books, journal articles and celluloid film) to one where digital information allows for innumerable access points, innumerable tags and descriptors, and seemingly available from anywhere.

[Of interest, check out this podcast from Beth Jefferson on transforming public libraries’ online catalogues into environments for social discovery of resources that are catalogued not only by librarians, but also by patrons. A salient quote on social cataloguing – collaborative tagging if you like: “the metadata people create by cataloguing content is what enables social search and discovery”. Beth Jefferson wants to enhance social search and discovery across North American public libraries through collaborative cataloguing, whether by evaluative comment or by description. Tagging and thesauri may indeed coexist.]

So the question remains – is the traditional way of ordering information and establishing a single authority for fixed terms appropriate in the modern digital world? And practically speaking, what is the right balance between order and miscellany in any given context?

I will feature one more blog post on the tagging issue looking at how the enterprise (the firm, not the fictional space ship), might take to the tagging phenomenon. Stay tuned…

On the positive side of tagging

In the light of what I discussed yesterday with respect to my conference presentation on Tuesday, I want to move on to tagging. Tagging is essentially unstructured metadata that is assigned by the content creator and the readers/users of the content, the latter called collaborative tagging. The user-generated classification that emerges is called a folksonomy.

Examples of digital content using tags include, Flickr, LibraryThing, Technorati, and Youtube. Even the web-based news services are using tags, like the ABC in Australia.

In addition to the tags themselves and the act of tagging content, a collection of tags into a group showing relative emphasis or popularity is called a tag cloud.

There are a number of benefits from using tagging and they can be broadly summarised as the following:

  1. terms meaningful to the content creator and/or readers (and not just those terms allowed by a single classification authority)
  2. establishes relationships between content and the people connected to the content (both content creators and readers)
  3. is inexpensive to undertake, especially in relation to traditional cataloguing and thesauris construction
  4. scales exceptionally well, thereby suiting the miscellany of digital space
  5. aggregates especially well, thereby harnessing the so-called wisdom of crowds
  6. permits multiple access points to information instead of just bibliographic data
  7. permits discovery of a range of other items tagged by other content creators and readers
  8. overcomes the lack of currency when using traditional fixed forms of metadata (like the established classification systems)
  9. is highly participatory in that people freely choose the relevant tags they regard as appropriate to their own content and to the content of others
  10. as more applications make tagging available, and as the new digital generations increasingly enter the workforce, tagging will become the established norm in the digital information environment (we can see how blogs may offer such an opportunity)

Point 10 is especially important. There is already some evidence of tagging popularity from a Pew Internet Report showing that nearly one-third of US internet users tagged content. As tagging becomes more familiar and mainstream, new opportunities will open up to enhance the popularity of tagging – what I have called “the tagging locomotive”.

I’ll stop here (but with another post to come) with some recommended readings:

Everything is miscellaneous by David Weinberger

Ontology is overrated by Clay Shirky

Folksonomies: power to the people by Emanuele Quintarelli

On tagging (3)

Chance encounters often reveal positive results. I came across this November 2006 blog post by Joshua Porter on why scale matters in tagging systems.

A point I want to tag onto (pun intended) is the one about the rights of the individual to tag anything with any tag the individual likes. Joshua illustrates with his comment about the New York Yankees. Some people will say (and tag) that the New York Yankees are the best baseball team in the US and some will disagree, and some (like me) couldn’t care less – test cricket is far superior!

Joshua says: “Even if a few people tag things incorrectly, most people won’t. This doesn’t have to do with the fact that most people are Good, it’s just that if we ask enough people the same question or have them observe the same phenomenon, where their experiences overlap will tend to be the reality of the situation” – the wisdom of crowds phenomenon.

Actually, it is the same argument promoted by the free-market economist Adam Smith with his concept of the invisible hand – each individual acts to maximise self-interest but in aggregate, society benefits. But does this really happen in practice – if the majority of people in rich countries want to continue to pollute the planet, is this a good thing for society or not?

But what has this to do with communication and knowledge management? Well, besides the tagging phenomenon itself, the concern in this aggregation and crowd argument is that opinions and thoughts that lie outside the “consensus” view are too easily ignored.

We also need to listen and hear to what people outside the crowd are saying because all too often, there is something special and innovative there that the pack of individuals in the crowd missed or hadn’t thought of, not to mention the danger of Groupthink!

And knowledge management needs to deal with both the consensus view and the outliers. How we can do this effectively all the time is indeed a challenge.

On tagging (2)

I previously made some comments about tagging. I believe tagging has its place as does controlled vocabularies. John Udell’s blog post yesterday on tagging and foldering made the point that: “On the desktop as well as on the web, we’re in the midst of a long transition from container-based to query-based storage and retrieval”.

In container-based storage one looks for what you want by going to the container and looking to see what is in it. In a search-based world, the container is irrelevant so long as access to the contents of the container can be searched, made even more powerful by being able to search across multiple containers. And even the notion of containers is becoming obsolete as digital content becomes miscellaneous.

Interestingly, I was left thinking about the notion of access points that we looked at years ago in my librarianship training. Traditionally, access points were different ways of accessing a library catalogue but now access points relate also to the digital domain. The fact is that now we can have an enormous number of access points and these access points can now be determined by users with user-generated content and tagging.

The Udell blog post reignited some thoughts on my own plans for my home digitisation project to convert several thousand hard copy prints and slides into digital images. The workflow includes using cataloguing software for categorising and searching my photo collection (where are the digital images located on my computer and external drives and what terms will I use to be able to search and find the ones I want?).

The issue for me is that I need a controlled vocabulary to ensure consistent and accurate description and searchability of my photo collection. In addition, I will be undertaking this catagorisation myself so there is little benefit gained from tagging since I am not saving time by having others do the categorisation for me. And certainly, there is no user-generated aggregation as there could be if I used Flickr as my host and archive.

And this is the point: tagging works best in aggregate for two reasons: Firstly, aggregation enables some semblance of preference that gives a general consensus from which patterns emerge (folksonomies) – a kind of user-generated thesaurus. Secondly, tagging works because aggregation also takes place at the actual labeling end of the workflow – individuals tagging upon production and subsequently by use, a scale issue that traditional thesaurus-based cataloguing cannot compete with. In other words, there is so much digital content out there that changes all the time that a consistent, centrally-determined traditional classification scheme and workflow is impossible.

But at home, I can generate my own controlled vocabulary to ensure accuracy and consistency across my photo collection, make reference to it for future additions, and find what I want in a reliable manner. If I was tagging, in the end I would probably have a defacto controlled vocabulary, but something less than consistent and no more meaningful.

The future may yet bring, however, the opportunity for improved tagging that generates greater consistency and reliability while still maximising scale. Even so, for my home project that’s not needed.

On tagging

I have been giving some attention of late to tagging, partly because of some research I am doing for university, and partly in response to a challenge Matt Moore gave me a while back to start putting some of my photos up on Flickr.

A key feature of Flickr is tagging, but tagging has become much more widespread. US research indicates that tagging is a popular user generated activity with 28% of internet users having tagged online content.

Thomas Vanderwal has written a great post on tagging. In it, he describes the history and current state of tagging and what improvements he’d like to see (stemming to see different versions of the same word, for example).

What I find interesting, coming from a background in librarianship and functional thesauri, is that there now seems to be more interest in organising tags so they become more meaningful and less ambiguous. Ambiguity is a real issue for modern libraries, particularly structuring folksonomy tags in public libraries.

Tagging works well with scale because scale gives weight to more popular tags than others. Popularity of tag terms becomes the defacto preferred term that a thesauri might recommend under a controlled vocabulary environment. However, popular tags may have even greater weight and value if the same tags are agglommerated with like tags (tags that are either similar or the same, using a different word or spelling for example).

One initiative that has some promise is FaceTag, a semantic collaborative tagging tool, described in a recent article in ASIS&T Bulletin. It’s early days but FaceTag may be on the right road in looking at relational and heirarchical issues within tagging folksonomies.