Tag Sorting: Another tool in an information architect’s toolbox

So marketers have already started advising companies to pay attention to tags. So I started thinking – what would an information architect do with the wealth of information given by del.icio.us / flickr / technorati tags?

The first thing that comes to mind is to use tags as a proxy for free-listing. Information architects or anyone else researching a domain IAs can perform a card-sort on tags instead of generating items for a card-sort exerise using freelisting.

Did that make sense? Let me explain. Card sorting is used as method for understanding user mental models. Prior to asking users to sort cards, you need to generate a list of relevant items for the exercise. We (at Uzanto) often use free-listing at this stage. In case this is the first time you are hearing about free-listing – its a light-weight technique to understand what lies within a domain. An example: ask someone to list all the foods at Starbucks. Repeat this exercise with 20 people. Write down the frequency of each term mentioned – it will give you a pretty good idea of the scope and boundaries of the domain “Starbucks”. If you must know more, go read my article at Boxes and Arrows.

Tags bear an uncanny resemblance to freelisting data. Navigating by tags is remarkably like looking through free-listing data. Except that you don’t need to incentivize a bunch of participants and design and execute a research study. The taggers of the internet have already done that work for us. Thank you Joshua Schacter and the kind folks at Ludicorp.

Enough background – how does one go about tag sorting. First step, find a tag for a topic you are interested in. I was interested in understanding how people think about “apple”.

I found the tag “apple” on Delicious. Next, I looked up the related tags. Here they are.

1) mac
2) osx
3) ipod
4) software

5) itunes
6) music
7) history
8) technology
9) windows
10) macintosh
11) hardware

Next, I looked up the related tags for each of the above tags. I ended up with a list of all tags that were one or two degrees away. Below is the list showing the above related tags (going down vertically) and related tags for each of those tags (going across horizontally).

1) mac osx software apple windows linux music itunes unix tips howto ipod
2) osx software apple unix audio itunes music tips howto apps music linux
3) ipod apple music mp3 mac itunes software shuffle linux hacks audio osx
4) software mac tools windows linux programming web osx free opensource development music
5) itunes music apple mac osx mp3 software ipod audio hacks drm sync
music mp3 audio google search software radio video art free blog cool
6) history photography war reference design art web culture photos internet politics photo
7) technology blog news web design software science tools internet art google music

8) windows software linux tools security mac free freeware xp microsoft programming osx
9) macintosh apple software osx mac computers ipod raskin macosx history audio hack
10)hardware linux software mac howto apple tools computer hacks technology music hack

Next, I did what I do with freelisting data at this stage. I mapped each word by frequency. There were 132 words in all – 11 directly related tags, and 121 related at a second degree. There were 51 unique words. Here is the list of all the unique words, along with the frequency, starting with the most frequent word.

1) software: 10
2) mac: 8
3) music: 8
4) apple: 7

5) osx: 7
6) linux: 6
7) audio: 5
8) hack: 5
9) ipod: 4
10) itunes: 4
11) tools: 4
12) art: 3
13) free: 3

14) howto: 3
15) mp3: 3
16) web: 3
17) windows: 3
18) blog: 2
19) computer: 2
20) design: 2
21) google: 2
22) history: 2

23) internet: 2
24) photo: 2
25) programming:2
26) technology: 2
27) tips: 2
28) unix: 2
29) cool: 1
30) culture: 1
31) development:1

32) drm: 1
33) hardware: 1
34) macintosh: 1
35) mac osh: 1
36) microsoft: 1
37) news: 1
38) open source:1
39) photography:1
40) politics: 1

41) radio: 1
42) raskin: 1
43) reference: 1
44) science: 1
45) search: 1
46) security: 1
47) shuffle: 1
48) sync: 1
49) video: 1

50) war: 1
51) xp: 1

The list looks very much like a freelisting list. Except that the list is a little tech heavy, and lacks the range of general free-listing data. Remarkably, on de.licio.us apple does not appear even once as a fruit. Nor do you see any reference to the redness of an apple. Interestingly, both fruit and redness do appear as tags directly related to the apple tag on FLickr. Here are the first degree of related tags for apple on Flickr: apple: powerbook, ibook, computer, imac, fruit, music, food, red, store, macro.

Back to the delicious data. Another characteristic typical of freeisting data – the long tail at the end (note all the tags with a frequency of “1”).

I suspect however, that I would have got a richer dataset if I had asked a group of people to freelist on “apple”. I think what we are seeing is the mental models of “Apple” among the early del.icio.us adopters, who are by no means a reresentative sample. Additionally, all the tags are related to links that they have found worth saving. All these bias the data in a certain direction.

Another observation – none of the tags surprised me. Generally, in a freelisting dataset, there are at least some associations that suprise. I think that there is simply not enough variance here for unique, idiosyncratic associations to emerge. I expect that this will change as tagging becomes a more mainstream activity. Also, it possible that I my mental model closely matches those of the del.icio.us taggers and I found all the tags very predictable.

Now for the sorting. I signed up one individual to sort these tags into groups of his choosing. (If you are doing a real study, please ask more than one individual to do the sorting! Aggregate multiple sortings to create group mental model). The groups that popped up are shown below. We can immediately see just from looking at the groupings how apple is strongly associated with the production and consumption of art / music / media. The paucity of microsoft-oriented tags and the large number of linux / OSS tags that emerged point to a warming relationship between OSS hackers and applie users (the fact that OSS, linux, and hacking were grouped together says a lot about the brand identity of linux). On the basis of this preliminary tag-sorting, I think that “Hacking”, “Media”, and “Art” would be excellent top-level categories for an apple-oriented site.

So am I ready to give up freelisting with a group of participants? Are tags a perfect replacement for freelists? Not quite yet. I will make sure to check both del.icio.us and Flickr if my project is related to any topics they cover. Also, I think it would make sense to go upto three degrees of relatedness to get a broader variety of associations. Most importantly, I think that currently, tagging is not mainstream enough to use exclusively, or even as a primary research data stream.

Tag Sorting is an example of the type of method I hope we will see more of. There is so much structured data out there. It is time we learnt how to utilize it to understand people.

Please carry on the tag-sorting experiment. If you try something, please report back to this blog.

6 thoughts on “Tag Sorting: Another tool in an information architect’s toolbox”

oso

March 7, 2005 at 3:32 pm

Hi Rashmi, we met at Berkman Center last November or December and today I arrived here via a link of a link of a del.icio.us bookmark I clicked on. Some smart stuff here. I really like the diagram. My thoughts on tags and folksonomies have been exhausted, but they will definitely present both a challenge and resource to information architects and marketing agencies.
avi

May 5, 2005 at 12:12 pm

What a nice parallel and a great way to get at people’s intuitive categorizing.
One issue: Tags on del.icio.us are only as representative as the site’s visitors. Hence your finding: lots of computer-ese, no red no fruit.. As you say, this sort of service is not mainstream – so it’s tags won’t be either Conclusions as to mental models have to be related to specific groups.
One idea: And this is probably something you are already all over – could you provide people (who you sample) with content they are familiar with (a document, an image) and ask what “tags” or descriptors they’d apply? A sort of prompted free-listing/tagging.
One follow up: see 43things.com, which has a broader focus than del.icio.us, but appears to me to draw from a wider population.

I like your blog.
Lars Pind

May 5, 2005 at 11:54 pm

On the lack of surprise … could this be due to the definition of related tags employed by delicious? Most likely something’s only related if there are a certain minimum number of links tagged with both tags, so there that minimum serves as a filter against your surprises?
rashmi

May 6, 2005 at 5:09 pm

Lars, the minimum definitely serves as a filter. But I think something deeper is also happening here highlighting the difference between Flickr and Del.icio.us.

On Flickr, people tag objects and scenes from their OWN life – as such it reflects the idiosyncracy and subjectivity of individuals. On the other hand, articles tagged on del.icio.us are mostly links to articles or professional material.

I think we will see a greater variety of tags when people tag personal stuff, as opposed to mostly professional stuff.
rashmi

May 6, 2005 at 5:18 pm

Avi,

Thanks for your kind words about this blog.

What you are suggesting – giving people objects, documents and asking them what tags it makes them think of, is pretty similar to the freelisting method. You can read more here: http://www.uzanto.com/research/archives/2003_02/freelisting.html

Its interesting that you mention this – one of my realizations is that tagging and many of the research methods used in information architecture share a lot of similarity and are complementary. Tagging does on a large scale what many of us already do for our projects on a small scale using free-listing.
rickjdavies

July 2, 2008 at 5:53 pm

You can represent the data collected via card sorting and from tagging samples as a network, using social network analysis.
Here is a network version of the tagging lists above
http://www.scribd.com/word/download_preview/3788937?secret_password=1khtgskmk3sywcad089j
regards, rick davies