Proj #1: Clustering tool for Cloud Tags

Supervisor: Adrian O’Riordan

 

This project involves developing a partition clustering tool for website cloud tags.

 

Background

 

Cloud tags are a visual way of representing popular tags on websites.

 

Tags represent a certain subject field, topic, or category of topics. Popular websites such as Flickr and blog sites such as Technorati have pioneered the use of tag clouds. Most blogging software (e.g. Movable Type, WordPress, and TypePad) support tag categories directly. Links can also be manually added, e.g. for Technorati, a tag can be embedded as follows: <a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a>.

 

 

                                                Example Cloud Tag

 

Frequently, tags are portrayed in a larger font or otherwise emphasized to indicate popularity but are usually displayed in alphabetical order, ignoring the connections between tags and makes the tag cloud appear quite random. What would be useful is software to cluster or group tags into more meaningful sets. Flickr does some clustering in the form of a simple list of non-overlapping tag clusters (lists).

 

Project

 

This project proposes the development of a 2-dimensional partition cluster where the distance between tags gives an indication of similarity. This is called a topological map of the data. Topological maps can be built with statistical (K-means) or neural network algorithms. K-means is the simplest and most common approach to partition clustering.  The K-means algorithm assigns each point to the cluster whose centre (also called centroid) is nearest.

 

 

Software is freely available to compute k-means clustering, e.g. flexclusthttp://cran.r-project.org/src/contrib/Descriptions/flexclust.html