The information display is a very important process for anyone working with databases. Visual mechanisms, such as navigation maps or diagrams, can facilitate the understanding of the data thus transforming them into visual structures that combine graphical properties with the spatial organization of the data. This article shows how to implement an information visualization technique called tag cloud, which involves the concept of classification of content by tags to facilitate the classification, organization and data research. The article describes what the tag clouds are, how they are used and also how to implement a tag cloud from information stored in a SQL Server table. Finally, it presents an idea to generate a dynamic Web page containing a tag cloud.
Tag cloudsThe rise of the web content makes the task of sorting and searching a real need for developers. A common way to classify web content is to use tags (labels). These tags take the form of keywords which are assigned to content in order to facilitate their classification, organization and research.
However, despite the fact that tags facilitate the manipulation of content just list all the tags do not provide information about the popularity of content that the tags classified. Display information about popularity and frequency can be a determining factor in choosing which content to view.
To make user-friendly the navigation through a list of tags presented to users a technique called tag cloud was developed with the principle of adding more information than a list of tags. This technique builds a web page with tags, placed side by side, in order to highlight the most frequently used tags. Thus, the font size of tags most used is greater than the size of tags less used. In addition, all tags contain a link that redirect the user to another page which presents all the contents classified with the tag that received the click.
The use of tag clouds become popular due to the blogs phenomenon which can be defined as a virtual diary. In this diary the blogger owner is responsible for writing texts on various subjects. These texts, called posts, can contain video, text, images, hyperlinks, flash animations and any other type of content that can be placed on a web page.
In order to facilitate the search and classification of the subject of a post, the blogger can use one or more tags for the same post. For example, in a post that talks about the launch of new mobile phone from Apple, the iPhone, the blogger can sort the contents of this post with tags such as 'technology' and 'telephony'.
To facilitate search on their blogs, and also to see which are the most used tags, the technique called a tag cloud is employed. With this technique the user can quickly view the most frequently tags used in the posts showing the most popular blog's subjects.
The blogosphere, i.e. the set of all bloggers of a region, widespread the use of tag clouds. However, tag clouds are not unique to blogs. For instance, you can build a tag cloud to identify the most searched words in a search engine where the search word becomes the tags. Another application of tag clouds involves textual analysis: putting all the words of a text in a tag cloud can quickly identify the most common words used by the author, besides presenting grammatical information, such as the frequency of verbs, adjectives, and other grammatical constructs. In general, tag clouds are used to display the frequency of words appearing in a textual content.
Tag clouds can also be used to display tags created collaboratively, linking this technique with is known as folksonomy. If taxonomy is defined as the study of classification of things, the folksonomy refers to form of classification of things made by people (folks). Social bookmarking sites like Del.icio.us, digital photo storage websites, like Flickr, have in common the ability to allow the user to sort the content the way they want by means of tags chosen by the users. Using tag clouds these sites provide users not only the feature to visualize the tags most commonly used to classify content but also an overview of the entire content of the site.
In addition to changing the font size of the tag according to their frequency, other features can be implemented in tag clouds. For instance, one can change the display order in such a way that instead of displaying the tags in alphabetical order they appear in order of frequency and by decreasing font size. Another modification suggests the visualization of words in lines according to quantity of letters and the placement of words with fewer letters first, thus making easier the ordering of the tags. Using different colors in the tags is also a common modification: each color or shade of the same color represents different attributes. For example, one can use different colors for tags that represent words in other languages than English.
A tag cloud by itself has little functionality. So, it is common that each tag contains a hyperlink that points to a page that displays all content classified by this tag. Moreover, it is usual to indicate in the tag's tooltip information on its frequency. A tooltip is that little message displayed when the user leaves the mouse cursor for a few seconds on a text or a control without clinking on it.
This tooltip can indicate the amount of content that the tag ranks, making it easier to visualize it's number of occurrences or frequency. Figure 1 shows a tag cloud taken from a blog. It may be noted that when the mouse pointer rests on a tag information about its frequency are displayed as a tooltip along with the amount of clicks that were made in this tag.
Figure 1. A Tag cloud with Portuguese words from a blog.
Figure 2 shows a tag cloud of the service named Google News Cloud. In addition to changing the font size according to the frequency of words in news headlines, this tag cloud changes the background color of the words as follows: when a user positions the mouse over a tag, all tags of the same news change the background color, which in the example is the color red.
Figure 2. The Tag cloud of the Google News Cloud service.
In summary, the tag cloud technique allows the briefly visualization and classification of content providing quick access to the most popular content. One can also say that tag clouds are an example of visual data mining. Among the techniques of data mining visual data mining is the most intuitive since this technique utilizes the human ability to quickly interpret visual scenes. Visual data mining techniques aim to map data in visual elements such as squares, circles or lines in order to enabling the interpretation of data and to facilitate the discovery of patterns.
Building a tag cloudThis section shows how to create a simple tag cloud where the size of tags is modified in accordance with the frequency of the tag. All tags used in this example are stored in a SQL Server table called TB_TAGS. The details in this table are irrelevant to this article, however, just knowing that the SELECT statement shown in Listing 1 get the frequencies of the tags, i.e. the amount that each tag was used in our example, is enough to proceed with the code that creates the tag cloud.
/* THE FOLLOWING STATEMENT GETS THE AMOUNT OF EACH TAG */
SELECT TAG,COUNT(*) AS QTD_TAG
GROUP BY TAG
ORDER BY COUNT(*) DESC
Listing 1. SELECT statement that gets the tags and its frequencies.
The graph in Figure 3 shows the frequency distribution of tags stored in the table TB_TAGS. Because the table contains many tags, the graph of Figure 3 presents the name of only some tags in the axis X. Through this plot it can be verify that some tags are more popular than others.
Figure 3. Bar graph showing the amount of each tag and its frequency.
To create the tag cloud one should define how many different sizes of letters are used, which depends on the frequency distribution of tags. To help find out how many different sizes of letters exist it makes sense to analyze the axis Y in graph. The chart identify that the most frequent tag ('New Media') appears 53 times and the less frequent tags appear only once. In the chart the scale of values the Y axis, which presents quantity, was divided into ten units, with the maximum value equal to 60. This way it is easy to visualize the information and, consequently, six sizes of different sources will be used in the example tag cloud we are going to create.
The choice of different of sizes will influence the visualization of tag cloud. If multiple font sizes are used the users will have difficulty in identifying the frequency difference between the tags. If the number of font sizes used is small users will have the impression that certain tags are always used and that other tags are never used. Due to this peculiarity, it is important to choose a reasonable number of font sizes thus allowing easy identification of the popular tags, the tags that are not so popular and the tags that are more or less popular.
To use different font sizes in HTML developers use basically two approaches. The first approach modifies the value of the parameter size of the <font> HTML instruction to indicate directly the font size. The second approach uses the style property of some HTML instructions in order to indicate classes that use points to specify the font size. As a matter of aesthetics, the tag cloud mounted in this article use the second approach.
Knowing how many different sizes of sources must be used for the sample data one should assign the size of each font to tags. To calculate the exact division we must first obtain the DELTA, i.e. the variation of frequencies, calculated in the formula below:
DELTA = (MAX. FREQUENCY – MIN. FREQUENCY) / (NUMBER OF FONTS SIZE)
For our example the value of the maximum frequency is 53, because the tag 'New Media' appears 53 times. The min. frequency is 1 because multiple tags appear only once, as is the case of tag 'Education'. As presented previously, six different sizes font will be used. The value of DELTA is then calculated as:
DELTA = (53 – 1) / 6 = 8.667
The value of DELTA is used to generate the limits of each font size. Now we must add the value of min. frequency to a value of DELTA multiplied by font size to get the bounds of each font size. In our example:
• The tags with frequency less than or equal to 9.667 (1 + DELTA * 1) use a font size 1.
• The tags with a frequency between 9.668 and 18.334 (1 + DELTA * 2) use font size 2.
• The tags with a frequency between 18.335 and 27.001 (1 + DELTA * 3) use the font size 3.
• The tags with a frequency between 27.002 and 35.668 (1 + DELTA * 4) use the font size 4.
• The tags with a frequency between 35.669 and 44.335 (1 + DELTA * 5) use the font size 5.
• The tags with a frequency between 44.336 and 53.002 (1 + DELTA * 6) use the font size 6.
From the calculation of limits within which each tag one can identify which font size will be used. However, the fonts of size 1, 2, 3, 4, 5 and 6 will not be used for aesthetic reasons, so the style property will modify the font size. The values font-size: 10, font-size: 12, font-size: 14, font-size: 16, font-size: 18 and font-size: 20 will be used in place of the 1, 2, 3, 4 , 5 and 6 values, respectively.
To make easier the calculations and the combination of font size tags, the stored procedure ST_SIZE_TAG, shown in Listing 2, will calculate the limits of each font size and then assign each font size to the tags. This stored procedure takes as the first parameter the number of sources and the initial value of the font size, which in our example is 10, as the second parameter.
/* THE STORED PROCEDURE BELLOW ASSIGN FONT SIZES FOR THE TAGS*/
CREATE PROCEDURE ST_SIZE_TAG @SIZES INT, @TAM_FONT_INICIAL INT
DECLARE @DELTA NUMERIC(10,3)
DECLARE @MIN INT
/* GET THE TAGS AND THE FREQUENCY. */
/* THE RESULT IS STORED IN THE TEMPORARY TABLE #TB_TOTALS */
SELECT TAG, COUNT(*) AS QTD
GROUP BY TAG
CREATE TABLE #TB_LIMITS
/* NOW WE GET THE DELTA AND CALCULATES THE FONT SIZE OF EACH TAG */
SELECT @MIN = MIN(QTD),
@DELTA = (MAX(QTD)-MIN(QTD)) / CONVERT(NUMERIC(10,3),@SIZES)
/* THE LOOP CALCULATS THE FONT SIZE FOR EACH TAG*/
DECLARE @I INT
SET @I = 1
WHILE @I <= @SIZES
INSERT #TB_LIMITS VALUES(@I,@MIN + (@DELTA*@I) )
SET @I = @I + 1
/* NOW WE SET THE CORRECT VALUE USING THE SECOND PARAMETER */
SELECT TAG, QTD, @TAM_FONT_INICIAL + ( SELECT TOP 1 (ID_LIMIT -1)*2
FROM #TB_LIMITS B
WHERE B.MARK_LIMIT >= A.QTD ) AS SIZE_FONT
FROM #TB_TOTALS A
ORDER BY QTD DESC
Listing 2. Stored procedure that assigns fonts sized to the tags.
First, the stored procedure ST_SIZE_TAG returns all tags stored in the table TB_TAGS storing this result in the temporary table #TB_TOTALS. Then it gets the lower rate of tags and calculates the value of DELTA, which will be used to calculate the limit of each frequency. The stored procedure contain a loop to assign the limits of each font size, storing these data in the temporary table #TB_LIMITS. Despite using a loop inside the stored procedure, a technique that is not recommended by many developers due to performance issues, we highlight that the loop is not done on a cursor variable but upon variables of the stored procedure.
Finally, each tag receives its the font size through a SELECT statement that uses a sub query. The sub query returns only the first limit that is greater than or equal to the frequency of the tag calculating the value of the size using the initial size indicated by the parameter @TAM_FONT.
Table 1 show the first fifteen lines of the result after the execution of the stored procedure when the values 6 and 10 are assigned to parameters @SIZES and @TAM_FONT, respectively. The graph in Figure 4 shows the frequency distribution of tags in the sample data with the limits for each font size.
Table 1. The result of the stored procedure with the example data.
Figure 4. Bar graph with the amount of each tag and its font size.
To effectively build a tag cloud is necessary to use any programming language that allows dynamically assemble a web page, such as ASP.NET or PHP. This dynamic page should make a call to the stored procedure ST_SIZE_TAG and mount the tag cloud from the set of results returned. For each row in the result set returned by the stored procedure the dynamic page should assemble the following HTML statement:
<span style="font-size: [FONT SIZE COLUMN]px;"><a href=" ...">[TAG]</a></span> |
Where [FONT SIZE COLUMN] and [TAG] should be replaced dynamically by the values of columns SIZE_FONT and TAG, respectively, obtained from the result set of a stored procedure. The developer can also use the result of the QTD column to mount the tooltip for each tag (see Figure 1). This tooltip can indicate the number of elements of this tag thus facilitating the visualization of the number of occurrences. The developer must still fill the property href of the anchor <a> according to the page that will bring all the content associated with this tag. Other design features can be implemented, such as placing the selection of styles, colors or different fonts can be used in order to give a professional touch. The tag cloud mounted with the sample data can be visualized in Figure 5.
Figure 5. Tag cloud with the sample data of this article.
ConclusionThis paper has presented an information visualization technique called tag cloud. From a set of key words, called tags, one can elaborate a list where each tag receives a font size according to its popularity. This list is called the tag cloud and assists research, ordering and classifications of internet content.
The article also discussed some applications of tag clouds, as well as changes in the presentation of the tag. Next, an example of creating a tag cloud was presented based on the set of tags stored in a SQL Server table. Finally, he presented the idea of how to generate a dynamic web page from the result of a stored procedure that automates the process of choosing the font size for each tag.
Del.icio.us Tag Cloud
Flickr Tag Cloud
Google News Cloud