For our example the value of the maximum frequency is 53, because the tag 'New Media' appears 53 times. The min. frequency is 1 because multiple tags appear only once, as is the case of tag 'Education'. As presented previously, six different sizes font will be used. The value of DELTA is then calculated as:
DELTA = (53 – 1) / 6 = 8.667
The value of DELTA is used to generate the limits of each font size. Now we must add the value of min. frequency to a value of DELTA multiplied by font size to get the bounds of each font size. In our example:
• The tags with frequency less than or equal to 9.667 (1 + DELTA * 1) use a font size 1.
• The tags with a frequency between 9.668 and 18.334 (1 + DELTA * 2) use font size 2.
• The tags with a frequency between 18.335 and 27.001 (1 + DELTA * 3) use the font size 3.
• The tags with a frequency between 27.002 and 35.668 (1 + DELTA * 4) use the font size 4.
• The tags with a frequency between 35.669 and 44.335 (1 + DELTA * 5) use the font size 5.
• The tags with a frequency between 44.336 and 53.002 (1 + DELTA * 6) use the font size 6.
From the calculation of limits within which each tag one can identify which font size will be used. However, the fonts of size 1, 2, 3, 4, 5 and 6 will not be used for aesthetic reasons, so the style property will modify the font size. The values font-size: 10, font-size: 12, font-size: 14, font-size: 16, font-size: 18 and font-size: 20 will be used in place of the 1, 2, 3, 4 , 5 and 6 values, respectively.
To make easier the calculations and the combination of font size tags, the stored procedure ST_SIZE_TAG, shown in Listing 2, will calculate the limits of each font size and then assign each font size to the tags. This stored procedure takes as the first parameter the number of sources and the initial value of the font size, which in our example is 10, as the second parameter.
/* THE STORED PROCEDURE BELLOW ASSIGN FONT SIZES FOR THE TAGS*/
CREATE PROCEDURE ST_SIZE_TAG @SIZES INT, @TAM_FONT_INICIAL INT
DECLARE @DELTA NUMERIC(10,3)
DECLARE @MIN INT
/* GET THE TAGS AND THE FREQUENCY. */
/* THE RESULT IS STORED IN THE TEMPORARY TABLE #TB_TOTALS */
SELECT TAG, COUNT(*) AS QTD
GROUP BY TAG
CREATE TABLE #TB_LIMITS
/* NOW WE GET THE DELTA AND CALCULATES THE FONT SIZE OF EACH TAG */
SELECT @MIN = MIN(QTD),
@DELTA = (MAX(QTD)-MIN(QTD)) / CONVERT(NUMERIC(10,3),@SIZES)
/* THE LOOP CALCULATS THE FONT SIZE FOR EACH TAG*/
DECLARE @I INT
SET @I = 1
WHILE @I <= @SIZES
INSERT #TB_LIMITS VALUES(@I,@MIN + (@DELTA*@I) )
SET @I = @I + 1
/* NOW WE SET THE CORRECT VALUE USING THE SECOND PARAMETER */
SELECT TAG, QTD, @TAM_FONT_INICIAL + ( SELECT TOP 1 (ID_LIMIT -1)*2
FROM #TB_LIMITS B
WHERE B.MARK_LIMIT >= A.QTD ) AS SIZE_FONT
FROM #TB_TOTALS A
ORDER BY QTD DESC
Listing 2. Stored procedure that assigns fonts sized to the tags.
First, the stored procedure ST_SIZE_TAG returns all tags stored in the table TB_TAGS storing this result in the temporary table #TB_TOTALS. Then it gets the lower rate of tags and calculates the value of DELTA, which will be used to calculate the limit of each frequency. The stored procedure contain a loop to assign the limits of each font size, storing these data in the temporary table #TB_LIMITS. Despite using a loop inside the stored procedure, a technique that is not recommended by many developers due to performance issues, we highlight that the loop is not done on a cursor variable but upon variables of the stored procedure.
Finally, each tag receives its the font size through a SELECT statement that uses a sub query. The sub query returns only the first limit that is greater than or equal to the frequency of the tag calculating the value of the size using the initial size indicated by the parameter @TAM_FONT.
Table 1 show the first fifteen lines of the result after the execution of the stored procedure when the values 6 and 10 are assigned to parameters @SIZES and @TAM_FONT, respectively. The graph in Figure 4 shows the frequency distribution of tags in the sample data with the limits for each font size.
Table 1. The result of the stored procedure with the example data.
Figure 4. Bar graph with the amount of each tag and its font size.
To effectively build a tag cloud is necessary to use any programming language that allows dynamically assemble a web page, such as ASP.NET or PHP. This dynamic page should make a call to the stored procedure ST_SIZE_TAG and mount the tag cloud from the set of results returned. For each row in the result set returned by the stored procedure the dynamic page should assemble the following HTML statement:
<span font-size: [FONT SIZE COLUMN]px;"><a href=" ...">[TAG]</a></span> |
Where [FONT SIZE COLUMN] and [TAG] should be replaced dynamically by the values of columns SIZE_FONT and TAG, respectively, obtained from the result set of a stored procedure. The developer can also use the result of the QTD column to mount the tooltip for each tag (see Figure 1). This tooltip can indicate the number of elements of this tag thus facilitating the visualization of the number of occurrences. The developer must still fill the property href of the anchor <a> according to the page that will bring all the content associated with this tag. Other design features can be implemented, such as placing the selection of styles, colors or different fonts can be used in order to give a professional touch. The tag cloud mounted with the sample data can be visualized in Figure 5.
Figure 5. Tag cloud with the sample data of this article.
ConclusionThis paper has presented an information visualization technique called tag cloud. From a set of key words, called tags, one can elaborate a list where each tag receives a font size according to its popularity. This list is called the tag cloud and assists research, ordering and classifications of internet content.
The article also discussed some applications of tag clouds, as well as changes in the presentation of the tag. Next, an example of creating a tag cloud was presented based on the set of tags stored in a SQL Server table. Finally, he presented the idea of how to generate a dynamic web page from the result of a stored procedure that automates the process of choosing the font size for each tag.
Del.icio.us Tag Cloud
Flickr Tag Cloud
Google News Cloud