The popularization of the Web as a platform for publishing content is a major resource for the dissemination and creation of new ideas, projects, applications, virtual shops and other ways to have an online presence. At a time when many believe that content is king and that the transmission medium is the message conveyed, much of what is produced in different types of web sites like blogs, institutional sites, online stores, applications, portals and others is supported by a single web technology pillar: the platforms for generating and publishing content.
Driven by the free software movement, the main platforms for content generation and publication, also known as CMS (Content Management System), are gaining more followers either through the catalog of services provided by hosting companies or by users who decide build their your own web hosting infrastructure. Among the main features that have popularized the adoption of CMS we can cite: the free and open licensing model, the ease of use for both users browsing the website and for those who create content, the management and dissemination content and the possibility of almost infinite customization by inserting components and templates freely available and produced by the community.
Currently there are several solutions for content management freely available for those wishing to publish web content. To simplify the nomenclature, these tools will be referenced only as CMS throughout this article. The most popular CMS are: WordPress, Joomla, Drupal and Magento. These take the form of software frameworks ready for use after the customization and configuration of certain features. Its main elements are the web page design templates ready for use, the administrative panel, access security, what you see is what you get editor for content, configuration files, images, and also a data model, whose analysis is the main objective of this article.
Knowing the data model of a CMS is important in various situations. For instance: when you want to do some customization, improve performance, perform maintenance on the database, migrate the platform to ensure that resources are being well used and other tasks that support the web site built with CMS. Also, knowing the technical details related to the CMS data model used can facilitate the execution of tasks on a daily basis and also increase the level of experience and expertise of the professionals involved.
This first part will talk about what is a CMS, how they work and will explain the details of the WordPress data model, the most popular CMS used nowadays.
Content production with CMS
From the early popularization of the Internet until the end of the era that became known as the Internet bubble, the period roughly between 1995 and 2000, the main solutions for managing online content had a high total cost of ownership (TCO - Total Cost of Ownership, see DevMan Note 1), which covers both licenses and other financial aspects. Some analysts and historians believe that this high cost of ownership made it difficult to publish content on the web thus restricting the publication to only those who had had the resources or expertise to use one of the commercial CMSs available at the time.
DevMan Note 1. TCO – Total cost of Ownership
TCO (Total Cost of Ownership) is a financial estimate designed for consumers and business managers to evaluate the direct and indirect costs related to purchase of any capital investment, such as software and hardware, and the expense inherent of such products to keep them in operation, i.e. the costs involved to maintain what was purchased. A TCO assessment ideally offers a final statement reflecting not only the cost of purchase but all aspects of the use and maintenance of equipment, device, or system. This includes the costs of maintenance personnel, training of users, costs associated with failure, incidents, security, disaster preparedness and recovery, space, electricity, cost of development, infrastructure, testing, quality assurance, incremental growth, deactivation of equipment and others.
After the Internet bubble period some projects of CMS based on the ideas of free software began to emerge. These projects began to gain popularity, because once they are installed and configured they allow people without much technical knowledge to start publishing content. This period was characterized by the emergence and popularization of various blogs and also companies providing blog hosting services for free or for a small fee.
Analyzing what is a CMS one can see that it is a relatively simple web application. The main concept is to provide a platform that allows the customization of a web page layout in such a way that is easy to add content to be presented to users visiting the site. Next we are going to see the existing CMS meets these two objectives.
The possibility to customize the web site layout is one of the main attractions for those who develop for the web, because the CMS allows the pages viewed by the site users have virtually any layout or format. This is accomplished as follows: the CMS has a standard layout with certain pre-defined areas on the page. These areas contain source code that will search the database for the content to be placed in the correct area when the page is visualized by the user. The creation of a new layout for the CMS required these pre-defined areas. A layout with these areas (and the HTML, CSS, images and other) ready to be used by a CMS is called a template. There are several repositories on the web with a wide variety of templates created by the community and ready for use without additional cost. In addition to templates for layouts of sites, each CMS also has components created and made freely available by the community that extend the functionality of CMS. These components are known as extensions, plug-ins and add-ons. They also free and found on the official website of the specific CMS.
Note that the dynamic content, and some of CMS’s own settings, were initially stored in text files. Later the CMS’s developers adopted a database for storage, which guaranteed access to dynamic, fast and suitable for the management of content data and settings of the CMS.
The production of content itself is provided by an administrative area of the CMS that is accessed through a login. This content can vary, but generally it is based on text or images inserted by an editor with many text formatting features (bold, alignment, inserting links, etc.). For the CMSs that support e-commerce the content takes the form of products, categories, payment terms, prices and other common elements of a virtual store. Either way, the simplification and ease of insertion of content is the key to non-technical users publish web content, manage this content, and interact with users by responding to comments and other forms of dialogue with the site's visitors.
The content inserted by the site administrators, the CMS settings, and the interactions with the user are recorded in the tables of the CMS data model. There are also other types of information stored in the database, such as images used in advertising (banners), access information, purchase data and records that make the data model more complex and extensive.
In general, the use of a CMS does not require direct manipulation of the database because the platform is designed to simplify the production of content and minimize the necessary technical expertise to use it. However, this does not mean that we must forget the database. When a need arises that is not directly served by CMS or its extensions we need to change the source code of pages and also manipulate the database. Note that as the CMS generate HTML dynamically by using languages like PHP, ASP.NET and others, it is possible to perform several customizations that go beyond modifying the page layout and in these situations you need to know the data model used by the CMS.
Besides the need for modification of the CMS it is also necessary to know the database when there is a considerable amount of data to be stored and when it is necessary to optimize the performance of CMS. There are also several administrative tasks that must be performed on any database that works with official data on production such as: performing backups, monitoring of resources, re-indexing, monitoring growth of data files and log files, purge, replication, implementing fault tolerance, etc. These typical activities that are performed by a DBA should also be performed on the servers of databases used by a CMS. The focus of this paper is to study only the data models of some CMSs, but the reader who is interested in the common administrative tasks required by a database can find several articles that address the planning and execution of the main tasks administrative necessary to a database in many specialized publications.
The next sections of this paper discusses in detail the data models of the CMSs named WordPress 2.9, Joomla 1.6, Drupal 7 and Magento 1.12, as these four CMSs are the mostly used free software for publishing content on the web today. Then the next section presents a technical comparison between the CMS and its data models.
Before beginning to comment on the details of the models, it is worth noting that they were generated with the modeling tool MySQL Workbench, but the choice of this tool is purely for your convenience. Any CASE tool can be used as long as it can work with the data model properly. The models presented in the figures of this article contain only tables without relationships in order to facilitate the visualization of the entities.
The WordPress data model
WordPress is the most popular CMS today and can be used basically in two ways: 1) By the wordpress.com domain where you can create a blog for free and without much technical knowledge of its inner workings, and 2) by downloading and installing the WordPress available on the official website http://wordpress.org/.
An important detail is that during the WordPress installation you can choose what will be the prefix of the tables of the data model. The prefix wp_ for all the tables is automatically chosen by the installer if the user does not change it. Note that the official WordPress can only be used with the MySQL database server. Figure 1 presents the data model used in version 2.9 of WordPress with 11 tables.
Figure 1. The data model of WordPress 2.9.
The first point that stands out in the WordPress data model is the lack of foreign keys. This data model has no foreign key due to the fact that it was created using the MyISAM engine instead of the InnoDB engine. The lack of foreign keys mean that the relationship between the tables is not maintained by the database, but by the application. This lack of foreign key constraints and other database happens in WordPress and Joomla, but not in Drupal and Magento.
The lack of foreign keys allows inconsistent data in the database. For instance: a comment that has no associated blog post. Although this scenario is fictitious and only occur when someone changes the data directly from the database and not thought the WordPress admin panel, the possibility of inconsistency can cause problems, especially when it comes to data migration and security.
Another aspect that can greatly affect the performance of WordPress is the data type used in the tables. Suffice to say that the data types used in this model are too exaggerated, which can generate various performance problems when the database becomes charged. Special mention goes to the type unsigned BIGINT (20), used in primary keys and indexes. This data type stores values from 0 to 18,446,744,073,709,551,615 or 18 quintillion 446 quadrillion 744 trillion 709 billion 73 million 551 thousand and 615 about 10 raised to the power of 18 store in 20 bytes. I suppose that rarely a blog or other type of application reaches this number of posts or comments. This involves more overhead in the data pages and index, since all model tables contain a primary key with the BIGINT (20) data type.
Apart from typing too expensive there are also data redundancies since virtually all information related to data and time is stored in two columns with different data types. For example: the wp_comments table, which stores the posts' commentaries, contains the columns comment_date and comment_date_gmt, both typed with the DATETIME data type. If the goal is to store the date why not choose just only one column that stores the date in GMT format? Again, there are probably two columns due to compatibility with the legacy versions of WordPress.
Another aspect of the model is the lack of standardization because the WordPress data model is not standardized to the third normal form. This is because there are columns that bring aggregated and repeated data. For instance: the column comment_count of the table wp_posts. This column stores the amount of post's commentaries, a value that can be calculated by counting the wp_comments table rows in a SQL statement. The reason to store this information is probably related to the performance gain you get with a certain degree of model's denormalization.
Another interesting detail is that the data model has no domain integrity or stored procedures that restrict what is inserted in the data tables. Though not mandatory, this practice opens room for problems such as SQL injection where a malicious user may try to invade the site by using special characters in the URL or in the data input fields. Moreover, the lack of data integrity may invalidate the application. For example: insert the value -2 in column menu_order of the wp_posts table will result in a runtime error when the menu is created dynamically by the CMS. There are also some table columns where the NULL value is allowed, requiring an additional check on the application. While some may argue that it is the responsibility of the application, it is not uncommon to find databases with this type of domain integrity implemented, thus providing another layer of security and data consistency.
Another aspect to be mentioned in the model is the file storage attached the blog post. According to the model, the types of files are stored in the column post_mine_type of the table wp_posts which is the main table of the model. However, the files themselves are placed in a directory of WordPress installation. While there are advantages and disadvantages of storing files internally in the database, there are issues such as security, disk space, concurrent access and update the contents of these files that are outside the database realm, thus creating an additional task for the administrator .
In summary we can say that the WordPress model meets the blog posting needs but it is not an ideal model in terms of formal modeling. However, WordPress is still a relatively new application and with the support of the community can become even better, particularly in relation to its database model.
The production and content management for the Internet is currently carried out by content management tools known as CMS. These tools are ready platforms that can be customized to provide a suitable layout and also to meet other needs not covered by CMS or its components.
Because CMS is a ready platform to be used, they need a database for storing data related to the content, layout and its internal configurations. Based on the currently used major CMS (WordPress, Joomla, Drupal and Magento) this article presents their data models and discusses some technical aspects that may be of interest to those who work using these platforms for content management.
The article details various aspects of models such as typing, naming, organization, relationship, use of constraints, modeling techniques, storage and availability of each of the data models. From the explanations and criticisms of the models provided by this article, readers can begin to familiarize yourself with the layout and the data storage mechanism used by each CMS because this knowledge is useful when there is a need to implement new features in CMS or if it is necessary to perform some administrative task in the database that contains the CMS' data model.
This first part talked about the production of web content by using a CMS. Next, the article discussed about the details of the WordPress data model highlighting some positive and negative aspects.
In part two, we will see the Joomla data model.