Free Online Courses for Software Developers - MrBool
× Please, log in to give us a feedback. Click here to login

You must be logged to download. Click here to login


MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation


MrBool is totally free and you can help us to help the Developers Community around the world

Yes, I'd like to help the MrBool and the Developers Community before download

No, I'd like to download without make the donation

Simulated data for complex relationships – Populating the database – Part 4

See this article a discussion of alternatives to fill up the database with test data

There is a pre-requisite to use a database: data. These data are stored according to the specification of a data model that is designed from the modeling generated from the analysis of the storage requirements. For example, a system that controls the access of employees in a company that replaces the old timecard will have data about people, times, dates, authorization, etc.

Most models of data is filled through the use of an interface system that allows the capture of input performed by manual typists/users or automatically performed by sensors or specialized equipment. But what to do when it is necessary to fill the database with simulated data? This situation is very common during validations, approvals, performance testing, exportation of the database and other situations. Unfortunately, the available databases in the market do not have adequate resources and tools to generate simulated data suitable for real data models.

Based on this scenario, this article will present some open-source alternatives to allow the generation of simulated data, also known as mass screening. The article also shows how to populate data in tables of three partial data models that in order to illustrate the needs of simulated data. To task of generate simulated data is crucial for any professional who works with developing and managing the database, since data generation is closely linked with the process of testing and approval.

This last part present two examples of simple data models that require simulated data. The models contain relationships with cardinalities 1:N and N:M and are discussed in the context o data generation.

Example 2: Relationship with the 1:N cardinality

This example discuss how to generate simulated data for two tables whose relationship has the cardinality 1:N. The tables chosen are VEHICLE and VEHICLE_USE belonging to a model that tracks information about the use of vehicles for a fleet management system in a big company. Figure 1 shows the tables VEHICLE_USE and VEHICLES related by foreign key and primary key constraints on the column ID_VEHICLE placed on both tables.

Figure 1. Tables VEHICLE_USE and VEHICLES in the project database to control fleet usage.

The generation of simulated data for these two tables must be made in two steps: first it is necessary to generates the data for the table VEHICLES and then insert the data in the table VEHICLE_USER. In this way we avoid generating data that are inconsistent with the constraint imposed by the foreign key.

The first step for generating data for the table VEHICLES is set up a generator to auto-increment the column ID_VEHICLE as this will cause no duplication of value for the primary key. Moreover, it is worth to have generators that are based on lists of values   for the columns MANUFACTURER, MODEL_YEAR, COLOR, INSURANCE_COMPANY, and TYPE_VEHICLE. The column NUMBER_INS_POLICY must be generated in accordance with the format used by insurance policy numbers. The values for the column PURCHASE_DATE must be created from generators that accepct a start date and a end date with the exception of specific dates where the purchase can not be made (weekends and holidays). The other columns can be generated typically by regular expressions and random values  .

With the data generated for the VEHICLES table now we must must fill up the table VEHICLE_USE. Sequential values    automatically incremented can be placed in the column ID_USE. It is required to get a list with the values   previously registered in the table VEHICLE and only then choose amont them to insert data in the column ID_VEHICLE of the table VEHICLE_USE. Other than this detail, the table VEHICLE_USE has no restrictions other than the following rule to the column value DATE_RETURN: the return date values must be greater than the value of the column DATE_GET because the returning date of the vehicle indicates when the vehicle was being used as a benefit. This constraint is easily specified when a range of start and end dates are defined in the tools.

Example 3: Relationship with the M:N cardinality

This example features the classic M:N relationship between a table that store orders and a table that store products. This type of relationship requires a third additional table to relate which products belong to which orders. The orders table is simple and it's called ORDERS, the products table is called PRODUCTS table and the relationship between then is made by a table called ORDER_ITENS. Figure 2 shows these three tables, their columns and relationships.

Figure 2. Classic parent-child relationship between the tables ORDERS, ORDER_ITENS and PRODUCTS.

To create simulated data for the tables in this type of relationship is necessary to separate the generation of simulated data in three steps: first you must generate the data to PRODUCTS, then the ORDERS and finally put the data in table ORDER_ITENS in order to comply with its two foreign keys placed in the columns ID_ORDER and ID_PRODUCT.

The generation of data for the PRODUCTS table has no major difficulties. It takes a sequential number generator self-countable for the column ID_PRODUCT, and generators for simple description (PROD_DESCRIPTION) and PROD_PRICE. The most difficult step is the use of lists of values   from one file to full up the description columns and also the specification of numerical values   with decimal places to the maximum and minimum prices. The ORDERS table is also simple and requires a sequential number generator for the ID_ORDER and a random value generator   for the dates in the ORDER _DATE column.

To create the relationship between the two tables and populate the ORDER_ITENS table one must respect two rules: the foreign keys and also the primary key. The foreign keys can be generated from the list of values   already created in the ORDERS and PRODUCTS tables without problems. But to guarantee that there will be no repetition is something a little more complex to be done with the tools to generate data. In this situation it is recommended to first generate the data with possible duplicated values and then perform a simple post-processing with a DELETE statement that search for duplications and removes these values before creating the primary key in the table ORDER_ITENS.


This article presented a discussion and examples of how to generate simulated data to a database. The generation of simulated data is important in various situations such as testing, approval, and transfers of the database without disclosure of sensitive data and others.

The first part of the article covered the theoretical aspects that should be considered prior to generating data. We discussed the need for representation of data, the quality and quantity, the environment, the implications and technical aspects involved in this type of task such as import options,  growth, and purging the transaction log data.

The second part discussed some alternatives to simulated data: how to work with custom scripts, some ready databases and presentation of two open source tools that can be used to automate the generation of simulated data: GenerateData and Spawner

The third part presented the dgMaster, a more complex Java tool used to perform complex generation of data. We also talked about a simple table and how to proceed when generating data for this table.

Finally, this last part presents two examples where it may be necessary to generate simulated data:  two tables with a relationship with cardinality 1:N and three tables involved in a relationship with cardinality N:M. The main details about the columns and specific types of data were discussed in order to generate the specific data in each data model.

To see part three, go to: 

Mauro Pichiliani has the Master of Science degree on collaborative systems by the Aeronatics Institute of Technology (ITA) in Brazil. He is a specialist on database technologies with more than 8 years of experience on the industry...

What did you think of this post?
To have full access to this post (or download the associated files) you must have MrBool Credits.

  See the prices for this post in Mr.Bool Credits System below:

Individually – in this case the price for this post is US$ 0,00 (Buy it now)
in this case you will buy only this video by paying the full price with no discount.

Package of 10 credits - in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download few videos. In this plan you will receive a discount of 50% in each video. Subscribe for this package!

Package of 50 credits – in this case the price for this post is US$ 0,00
This subscription is ideal if you want to download several videos. In this plan you will receive a discount of 83% in each video. Subscribe for this package!

> More info about MrBool Credits
You must be logged to download.

Click here to login