The reason that W3C consortium allowed to combine the other technologies with HTML, is to make dynamic web pages which support more interactivity and presentation features in a web page. But there are some limitations in HTML, even after addition of these features, such as because it is not extensible, it cannot be used for special purposes such as presentation of chemical and mathematical formulas etc. HTML is also weak to describe the meaning and structure of a document’s content. To address these problems, XML is used. It can be used to create new markup languages because it can define arbitrary sets of tags to describe the document’s structure. XML is a meta language which allows user to create their own custom tags best suited for their specific application domain.
Basically, XML documents are used to exchange and describe data. The area of applications, where XML is used, are document exchange in e-commerce, electronic data exchange, B2B transactions etc. In this discussion, we introduce some most important features of XML.
Well-Formed XML Documents
To be processed correctly, every XML documents must follow some rules used to define a well-formed structured document. These rules are:
- Every XML document must start with a line describing the languages version.
- An opening and closing tag must be used to delimit each tag. The content between these tags are called values and can be text or another tag. If a tag don’t have any closing tag but having “/” symbol at the end of the tag (e.g., ), this tag is called an empty tag.
- All the elements must be nested in a root element and before closing its inner element, outer element cannot be closed.
- Elements may contain attributes with values delimited by quotes (“ ”).
A well-formed XML document representing scientific article’s data is given in Listing 1 below:
Listing 1: Representing the well-formed XML documents
<?xml version = "1.0" ?> <article> <title> An example of well-formed XML document to represent data about a scientific article </title> <author> Alok Shuklal </author> <author> Ravindra Panday </author> <publishing category = "Journal" year = "2013"> <publication> Where ever it can be published </publication> </publishing> <section title = "Introduction"> <text> Here we provide the text to introduce the article </text> <figure file = "figure-sec1.jpg"/> </section> <section title = "Rationale and Background"> <text> Text used for the section Rationale and Background </text> <codeExample> Code Example </codeExample> ... ... ... ... <section title = "References" type = "bibliography"> <text> Text for the section References </text> </section> </article>
In listing 1, the document starts with XML version declaration, the root element. , then other elements defining the structure of the documents.
Document Type Definition (DTD)
A DTD is used to define what element can be used in am XML document and the values, permitted in those elements. A DTD is used to address the validity concern of an XML document by associating it with an XML document to define the format of the document’s class. In Listing 2, a DTD is represented for the XML document of Listing 1:
Listing 2: Representing the DTD for XML document of Listing 1
<!ELEMENT article (title, author+ , publishing?, section+)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT publishing (publication)> <!ELEMENT publication (#PCDATA)> <!ELEMENT section (text+, figure*, codeExample*)> <!ELEMENT text (#PCDATA)> <!ELEMENT figure EMPTY> <!ELEMENT codeExample (#PCDATA)> <!ATTLIST publishing year CDATA #REQUIRED category (conference|journal|book) #IMPLIED> <!ATTLIST section title CDATA #REQUIRED type (abstract|bodySection|bibliography)#IMPLIED> <!ATTLIST figure file CDATA #REQUIRED>
In this DTD, <article> tag is defined as an complex element, which has a set of sub-elements: one sub-element of type title, author element has one or more sub-type, denoted by a “+”, publishing element can have zero or one sub-element, denoted by a “?” and section element. For example, one or more type sub-elements or, one or zero sub-elements of type can be used to compose element.
Some constraints on attribute values are defined by ATLIST clause. For example, in DTD, the attribute year of the element consists character data (CDATA) and mandatory (#REQUIRED), where the attribute category is optional (#IMPLIED).
We can use embed a DTD into an XML document or this can be an external file which can be referenced in the document. For example, in below example code an DTD file Is being imported in an XML document:
<?xml version="1.0"?> <!DOCTYPE article SYSTEM "article.dtd"> <article> ... </article>
In the above example code, an DTD article.dtd is imported in second line of code.
XML Schema Definition
There are some limitations with DTD’s. DTD is not used for declaration of data types, only the definition of textual contents is allowed for XML elements. Because of this, XML parsers are prevented from controlling the element contents and attribute values. A DTD do not tell about the number of appearance of an element in a document. Also defining the exact order of the elements is not possible in the document. One more problem with DTD’s is that we cannot refer more than one DTD in XML documents so we cannot create new documents by merging other documents. Also it is problematic scenario for developers to familiar with DTD because the syntax of DTD is different from XML.
To solve these problems, the concept of XSD or XML schema definition is introduced. Just like DTD, XSD is also used to define the structure of the elements of an XML document but unlike the DTD, it is a valid XML document itself, use a standard set of tags for element declaration.
An XSD is provided in Listing 3, specifying the syntax of XML document of listing 1:
Listing 3: An XSD of XML document of listing 1
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:xs = "http :// www.w3.org /2001/XMLSchema"> <xs:element name = "article"> <xs:complexType> <xs:sequence> <xs:element ref = "title"/> <xs:element ref = "author" maxOccurs = "unbounded"/> <xs:element ref = "publishing" minOccurs = "0"/> <xs:element ref = "section" maxOccurs = "unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "title" type = "xs:string"/> <xs:element name = "author" type = "xs:string"/> <xs:element name = "publishing"> <xs:complexType> <xs:sequence> <xs:element name = "publication" type = "xs:string"/> </xs:sequence> <xs:attribute name = "category" type = "xs:NCName"/> <xs:attribute name = "year" use = "required" type = "xs:integer"/> </xs:complexType> </xs:element> <xs:element name = "section"> <xs:complexType> <xs:sequence> <xs:element name = "text" type = "xs:string"/> <xs:choice minOccurs = "0"> <xs:element name = "codeExample" type = "xs:string"/> <xs:element name = "figure"> <xs:complexType> <xs:attribute name = "file" use = "required" type = "xs:NCName"/> </xs:complexType> </xs:element> </xs:choice> </xs:sequence> <xs:attribute name = "title" use = "required"/> <xs:attribute name = "type" type = "xs:NCName"/> </xs:complexType> </xs:element> </xs:schema>
A namespace is used to provide an identifier for univocally referencing a set of names for elements and attributes. This identifier is responsible for differentiating between the names belonging to different sets. So to use elements from different sets in a document, each namespace and the schema defining it, should be referenced.
In the example code of Listing 4, xmlns:xsi attribute is used to import the definition of the XSD tags and xmlns:art and xmlns:biblio are used to describe the namespaces the documents refers to and prefixes art and biblio are used within the documents as namespace identifiers. Then an article section of type bibliography is represented by using two previously imported namespaces: art and biblio.
Listing 4: Representation of the Namespace in the article XML document
<article xmlns:xsi = " http:// www.w3.org/ 2001/XMLSchema-instance " xmlns:art = " http:// www.mysite.com/article-xml/article " xmlns:biblio = " http://www.mysite.com/article-xml/bibliography " xsi:schemaLocation = " http://www.dominio.it/xml/article/article.xsd " xsi:schemaLocation = " http://www.dominio.it/xml/bibliography/biblio.xsd "> ... ... ... ... <art:section title = " References " type = " bibliography "> <biblio:bibliography> <biblio:author> Ravindra Panday </biblio:author> <biblio:author> Alok Shukla </biblio:author> <biblio:title> Mashing Up Context-Aware Web Applications </biblio:title> <biblio:year> 2013 </biblio:year> </biblio:bibliography> </art:section> </article>
In this discussion, we tried to understand various aspects of XML: such as why it is used, its general architecture, about DTD and XSD and finally about namespaces. As we know XML is itself used to create markup languages so it is a must know topic for every developer, wants to be well versed in engineering web applications.