Written by Ramón Saquete
Índice
XML, an acronym for eXtensible Markup Language, is a markup language designed to be easy to use by both machines and humans.
A markup language is a language that adds markup, tags or elements to a document to add additional information to it. So XML is a computer language for markup but not for programming. Expressions such as “programming an XML” or “XML programming” are therefore incorrect, since programming languages are used to execute algorithms by means of control structures that direct the flow of programs, and this is something much more complex than simply add information to a document.
Example of XML document
Let’s see an example of XML of the sitemap type, used to list to Google the URLs of a site. In the example, we have the marks in bold and these tell us which part of the text is the URL and which part is the last modification date:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://www.humanlevel.com/blog.html</loc> <lastmod>2019-10-02T10:15:48+02:00</lastmod> </url> </urlset>
Difference between valid and well-formed XML
Every XML document must follow a set of rules to be considered a well-formed document. For example, every XML document must have a root tag (in the case of the example, <urlset>). In addition, it must comply with the rules specified for the type of XML document being written (XHMTL, Sitemap, etc.).
Commonly, these rules are defined with documents also written in XML following the XML standard. XML Schema (in the example we have the XML Schema of Sitemaps declared in the root element, in the so-called name space), so that XML is a meta-language, since it allows the definition of languages and that is the reason why it bears the word extensible in the name itself.
Having an XML document and the XML Schema document that defines it, we can validate its correctness with an automatic tool. This makes it the ideal format to be interpreted by machines. Only if the document is well-formed can it be validated against the rules of its format and be considered valid, if it meets the requirements.
The validation tool will tell us whether the document is well formed and, if so, whether it is valid or not.
Relationship to XHTML
In the past, it has also been widely used in DTD standard (Document Type Definition), instead of XML Schema, as the language for defining the rules of the different XML formats, especially before the advent of HTML5, when at the beginning of each XHTML it was specified which set of rules it should follow, establishing the URL of the DTD where they were defined:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!-- the XHTML document body starts here--> <html xmlns="http://www.w3.org/1999/xhtml"> ... </html>
DTDs are not in XML, but in SGML (Standard Generalized Markup Language), which is an earlier language, more powerful and complex than XML. However, SGML is also a meta-language for defining XML documents, since SGML is a superset of XML.
In HTML5 the DTD is not defined, leaving only the empty declaration, which indicates that the latest version of the specification must be followed:
<!DOCTYPE html>
But HTML5 documents are not in XML because, although it is also a similar markup language, it allows a looser syntax than XML and supports malformed documents.
When we follow strict XML syntax rules in an HTML5 document, we can validate it to confirm that it is XML and serve it to the user as such. This is called XHTML5 This is a technology that is not used, because although the browser interpreter can process it faster, a single undetected failure or a document half downloaded due to a connection outage will cause the user to see a blank or error page.
Practical applications
The use of XML as a common format for information exchange between different data sources is necessary and very common to create solutions that would be impossible without an intermediate document format between the source and the target.
For example, if Google developers were to read the data they display in Google Merchant by taking it directly from the databases of the stores that offer it, they would have to implement millions of data transformations, one for each type of database, to the Google Merchant database. Whereas by establishing a common XML format, they only need to implement one transformation, which translates Google Merchant’s XML format into their own database.
Its ease of use and extensibility are the reasons why XML is used in a multitude of open standards (RSS, SVG, MathML, XHTML, Sitemaps, Google Merchant Feed, the SOAP protocol used by websites to communicate with WebServices, etc.), as well as in open and closed standards for the transfer of internal information (information on available hotel reservations, products for Dropshipping, etc.).