July, 2003

Fun with XML

"XML is the Extensible Markup Language.  It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification. It is called extensible because it is not a fixed format like HTML (a single, predefined markup language).  Instead, XML is actually a 'metalanguage' — a language for describing other languages — which lets you design your own customized markup languages for limitless different types of documents.  XML can do this because it's written in SGML, the international standard metalanguage for text markup systems (ISO 8879). XML is intended 'to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web.' It defines 'an extremely simple dialect of SGML which is completely described in the XML Specification.  The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.' 'For this reason, XML has been designed for ease of implementation, and for interoperability with both SGML and HTML' [Quotes are from the XML specification].  XML is not just for Web pages: it can be used to store any kind of structured information, and to enclose or encapsulate information in order to pass it between different computing systems which would otherwise be unable to communicate. SGML is the Standard Generalized Markup Language (ISO 8879:1985), the international standard for defining descriptions of the structure of different types of electronic document.  There is an SGML FAQ at http://lamp.man.deakin.edu.au/sgml/sgmlfaq.txt which is posted every month to the comp.text.sgml newsgroup, and the SGML Web pages are at http://xml.coverpages.org/. SGML is very large, powerful, and complex.  It has been in heavy industrial and commercial use for over a decade, and there is a significant body of expertise and software to go with it.  XML is a lightweight cut-down version of SGML which keeps enough of its functionality to make it useful but removes all the optional features which make SGML too complex to program for in a Web environment." - taken from the FAQs at the XML FAQ site run by Peter Flynn (see first link in the bulleted list).

All of the above is nice, but what can it do for you? What it does best is structure the information that needs to be displayed. A simple piece of raw XML looks something like this:

- <CityDesk>
  <File CRC="-115249870">DoYouUnix.htm</File>
  <File CRC="714009886">ExtremeProgramming.htm</File>
  <File CRC="2088314137">YousayDotNETtheysay.NET.htm</File>
  <File CRC="-1493489515">SkillstoWatchin2003.htm</File>
  <File CRC="1109290001">TheJoyofPHP.htm</File>
  <File CRC="-61477954">fog0000000017.htm</File>
  <File CRC="-1771152429">fog0000000016.htm</File>
  <File CRC="866860536">fog0000000015.htm</File>
  <File CRC="1082219730">index.html</File>
  </CityDesk>

Which is probably recognizable as the XML file at the root of the Hot Skill of the Month section. It can also be used for interoperability. For example, openoffice.org is working on an XML interface for their documents. This means that if you created a spreadsheet in one application, but wanted to open it with another company's product, you wouldn't have to go through all the tedious mucking about with converters and filters. The other application could read the XML schema, parse your file correctly, and display it in the other application. (For more information on this project, check out this link.)

Here is an assortment of links about XML - have fun!