UNIVERSITY OF MARYLAND UNIVERSITY COLLEGE
Web Engineer Bob Betterton
RDB PRIME INC.
Thoughts and Ideas About XML
August 20, 2003
Robert D. Betterton
XML is Not HTML
If you've had some experience writing HTML documents, you should pay close
attention to XML's rules for elements. Shortcuts you can get away with in
HTML, like forgeting a closing tag, are not allowed in XML. Some important
changes you should take note of include:
- Element names are case-sesnsitive in XML. HTML allows you to wirte tags
in whatever case you want.
- In XML, container elements always require both a start and an end tag.
In HTML, on the other hand, you can drop the end tag in some
cases.
- Empty XML elements require a slash before the right bracket (i.e.,
<examle/>), whereas HTML uses a lone start tag with no fiinal
slash.
- XML elements treat whitespace as part of the content, preserving it unless
they are explicitly told not to. But in HTML, most elements throw away
extra spaces and lin breaks when formitting content in the
browser.
Unlike many HTML elements, XML elements are based strictly on function, and
not on format. You should not assume any kind of formatting or presentational
style based on markup alone. Instead, XML leaves presentation for stylesheets,
which are separate documents that map the elements to styles.
Unlearning Bad Habits
Whereas HTML browsers often ignore simple errors in documents, XML applications
are not nearly as forgiving. For the HTML reader, there are a few bad habits
from which we should first dissuade you:
Attribute values must be in quotation marks -- You can't specify an attribute
value such as <picture src=/images/blueball.gif>, an error that HTML
browsers often overlooked. An attribute value must be inside single or double
quotation marks, or the XML parser will flag it as an error. Here is the
correct way to specify such a tag:
<picture src="/images/blueball.gif">
A non-empty element must have an opening and closing tage -- Each element that
specifies an opening tag must have a closing tag that matches it. If it
does not, and it is not an empty element, the XML parser generates an error.
In other words, yu cannot do the following:
<Paragraph>
This is a paragraph
<Paragraph>
This is another paragraph.
Instead, you must have an opening and closing tag for each paragraph element:
<Paragraph>This is a paragraph.</Paragraph>
<Paragraph>This is another paragraph.</Paragraph>
Tags must be nested correcly -- It is illegal to do the following:
<Italic><Bold>This is incorrect</Italic></Bold>
The closing tag for the Bold element should be inside the closing tag for
the Italic element, to match the nearest opening tag and preserve the
correct element nesting. It is essential for the application parsing your XML
to process the hierarchy of the elements:
<Italic><Bold>This is correct</Bold></Italic>
These syntactic rules are the source of many common errors in XML, especially
given that some of this behavior can be ignored by HTML browser parsers. An XML
document that adheres to these rules (and a few others) is said to be
well-formed.
|