Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Developer's Guide

Starter Kit

On the surface, XML looks like HTML. Both are derived from the Standard Generalized Markup Language (SGML). Tools that generate HTML can often be reused to generate XML.

XML is different from HTML in two key areas: syntax and semantics.

XML Syntax for Well-formed Documents

Both HTML and XML use <, >, and & to create element and attribute structures. While HTML browsers accept or ignore mangled markup language, XML parsers and applications built on those parsers are less forgiving. Errors in XML syntax halt document processing, and users or applications receive error messages, not a best-guess interpretation of the document structure.

XML documents must be well-formed. That is, they must follow rules for identifying document parts and creating nested element structures. These rules include:

XML Semantics

Although XML is unforgiving about syntax, it offers developers more options for defining meaning in XML documents. HTML is basically one vocabulary with a few variations; <b> always means the same thing to an HTML processor. With XML, you can create your own markup vocabulary or choose from markup vocabularies appropriate to your industry or project type. Schemas and document type definitions (DTDs) let you describe these vocabularies, but you can also create documents using vocabularies without formal definitions. Namespaces help you identify the vocabulary you are using.

This approach requires architectures different from those used by browsers. Developers cannot count on XML applications to understand what their markup means or how it is to be presented, understandings that were built into HTML browsers. Browsers can still present XML, but require a style sheet to format to your specifications. These style sheets are built using cascading style sheets (CSS) or XSL Transformations (XSLT). Some browsers, including Internet Explorer 5.0 and later, include a default style sheet, but it is designed more for diagnostics than for presenting information to end users.

XML applications can also bring their own logic to XML vocabularies, rather than relying on style sheets. This logic may take the form of simple scripts or binding to particular presentation modes, or it may involve writing an entire application from scratch. These applications can take advantage of their built-in knowledge of the labeled structures contained in XML documents to process the information in those documents, present them to users, connect them with other data sources, or redirect them to other appropriate consumers.

In This Section