Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Developer's Guide

Elements

Elements form the backbone of XML documents, creating structures you can manipulate with programs or style sheets. Elements identify named sections of information and are built using markup tags that identify the name, start, and end of the element.

Elements can also contain attribute names and values, which provide additional information about your content. For more information, see Attributes.

Element Names

All elements must have names. Element names are case-sensitive and must start with a letter or underscore. An element name can contain letters, digits, hyphens, underscores, and periods.

Note   Colons are reserved for use with namespaces. For more information about which Unicode characters are acceptable letters and digits, see Appendix B of the XML specification.

Start Tags, End Tags, and Empty Tags

Tags establish boundaries around the content, if any, of the element.

Start tags indicate the beginning of an element and use the following general syntax.

<elementName att1Name="att1Value" att2Name="att2Value"...>

For elements that do not have attributes, the start tag can be reduced.

<elementName>

End tags indicate the end of an element and cannot contain attributes. End tags always take the following form.

</elementName>

An element is generally considered to include the start and end tags, and everything in between.

<person><givenName>Peter</givenName> <familyName>Kress</familyName></person>

In this case, the <person> element contains two other elements, <givenName> and <familyName>, along with a space that separates them. The <givenName> element contains the text Peter while the <familyName> element contains the text Kress.

Empty tags are used to indicate elements that have no textual content, though they can have attributes. The HTML img and br elements are examples of empty elements. Empty tags can be used as a shortcut when there is no content between the start and end tags of a document. Empty tags look like start tags, except that they contain a slash (/) before the closing >.

<elementName att1Name="att1Value" att2Name="att2Value".../>

In XML, you can indicate an empty element with start and end tags and no white space or content in between, for example, <giggle></giggle>, or you can use an empty tag, for example, <giggle/>. The two forms produce identical results in an XML parser.

Element Relationships

Relationships between elements are described using either family or tree metaphors. XML documents must contain a document element (also known as a root element). Although it can be preceded and followed by other markup, such as declarations, processing instructions, comments, and white space, the root must contain all of the content considered to be part of the document itself. For example, the following code can be an XML document with <person> as its root element.

<person><givenName>Stephanie</givenName> <familyName>Bourne</familyName></person>

The following fragment cannot be an XML document because it has multiple root elements.

<givenName>Stephanie</givenName>
<familyName>Bourne</familyName>
Note   Document fragments can be useful as parts of an XML document but should not be passed to the parser on their own. The parser will report an error when it encounters the second element or text outside of an element.

In the tree metaphor, the leaves refer to elements that do not contain any other elements, like leaves on the end of a branch. Leaf elements are generally elements containing only text or nothing at all; leaf nodes are generally empty elements or text. In the document map, all of the text describing the books is stored in leaf elements; the text itself is the leaf node.

Family metaphors, such as parent, child, ancestor, descendant, and sibling, are used to describe relationships between elements relative to each other, not necessarily to the entire document. The following abstract sample document illustrates the relationships between elements.

<a>
 <b>
   <c>
    <d/><e/><f/>
   </c>
 </b>
</a>

The <a> element contains the <b> element, which contains the <c> element, which contains the <d>, <e>, and <f> elements. Using the tree metaphor, <a> is the root element, and <d>, <e>, and <f> are leaf elements. Although <b> and <c> might be considered trunks or branches, these descriptions are rarely used.

The family metaphors provide more levels of description. The only siblings in this document are the <d>, <e>, and <f> elements, all of which are contained by the <c> element. The <c> element is the parent of the <d>, <e>, and <f> elements; the <d>, <e>, and <f> elements are the child elements of the <c> element. In the same way, the <b> element is the parent of the <c> element and the <c> element is the child of the <b> element, while the <a> element is the parent of the <b> element and the <b> element is the child of the <a> element.

Ancestors and descendants are defined in a way similar to parents and children, except that they do not have to contain or be contained directly. The <a> element is the parent of the <b> element, and the ancestor of every element in the document. The <d>, <e>, and <f> elements are descendants of the <a>, <b>, and <c> elements.

See Also

Document Map

Other Resources Other Resources

Appendix B, Character Classes