The following are some frequently asked questions about the Simple API for XML (SAX).
What are the system requirements for SAX2?
Can I pass a BSTR instead of a URL to the SAXXMLReader?
Why does the MXValidator CoClass contain methods and properties that are not implemented?
Why is white space reported as characters()? Why isn't ignorableWhitespace called?
How do I get XML header information?
How can I use SAXXMLWriter from scripts?
Can I continue parsing if my Visual Basic program breaks because the XML file has errors?
Can I use the same instance of SAXXMLReader to parse XML files sequentially?
How can I write from SAXXMLWriter in a memory buffer in nonUnicode encoding?
How can I reset SAXXMLWriter to create a new string?
How do I avoid appending a new XML document to the previous one?
How can I tell if attribute values have an entity reference?
Can I find the order of element attributes?
How do I handle errors with SAX?
The system requirements for SAX2 are the same as those for Installing and Registering the MSXML 5.0 SDK.
Yes. With MSXML 5.0, SAX supports validation to XSD schemas but does not support validation using Document Type Definition (DTD) files.
To validate documents when using SAX, you set the validation flag on the SAXXMLReader
through the putFeature
method.
When setting this feature, the feature name is "schema-validation"
and its value is set to True
like this:
oReader.putFeature("schema-validation", True)
This feature is read-only during parsing and read/write otherwise.
For more information, see Validate Documents Using SAX.
SAXXMLReader
and MXValidator
behave in very similar ways when used for validating SAX document streams. The only difference is that the SAX reader might provide some additional controls that the MXValidator
does not. Also, unlike the SAX reader, which allows you to toggle the error-reporting mode, MXValidator
always operates in "exhaustive-errors" mode and reports all errors.
MXValidator
is most useful when the data to be parsed is not in XML form, such as comma-delimited text, a Microsoft Word document, or a C-style data structure. The SAX reader cannot read any of these formats. If you need to parse non-XML data, you can implement your own data transformer using SAX interfaces, and connect to MXValidator
to verify the data stream. For an example, see the putProperty Method (MXValidator) sample.
MXValidator
inherits from the IMXFilter
interface, which is intended to provide a generic front end for SAX event filtering. For MXValidator
, only the ability to get and put features and properties is needed. Currently, only the contentHandler and errorHandler properties are implemented, but other SAX filters might be developed in the future.
You can pass a VARIANT containing a BSTR to ISAXXMLReader::parse(VARIANT)
. In this case, the encoding is UTF-16.
White space can occur in several places, for example, in an element without character data, which contains only child nodes and white space. To ignore white space, the parser must be able to distinguish those cases. The SAX parser is a nonvalidating parser and cannot distinguish those cases, so ignorableWhitespace()
never gets called. Nonvalidating parsers treat all white space between elements as characters.
The XML header contains version and encoding information, for example, <?xml version="1.0" encoding="UTF-8"?>
. To get XML header information, call ISAXXMLReader::getProperty([in] const wchar_t * pwchName, [out, retval] VARIANT * pvarValue);
and pass one of following three property values:
"xmldecl-encoding"
"xmldecl-version"
"xmldecl-standalone"
Note The"xmldecl-encoding"
,"xmldecl-version"
, and"xmldecl-standalone"
properties provide information about the presence and content of the XML header. The information is available only whenSAXXMLReader
reads and parses the XML document. After processing, the control returns to the application, and this information is no longer available.
XML header information was designed for low-level reader and parser use, not for applications.
To get the processing instruction, implement a ContentHandler
that supports ISAXContentHandler
and handles the processingInstruction
event.
SAXXMLReader
implements IVBSAXXMLReader
, which is accessible from scripts. You can call handler events directly from SAXXMLWriter
and generate XML without the reader.
All parser errors are fatal (E_FAIL). However, the classification of parsing errors in SAX is independent from the classification of errors in Microsoft® Visual Basic®. In SAX, a fatal parsing error means only that parsing cannot continue; in Visual Basic, a fatal error means that the application cannot continue. Use the ON ERROR statement in Visual Basic.
You can use the same instance of SAXXMLReader
to parse two XML files sequentially, but not in different threads. The MSXML implementation does not support multithreaded use. AddRef/Release are not multithread-safe and there is no locking on any of the API entry points.
However, you can use two instances of SAXXMLReader
in two threads, and parse two different XML files, as long as nothing gets shared.
If "memory buffer" is a Visual Basic string, you cannot write to it because it is always in Unicode format. However, you can provide an IStream
/ISequentialStream
object, which writes to a memory buffer. XML will be generated the same way as for output into a file.
To reset SAXXMLWriter to create a new string:
To reset XML writer to create a new string, reset the output
property. Internally, the flush
method of IMXXMLWriter
will be called.
There is no indication of whether attribute values have an entity reference.
The order of attributes is not important in XML, and is therefore not exposed. Enumeration with attributes may follow the original order of the attributes.
ISAXErrorHandler
/IVBSAXErrorHandler
provides the basic interface for handling parsing errors. Currently, all errors are fatal.
In C++, a fatal error will result in returning a value other than S_OK HRESULT from the parse
or parseURL
method.
In Visual Basic, the On Error statement handles exceptions.