This topic suggests best practices to make your MSXML applications more robust, and to reduce their vulnerability to malicious intruders.
Set the resolveExternals property to False when you create new DOM documents.
Be careful when handling file input and output.
Be aware of inherited security contexts from Internet Explorer and other host applications.
Check the length of character input and validate against a permitted range of characters.
Implement parse error handling in your code.
When you create a new DOMDocument
object, the default value for the resolveExternals
property is True
. This allows files that contain external definitions to be included and resolved as part of the XML document stream at parse time. For example, the following types of external files and resolvable definitions might be resolved and incorporated into your parsed document:
Unless you need or expect this behavior, you should set this property explicitly to False
.
Note Setting theresolveExternals
property toFalse
does not prevent your document from being validated upon parsing. This is determined by the value of thevalidateOnParse
property.
MSXML provides two DOM methods for working with file input and output:
load
method on a DOMDocument
object.save
method on a DOMDocument
object.Before you write code that deals with file input and output, you should be familiar with the details of how to design file handling code for the APIs you plan to use with these methods. In particular, you should understand the possibilities for loading or working with IStream
objects if you reference and use them in your design. Because IStream
objects can be marshaled to other processes, the data you store with them could potentially be cloned or shared to other applications, with unintended consequences.
For more information about working with the IStream
interface, see "IStream Compound File Implementation" in the Platform SDK.
XSL Transformations (XSLT) might appear to be a style sheet language, but it is actually a programming language. Therefore, many programs that are typically written in script or in languages such as Visual Basic or C/C++ could potentially be designed and written in XSLT.
To prevent problems, you should test your XSLT files as thoroughly as you would any other script or code module against corrupt or accidental input, such as unanticipated XML document types. Debug as necessary, and design and implement good error handling in your XSLT files. For more information, see the following topics:
In particular, safeguard your template designs against the possibility of an infinite recursion loop, in which two templates are written that match and point to each other. The XSLT processor in MSXML does not have a timeout, so when loops occur the application must be manually terminated to stop execution.
MSXML inherits its first level of security from Internet Explorer, or from another immediate host application running under Windows. If that security is not set or in effect, MSXML imposes security based on the source context of the URL provided to locate a file.
For example, the following are three different contexts for loading a sample XML file, books.xml. The first is a local file system, the second is an intranet site, and the third is an Internet site.
C:\temp\books.xml http://MyWorkgroupServer/books.xml http://www.example.com/books.xml
For the first URL, MSXML assumes complete trust of the local file system. Access and control of the file are determined solely by the currently configured Windows file security settings, or by the system defaults.
For the second URL, the file is browseable (read-only), because the source is a local Web server on the same local intranet.
For the third URL, the source is an external Web server located using a DNS domain name on the Internet. In this case, MSXML blocks cross-domain interaction. For example, if example.com was the DNS domain requested in the URL, you would not be able to interact with another domain, such as microsoft.com.
For more information about the Internet Explorer security model, see the following topics in Internet Explorer Help:
Many attacks on applications have occurred when string input goes unchecked or a buffer used to store it is overrun. In the worst case, Windows returns an access violation and the application stops responding. In the case of an intentional attack by a malicious user or application attempting to overrun a text input control on an application form, you should know that the MSXML parser fails without an error if more than 32 kilobytes of character or string input is passed to it. However, you might want to implement additional safeguards in your own form validation code for validating user input.
Many simple applications that can be written using MSXML assume that DOM documents load successfully. For example, consider the following Visual Basic code. This code loads two documents, an XML file and an XSLT style sheet, and then performs a transformation using both files.
Begin Sub LoadButDoNotCheck Dim xmlDoc As New Msxml2.DOMDocument30 Dim xslDoc As New Msxml2.DOMDocument30 xmlDoc.load "books.xml" xslDoc.load "stylesheet.xsl" MsgBox xmlDoc.transformNode(xslDoc) End Sub
In many cases this code might run without problems. However, it makes two assumptions that might not always be correct:
transformNode
method. This method call requires both documents.If any of these conditions are untrue, the subsequent lines of code fail, but in some instances they are unnecessarily executed anyway. You can rewrite this subroutine as follows, so that it handles errors as they occur:
Begin Sub LoadButCheckAndReportParseErrors Dim xmlDoc As New MSXML2.DOMDocument30 Dim xslDoc As New MSXML2.DOMDocument30 xmlDoc.Load "books.xml" If xmlDoc.parseError.errorCode = 0 Then xslDoc.Load "stylesheet.xsl" If xslDoc.parseError.reason = "" Then MsgBox xmlDoc.transformNode(xslDoc) Else MsgBox "Stylesheet.xsl did not load. " & _ xslDoc.parseError.reason End If Else MsgBox "Books.xml did not load. " & _ xmlDoc.parseError.reason End If End Sub
Whenever possible, you should include this kind of parse error handling in code that loads and works with DOMDocument
objects. Robust code takes longer to write, but it is easier and more efficient to maintain.