Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - XML Developer's Guide

Best Practices for Securing MSXML Code

This topic suggests best practices to make your MSXML applications more robust, and to reduce their vulnerability to malicious intruders.

Set the resolveExternals property to False when you create new DOM documents.

Be careful when handling file input and output.

Remember that XSLT is code.

Be aware of inherited security contexts from Internet Explorer and other host applications.

Check the length of character input and validate against a permitted range of characters.

Implement parse error handling in your code.

Set the resolveExternals property to False when you create new DOM documents.

When you create a new DOMDocument object, the default value for the resolveExternals property is True. This allows files that contain external definitions to be included and resolved as part of the XML document stream at parse time. For example, the following types of external files and resolvable definitions might be resolved and incorporated into your parsed document:

Unless you need or expect this behavior, you should set this property explicitly to False.

Note   Setting the resolveExternals property to False does not prevent your document from being validated upon parsing. This is determined by the value of the validateOnParse property.

Be careful when handling file input and output.

MSXML provides two DOM methods for working with file input and output:

Before you write code that deals with file input and output, you should be familiar with the details of how to design file handling code for the APIs you plan to use with these methods. In particular, you should understand the possibilities for loading or working with IStream objects if you reference and use them in your design. Because IStream objects can be marshaled to other processes, the data you store with them could potentially be cloned or shared to other applications, with unintended consequences.

For more information about working with the IStream interface, see "IStream – Compound File Implementation" in the Platform SDK.

Remember that XSLT is code.

XSL Transformations (XSLT) might appear to be a style sheet language, but it is actually a programming language. Therefore, many programs that are typically written in script or in languages such as Visual Basic or C/C++ could potentially be designed and written in XSLT.

To prevent problems, you should test your XSLT files as thoroughly as you would any other script or code module against corrupt or accidental input, such as unanticipated XML document types. Debug as necessary, and design and implement good error handling in your XSLT files. For more information, see the following topics:

In particular, safeguard your template designs against the possibility of an infinite recursion loop, in which two templates are written that match and point to each other. The XSLT processor in MSXML does not have a timeout, so when loops occur the application must be manually terminated to stop execution.

Be aware of inherited security contexts from Internet Explorer and other host applications.

MSXML inherits its first level of security from Internet Explorer, or from another immediate host application running under Windows. If that security is not set or in effect, MSXML imposes security based on the source context of the URL provided to locate a file.

For example, the following are three different contexts for loading a sample XML file, books.xml. The first is a local file system, the second is an intranet site, and the third is an Internet site.

C:\temp\books.xml
http://MyWorkgroupServer/books.xml
http://www.example.com/books.xml

For the first URL, MSXML assumes complete trust of the local file system. Access and control of the file are determined solely by the currently configured Windows file security settings, or by the system defaults.

For the second URL, the file is browseable (read-only), because the source is a local Web server on the same local intranet.

For the third URL, the source is an external Web server located using a DNS domain name on the Internet. In this case, MSXML blocks cross-domain interaction. For example, if example.com was the DNS domain requested in the URL, you would not be able to interact with another domain, such as microsoft.com.

For more information about the Internet Explorer security model, see the following topics in Internet Explorer Help:

Check the length of character input and validate against a permitted range of characters.

Many attacks on applications have occurred when string input goes unchecked or a buffer used to store it is overrun. In the worst case, Windows returns an access violation and the application stops responding. In the case of an intentional attack by a malicious user or application attempting to overrun a text input control on an application form, you should know that the MSXML parser fails without an error if more than 32 kilobytes of character or string input is passed to it. However, you might want to implement additional safeguards in your own form validation code for validating user input.

Implement parse error handling in your code.

Many simple applications that can be written using MSXML assume that DOM documents load successfully. For example, consider the following Visual Basic code. This code loads two documents, an XML file and an XSLT style sheet, and then performs a transformation using both files.

Begin Sub LoadButDoNotCheck
   Dim xmlDoc As New Msxml2.DOMDocument30
   Dim xslDoc As New Msxml2.DOMDocument30
   xmlDoc.load "books.xml"
   xslDoc.load "stylesheet.xsl"
   MsgBox xmlDoc.transformNode(xslDoc)
End Sub

In many cases this code might run without problems. However, it makes two assumptions that might not always be correct:

  1. Both the sample XML file (books.xml) and XSLT style sheet (stylesheet.xsl) are assumed to be available at the same path as the executing VBScript (.vbs) file or compiled Visual Basic application (.exe) file that contains this subroutine.
  2. Both the XML and XSLT documents are assumed to load successfully as well-formed XML before the call to the transformNode method. This method call requires both documents.

If any of these conditions are untrue, the subsequent lines of code fail, but in some instances they are unnecessarily executed anyway. You can rewrite this subroutine as follows, so that it handles errors as they occur:

Begin Sub LoadButCheckAndReportParseErrors
   Dim xmlDoc As New MSXML2.DOMDocument30
   Dim xslDoc As New MSXML2.DOMDocument30
   xmlDoc.Load "books.xml"
   If xmlDoc.parseError.errorCode = 0 Then
      xslDoc.Load "stylesheet.xsl"
      If xslDoc.parseError.reason = "" Then
         MsgBox xmlDoc.transformNode(xslDoc)
      Else
         MsgBox "Stylesheet.xsl did not load. " & _
            xslDoc.parseError.reason
      End If
   Else
      MsgBox "Books.xml did not load. " & _
            xmlDoc.parseError.reason
   End If
End Sub

Whenever possible, you should include this kind of parse error handling in code that loads and works with DOMDocument objects. Robust code takes longer to write, but it is easier and more efficient to maintain.