Accessing and Manipulating XML Data
Date: Oct 3, 2003
Terms you'll need to understand:
- DiffGram
- Document Object Model (DOM)
- Document Type Definition (DTD)
- Valid XML
- Well-formed XML
- XPath
Techniques you'll need to master:
- Retrieving information from XML files by using the Document Object Model, XmlReader class, XmlDocument class, and XmlNode class
- Synchronizing DataSet data with XML via the XmlDataDocument class
- Executing XML queries with XPath and the XPathNavigator class
- Validating XML against XML Schema Design (XSD) and Document Type Definition (DTD) files
- Generating XML from SQL Server databases
- Updating SQL Server databases with DiffGrams
You can't use the .NET Framework effectively unless you're familiar with XML. XML is pervasive in .NET, and it's especially important for the distributed applications covered on the 70-310 exam. The System.Xml namespace contains classes to parse, validate, and manipulate XML. You can read and write XML, use XPath to navigate through an XML document, or check to see whether a particular document is valid XML by using the objects in this namespace.
NOTE
In this chapter, I've assumed that you're already familiar with the basics of XML, such as elements and attributes. If you need a refresher course on XML basics, refer to Appendix B, "XML Standards and Syntax."
Accessing an XML File
In this section, you'll learn how to extract information from an XML file. I'll start by showing you how you can use the XmlReader object to move through an XML file, extracting information as you go. Then you'll see how other objects, including the XmlNode and XmlDocument objects, provide a more structured view of an XML file.
I'll work with a very simple XML file named Books.xml that represents three books a computer bookstore might stock. Here's the raw XML file:
<?xml version="1.0" encoding="UTF-8"?> <Books> <Book Pages="1109"> <Author>Gunderloy, Mike</Author> <Title>Exam 70-306 Training Guide</Title> <Publisher>Que</Publisher> </Book> <Book Pages="357"> <Author>Wildermuth, Shawn</Author> <Title>Pragmatic ADO.NET</Title> <Publisher>Addison-Wesley</Publisher> </Book> <Book Pages="484"> <Author>Burton, Kevin</Author> <Title>.NET Common Language Runtime Unleashed</Title> <Publisher>Sams</Publisher> </Book> </Books>
Understanding the DOM
The Document Object Model, or DOM, is an Internet standard for representing the information contained in an HTML or XML document as a tree of nodes. Like many other Internet standards, the DOM is an official standard of the World Wide Web Consortium, better known as the W3C. You can find it at http://www.w3.org/DOM.
In its simplest form, the DOM defines an XML document as consisting as a tree of nodes. The root element in the XML file becomes the root node of the tree, and other elements become child nodes. The DOM provides the standard for constructing this tree, including a classification for individual nodes and rules for which nodes can have children.
TIP
In the DOM, attributes are not represented as nodes within the tree. Rather, attributes are considered to be properties of their parent elements. You'll see later in the chapter that this is reflected in the classes provided by the .NET Framework for reading XML files.
Using an XmlReader Object
The XmlReader class is designed to provide forward-only, read-only access to an XML file. This class treats an XML file similar to the way a cursor treats a resultset from a database. At any given time, there is one current node within the XML file, represented by a pointer that you can move around within the file. The class implements a Read method that returns the next XML node to the calling application. The XmlReader class has many other members, as shown in Table 3.1.
Table 3.1 Important Members of the XmlReader Class
Member |
Type |
Description |
Depth |
Property |
The depth of the current node in the XML document |
EOF |
Property |
A Boolean property that is True when the current node pointer is at the end of the XML file |
GetAttribute |
Method |
Gets the value of an attribute |
HasAttributes |
Property |
True when the current node contains attributes |
HasValue |
Property |
True when the current node is a type that has a Value property |
IsEmptyElement |
Property |
True when the current node represents an empty XML element |
IsStartElement |
Method |
Determines whether the current node is a start tag |
Item |
Property |
An indexed collection of attributes for the current node (if any) |
MoveToElement |
Method |
Moves to the element containing the current attribute |
MoveToFirstAttribute |
Method |
Moves to the first attribute of the current element |
MoveToNextAttribute |
Method |
Moves to the next attribute |
Name |
Property |
The qualified name of the current node |
NodeType |
Property |
The type of the current node |
Read |
Method |
Reads the next node from the XML file |
Skip |
Method |
Skips the children of the current element |
Value |
Property |
The value of the current node |
The XmlReader class is a purely abstract class. That is, this class is marked with the MustInherit modifier; you cannot create an instance of XmlReader in your own application. Generally, you'll use the XmlTextReader class instead. The XmlTextReader class implements XmlReader for use with text streams. Here's how you might use this class to dump the nodes of an XML file to a ListBox control:
Private Sub btnReadXml_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnReadXML.Click Dim intI As Integer Dim intJ As Integer Dim strNode As String ' Create a new XmlTextReader on the file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Walk through the entire XML file Do While xtr.Read If (xtr.NodeType = XmlNodeType.Element) Or _ (xtr.NodeType = XmlNodeType.Text) Then strNode = "" For intI = 1 To xtr.Depth strNode &= " " Next strNode = strNode & xtr.Name & " " strNode &= xtr.NodeType.ToString If xtr.HasValue Then strNode = strNode & ": " & xtr.Value End If lbNodes.Items.Add(strNode) ' Now add the attributes, if any If xtr.HasAttributes Then While xtr.MoveToNextAttribute strNode = "" For intI = 1 To xtr.Depth strNode &= " " Next strNode = strNode & xtr.Name & " " strNode &= xtr.NodeType.ToString If xtr.HasValue Then strNode = strNode & ": " & _ xtr.Value End If lbNodes.Items.Add(strNode) End While End If End If Loop ' Clean up xtr.Close() End Sub
Figure 3.1 shows the view of the sample Books.xml file produced by this code.
Figure 3.1 An XML file translated into schematic form by an XmlTextReader object.NOTE
This and other examples in this chapter assume that the XML file is located in the bin folder of your Visual Basic .NET project.
The DOM includes nodes for everything in the XML file, including the XML declaration and any whitespace (such as the line feeds and carriage returns that separate lines of the files). On the other hand, the node tree doesn't include XML attributes, though you can retrieve them from the parent elements. However, the DOM and the XmlTextReader are flexible enough that you can customize their work as you like. Note the use of the NodeType property and the MoveToNextAttribute method in this example to display just the elements, text nodes, and attributes from the file.
CAUTION
Alternatively, you can retrieve attributes by using the Item property of the XmlTextReader. If the current node represents an element in the XML file, the following code will retrieve the value of the first attribute of the element:
xtr.Items(0)
This code will retrieve the value of an attribute named Page:
xtr.Item("Page")
The XMLNode Class
The code you saw in the previous example deals with nodes as part of a stream of information returned by the XmlTextReader object. But the .NET Framework also includes another class, XmlNode, that can be used to represent an individual node from the DOM representation of an XML document. If you instantiate an XmlNode object to represent a particular portion of an XML document, you can alter the properties of the object and then write the changes back to the original file. The DOM provides two-way access to the underlying XML in this case.
NOTE
In addition to XmlNode, the System.Xml namespace also contains a set of classes that represent particular types of nodes: XmlAttribute, XmlComment, XmlElement, and so on. These classes all inherit from the XmlNode class.
The XmlNode class has a rich interface of properties and methods. You can retrieve or set information about the entity represented by an XmlNode object, or you can use its methods to navigate the DOM. Table 3.2 shows the important members of the XmlNode class.
Table 3.2 Important Members of the XmlNode Class
Member |
Type |
Description |
AppendChild |
Method |
Adds a new child node to the end of this node's list of children |
Attributes |
Property |
Returns the attributes of the node as an XmlAttributeCollection |
ChildNodes |
Property |
Returns all child nodes of this node |
CloneNode |
Method |
Creates a duplicate of this node |
FirstChild |
Property |
Returns the first child node of this node |
HasChildNodes |
Property |
True if this node has any children |
InnerText |
Property |
The value of the node and all its children |
InnerXml |
Property |
The markup representing only the children of this node |
InsertAfter |
Method |
Inserts a new node after this node |
InsertBefore |
Method |
Inserts a new node before this node |
LastChild |
Property |
Returns the last child node of this node |
Name |
Property |
The name of the node |
NextSibling |
Property |
Returns the next child of this node's parent node |
NodeType |
Property |
The type of this node |
OuterXml |
Property |
The markup representing this node and its children |
OwnerDocument |
Property |
The XmlDocument object that contains this node |
ParentNode |
Property |
Returns the parent of this node |
PrependChild |
Method |
Adds a new child node to the beginning of this node's list of children |
PreviousSibling |
Property |
Returns the previous child of this node's parent node |
RemoveAll |
Method |
Removes all children of this node |
RemoveChild |
Method |
Removes a specified child of this node |
ReplaceChild |
Method |
Replaces a child of this node with a new node |
SelectNodes |
Method |
Selects a group of nodes matching an XPath expression |
SelectSingleNode |
Method |
Selects the first node matching an XPath expression |
WriteContentTo |
Method |
Writes all children of this node to an XmlWriter object |
WriteTo |
Method |
Writes this node to an XmlWriter |
The XmlDocument Class
You can't directly create an XmlNode object that represents an entity from a particular XML document. Instead, you can retrieve XmlNode objects from an XmlDocument object. The XmlDocument object represents an entire XML document. By combining the XmlNode and XmlDocument objects, you can navigate through the DOM representation of an XML document. For example, you can recursively dump the contents of an XML file to a ListBox control with this code:
Private Sub btnReadXML_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnReadXML.Click Dim intI As Integer Dim intJ As Integer Dim strNode As String ' Create a new XmlTextReader on the file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Load the XML file to an XmlDocument xtr.WhitespaceHandling = WhitespaceHandling.None Dim xd As XmlDocument = New XmlDocument() xd.Load(xtr) ' Get the document root Dim xnodRoot As XmlNode = xd.DocumentElement ' Walk the tree and display it Dim xnodWorking As XmlNode If xnodRoot.HasChildNodes Then xnodWorking = xnodRoot.FirstChild While Not IsNothing(xnodWorking) AddChildren(xnodWorking, 0) xnodWorking = xnodWorking.NextSibling End While End If ' Clean up xtr.Close() End Sub Private Sub AddChildren(ByVal xnod As XmlNode, _ ByVal Depth As Integer) ' Add this node to the listbox Dim strNode As String Dim intI As Integer Dim intJ As Integer Dim atts As XmlAttributeCollection ' Only process Text and Element nodes If (xnod.NodeType = XmlNodeType.Element) Or _ (xnod.NodeType = XmlNodeType.Text) Then strNode = "" For intI = 1 To Depth strNode &= " " Next strNode = strNode & xnod.Name & " " strNode &= xnod.NodeType.ToString strNode = strNode & ": " & xnod.Value lbNodes.Items.Add(strNode) ' Now add the attributes, if any atts = xnod.Attributes If Not atts Is Nothing Then For intJ = 0 To atts.Count - 1 strNode = "" For intI = 1 To Depth + 1 strNode &= " " Next strNode = strNode & _ atts(intJ).Name & " " strNode &= atts(intJ).NodeType.ToString strNode = strNode & ": " & _ atts(intJ).Value lbNodes.Items.Add(strNode) Next End If ' And recursively walk ' the children of this node Dim xnodworking As XmlNode If xnod.HasChildNodes Then xnodworking = xnod.FirstChild While Not IsNothing(xnodworking) AddChildren(xnodworking, Depth + 1) xnodworking = xnodworking.NextSibling End While End If End If End Sub
The XmlDocument class includes a number of other useful members. Table 3.3 lists the most important of these.
Table 3.3 Important Members of the XmlDocument Class
Member |
Type |
Description |
CreateAttribute |
Method |
Creates an attribute node |
CreateElement |
Method |
Creates an element node |
CreateNode |
Method |
Creates an XmlNode object |
DocumentElement |
Property |
Returns the root XmlNode for this document |
DocumentType |
Property |
-Returns the node containing the DTD declaration for this document, if it has one |
ImportNode |
Method |
Imports a node from another XML document |
Load |
Method |
Loads an XML document into the XmlDocument |
LoadXml |
Method |
Loads the XmlDocument from a string of XML data |
NodeChanged |
Event |
Fires after the value of a node has been changed |
NodeChanging |
Event |
Fires when the value of a node is about to be changed |
NodeInserted |
Event |
Fires when a new node has been inserted |
NodeInserting |
Event |
Fires when a new node is about to be inserted |
NodeRemoved |
Event |
Fires when a node has been removed |
NodeRemoving |
Event |
Fires when a node is about to be removed |
PreserveWhitespace |
Property |
-True if whitespace in the document should be preserved when loading or saving the XML |
Save |
Method |
Saves the XmlDocument as a file or stream |
WriteTo |
Method |
Saves the XmlDocument to an XmlWriter |
The XmlDataDocument Class
The System.Xml namespace also includes the capability to automatically synchronize a DataSet with an equivalent XML file. The XmlDocument class is useful for working with XML via the DOM, but it's not a data-enabled class. To bring the DataSet class into the picture, you need to use an XmlDataDocument class, which inherits from the XmlDocument class. Table 3.4 shows the additional members the XmlDataDocument class adds to the XmlDocument class.
Table 3.4 Additional Members of the XmlDataDocument Class
Member |
Type |
Description |
DataSet |
Property |
Retrieves a DataSet representing the data in the XmlDataDocument |
GetElementFromRow |
Method |
Retrieves an XmlElement representing a specified DataRow |
GetRowFromElement |
Method |
Retrieves a DataRow representing a specified XmlElement |
Load |
Method |
Loads the XmlDataDocument and synchronizes it with a DataSet |
The XmlDataDocument class allows you to exploit the connections between XML documents and DataSets. You can do this by synchronizing the XmlDataDocument (and hence the XML document that it represents) with a particular DataSet. You can start the synchronization process with any of the following objects:
An XmlDataDocument
A full DataSet
A schema-only DataSet
If you have an XML file in an XmlDataDocument object, you can retrieve a DataSet object from its DataSet property. Here's how you might load a DataSet using this technique:
' Create a new XmlTextReader on a file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Create an object to synchronize Dim xdd As XmlDataDocument = New XmlDataDocument() ' Retrieve the associated DataSet Dim ds As DataSet = xdd.DataSet ' Initialize the DataSet by reading the schema ' from the XML document ds.ReadXmlSchema(xtr) ' Reset the XmlTextReader xtr.Close() xtr = New XmlTextReader("Books.xml") ' Tell it to ignore whitespace xtr.WhitespaceHandling = WhitespaceHandling.None ' Load the synchronized object xdd.Load(xtr)
This code performs some extra setup to make sure the DataSet can hold the data from the XmlDataDocument. Even when you're creating the DataSet from the XmlDataDocument, you must still explicitly create the schema of the DataSet before it will contain data. That's because in this technique, you can also use a DataSet that represents only a portion of the XmlDataDocument. In this case, the code takes advantage of the ReadXmlSchema method of the DataSet object to automatically construct a schema that matches the XML document. Because the XmlTextReader object is designed for forward-only use, the code closes and reopens this object after reading the schema so that it can also be used to read the data.
CAUTION
When you use the ReadXmlSchema method of the DataSet object to construct an XML schema for the DataSet, both elements and attributes within the XML document become DataColumn objects in the DataSet.
A second way to end up with a DataSet synchronized to an XmlDataDocument is to start with a DataSet. To use this technique, you simply pass the DataSet (which you have already filled with data) to the XmlDataDocument object's constructor:
' Fill the DataSet SqlDataAdapter1.Fill(DsCustomers, "Customers") ' Retrieve the associated document Dim xdd As XmlDataDocument = _ New XmlDataDocument(DsCustomers)
The third method to synchronize the two objects is to follow a three-step recipe:
Create a new DataSet with the proper schema to match an XML document, but no data.
Create the XmlDataDocument from the DataSet.
Load the XML document into the XmlDataDocument.
One way to manage this is to use an XML schema file. An XML schema file describes the format of an XML file. For example, here's an XML schema description of Books.xml:
<?xml version="1.0" encoding="utf-8" ?> <xs:schema id="Books" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Books"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" /> <xs:element name="Title" type="xs:string" /> </xs:sequence> <xs:attribute name="Pages" type="xs:string" /> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> </xs:schema>
Given this schema file, with a name such as Books.xsd, you can construct the corresponding DataSet by calling the ReadXmlSchema method of a DataSet object and from there create the corresponding XmlDataDocument:
' Create a dataset with the desired schema Dim ds As DataSet = New DataSet() ds.ReadXmlSchema("Books.xsd") ' Create a matching document Dim xd As XmlDataDocument = _ New XmlDataDocument(ds) ' Load the XML xd.Load("Books.xml")
The advantage to using this technique is that you don't have to represent the entire XML document in the DataSet schema; the schema only needs to include the XML elements you want to work with. For example, in this case the DataSet does not contain the Publisher column, even though the XmlDataDocument includes that column (as you can verify by inspecting the information in the ListBox control).
Understanding XPath
To pass the exam, you should also have a basic knowledge of XPath. You can think of XPath as being a query language, conceptually similar to SQL. Just as SQL allows you to select a set of information from a table or group of tables, XPath allows you to select a set of nodes from the DOM representation of an XML document. By writing an appropriate XPath expression, you can select particular elements or attributes within an XML document.
The XPath Language
XPath starts with the notion of current context. The current context defines the set of nodes that will be inspected by an XPath query. In general, there are four choices to specify the current context for an XPath query:
./Uses the current node as the current context
/Uses the root of the XML document as the current context
.//Uses the entire XML hierarchy starting with the current node as the current context
//Uses the entire XML document as the current context
To identify a set of elements using XPath, you use the path down the tree structure to those elements, separating tags by forward slashes. For example, this XPath expression selects all the Author elements in the Books.xml file:
/Books/Book/Author
You can also select all the Author elements without worrying about the full path to get to them by using this expression:
//Author
You can use an asterisk (*) as a wildcard at any level of the tree. So, for example, the following expression selects all the Author nodes that are grandchildren of the Books node:
/Books/*/Author
XPath expressions select a set of elements, not a single element. Of course, the set might only have a single member, or no members at all. In the context of the XmlDocument object, an XPath expression can be used to select a set of XmlNode objects to operate on later.
To identify a set of attributes, you trace the path down the tree to the attributes, just as you do with elements. The only difference is that attribute names must be prefixed with the @ character. For example, this XPath expression selects all the Pages attributes from Book elements in the Books.xml file:
//Book/@Pages
Of course, in the Books.xml file, only Book elements have a Pages attribute. So in this particular context, the following XPath expression is equivalent to the previous one:
//@Pages
You can select multiple attributes with the @* operator. To select all attributes of Book elements anywhere in the XML, use this expression:
//Book/@*
XPath also offers a predicate language to allow you to specify smaller groups of nodes or even individual nodes in the XML tree. You might think of this as a filtering capability similar to a SQL WHERE clause. One thing you can do is specify the exact value of the node you'd like to work with. To find all Publisher nodes with the value "Que," you could use the following XPath expression:
/Books/Book/Publisher[.="Que"]
Here, the dot operator stands for the current node. Alternatively, you can find all books published by Que:
/Books/Book[./Publisher="Que"]
Note that there is no forward slash between an element and a filtering expression in XPath.
Of course, you can filter on attributes as well as elements. You can also use operators and Boolean expressions within filtering specifications. For example, you might want to find books that have 1,000 or more pages:
/Books/Book[./@Pages>=1000]
Because the current node is the default context, you can simplify this expression a little bit:
/Books/Book[@Pages>=1000]
XPath also supports a selection of filtering functions. For example, to find books whose title starts with A, you could use this XPath expression:
/Books/Book[starts-with(Title,"A")]
Table 3.5 lists some additional XPath functions.
Table 3.5 Selected XPath Functions
Function |
Description |
concat |
Concatenates strings |
contains |
Determines whether one string contains another |
count |
Counts the number of nodes in an expression |
last |
The last element in a collection |
normalize-space |
Removes whitespace from a string |
not |
Negates its argument |
number |
Converts its argument to a number |
position |
The ordinal of a node within its parent |
starts-with |
Determines whether one string starts with another |
string-length |
Returns the number of characters in a string |
substring |
Returns a substring from a string |
Square brackets are also used to indicate indexing. Collections are indexed starting at one. To return the first Book node, you'd use this expression:
/Books/Book[1]
To return the first author in the XML file (knowing that authors are children of books in this file), regardless of the book, you'd use this expression:
(/Books/Book/Author)[1]
The parentheses are necessary because the square brackets have a higher operator precedence than the path operators. Without the brackets, the expression would return the first author of every book in the file. There's also the last() function, which you can use to return the last element in a collection, without needing to know how many elements are in the collection:
/Books/Book[last()]
Another useful operator is the vertical bar, which is used to form the union of two sets of nodes. This expression returns all the authors for books published by Addison-Wesley or Que:
/Books/Book[./Publisher="Addison-Wesley"]/Author| /Books/Book[./Publisher="Que"]/Author
One way to see XPath in action is to use the SelectNodes method of the XmlDocument object. Here's how you could load the Books.xml file and select nodes matching a particular XPath expression:
' Load the Books.xml file Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") xtr.WhitespaceHandling = WhitespaceHandling.None Dim xd As XmlDocument = _ New XmlDocument() xd.Load(xtr) ' Retrieve nodes to match the expression Dim xnl As XmlNodeList = _ xd.DocumentElement.SelectNodes("//Books/Book/Title")
The SelectNodes method of the XmlDocument takes an XPath expression and evaluates that expression over the document. The resulting nodes are returned in an XmlNodeList object, which is just a collection of XML nodes.
Using the XPathNavigator Class
You've seen how you can use the XmlReader class to move through an XML document. But the XmlReader allows only forward-only, read-only access to the document. The System.Xml.XPath namespace contains another set of navigation classes. In particular, the XPathNavigator class provides you with read-only, random access to XML documents.
You can perform two distinct tasks with an XPathNavigator object:
Selecting a set of nodes with an XPath expression
Navigating the DOM representation of the XML document
Selecting Nodes with XPath
To use the XPathNavigator class, you should start with an XmlDocument, XmlDataDocument, or XPathDocument object. In particular, if you're mainly interested in XPath operations, you should use the XPathDocument class. The XPathDocument class provides a representation of the structure of an XML document that is optimized for query operations. You can construct an XPathDocument object from a URI (including a local filename), a stream, or a reader containing XML.
The XPathDocument object has a single method of interest, CreateNavigator (you'll also find this method on the XmlDocument and XmlDataDocument objects). This method returns an XPathNavigator object that can perform operations with the XML document represented by the XPathDocument object. Table 3.6 lists the important members of the XPathNavigator object.
Table 3.6 Important Members of the XPathNavigator Class
Member |
Type |
Description |
Clone |
Method |
Creates a duplicate of this object with the current state |
ComparePosition |
Method |
Compares two XPathNavigator objects to determine whether they have the same current node |
Compile |
Method |
Compiles an XPath expression for faster execution |
Evaluate |
Method |
Evaluates an XPath expression |
HasAttributes |
Property |
Indicates whether the current node has any attributes |
HasChildren |
Property |
Indicates whether the current node has any children |
IsEmptyElement |
Property |
Indicates whether the current node is an empty element |
Matches |
Method |
Determines whether the current node matches an XSLT (Extensible Stylesheet Language Transform) pattern |
MoveToFirst |
Method |
Moves to the first sibling of the current node |
MoveToFirstAttribute |
Method |
Moves to the first attribute of the current node |
MoveToFirstChild |
Method |
Moves to the first child of the current node |
MoveToNext |
Method |
Moves to the next sibling of the current node |
MoveToNextAttribute |
Method |
Moves to the next attribute of the current node |
MoveToParent |
Method |
Moves to the parent of the current node |
MoveToPrevious |
Method |
Moves to the previous sibling of the current node |
MoveToRoot |
Method |
Moves to the root node of the DOM |
Name |
Property |
The qualified name of the current node |
Select |
Method |
Selects a set of nodes using an XPath expression |
Value |
Property |
The value of the current node |
CAUTION
Unlike the XmlReader class, the XPathNavigator class implements methods such as MovePrevious and MoveParent that can move backward in the DOM. The XPathNavigator class provides random access to the entire XML document.
Like the XmlReader class, the XPathNavigator class maintains a pointer to a current node in the DOM at all times. But the XPathNavigator brings additional capabilities to working with the DOM. For example, you can use this class to execute an XPath query by calling its Select method:
' Load the Books.xml file Dim xpd As XPathDocument = _ New XPathDocument("Books.xml") ' Get the associated navigator Dim xpn As XPathNavigator = _ xpd.CreateNavigator() ' Retrieve nodes to match the expression Dim xpni As XPathNodeIterator = _ xpn.Select("//Books/Book/Title") ' And dump the results lbNodes.Items.Clear() While xpni.MoveNext lbNodes.Items.Add( _ xpni.Current.NodeType.ToString _ & ": " & xpni.Current.Name & " = " & _ xpni.Current.Value) End While
The Select method of the XPathNavigator class returns an XPathNodeIterator object, which lets you visit each member of the selected set of nodes in turn.
Navigating Nodes with XPath
You can also use the XPathNavigator object to move around in the DOM. To see how this works, try following these steps:
-
Open a Visual Basic .NET Windows Application project and add a new form to the project.
-
Add four Button controls (btnParent, btnPrevious, btnNext, and btnChild) and a ListBox control named lbNodes to the form.
-
Double-click the Button control to open the form's module. Add this line of code at the top of the module:
-
Add code to load an XML document when you load the form:
-
Add code to handle events from the Button controls:
-
Set the form as the startup form for the project.
-
Run the project and then experiment with the buttons. You'll find that you can move around in the DOM, as shown in Figure 3.2.
Figure
3.2 Exploring an XML document with the XPathNavigator.
Imports System.Xml.XPath
Dim xpd As XPathDocument Dim xpn As XPathNavigator Private Sub StepByStep2_9_Load( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles MyBase.Load ' Load the Books.xml file xpd = New XPathDocument("Books.xml") ' Get the associated navigator xpn = xpd.CreateNavigator() xpn.MoveToRoot() ListNode() End Sub Private Sub ListNode() ' Dump the current node to the listbox lbNodes.Items.Add( _ xpn.NodeType.ToString _ & ": " & xpn.Name & " = " & _ xpn.Value) End Sub
Private Sub btnParent_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnParent.Click ' Move to the parent of the current node If xpn.MoveToParent() Then ListNode() Else lbNodes.Items.Add("No parent node") End If End Sub Private Sub btnPrevious_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles btnPrevious.Click ' Move to the previous sibling of the current node If xpn.MoveToPrevious() Then ListNode() Else lbNodes.Items.Add("No previous node") End If End Sub Private Sub btnNext_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnNext.Click ' Move to the next sibling of the current node If xpn.MoveToNext() Then ListNode() Else lbNodes.Items.Add("No next node") End If End Sub Private Sub btnChild_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnChild.Click ' Move to the first child of the current node If xpn.MoveToFirstChild() Then ListNode() Else lbNodes.Items.Add("No child node") End If End Sub
The XPathNavigator class does not throw an error if you try to move to a nonexistent node. Instead, it returns False from the method call, and the current node remains unchanged.
Generating and Using XSD Schemas
In Chapter 2, "Creating and Manipulating DataSets," you learned how to create an XSD schema in the Visual Studio .NET user interface by dragging and dropping XML elements from the Toolbox. This method is useful when you need to create a schema from scratch. But there will be times when you want to create a schema to match an existing object. In this section, you'll learn about the methods that are available to programmatically generate XSD schemas.
Generating an XSD Schema
One obvious source for an XML schema is an XML file. An XML file can contain explicit schema information (in the form of an embedded schema), or it can contain implicit schema information in its structure. Here's a sample file, Products.xml, that contains embedded schema information:
<?xml version="1.0" encoding="UTF-8"?> <root xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:od="urn:schemas-microsoft-com:officedata"> <xsd:schema> <xsd:element name="dataroot"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="Products"/> </xsd:choice> </xsd:complexType> </xsd:element> <xsd:element name="Products"> <xsd:annotation> <xsd:appinfo/> </xsd:annotation> <xsd:complexType> <xsd:sequence> <xsd:element name="ProductID" od:jetType="autonumber" od:sqlSType="int" od:autoUnique="yes" od:nonNullable="yes"> <xsd:simpleType> <xsd:restriction base="xsd:integer"/> </xsd:simpleType> </xsd:element> <xsd:element name="ProductName" minOccurs="0" od:jetType="text" od:sqlSType="nvarchar"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:maxLength value="40"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> <dataroot xmlns:xsi= "http://www.w3.org/2000/10/XMLSchema-instance"> <Products> <ProductID>1</ProductID> <ProductName>Chai</ProductName> </Products> </dataroot> </root>
If the file contains explicit schema information, you can use the DataSet object to read that information and create the corresponding schema as a separate file, as shown here:
' Load the XML file with inline schema info Dim xtr As XmlTextReader = _ New XmlTextReader("Products.xml") ' Read the schema (only) into a DataSet Dim ds As DataSet = New DataSet() ds.ReadXmlSchema(xtr) ' Write the schema out as a separate stream Dim sw As StringWriter = New StringWriter() ds.WriteXmlSchema(sw) txtSchema.Text = sw.ToString()
The DataSet object must have the capability to read an XML schema so that it can construct a matching data structure in memory. The .NET Framework designers thoughtfully exposed this capability to you through the ReadXmlSchema and WriteXmlSchema methods of the DataSet object. But what if the file does not contain explicit schema information? It turns out that you can still use the DataSet object because this object also has the capability to infer an XML schema based on the data in an XML file. For example, here's how to use the DataSet object to infer a schema for the Books.xml file:
' Load an XML file with no schema information Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Read the schema (only) into a DataSet Dim ds As DataSet = New DataSet() Dim ns As String() ds.InferXmlSchema(xtr, ns) ' Write the schema out as a separate stream Dim sw As StringWriter = New StringWriter() ds.WriteXmlSchema(sw) txtSchema.Text = sw.ToString()
You have at least four ways to obtain XSD files for your applications:
You can use a file generated by an external application such as Microsoft SQL Server or Microsoft Access.
You can create your own schema files from scratch using the techniques you learned in Chapter 2.
You can extract inline schema information from an XML file using the DataSet.ReadXmlSchema method.
You can infer schema information from an XML file using the DataSet.InferXmlSchema method.
Using an XSD Schema
The prime use of a schema file is to validate the corresponding XML file. Although any XML file that conforms to the syntactical rules for XML is well-formed, this does not automatically make the file valid. A valid XML file is one whose structure conforms to a specification. This specification can be in the form of an XML schema or a Document Type Definition (DTD), for example. Any valid XML file is well-formed, but not every well-formed XML file is valid. The .NET Framework provides good support for validating XML files.
To validate an XML document, you can use the XmlValidatingReader class. This class provides an additional layer between the XmlReader and the XmlDocument. The extra layer validates the document as it is read in to the XmlDocument object. To use the XmlValidatingReader object to validate an XML document with an inline schema, you should supply a handler for any validation errors, as in this code sample:
Private Sub ValidateIt() ' Load a document with an inline schema Dim xtr As XmlTextReader = _ New XmlTextReader("Products.xml") ' Prepare to validate it Dim xvr As XmlValidatingReader = _ New XmlValidatingReader(xtr) xvr.ValidationType = ValidationType.Schema ' Tell the validator what to do with errors AddHandler xvr.ValidationEventHandler, _ AddressOf ValidationHandler ' Load the document, thus validating Dim xd As XmlDocument = _ New XmlDocument() xd.Load(xvr) End Sub Private Sub ValidationHandler( _ ByVal sender As Object, _ ByVal e As ValidationEventArgs) ' Dump any validation errors to the UI MessageBox.Show(e.Message) End Sub
An inline schema cannot contain an entry for the root element of the document, so even when the document is otherwise valid, you'll get an error from that node. The XmlValidatingReader class is constructed so that it does not stop on validation errors. Rather, it continues processing the file but raises an event for each error. This enables your code to decide how to handle errors while still filling the XmlDocument object.
You can also validate an XML file against an external schema. To do this, you can load the schema and the XML file separately and tell the XmlValidatingReader class to compare one to the other:
Private Sub ValidateIt() ' Load a document with an external schema Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Prepare to validate it Dim xvr As XmlValidatingReader = _ New XmlValidatingReader(xtr) xvr.ValidationType = ValidationType.Schema ' Tell the validator what to do with errors AddHandler xvr.ValidationEventHandler, _ AddressOf ValidationHandler ' Load the schema Dim xsc As XmlSchemaCollection = _ New XmlSchemaCollection() xsc.Add("xsdBooks", "Books.xsd") ' Tell the validator which schema to use xvr.Schemas.Add(xsc) ' Load the document, thus validating Dim xd As XmlDocument = _ New XmlDocument() xd.Load(xvr) End Sub Private Sub ValidationHandler( _ ByVal sender As Object, _ ByVal e As ValidationEventArgs) ' Dump any validation errors to the UI MessageBox.Show(e.Message) End Sub
Schema files are not the only way to describe the structure of an XML file. An older standard for specifying structure is the Document Type Definition, or DTD. DTDs are part of the Standard Generalized Markup Language (SGML) standard, from which both HTML and XML derive. A DTD file lists the elements that may appear in an XML file, as in this example (Books.dtd):
<!ELEMENT Books (Book)* > <!ELEMENT Book (Author, Title, Publisher) > <!ATTLIST Book Pages CDATA #REQUIRED> <!ELEMENT Author (#PCDATA)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
To use a DTD file as the schema for an XML file, you include a DOCTYPE node in the XML file:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE Books SYSTEM "books.dtd"> <Books> <Book Pages="1046"> <Author>Delaney, Kalen</Author> <Title>Inside Microsoft SQL Server 2000</Title> <Publisher>Microsoft Press</Publisher> </Book> <Book Pages="1000"> <Author>Gunderloy. Michael</Author> <Title>ADO and ADO.NET Programming</Title> <Publisher>Sybex</Publisher> </Book> <Book Pages="484"> <Author>Cooper, James W.</Author> <Title>Visual Basic Design Patterns</Title> <Publisher>Addison Wesley</Publisher> </Book> </Books>
The XmlValidatingReader class can validate an XML document for conformance with a DTD, as in the following sample code:
Private Sub ValidateIt() ' Load a document with an external schema Dim xtr As XmlTextReader = _ New XmlTextReader("Books.xml") ' Prepare to validate it Dim xvr As XmlValidatingReader = _ New XmlValidatingReader(xtr) xvr.ValidationType = ValidationType.DTD ' Tell the validator what to do with errors AddHandler xvr.ValidationEventHandler, _ AddressOf ValidationHandler ' Load the document, thus validating Dim xd As XmlDocument = _ New XmlDocument() xd.Load(xvr) End Sub Private Sub ValidationHandler( _ ByVal sender As Object, _ ByVal e As ValidationEventArgs) ' Dump any validation errors to the UI MessageBox.Show(e.Message) End Sub
The only difference between validating against a schema file and validating against a DTD is in the constant chosen for the ValidationType property of the XmlValidatingReader.
Using XML with SQL Server
Over the past several releases, Microsoft SQL Server has become increasingly integrated with XML. In the current release, you can generate XML with SQL statements, using Microsoft T-SQL extensions to the SQL standard query language. You can also update SQL Server tables by sending properly formed XML messages, called DiffGrams, to a SQL Server database. In this section, you learn the basics of interacting with SQL Server via XML.
Generating XML with SQL Statements
SQL Server allows you to retrieve the results of any query as XML rather than as a SQL resultset. To do this, you use the Microsoft-specific FOR XML clause. You can use a variety of options in the FOR XML clause to customize the XML that SQL Server generates.
The first option is FOR XML RAW. When you use raw mode with FOR XML, SQL Server returns one element (always named row) for each row of the resultset, with the individual columns represented as attributes. For example, consider this query:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Country = 'Brazil' AND OrderDate BETWEEN '1997-03-15' AND '1997-04-15' FOR XML RAW
If you execute this query (for example, using SQL Query Analyzer) in the Northwind sample database, you'll get back these results:
<row CustomerID="RICAR" CompanyName="Ricardo Adocicados" OrderID="10481" OrderDate="1997-03-20T00:00:00"/> <row CustomerID="QUEEN" CompanyName="Queen Cozinha" OrderID="10487" OrderDate="1997-03-26T00:00:00"/> <row CustomerID="COMMI" CompanyName="Comércio Mineiro" OrderID="10494" OrderDate="1997-04-02T00:00:00"/> <row CustomerID="TRADH" CompanyName="Tradiçaõ Hipermercados" OrderID="10496" OrderDate="1997-04-04T00:00:00"/>
NOTE
SQL Query Analyzer returns XML results as one long string. I've reformatted these results for easier display on the printed page. If you have trouble seeing all the results in SQL Query Analyzer, select Tools, Options, Results and increase the Maximum Characters Per Column setting.
The second variant of the FOR XML clause is FOR XML AUTO. When you use auto mode with FOR XML, nested tables in the returned data are represented as nested elements in the XML. Columns are still represented as attributes. For example, here's a query that uses FOR XML AUTO:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Country = 'Brazil' AND OrderDate BETWEEN '1997-03-15' AND '1997-04-15' FOR XML AUTO
Here's the corresponding returned data:
<Customers CustomerID="RICAR" CompanyName="Ricardo Adocicados"> <Orders OrderID="10481" OrderDate="1997-03-20T00:00:00"/> </Customers> <Customers CustomerID="QUEEN" CompanyName="Queen Cozinha"> <Orders OrderID="10487" OrderDate="1997-03-26T00:00:00"/> </Customers> <Customers CustomerID="COMMI" CompanyName="Comércio Mineiro"> <Orders OrderID="10494" OrderDate="1997-04-02T00:00:00"/> </Customers> <Customers CustomerID="TRADH" CompanyName="Tradiçaõ Hipermercados"> <Orders OrderID="10496" OrderDate="1997-04-04T00:00:00"/> </Customers>
Note that in this output, the Orders element is nested within the Customers element for each order. If there were multiple orders for a single customer, the Orders element would repeat as many times as necessary.
There's a second variant of FOR XML AUTO. You can include the ELEMENTS option to represent columns as elements rather than as attributes. Here's a query that uses this option:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Country = 'Brazil' AND OrderDate BETWEEN '1997-03-15' AND '1997-04-15' FOR XML AUTO, ELEMENTS
Here's the corresponding output:
<Customers> <CustomerID>RICAR</CustomerID> <CompanyName>Ricardo Adocicados</CompanyName> <Orders> <OrderID>10481</OrderID> <OrderDate>1997-03-20T00:00:00</OrderDate> </Orders> </Customers> <Customers> <CustomerID>QUEEN</CustomerID> <CompanyName>Queen Cozinha</CompanyName> <Orders> <OrderID>10487</OrderID> <OrderDate>1997-03-26T00:00:00</OrderDate> </Orders> </Customers> <Customers> <CustomerID>COMMI</CustomerID> <CompanyName>Comércio Mineiro</CompanyName> <Orders> <OrderID>10494</OrderID> <OrderDate>1997-04-02T00:00:00</OrderDate> </Orders> </Customers> <Customers> <CustomerID>TRADH</CustomerID> <CompanyName>Tradiçaõ Hipermercados</CompanyName> <Orders> <OrderID>10496</OrderID> <OrderDate>1997-04-04T00:00:00</OrderDate> </Orders> </Customers>
The final variant of FOR XML is FOR XML EXPLICIT. In explicit mode, you must construct your query so as to create a resultset with the first column named Tag and the second column named Parent. These columns create a self-join in the results that is used to determine the hierarchy of the created XML file. Here's a relatively simple query in explicit mode:
SELECT 1 AS Tag, NULL AS Parent, Customers.CustomerID AS [Customer!1!CustomerID], Customers.CompanyName AS [Customer!1!CompanyName], NULL AS [Order!2!OrderID], NULL AS [Order!2!OrderDate] FROM Customers WHERE COUNTRY = 'Brazil' UNION ALL SELECT 2, 1, Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Country = 'Brazil' AND OrderDate BETWEEN '1997-03-15' AND '1997-04-15' ORDER BY [Customer!1!CustomerID], [Order!2!OrderID] FOR XML EXPLICIT
The resulting XML from this query is as follows:
<Customer CustomerID="COMMI" CompanyName="Comércio Mineiro"> <Order OrderID="10494" OrderDate="1997-04-02T00:00:00"/> </Customer> <Customer CustomerID="FAMIA" CompanyName="Familia Arquibaldo"/> <Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"/> <Customer CustomerID="HANAR" CompanyName="Hanari Carnes"/> <Customer CustomerID="QUEDE" CompanyName="Que Delícia"/> <Customer CustomerID="QUEEN" CompanyName="Queen Cozinha"> <Order OrderID="10487" OrderDate="1997-03-26T00:00:00"/> </Customer> <Customer CustomerID="RICAR" CompanyName="Ricardo Adocicados"> <Order OrderID="10481" OrderDate="1997-03-20T00:00:00"/> </Customer> <Customer CustomerID="TRADH" CompanyName="Tradiçaõ Hipermercados"> <Order OrderID="10496" OrderDate="1997-04-04T00:00:00"/> </Customer><Customer CustomerID="WELLI" CompanyName="Wellington Importadora"/>
Note that in this case, even customers without orders in the specified time period are included because the first half of the query retrieves all customers from Brazil. Explicit mode allows you the finest control over the generated XML, but it's also the most complex mode to use in practice. You should stick to raw or auto mode whenever possible.
Finally, you can generate schema information as part of a SQL Server query by including the XMLDATA option in the query. You can do this in any of the FOR XML modes. For example, here's a query you saw earlier in this section with the XMLDATA option added:
SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Country = 'Brazil' AND OrderDate BETWEEN '1997-03-15' AND '1997-04-15' FOR XML AUTO, ELEMENTS, XMLDATA
The resulting XML is as follows:
<Schema name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <ElementType name="Customers" content="eltOnly" model="closed" order="many"> <element type="Orders" maxOccurs="*"/> <element type="CustomerID"/> <element type="CompanyName"/> </ElementType> <ElementType name="CustomerID" content="textOnly" model="closed" dt:type="string"/> <ElementType name="CompanyName" content="textOnly" model="closed" dt:type="string"/> <ElementType name="Orders" content="eltOnly" model="closed" order="many"> <element type="OrderID"/> <element type="OrderDate"/> </ElementType> <ElementType name="OrderID" content="textOnly" model="closed" dt:type="i4"/> <ElementType name="OrderDate" content="textOnly" \ model="closed" dt:type="dateTime"/>\ </Schema> <Customers xmlns="x-schema:#Schema1"> <CustomerID>RICAR</CustomerID> <CompanyName>Ricardo Adocicados</CompanyName> <Orders> <OrderID>10481</OrderID> <OrderDate>1997-03-20T00:00:00</OrderDate> </Orders> </Customers> <Customers xmlns="x-schema:#Schema1"> <CustomerID>QUEEN</CustomerID> <CompanyName>Queen Cozinha</CompanyName> <Orders> <OrderID>10487</OrderID> <OrderDate>1997-03-26T00:00:00</OrderDate> </Orders> </Customers> <Customers xmlns="x-schema:#Schema1"> <CustomerID>COMMI</CustomerID> <CompanyName>Comércio Mineiro</CompanyName> <Orders> <OrderID>10494</OrderID> <OrderDate>1997-04-02T00:00:00</OrderDate> </Orders> </Customers> <Customers xmlns="x-schema:#Schema1"> <CustomerID>TRADH</CustomerID> <CompanyName>Tradiçaõ Hipermercados</CompanyName> <Orders> <OrderID>10496</OrderID> <OrderDate>1997-04-04T00:00:00</OrderDate> </Orders> </Customers>
Using ExecuteXmlReader
ADO.NET provides a means to integrate SQL Server's XML capabilities with the .NET Framework classes. The ExecuteXmlReader method of the SqlCommand object allows you to retrieve an XmlReader directly from a SQL statement, provided that the SQL statement uses the FOR XML clause. Here's an example:
Dim cmd As SqlCommand = _ SqlConnection1.CreateCommand ' Create a command to retrieve XML cmd.CommandType = CommandType.Text cmd.CommandText = _ "SELECT Customers.CustomerID, " & _ "Customers.CompanyName," & _ "Orders.OrderID, Orders.OrderDate " & _ "FROM Customers INNER JOIN Orders " & _ "ON Customers.CustomerID = " & _ "Orders.CustomerID " & _ "WHERE Country = 'Brazil' AND " & _ "OrderDate BETWEEN '1997-03-15' " & _ "AND '1997-04-15' " & _ "FOR XML AUTO, ELEMENTS" SqlConnection1.Open() ' Read the XML into an XmlReader Dim xr As XmlReader = _ cmd.ExecuteXmlReader() ' Dump the contents of the reader Dim strNode As String Dim intI As Integer Do While xr.Read ' Do something with the nodes here Loop xr.Close() SqlConnection1.Close()
Updating SQL Server Data by Using XML
You can also update SQL Server data by using special XML messages called DiffGrams. You can think of a DiffGram as a before-and-after snapshot of a part of a SQL Server table. The .NET Framework uses DiffGrams internally as a means of serializing changes in a DataSet. For example, if you pass the changes in a DataSet from one tier to another, the .NET Framework will use a DiffGram to send the changes.
You can also use DiffGrams yourself to update data in a SQL Server. However, before you can do so, you'll need to install the SQLXML managed classes, an interface between SQL Server and .NET.
Installing SQLXML
Although SQL Server 2000 includes some XML support (for example, the FOR XML syntax is built in to the product), there have been many advances in XML since that version of SQL Server was released. Microsoft has kept SQL Server in tune with these advances by issuing a series of free upgrade packages with the general name of SQLXML. As of this writing, the current release of SQLXML is SQLXML 3.0 SP1.
To install SQLXML, you need to download the current release directly from Microsoft's Web site. You can always find the current release by starting at the SQLXML home page, http://msdn.microsoft.com/sqlxml.
Using DiffGrams
After you've installed SQLXML, you can use the SqlXmlCommand object to execute a DiffGram. A DiffGram is an XML file that includes changes to a DataSet. Here's a small example:
<?xml version="1.0" standalone="yes"?> <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr= "urn:schemas-microsoft-com:xml-diffgram-v1"> <NewDataSet> <Customers diffgr:id="Customers1" msdata:rowOrder="0" diffgr:hasChanges="modified"> <CustomerID>ALFKI</CustomerID> <ContactName>Maria Anderson</ContactName> </Customers> </NewDataSet> <diffgr:before> <Customers diffgr:id="Customers1" msdata:rowOrder="0"> <CustomerID>ALFKI</CustomerID> <ContactName>Maria Anders</ContactName> </Customers> </diffgr:before> </diffgr:diffgram>
In this case, the first part of the XML file lists a row in the Customers table and indicates that it has been modified. The second part of the DiffGram contains the original data from the SQL Server table. SQL Server can use this data to find the row to be modified.
In addition to the DiffGram, you'll also need a schema file that maps the element names in the DiffGram back to tables and columns in the SQL Server database. The sql:relation attribute in the schema file indicates the table mapping, whereas the sql:field attributes indicate the field mappings.
<xsd:schema xmlns: xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql="urn:schemas-microsoft-com:mapping-schema"> <xsd:element name="Customers" sql:relation="Customers" > <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerID" sql:field="CustomerID" type="xsd:string" /> <xsd:element name="ContactName" sql:field="ContactName" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
Given these two pieces, you can use SQLXML to apply the DiffGram to a database with code such as this:
Private Sub ApplyDiffGram() ' Connect to the SQL Server database Dim sxc As SqlXmlCommand = _ New SqlXmlCommand("Provider=SQLOLEDB;" & _ "Server=(local);database=Northwind;" & _ "Integrated Security=SSPI") ' Set up the DiffGram sxc.CommandType = SqlXmlCommandType.DiffGram sxc.SchemaPath = "diffgram.xsd" sxc.CommandStream = _ New FileStream("diffgram.xml", FileMode.Open) ' And execute it sxc.ExecuteNonQuery() MessageBox.Show("Database was updated!") End Sub
DiffGrams can insert or delete data as well as modify data. For an insertion, the DiffGram will contain the data for the new row and no old data. For a deletion, the DiffGram will contain the row to be deleted but no new row.
Exam Prep Questions
Question 1
Your SQL Server database contains employee timesheet information. You want to retrieve some of this information into an XML document. You decide to use the FOR XML T-SQL statement to retrieve the information. Your application requires the EmployeeName and EmployeeNumber database columns to be presented as XML elements. Which clause can you use in your SQL statement?
-
FOR XML AUTO
-
FOR XML RAW
-
FOR XML EXPLICIT
-
FOR XML AUTO, XMLDATA
Answer C is correct. FOR XML EXPLICIT maps columns to elements. Answers A and D are incorrect because the AUTO mode of the FOR XML clause maps columns to attributes rather than elements, unless you include the ELEMENTS modifier. Answer B is incorrect because the RAW mode of the FOR XML clause also maps columns to attributes. The EXPLICIT mode can map columns to elements.
Question 2
You have loaded an XML file that represents orders from a trading partner and are using an XPathNavigator class to navigate through the file. The current node of the XPathNavigator is an <Order> node that is emptythat is, this node does not contain any child nodes. Your code calls the MoveFirstChild method of the XPathNavigator object. What is the result?
-
No error occurs. The <Order> node remains the current node.
-
No error occurs. The root node of the document becomes the current node.
-
A runtime error is thrown. The root node of the document becomes the current node.
-
A runtime error is thrown. The <Order> node remains the current node.
Answer A is correct. The XPathNavigator ignores attempts to move to a nonexistent node. Answers B and C are incorrect because the current node remains unchanged when you try to move to a nonexistent node. Answers C and D are incorrect because no error is thrown when you try to move to a nonexistent node.
Question 3
Which of the following operations requires you to have an XML schema file?
-
Updating a SQL Server database by sending a DiffGram through the SQLXML managed classes
-
Writing an XML file to disk with the XmlTextWriter class
-
Validating an XML file with the XmlValidatingReader class
-
Performing an XPath query with the XPathNavigator class
Answer A is correct. The SQLXML managed classes require a schema file to match the DiffGram information to columns in the database. Answer B is incorrect because writing an XML file does not require a schema. Answer C is incorrect because you can validate an XML file against non-schema information such as a DTD. Answer D is incorrect because the XPathNavigator class does not use schema information to perform queries.
Question 4
You have an XML file with the following structure:
<?xml version="1.0" encoding="utf-8" ?> <Orders> <Order> <Product>...</Product> <Product>...</Product> ... <Order> <Product>...</Product> <Product>...</Product> ... ... </Orders>
You want to retrieve all the <Product> nodes from this file with the best performance. How should you proceed?
-
Read the XML file into an XmlDataDocument object. Retrieve a DataSet object from the DataSet property of the XmlDataDocument object. Create a DataView object within the DataSet object to return the required nodes.
-
Read the XML file into an XPathDocument object. Use the CreateNavigator method of the XPathDocument object to return an XPathNavigator object. Use an XPath expression with the Select method of the XPathNavigator object to return the required nodes.
-
Read the XML file into an XmlDocument object. Use the CreateNavigator method of the XmlDocument object to return an XPathNavigator object. Use an XPath expression with the Select method of the XPathNavigator object to return the required nodes.
-
Read the XML file into an XPathDocument object. Use the CreateNavigator method of the XPathDocument object to return an XPathNavigator object. Use the Move methods of the XPathNavigator object to move through the document, accumulating the required nodes as you go.
Answer B is correct. The XPathDocument class is optimized to perform XPath operations quickly. Answer A is incorrect because the DataSet object adds unnecessary overhead in this scenario. Answer C is incorrect because the XmlDocument object is not designed to support fast XPath queries. Answer D is incorrect because querying for the required nodes is faster than visiting all the nodes.
Question 5
Your application contains an XML file, Employees.xml, with the following content:
<?xml version="1.0" encoding="utf-8" ?> <Employees> <Employee EmployeeID="4"> <EmployeeName>John Farrel</EmployeeName> </Employee> <Employee EmployeeID="7"> <EmployeeName>Melanie Runnion</EmployeeName> </Employee> </Employees>
Your application also contains a form with a Button control named btnProcess and a ListBox control named lbNodes. The event handler for the Button control has this code:
Private Sub btnProcess_Click( _ ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles btnProcess.Click Dim xtr As XmlTextReader = _ New XmlTextReader("Employees.xml") Do While xtr.Read If (xtr.NodeType = _ XmlNodeType.Attribute) _ Or (xtr.NodeType = _ XmlNodeType.Element) _ Or (xtr.NodeType = _ XmlNodeType.Text) Then If xtr.HasValue Then lbnodes.Items.Add( _ xtr.Value) End If End If Loop xtr.Close() End Sub
What will be the contents of the ListBox control after you click the Button control?
-
4 7
-
4 John Farrel 7 Melanie Runnion
-
John Farrel Melanie Runnion
-
Employees Employee 4 EmployeeName John Farrel Employee 7 EmployeeName Melanie Runnion
Answer C is correct. The Value property refers only to the actual data within XML elements. Answers A and B are incorrect because XML attributes are not treated as nodes within the DOM. Answer D is incorrect because it includes element names, which are not treated as values.
Question 6
Your application loads an XML file selected by the user and then allows him to move through the XML file. You need to allow the user to move up to the parent node, over to the next node, or down to the first child node, by selecting choices on the user interface. Which object can you use to implement this requirement?
-
XPathNavigator
-
XmlReader
-
XPathExpression
-
XmlDataDocument
Answer A is correct. The XPathNavigator class provides random navigation through an XML document. Answer B is incorrect because the XmlReader class provides forward-only access to the XML document. Answer C is incorrect because the XPathExpression class represents a single XPath expression, not an XML document. Answer D is incorrect because the XmlDataDocument class is used to synchronize an XML document and a DataSet but does not provide random access to the XML document.
Question 7
You load an XML file of employee information into an XmlDataDocument object and then retrieve the DataSet from that object. The name of the first employee as stored in the XML file is "James Doe." In the DataSet, you change the value of the EmployeeName column of the first employee to "John Doe." After that, in the XmlDataDocument object, you change the value of the corresponding Node to "James Doe Jr." Finally, you call the AcceptChanges method of the DataSet object. What is the value of the first EmployeeName in the XmlDataDocument after taking these actions?
-
James Doe.
-
John Doe.
-
James Doe Jr.
-
The value is undefined because an error occurs.
Answer C is correct. The XmlDataDocument object and its corresponding DataSet object present two views of the exact same data, so changes to either one affect the other. Answer A is incorrect because the original value has been changed twice. Answer B is incorrect because the DataSet change gets overwritten by the XmlDataDocument change. Answer D is incorrect because you can make as many changes as you like without throwing an error.
Question 8
One of your company's suppliers has delivered a new parts list to you in the form of an XML file without an inline schema. You'd like to create an XML schema file that corresponds to this XML file. How can you do this with the least effort?
-
Import the XML file to a SQL Server database and then use drag and drop from the Server Explorer to the schema designer within Visual Studio .NET.
-
Use the schema designer within Visual Studio .NET to create the schema file by dragging and dropping elements, attributes, keys, and relations.
-
Use the DataSet.ReadXmlSchema method to create the schema information from the XML file.
-
Use the DataSet.InferXmlSchema method to create the schema information from the XML file.
Answer D is correct. The InferXmlSchema method can create a schema file based on the elements and attributes within an XML file. Answers A and B are incorrect because building a schema for an existing file within Visual Studio .NET is more work than letting the DataSet class do it for you. Answer C is incorrect because the ReadXmlSchema method requires embedded schema information.
Question 9
Which of these operations can you perform by sending a DiffGram to SQL Server? [Select all correct answers.]
-
Add new data to a table
-
Delete existing data from a table
-
Execute a stored procedure
-
Update the index on a table
Answers A and B are correct. The DiffGram format is designed to carry information on data changes. You can add new data, delete existing data, or modify existing data with a DiffGram. Answers C and D are incorrect because the DiffGram format doesn't carry any information on stored procedures or indexes.
Need to Know More?
Griffin, John. XML and SQL Server. New Riders, 2002.
Gunderloy, Mike. ADO and ADO.NET Programming. Sybex, 2002.
Simpson, John E. XPath and XPointer. O'Reilly, 2002.
SQL Server Books Online, installed as part of SQL Server 2000.
XML.com: http://www.xml.com/.
XMLFiles.com DTD Tutorial: http://www.xmlfiles.com/dtd/.