• Aucun résultat trouvé

Working with DTDs

Chapter 7. XML Validation and Dialects

7.1 Working with DTDs

Schemas and validation play a major role in reliable application communication.

Developing a firm understanding of how to express document relationships within a schema is crucial to using them effectively. In this chapter, we concentrate on DTDs, but the concepts presented here apply to all schema languages. See the discussion of alternate schema languages in Chapter 2 for pointers to Python modules that support schema languages other than the DTD language defined as part of XML 1.0.

The DTD is represented in the internal DTD subset, the external DTD subset, or the combination of the two. As the name suggests, the internal subset rides along with the XML document instance, whereas the external subset is stored as a link telling the parser where to find the DTD.

The xmlproc package is a validating parser for Python. As of this writing, it is the only validating parser available for Python that is also implemented in Python. If you have the PyXML package installed, as we assume throughout this book, you already have xmlproc available and may already use it in your programs. The xmlproc package can be imported from the xml.parsers package:

>>> from xml.parsers import xmlproc

7.1.1 Validating with the Internal DTD Subset

There is a good chance that if you have been working with XML for a while, you are able to easily pick up the basic syntax of DTDs just by seeing a few examples. The xmlproc package features a command-line routine called xvcmd.py. This simple utility tests documents for validity against their DTDs. You can use xvcmd.py to try out a few simple DTDs, both external and internal. Be sure that you have xvcmd.py in your path (typically located beneath your PyXML installation directory in xmldoc/demo/xmlproc/xvcmd.py).

Here is a small XML document called product.xml (Example 7-1), which shows an internal DTD subset. For illustration purposes, the document doesn't faithfully

implement the DTD. You may not notice this just by glancing at the code; therefore it's good that we have xvcmd.py handy to actually test for validation.

Example 7-1. product.xml with a bad product element

<?xml version="1.0"?>

<!DOCTYPE product [

<!ELEMENT name (#PCDATA)>

<!ELEMENT price (#PCDATA)>

<!ELEMENT product (name, price)>

]>

<product>

<name>Bean Crusher</name>

</product>

Try out xvcmd.py (the validator) from your command line:

C:\>python c:\python20\xmldoc\demo\xmlproc\xvcmd.py product.xml xmlproc version 0.70

Parsing 'product.xml'

E:product.xml:9:11: Element 'product' ended, but not finished

Parse complete, 1 error(s) and 0 warning(s)

As suspected, an error occurs. The problem is that in the DTD, we explicitly stated the content model for a product element. We stated that it must contain exactly one name element and one price element:

<!ELEMENT product (name, price)>

Furthermore, the DTD instructs that each of those elements (price and name) must contain only character data as shown in the following element declarations:

<!ELEMENT name (#PCDATA)>

<!ELEMENT price (#PCDATA)>

We can correct the problem in your XML, as we show in Example 7-1. The product element needs a price element inside of it, and this price element can only have

character data. Let's change the document products.xml in Example 7-1 to the following, by adding a price element:

<?xml version="1.0"?>

<!DOCTYPE product [

<!ELEMENT name (#PCDATA)>

<!ELEMENT price (#PCDATA)>

<!ELEMENT product (name, price)>

]>

<product>

<name>Bean Crusher</name>

<price>3.95</price>

</product>

Now, return to the command line to try out the xvcmd.py validator once again:

C:\>python c:\python20\xmldoc\demo\xmlproc\xvcmd.py product2.xml xmlproc version 0.70

Parsing 'product.xml'

Parse complete, 0 error(s) and 0 warning(s)

This time Example 7-1 works just fine, because the XML instance document is now in compliance with the DTD. The DTD places strict control over the content model of basic XML constructs (elements, attributes, and character data) allowed with any given XML document.

7.1.2 Validating with an External DTD Subset

We've looked at an internal DTD subset. Now let's explore an external DTD subset.

Typically, when dealing with a DTD that is applied to many document instances, the DTD is stored externally. By keeping the DTD external, you can maintain one DTD that can be applied to many documents. If you store your DTD within the document, each document instance needs its own copy. With a large collection of instance documents, reliably maintaining an internal DTD is problematic. An external DTD is sometimes a better idea in these cases. Import the DTD into the document, as shown in Example 7-2.

Example 7-2. order.xml with an external DTD

<?xml version="1.0"?>

<!DOCTYPE order SYSTEM "order.dtd">

<order>

<customer_name>eDonkey Enterprises</customer_name>

<sku>343-3940938</sku>

<qty>4</qty>

<unit_price>39.95</unit_price>

<product_name>eDonkey Feed Bags</product_name>

</order>

Note that there is no internal DTD subset. The file order.dtd contains the Document Type. The order.dtd file is shown in Example 7-3:

Example 7-3. order.dtd

<!ELEMENT customer_name (#PCDATA)>

<!ELEMENT sku (#PCDATA)>

<!ELEMENT qty (#PCDATA)>

<!ELEMENT unit_price (#PCDATA)>

<!ELEMENT product_name (#PCDATA)>

<!ELEMENT order (customer_name, sku,

qty,

unit_price, product_name)>

While the exact syntax of element type declarations is covered in the next section, here it's relevant to explain the general composition of the DTD. In Example 7-3, five XML elements are created, each with a character data content model. A sixth element is created named order, but it takes precisely one of each of the other elements within it as its content model. Any valid document using this DTD must adhere to this structure. You can test the new document and DTD by running the xvcmd.py command, as shown here:

C:\>python c:\python20\xmldoc\demo\xmlproc\xvcmd.py order.xml xmlproc version 0.70

Parsing 'order.xml'

Parse complete, 0 error(s) and 0 warning(s)

The document order.xml is valid. If you arbitrarily change the document, it breaks.

Let's modify your order.xml document to look like the one following by deleting the qty and product_name elements. This ensures that the document breaks under the eyes of validation:

<?xml version="1.0"?>

<!DOCTYPE order SYSTEM "order.dtd">

<order>

<customer_name>eDonkey Enterprises</customer_name>

<sku>343-3940938</sku>

<unit_price>39.95</unit_price>

</order>

In this case, the parser complains about the new document structure:

$ python c:\python20\xmldoc\demo\xmlproc\xvcmd.py order.xml xmlproc version 0.70

Parsing 'badorder.xml'

E:badorder.xml:6:14: Element 'unit_price' not allowed here E:badorder.xml:7:9: Element 'order' ended, but not finished

Parse complete, 2 error(s) and 0 warning(s)

Generally, it's a good idea to place the DTD externally. This is a far more flexible way of doing things as it allows multiple document instances to be compared to one single DTD. For example, a DTD is much better when your documents are published on the Internet. You can easily have XML instance documents scattered all over the world, but if their document type declarations point to a URL for a valid DTD, they can still be validated. Using a URL to indicate the DTD allows you to keep a single copy of a DTD online.