• Aucun résultat trouvé

Element Type Declarations

Chapter 2. XML Fundamentals

2.5 Constructing XML Documents

2.6.2 Element Type Declarations

Element type declarations are used to constrain an element's content. They indicate what element types can be used as children of the element, and show how the children may be arranged.

Element type declarations may look like this:

<!ELEMENT br EMPTY>

<!ELEMENT generic ANY>

<!ELEMENT name (address+)>

<!ELEMENT para (#PCDATA | list | picture)*>

We can break up the declaration in particular systactic components, each with a specific purpose:

<!ELEMENT name content-model>

The text <!ELEMENT tells the parser that this is an element type declaration. name gives a name to the element type; this allows it to be referenced from elsewhere in the Document Type Definition. The content-model is used to specify what can appear as content of the element, whether it can contain character data, other elements, or both. No element type may be declared more than once.

It is interesting to note that there is not a place for attributes to be declared. While attributes are associated with element types, they are defined using attribute declarations, described later in this chapter, in Section 2.6.3.

2.6.2.1 Content models

A content model describes what elements are allowed as children of the declared element type, in what order and combination they are allowed, and whether arbitrary character data is allowed.

The content models of all elements can be broken into two categories:

Element Content

This describes content made up only of elements. That is, you define an address element that requires no character data, but instead requires child elements. The specification defines content particles that "consist of names, choice lists of content particles, or sequence lists of content particles."

Mixed Content

This content may contain character data. This is the most common arrangement in text documents:

<news title="XML from Outer Space">

This article describes XML transmissions from outer space.

<h1>Not a Meteor</h1>

<para>Contrary to earlier reports, the XML that has landed from

outer space is not a meteor.</para>

</news>

In this example, elements and character data are mixed beneath the news element.

Elements that have a mixed content model are not required to allow other elements as content. In fact, an element type with only character data in the content model may be completely empty; there is no way to specify that there must be characters in the character data.

Let's take another look at our example element declarations:

<!ELEMENT br EMPTY>

These element type declarations are simple. The content model of the first, EMPTY, can be used to describe an empty br element as found in XHTML. It can contain no child elements and no character data. It can still contain noncontent constructs, such as comments or processing instructions. An element type declared as EMPTY is considered a degenerate special case of element content.

<!ELEMENT generic ANY>

Next, we have an element named generic that can contain any kind of element defined in the document type (this does not allow undefined element types!). In addition to other elements, character data is allowed as well, so a content model of ANY is mixed content.

<!ELEMENT name (address+)>

The third example is simple, but very different from the others. Instead of a simple name such as ANY or EMPTY, the model is described by something that closely resembles a regular expression.

In this particular example, we have a name element that requires one or more address elements to be included. This form of content model is perhaps the most commonly used and allows for fine control. Content models can take on varying levels of complexity, but the goal is always the same: to define the content that is allowed or expected within the element.

The content model is specified with parentheses, as well as with commas indicating a sequence.

Vertical bar characters (|) indicate a choice. For example:

<!ELEMENT name (first, last)>

This element type requires a first child element followed by a last child element, and nothing else. If you want to offer a choice between first or last, but not allow both, use a vertical bar:

<!ELEMENT name (first | last)>

These expressions can be nested within each other as well:

<!ELEMENT order (sku, quantity, (account | name), price)>

The above order element requires a child sku element, followed by a quantity element, then followed by either an account or a name element, and finally followed by a price element.

Additionally, the operators +, *, and ? can be tacked onto the end of content expressions to indicate the number of times an element or sequence must occur, or whether it is repeatable or even required. Without a modifier, the element must appear exactly once in that location. They are explained in the following list:

+

Content must appear one or more times.

*

Content may appear zero or more times.

?

Content may appear zero times or one time.

For example, to require an order element to have only one account, followed by at least one or more skus, contain one or more price elements, and optionally provide a shipping address (ship) once only, you could use an Element type such as the following:

<!ELEMENT order (account, sku+, price+, ship?)>

To mix a combination of character data or elements, you can use the or operator to specify your mixed content, as shown here:

<!ELEMENT paragraph (#PCDATA | list | picture)*>

This paragraph element type allows for repeatable sequences of character data (denoted by the asterisk), list elements, or picture elements within paragraph elements. #PCDATA can only be combined with elements using the or operator in a group that has a * modifier, and it can only occur in the outermost parenthesized group of a content model.