How does XML handle white-space in my documents?
Answer:
All white-space, including line breaks, TAB characters, and normal spaces, even between
‘structural’ elements where no text can ever appear, is passed by the parser unchanged to
the application (browser, formatter, viewer, converter, etc), identifying the context in
which the white-space was found (element content, data content, or mixed content, if this
information is available to the parser, eg from a DTD or Schema). This means it is the
application's responsibility to decide what to do with such space, not the parser's:
* insignificant white-space between structural elements (space which occurs where only
element content is allowed, i.e. between other elements, where text data never occurs)
will get passed to the application (in SGML this white-space gets suppressed, which is
why you can put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which can contain text and
markup mixed together, usually mixed content or PCDATA) will still get passed to the
‘structural’ elements where no text can ever appear, is passed by the parser unchanged to
the application (browser, formatter, viewer, converter, etc), identifying the context in
which the white-space was found (element content, data content, or mixed content, if this
information is available to the parser, eg from a DTD or Schema). This means it is the
application's responsibility to decide what to do with such space, not the parser's:
* insignificant white-space between structural elements (space which occurs where only
element content is allowed, i.e. between other elements, where text data never occurs)
will get passed to the application (in SGML this white-space gets suppressed, which is
why you can put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which can contain text and
markup mixed together, usually mixed content or PCDATA) will still get passed to the
application exactly as under SGML. It is the application's responsibility to handle it
correctly.
The parser must inform the application that white-space has occurred in element content,
if it can detect it. (Users of SGML will recognize that this information is not in the ESIS,
but it is in the Grove.)
<chapter>
<title>
My title for
Chapter 1.
</title>
<para>
text
</para>
</chapter>
In the example above, the application will receive all the pretty-printing linebreaks,
TABs, and spaces between the elements as well as those embedded in the chapter title. It
is the function of the application, not the parser, to decide which type of white-space to
discard and which to retain. Many XML applications have configurable options to allow
programmers or users to control how such white-space is handled.
correctly.
The parser must inform the application that white-space has occurred in element content,
if it can detect it. (Users of SGML will recognize that this information is not in the ESIS,
but it is in the Grove.)
<chapter>
<title>
My title for
Chapter 1.
</title>
<para>
text
</para>
</chapter>
In the example above, the application will receive all the pretty-printing linebreaks,
TABs, and spaces between the elements as well as those embedded in the chapter title. It
is the function of the application, not the parser, to decide which type of white-space to
discard and which to retain. Many XML applications have configurable options to allow
programmers or users to control how such white-space is handled.
No comments:
Post a Comment