How can I handle embedded HTML in my XML?
Answer:
Apart from using CDATA Sections, there are two common occasions when people want
to handle embedded HTML inside an XML element:
1. when they have received (possibly poorly-designed) XML from somewhere else which
they must find a way to handle;
2. when they have an application which has been explicitly designed to store a string of
characters containing < and & character entity references with the objective of turning
them back into markup in a later process (eg FreeMind, Atom).
Generally, you want to avoid this kind of trick, as it usually indicates that the document
structure and design has been insufficiently thought out. However, there are occasions
when it becomes unavoidable, so if you really need or want to use embedded HTML
markup inside XML, and have it processable later as markup, there are a couple of
techniques you may be able to use:
* Provide templates for the handling of that markup in your XSLT transformation or
whatever software you use which simply replicates what was there, eg
<xsl:template match="b">
<b>
<xsl:apply-templates/>
</b>
</xsl:template/>
* Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup
verbatim, eg
<xsl:template match="ol">
<xsl:copy-of select="."/>
</xsl:template/>
* As a last resort, use the disable-output-escaping attribute on the xsl:text element of
XSL[T] which is available in some processors, eg
<xsl:text disable-output-escaping="yes"><![CDATA[<b>Now!</b>]]></xsl:text>
* Some processors (eg JX) are now providing their own equivalents for disabling output
escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but
it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary
code from being passed untouched through your system. It also adds another dependency
to your software
to handle embedded HTML inside an XML element:
1. when they have received (possibly poorly-designed) XML from somewhere else which
they must find a way to handle;
2. when they have an application which has been explicitly designed to store a string of
characters containing < and & character entity references with the objective of turning
them back into markup in a later process (eg FreeMind, Atom).
Generally, you want to avoid this kind of trick, as it usually indicates that the document
structure and design has been insufficiently thought out. However, there are occasions
when it becomes unavoidable, so if you really need or want to use embedded HTML
markup inside XML, and have it processable later as markup, there are a couple of
techniques you may be able to use:
* Provide templates for the handling of that markup in your XSLT transformation or
whatever software you use which simply replicates what was there, eg
<xsl:template match="b">
<b>
<xsl:apply-templates/>
</b>
</xsl:template/>
* Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup
verbatim, eg
<xsl:template match="ol">
<xsl:copy-of select="."/>
</xsl:template/>
* As a last resort, use the disable-output-escaping attribute on the xsl:text element of
XSL[T] which is available in some processors, eg
<xsl:text disable-output-escaping="yes"><![CDATA[<b>Now!</b>]]></xsl:text>
* Some processors (eg JX) are now providing their own equivalents for disabling output
escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but
it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary
code from being passed untouched through your system. It also adds another dependency
to your software
No comments:
Post a Comment