<html xmlns="http://www.w3.org/1999/xhtml"> <!--*- HTML -*-->
 <head>
  <title>XHTML and MathML in XML sent as text/html and parsed as tag soup HTML</title>
  <style type="text/css">
   .bias { font-size: smaller; }
  </style>
 </head>
 <body>
  <h1>The XHTML as <code>text/html</code> Mess</h1>
  <p>Note: I want this document to be as impartial as possible (except
  where <span class="bias">explicity noted</span>). If you have any
  comments, additions, or questions I strongly recommend that you mail
  them to me (<a href="mailto:ian@hixie.ch">ian@hixie.ch</a>) and ask
  that you cc the www-talk mailing list (<a
  href="mailto:www-talk@w3.org">www-talk@w3.org</a>). Thanks.</p>
  <h2>The problems</h2>
  <p>There are several people who have several different (but similar)
  problems which all basically boil down to the same issue. The main
  goals that I recall are:</p>
  <dl>
   <dt>MathML</dt>
   <dd>
    <p>People want to send MathML embedded in HTML in such a way that
    suitably enabled browsers can view the equations correctly.</p>
    <p><a href="http://www.mozilla.org/projects/mathml/">Mozilla</a>
    are able to render MathML natively, and Windows IE has a plugin
    which allows it to render MathML. However, whereas Mozilla
    requires that the document be parsed as text/xml in order for it
    to recognise the MathML namespace and thus construct the right
    DOM, Windows IE requires that the document be parsed as text/html
    in order for it to construct its DOM.</p>
    <p>(Note that you cannot, while complying to the spirit of current
    W3C technologies, send XHTML containing non-XHTML namespaced
    content as text/html. All XHTML content containing any mention of
    namespaces other than the xmlns attribute on the root &lt;html&gt;
    element are, of course, invalid XHTML documents to start with, but
    even taking that into account, section 5.1 of XHTML states that
    only documents that, by virtue of following Appendix C, are
    compatible with existing UAs may be sent as text/html. Documents
    containing namespaces are almost certainly not backwards
    compatible.)</p>
   </dd>
   <dt>Progress</dt>
   <dd>
    <p>People want to use XHTML (presumably because it is the "latest
    and greatest" version of HTML) without losing their target
    audience. The majority of deployed browsers do not support XHTML,
    but if the guidelines in <a
    href="http://www.w3.org/TR/xhtml1/#guidelines">Appendix C</a> are
    followed, then valid XHTML is compatible with existing browsers
    and thus can be sent to them. The problem is that authors want
    modern browsers to still use their XML parser on these
    documents.</p>
    <p>There is no normative reason why this would not be a valid
    thing to do. <span class="bias">However, <a href="#future">see my
    rebuttal below</a>.</span></p>
   </dd>
   <dt>More Reliable CSS Styling</dt>
   <dd>
    <p>See <a href="http://lists.w3.org/Archives/Public/www-talk/2001JulAug/0008.html">this post</a>.</p>
   </dd>
   <dt>Styling with XSL</dt>
   <dd>
    <p>To style a web page with XSL, the source has to be an XML
    document. The theory is that if those XHTML documents are sent
    with both XSL and CSS stylesheets and are sent as text/html, then
    non-XML browsers will be able to render the pages using the CSS,
    and browsers supporting XSL will be able to render the pages using
    XSL, treating the source as XML.</p>
    <p>In practice, this is unlikely to prove a problem: first, XHTML
    is more easily styled using CSS than XSLT+XSL:FOs, and second, by
    the time any web browser supports XSL, XML support is likely to be
    on most desktops.</p>
   </dd>
  </dl>
  <h2>The real world</h2>
  <p>Content sent as text/html is not only HTML4, but also Tag
  Soup. Tag Soup is a defacto standard that is only superficially
  related to SGML. UAs absolutely have to support Tag Soup or they
  will never gain enough market penetration for their standards
  support to matter.</p>
  <h2>The solutions</h2>
  <p>There are several existing solutions.</p>
  <ol>
   <li>
    <p>I have written <a
    href="http://software.hixie.ch/utilities/cgi/xhtml-for-ie/">a
    tool</a> that will sniff for the browser version and send IE the
    MIME type text/html and other browsers text/xml. In fact, this
    very document is being processed by this script. Unfortunately,
    this requires access to the server.</p>
   </li>
   <li>
    <p>Don't use XHTML yet. If you need validation, do it on your
    side.</p>
   </li>
  </ol>
  <h2>The proposals</h2>
  <p>Given a text/html entity, how might a suitably capable user-agent
  determine unambiguously and reliably that the author intended it to
  apply an XML parser to the contents?</p>
  <p>People have proposed several ideas to make XML-aware UAs treat
  documents sent as text/html as text/xml.</p>
  <p>Note, however, that the only XHTML documents that are allowed to
  be sent as text/html according to the spec are those conforming to
  Appendix C, and documents conforming to Appendix C will look
  identical whether treated as text/html or text/xml.</p>
  <ol>
   <li>
    <p>Sniff for an XHTML DOCTYPE. Rebuttal: no algorithm has been
    proposed which could be executed fast enough (it is vital that
    page load times not be affected by sniffing for XHTML since this
    feature has such a small target audience compared to general fast
    surfing) while not stumbling on <a href="html-not-xml.html" >valid
    HTML documents</a>.</p>
   </li>
   <li>
    <p>Sniff for an XML Declaration. Rebuttal: Appendix C says PIs
    should not be included in documents sent as text/html; they are
    also <a href="html-not-xml.html">hard to parse correctly</a> in a
    hurry.</p>
   </li>
   <li>
    <p>Sniff for an XHTML namespace declaration. Rebuttal: even harder
    to parse than an XHTML DOCTYPE.</p>
   </li>
   <li>
    <p>Only examine the file extension and treat .xhtml files as
    XML. Rebuttal: There are <a href="/404.xhtml">many files</a> with
    .xhtml extensions which are plain old HTML.</p>
   </li>
   <li>
    <p>Magic Comment String. Rebuttal: it is "wrong".</p>
   </li>
  </ol>
  <h2 class="bias">The future (This is the explicitly biased
  section!)</h2>
  <p class="bias">Personally, I believe the way forward is for UAs to
  support XHTML (as some browsers are doing, WinIE being the major
  obstacle here), for these browsers to be distributed widely, and
  only then for XHTML to begin being used. There is no point running
  before we can crawl.</p>
  <p class="bias">I'm still looking for a good reason to write
  websites in XHTML <em>at the moment</em>, given that the majority of
  web browsers don't grok XHTML. The only reason I was given (<a
  href="http://lists.w3.org/Archives/Public/www-talk/2001MayJun/0031.html"
  >by Dan Connolly</a>) is that it makes managing the content using
  XML tools easier... but it would be just as easy to convert the XML
  to tag soup or HTML before publishing it, so I'm not sure I
  understand that. And even then, having the content as XML for
  content management is one thing, but why does that require a
  minority of web browsers to have to treat the document as XML
  instead of tag soup?  What's the advantage of doing that? And even
  <em>then</em>, if the person in control of the content is using XML
  tools and so on, they are almost certainly in control of the website
  as well, so why not do the content type munging on the server side
  instead of campaigning for UA authors to spend their already
  restricted resources on implementing content type sniffing?</p>
  <h2>Further reading</h2>
  <p><a
  href="http://lists.w3.org/Archives/Public/www-talk/2001MayJun/thread.html#0"
  >This thread</a> in www-talk speaks about many of these issues.</p>
  <h2>Thanks</h2>
  <p><a href="mailto:aray@q2.net">Arjun Ray</a> contributed to this document.</p>
 </body>
</html>