From rob at robburns.com Sun Feb 1 12:30:46 2009 From: rob at robburns.com (Robert J Burns) Date: Sun, 1 Feb 2009 14:30:46 -0600 Subject: [html4all] HTML Specifications Message-ID: Hello 4All, I've made some substantial progress on developing an HTML4All HTML specification/specifications. I've done a lot to re-factor the way HTML is framed: mostly to avoid the common confusion of technical terms (such as an HTML document which can mean an HTML document regardless of its serialized form or can mean a non-XHTML document), but I think there are many other benefits too. Based on earlier messages I sent to the HTML WG, I followed my own advice and separated the parsing from the vocabulary from the browser behavior. My thinking is that, other than parsing and the presentation of HTML vocabulary, I will mostly rely on referencing the HTML5 specification in terms of browser behavior. For the presentation of HTML vocabulary, I expect to describe that mostly in CSS terms (though there are some things that have no CSS analog) so any CSS conforming UA will be able to easily support the presentation of our HTML (which I'm modestly calling HTML 4.1 for lack of another name). -------------- next part -------------- A non-text attachment was scrubbed... Name: HTML4AllStack.png Type: image/png Size: 39152 bytes Desc: not available URL: -------------- next part -------------- Parsing: The parsing adds many forward compatible features that Ian has rejected under the claim that no browser is going to make any changes to the parsing (so apparently we specify the incorrect parsing because we take the fatalist position that "oh well, no one will implement the HTML5 parsing algorithm anyway"). Some of the changes I added to the parsing algorithm are already supported in one browser or another. For example I added the WebKit behavior of allowing a self-closing tag on 'script' elements with a 'src' attribute. I've also added support for new and unknown elements in the head which some browsers support (e.g., I think Opera off of the top of my head). I added namespace aware parsing to the parsing algorithm so that not only are 'html', 'mathml', and presumably 'svg' to be added to their respective namespaces, but any author declared namespace will also be added to the appropriate namespace. This namespace aware parsing is not all that different from IEs text/html parser (though IE admittedly is a bit less namespace aware in its resulting document). I'm also in the process of explicitly adding the HTML4All elements to the parsing algorithm, though a browser implementing the parsing algorithm will automatically work with our newly added elements (which is the forward compatibility feature I added already). Serialization: In the spirit of separating implementation conformance from document conformance, the parsing algorithm is entirely about implementation conformance. Serialization on the other hand is entirely about document conformance (well except for serializing implementations :-) ). My goal is to have a canonical HTML serialization (what I'm calling cHTML with c for canonical) that basically follows the XHTML1.0 appendix C criteria and the new W3C Media Types note[1] with respect to HTML (and obviously not the script, DOM, and CSS criteria in that note). This serialization promotes what many of us insist are best practices in serialization. Leif has raised with me the desire to allow some source minimization and I think this could be done with alternate serialization specifications (which would have some corresponding conformance checking service). The possible alterations a serialization separate from the cHTML serialization might add (in order of best practice where each item in the list is a little worse practice IMHO). ? element tag minimization where some closing tags can be omitted (except for 'p') and opening and closing tags can be omitted on 'html', 'head', 'body', and 'tbody' and 'colgroup' (I expect to require explicit 'tbody' and 'colgroup' in the cHMTL serialization) ? omission of the self-closing tag solidus "/" from elements defined as empty ? boolean attribute minimization (e.g., "..." instead of "...") ? omission of quotation marks from attribute values in certain circumstances ? 'p' element closing tag omission. I expect to require this in the cHTML serialization for forward compatibility reasons. It was clearly a mistake to ever allow 'p' close tag omission and requiring going forward means that there are no strange exceptional cases with new elements such as the 'section' element which will work in many browsers automatically (and with the tHTML parsing algorithm) in a forward compatible way, but will not implicitly close the 'p' element as it should. The last two items are particularly troublesome for various reasons I won't go into now, but I think if we do define an alternate serialization to the cHTML serialization it should not include the last two items in the list (though some authors "in the know" will no how to produce even more minimized syntax that still parses correctly). In any even such a serialization will be compatible with the tHTML parser, but not with an XML parser, so authors can decide to maintain code only for tHTML and SGML parsing or for XML, SGML, and tHTML parsing with the cHTML serialization. Vocabulary: This is really the meat of the proposal. This is my attempt to do what I thought Ian should have been doing all along: listening to the members of the WG, engaging in dialog with them on and off list, and weaving their best suggestions into a new HTML vocabulary specification of which we could all be proud. There is a lot of new ideas in this vocabulary and therefore there is a lot to absorb in it. However, I think the features will prove quite intuitive and simple for authors to use and many things that today require complex scripting can be done in HTML 4.1 simply and with HTML-style declarative markup and a conforming browser. The vocabulary specification is currently a combination of document and implementation conformance criteria. It might be possible for us to split this out later if we wanted to, but I think in the meantime it is good to keep those norms together for editing purposes. I have already elaborated most of the new HTML4all originated features have Strategy: Some have asked me how we can hope to influence change in this area without the support of the W3C. Ideally will might gain their support and perhaps be invited into a genuine public process to develop HTML5. However, I sincerely believe we can bring these changes about even without the support of the W3C. Here's the steps I have in mind: ? to publish an HTML recommendation (recommendations) that put(s) the needs of users and authors first ? to provide various machine readable schema (XSD, RelaxNG, DTD) in addition to the normative prose of the recommendation to support alternate UAs - also provide an online validation/conformance service so authors can check their documents? conformance to the specification ? to organize and support the implementation of this recommendation in at least one open source rendering engine ? to organize the tracking of feature requests to bring support for our HTML recommendation to all the major browsers - with one reference implementation in place, there should be significant pressure on other browsers to support these desirable features ? develop javascript implementations of these features so that support can be added even to non-conforming implementations (at lest the ones supporting javascript). ? create tutorials to show authors how to use these new features (e.g., monikers and XForms on anyElement features) Because of a vibrant open source rendering engine community (KHTML, WebKit, Mozilla) and several open source browsers (e.g., Shira), I think we can lead the way both to better rendering engines and to better browsers. That is not to say that the commercial vendors will follow suit. However, much of what we propose degrades gracefully in other browsers. Wiki editing approach: I welcome everyone to get involved with editing these specifications. I have not done the work of specifying many of the existing features drawn from HTML4. Leif and I have interacted somewhat bout this privately and I'm convinced that a wiki approach would be ideal to develop an HTML vocabulary specification that was as clear and concise as could be. Ian's criticisms of HTML4 prose are sometimes justified, though HTML5 is often worse. However I think starting form the HTML4 descriptions of the semantics of the elements and attributes and improving upon them in a wiki way will create an excellent specification. Also on the new HTML 4.1 features, there's always room to improve upon my prose. Having new eyes read those and point out what is not clear and what is redundant would be quite helpful. Also some new features are described on the original page for this project[4], but have not been re-created on the new draft page (such as the 'access' element and the 'marks' attribute). Currently all of the various chapters (and modules) of the specification are gathered together on one wiki page[2]. That page has a corresponding discussion or talk page where deliberation can take place about the language of the specification (and even the substantive features). Normally on wikipedia there is a "no original research" policy. However, in contrast this effort is original research so that does not apply here. Instead I think we simply should foster a collegial atmosphere where we work to build consensus and focus on good faith improvements to HTML. Though I welcome us to think about new and better ways to provide accessibility in documents, I do not anticipate that many of the controversies we've faced in the HTML WG regarding features such as summaries for tables will be a problem here at HTML4All. As for the final presentation, order and hierarchy of the prose, I feel we might change that before it reaches its final form (and not presented on a wiki. However in the meantime I think the wiki approach will be an excellent way to shape the prose of this specification. The parsing algorithm couldn't go on the wiki since mediawiki is rather mediaweak when it comes to support for mildly complex hierarchy. There is a wiki page about the algorithm[3] and the corresponding talk page would be a fine place to discuss that algorithm and changes to it. The algorithm is also a quite complex read with many interlocking dependencies so filtering all the edits for that through one editor is not such a bad idea anyway. Take care, Rob [1]: [2]: [3]: [4]: From rob at robburns.com Sat Feb 28 15:24:51 2009 From: rob at robburns.com (Robert J Burns) Date: Sat, 28 Feb 2009 18:24:51 -0500 Subject: [html4all] removing document conformance from HTML5? Message-ID: Hello 4All, Surprisingly Maciej today raised the prospect of removing document conformance criteria from HTML5[1]. It's never clear what something like that might mean, especially with Maciej. However, I think this could go a long way to addressing the problems with HTML5. HTML5 could make a fine UA behavior and interoperability specification. However, the changes in document conformance ? both additions and subtractions ? are a disaster. Rob Sayer has also been proposing something similar[2][3] within the mozilla community, but has largely been shouted down and overwhelmed there by the WhatWG crew. To me this could open up the possibility of a more positive HTML vocabulary specification developed elsewhere without what I view as a WhatWG sabotage. Its also possible that the situation of just leaving HTML4 as the continued specification for HTML vocabulary would be the default. As much as I'd like to see improvements to HTML ? which is what drew me to the HTML WG in the first place ? it seems that the WhatWG had a very different goal in starting this project. While HTML4 could remain the document conformance specification in the meantime, the HTML5 specification could then refocus on UA conformance: specifically parsing of text/html, DOM behavior, rendering and other UA behavior. Eventually perhaps XHTML2, HTML4All, or another effort could be started to advance the HTML vocabulary and develop newly evolving document conformance criteria. Any thoughts? This idea keeps getting floated in various places and I'm interested in the opinions of this group. Take care, Rob [1]: [2]: [3]: