Sometimes we, or our robots, have to read raw HTML. HTML is a W3C Recommendation which approximates to the word “Standard”. Almost all of the HTML on the web isn’t. Originally it was authored by humans who couldn’t be relied on to get the end tags right so browsers had to be forgiving. And this forgiveness is both the success of the web and the horror of current non-semantic browsers.
Much HTML is now written by machines. Much of it is truly awful. But very occasionally you come across HTML which is so bad it is worthy of the famous William McGonagall, one of the worst poets in the Galaxy. For those unfamiliar, a snippet:
- Beautiful Railway Bridge of the Silv’ry Tay!
- Alas! I am very sorry to say
- That ninety lives have been taken away
- On the last Sabbath day of 1879,
- Which will be remember’d for a very long time.
If McGonagall had written in HTML he might have produced something like this:
The remarkable thing is that this is written by a machine, not a human. I cannot fathom the thought processes of the person (or probably persons) who wrote the generator.
However machines can do a good job as well. Near the top of the list is WordPress. I tried to save the above snippet as real code and got:
< H2>Physical dataH2>
< UL> Appearance: colourless liquid < br> Melting point: -31 C < br> Boiling point: 156 C < br> Vapour density: 5.4 (air = 1) < br> Vapour pressure: < br> Density (g cm< sup>-3sup>): 1.5 < BR> Flash point: 51 C < br> Explosion limits:
< br> Autoignition temperature: < BR> Water solubility: slight < P> UL>
This is bizarre - and the reason I haven't blogged any CML. But it ceases to be amusing in a few milliseconds. Whereas the true human has a delicious creativity in the grotesque.
This is what the WordPress HTML ACTUALLY looks like:
Enjoy the nested spans, the empty codes, the missing start tags. Now you can see why I need a new version ++ ICE]