Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't quite get Linus' problem with XML for document markup (for anything else - config files, build scripts - sure, XML is horrible). Does anyone know any more details about what his specific gripe is? For me, asciidoc (which looks very similar, conceptually, to markdown) suffers from one huge problem: it's incomplete. Substituting symbols for words results in a more limited vocabulary, if that vocabulary is to remain at all memorable.

Sure, XML can be nasty, but thats very much a function of the care taken to a) format the file sensibly b) use appropriate structure (i.e. be as specific as necessary, and no more).



Document markup is the one place XML is a no-brainer - more specifically, long-form, highly structured documents (i.e., essentially books).

Without it, publishing would be stuck in a morass of nebulous, ill-documented proprietary messes, and a great deal of current learning would be at risk of being lost to posterity. The fact that there are associated open standards such as XSLT with which to transform it is just the icing on the cake as far as publishing is concerned.

This is why there's so much distaste for XML - people try to use it for applications where it isn't ideal (and there are many more of those than there are applications where it is ideal) because they've swallowed someone else's hype, and as a consequence they have a bad time. If not for the unbelievable exaggeration a few years back (I heard people claim without irony that XML - a markup language for god's sake - would literally change the world), the divisiveness wouldn't exist, and it would be a technology used by experts quietly getting on with the jobs it's best for.


>Document markup is the one place XML is a no-brainer

That's only true for minimally formatted documents. For anything that approaches professional typesetting requirements, XML is a nightmare.

By far the biggest problem, it the requirement that inner elements must be closed before outer ones can be. This frequently means that the software must do a huge amount of read-ahead to figure out which aspect of the formatting changes first to make that formatting element innermost.

Sometimes, that's simply not possible to arrange and so you have to close a whole bunch of elements and then reopen all but one of them.

All this because a constraint of the format.

Ideal formats, such as used by typesetting systems that don't use XML, allow you to say: keep this formatting trait on until it's switched off. There is no concept of every element needing to be a subset of its encompassing element.


<startbold/>asdf<startitalic/>qwerty<endbold/>123<enditalic/>


Yeah, we're still well in the backlash phase of XML's hype cycle.

I just hope that opinion of it as a markup language can be rehabilitated before someone reinvents it and kicks off a new hype cycle.


For those who missed it, here's what Linus wrote in the comments:

"+Aaron Traas no, XML isn't even good for document markup.

Use 'asciidoc' for document markup. Really. It's actually readable by humans, and easier to parse and way more flexible than XML.

XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist.

As to JSON, it's certainly a better format than XML both for humans and computers, but it ends up sharing a lot of the same issues in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.

INI files are fine for simple config stuff. I still think that "git config" is a good implementation."


Linus' adversion to XML explains also why parsing git's output is so abysmal inconsistent.

Subversion has a really good XML output for its log command which is a joy to use (and that's something to say if you work with XML) whereas with git you always have ugly format options that are most of the time underdocumented.


I disagree. It's actually quite simple, and fast.

Git's output was designed in the Unix spirit; you can parse it very quickly without needing a parser toolchain.

It's also extensively documented: git help log, etc


Use 'asciidoc' for document markup.

I just had a quick scan of the user guide. It's very impressive. Looks like markdown but with all the edge cases thought out.


I've been working with restructured Text a lot, it's a breeze as well. Seems quite similar.


Just a note, pandoc (implemented in Haskell) makes it almost a joy to work with various "dialects" of ReST/AsciiDoc/Markdown etc. It feels like what python's ReST should have been (at least the last time I looked, it was pretty hard to get different html out of it - even if it is supposed to be extendible). If you have a dependency on python, staying with the python libs are probably best, but if you just want "a document system", I recommend having a look at pandoc.


Actually, i am using sphinx and latexpdf to generate project documentation and it is more then awesome to generate beautiful looking LaTeX documents with graphiz graphs in high-res but only writing rst... :)


I like that the syntax for features is illustrative, so that the raw text representation doesn't end up as a mess of ugly tags. However it looks completely inflexible. It's a mishmash of special cases. How would I implement a new feature without breaking existing implementations? Or without having to write a new parser that in all likelihood will break on some subtle edge case?

At its core XML (if you ignore all the DTD, namespace and entity rubbish) is both simpler and more powerful than this. You have text, tags and attributes. What those tags and attributes mean is up to the application, but at the very least you can be sure that the document can always be reliably parsed into a form you can work with.


> in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.

I'd really like to hear more about this perspective, if anyone feels like they can elaborate.


I think he overuses the single-word sentence "really" too much. Really.


This link needs to be posted again and again and again.

I'm sure quite a lot of people will easily recognize it. :^)

Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)

https://groups.google.com/forum/message/raw?msg=comp.lang.li...


That's a wonderful rant - I particularly appreciate the digression into anti-bush rhetoric - but:

1. There's very little detail here; it's a nicely worded, emotionally charged piece that leaves a lot of detail unaddressed, e.g. "'I would like to hear why you think it is so bad, can you be more specific please?' If you really need more information, search the Net, please." That's not very helpful.

2. It argues for 'simpler' markup via the removal of attributes. Where possible, I totally agree, as at least hinted at in my original post. Sometimes, though, this would be impossible or unwieldy (e.g. HREF attribute on an A element).

3. Character entities vs. unicode - totally agree. Wherever possible, I use proper unicode characters rather than ugly character entities in my markup.

4. "But the one thing I would change the most from a markup language ... is to go for a binary representation." Linus would vehemently disagree on this point.


There are whole lot more supplementary rants: http://www.xach.com/naggum/articles/search?q=xml

In particular: http://www.xach.com/naggum/articles/3224334063725870@naggum....

with key words being "Whather what you are really after is foo, bar, or zot, depends on your application.".

His articles on SGML are mandatory reading too.

Several years ago someone posted these links and it opened wonderful world of Lisp to me. Not the language per se (there are many languages) but whole another Universe of how things could be done. I swear I jumped on the chair reading every page of CL standard, how brilliant it is on every level to C. Eventually it led me to rethink attitude to C and Unix in general, core parts of which I despise now.

So here am I returning favor, maybe someone will follow these links too.

Thank you, Erik. Rest in peace.


I thought the politics part was totally out of place and made him sound like a nut, FWIW.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: