In the early 2000s I was 100% sold on the idea of strict XHTML documents and the...

strogonoff · 2026-01-24T14:49:51 1769266191

As someone who has gotten into the idea of semantic Web long after XHTML was all the rage[0], I somewhat resent that semantic Web and XML are so often lumped together[1]. After all, XML is just one serialisation mechanism for linked data.

[0] I don’t dislike XHTML. The snob in me loves the idea. Sure, had XHTML been The Standard it would have been so much more difficult to publish my first website at the age of 14 that I’m not sure I would have gotten into building for Web at all, but is it necessarily a good thing if our field is based on technology so forgiving to malformed input that a middle school pupil can pass for an engineer? and while I do omit closing tags when allowed by the spec, are the savings worth remembering these complicated rules for when they can be omitted, and is it worth maintaining all this branching that allows parsers to handle invalid markup, when barely any HTML is hand-written these days?

[1] Usually it is to the detriment of the former: the latter tends to be ill-regarded by today’s average Web developer used to JSON (even as they hail various schema-related additions on top of JSON that essentially try to make it do things XML can, but worse).

PaulHoule · 2026-01-24T14:58:47 1769266727

The semantic web took on the XSD data types

https://www.w3.org/TR/xmlschema-2/

even though a lot of tools and standards (I'm looking at you SPARQL) don't really support them. My favorite serialization for RDF is Turtle:

https://en.wikipedia.org/wiki/Turtle_(syntax)

strogonoff · 2026-01-24T15:16:19 1769267779

That is a good point, if you consider XSD then that is an XML connection, it starts to become a bit complicated and I see why people start to dislike it. I forget about that because to me it’s just about the idea of a graph, which is otherwise quite elegant. Why not have a graph type-free with just string literals; much richer information about what kind of values go where can be provided through constraints, vocabularies, etc.

My favourite serialisation has got to be dumb triples (maybe quads). I don’t think writing graphs by hand is the future. However, when it comes to that, Turtle’s great.

PaulHoule · 2026-01-24T15:33:10 1769268790

Because the semantics of numbers and dates matters.

It's absurd that JSON defines numbers as strings and has no specification for dates and times.

I believe we lose a lot of small-p programming talent (people who have other skills who could put them on wheels by "learning to code") the moment people have the 0.1 + 0.2 != 0.3 experience. Decimal numbers should just be on people's fingertips, they should be the default thing that non-professional programmers get, IEEE doubles and floats should be as exotic as FP16.

As for dates, everyday applications written by everyday people that use JSON frequently have 5 or more different date formats used in different parts of the application and it is an everyday occurrence that people are scratching their heads over why the system says that some event that happened on Jan 24, 2026 happened on Jan 23, 2026 or Jan 25, 2026.

Give people choices like that and they will make the wrong choices and face the consequences. Build answers for a few simple things that people screw up over and over and... they won't screw up!

strogonoff · 2026-01-24T15:52:36 1769269956

> Because the semantics of numbers and dates matters.

Type semantics is only a small part of what is needed for systems and humans to know how to adequately work with and display the data. All of that information, including the type but so much more, can be supplied in established ways (more graphs!) without having to sprinkle XSD types on your values.

For example, say you have a triple where the object is a number that for whatever good reason must lie between 1 and <value from elsewhere in the graph> in 0.1 increments. Knowing that it is a number and being able to do math on it is not that useful when 99% of math operations would yield an invalid value; you need more metadata, and if you have that you also have the type.

Besides, verbatim literal, as obtained, is the least lossy format. The user typed "2.2"—today you round it to an integer but tomorrow you support decimal places, if you keep the original the system can magically get more precise and no one needs to repeat themselves. (You can obviously reject input at the entry stage if it’s outlandish, but when it comes to storage plain string is king.)

jraph · 2026-01-24T14:37:35 1769265455

> I'm still mildly annoyed when I see self-closing tags in HTML

Why? That's (mildly) bad for your health.

direwolf20 · 2026-01-24T14:47:58 1769266078

You're annoyed when people are trying to keep the dream alive?

Since HTML5 specifies how to handle all parse errors, and the handling of an XML self-closing tag is to ignore it unless it's part of an unquoted attribute value, it's valid HTML5.

GavinAnderegg · 2026-01-24T15:20:48 1769268048

I'm not annoyed by it when people are trying to make XML compatible documents, but effectively no one is. Platforms like WordPress use self-closing image tags everywhere, but almost no one using WordPress cares about document validation. This ends up meaning that the `<img ... />` is just an empty gesture.