Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Almost all of this is solved by basically putting quotes around strings.

Yaml has its uses cases where you want things json doesnt do like recursion or anchors/aliases/tags. Or at least it has had - perhaps cue/dhall/hcl solves things better. Jsonnet is another. I havent tried enough to test how much better they are.



I feel like these two tenets - (1) yaml should require quotes & (2) the value in yaml is in recursion/anchors - are fundamentally the opposite of why yaml exists & why people use it.

The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.

Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.

Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.


Genuinely curious - What major flaws does TOML have? I've used it before and it seems like a simple no-nonsense config language. Plenty of blog articles about the flaws behind YAML, I don't really see complaints about TOML!


INI-like formats are perfectly fine for config files with at most one layer of nesting/sections. TOML is a perfectly fine INI-like parser. Its definitions and support for strings, numbers, comments, sections and simple arrays are great. But its main claim to fame is extending INI to support arbitrary levels nesting of arrays and dictionaries like JSON, and IMO it does a horrible job at it.

With JSON, YAML, XML and many other formats, the syntax for nesting has a visual appearance that matches the logical nesting. TOML does not. You have to maintain a mental model of the data structure, and slot the flat syntax into that structure.

Furthermore, there are multiple ways to express the same thing like

  [fruit.apple]
  color = "red"
or

  [fruit]
  apple.color = "red"
It isn't always obvious which approach is more appropriate for a task, and mixing them creates a big mess.

And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.


> And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.

While I have some minor annoyances with TOML, I counterintuitively consider it a strength of the format that nesting quickly becomes untenable, because it produces pressure on the designers of config file schemas to keep nesting to a minimum.

Maybe some projects have a legitimate need for something more complex, but IMO config files are at their best when they're just key-value pairs organized into sections.


As far as I can see, nobody originally constrained the problem to config files. So I guess the problem with TOML is that it's only good for config files, while JSON and TOML are general purpose.


Yes, I think that's a fair characterization. The priorities of config file formats are different than the priorities of human-readable arbitrary data serialization and transmission formats.


TOML is basically a formalization of the old INI format, which only existed in ad-hoc implementations. It's not really a "language", just a config data syntax like JSON. It doesn't have major footguns because it doesn't have a lot of surface area.

The various features it has for nesting and arrays make it convenient to write, but can make it harder to read. There is no canonical serialization of a TOML document, as far as I can tell, you could do it any number of ways.

So while TOML has its use for small config files you edit by hand, it doesn't really make sense for interchange, and it doesn't see much use outside of Rust afaik.


I believe TOML can always be serialized to JSON. And TOML is in the python standard library in newer pythons. It’s also used as the suggest format for `pyproject.toml` in python


> I don't really see complaints about TOML!

Sampling bias, there are no complaints about it because no-one uses it (jk).

It's subjective of course but despite the name TOML never really seemed that 'Obvious' to me, in particular the spec for tables. I also think the leniency in the syntax isn't necessarily a good feature and serves to make it less 'Minimal' than its name suggests.


toml is just not human friendly unless you're just using a super simple object with as little nesting as possible. As soon as you increase the nesting you need yaml or json


> The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters.

Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.

It's because JSON doesn't have comments.

If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.

But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.


JSONC is JSON with comments (and trailing commas) and it's fairly widely supported, namely because VS Code ships with support built in and they use it for all their config files. I've seen libraries for a number of languages.

VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).


> It's because JSON doesn't have comments.

This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?


I believe JSON5 didn't exist when we first wrote pub. If it did, it certainly wasn't widely known.

Obviously, migrating to it now when there are thousands and thousands of packages and dozens of tools all reading pubspecs would be much more trouble than it's worth.


Understandable. I just checked & JSON5 was just 1 year later but even then it would've taken a lot longer to gain sufficient traction to be well supported.


Most protocols defined in RFCs require the use of regular JSON. You don’t have a choice.


Not sure what context you're referring to but we're discussing configuration file formats, not data transports, so I doubt that would be a frequent issue.


I see where you are coming from but YAML anchors are definitely a great and powerful feature that deserves more attention. The other day I was refactoring a broken [1] k8s deployment based on a 3rd-party Helm chart and since I didn't have the time to migrate to a better chart, YAML anchors permitted me to easily reduce YAML duplication, with everything else (Helm, Kustomize, Flux, Kubernetes) completely unaware of anything. Just a standard YAML pattern.

[1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work


Is - not effectively an opening delimiter?

If we want to avoid quoting in particular, then we could use - for strings and anything else for non-strings. But the heuristics suck.


> Almost all of this is solved by basically putting quotes around strings.

Yeah, that was my first thought as well. I personally don't mind YAML, but I've also made a habit out of quoting strings. And, I mean, you're quoting both keys and strings in JSON, so you're still saving approx. 2 double quotes per key/value pair in YAML if that's a metric that's important to you.


As the article points out with the `on` example, you really have to quote yaml keys as well, if you want the defense to work...


The argument was that most of the mentioned problems could be solved by quoting the values. I don't have a problem with avoiding "on" as a key, and I apparently haven't used it ever, because I've never run into this particular problem in my 15+ years using YAML.

So, sure, if you want to play it super safe, quote keys as well. But I'm personally fine with the trade-off in not quoting keys.


If you compare to JSON5 instead of JSON, you still get the benefit of unquoted keys, but you also get a guarantee the keys are strings, and it's harder to forget to quote a value.


from the article:

>Many of the problems with yaml are caused by unquoted things that look like strings but behave differently. This is easy to avoid: always quote all strings.


As a total noob who had to work with yaml to write pipelines for ADO over my summer internship, I didn't seem to encounter any of these oddities, nearly everything I worked with was wrapped in quotations.


Yeah and this is enforced by default in yamllint.

It's very fair to cry "why the hell do I need a linter for my trivial config file format", and these footguns are a valid reason to avoid YAML.

But overall YAML's sketchiness is a pretty easy problem to solve and if you have a good reason to keep/choose YAML, and a context where adding a linter is viable, it's not really a big deal IMO.

And as hinted in the post, there's really no well-established universal alternative. TOML is a good default but it's only usable for pretty straightforward stuff. I'm personally a fan of the "just use Nix" approach but you can't put a Nix interpreter everywhere. And Cue is way overpowered for most usecases.

I guess the tldr is that the takeaway isn't "don't use YAML" but just "beware of YAML footguns, know the alternatives".


JSON doesn’t do them as part of the spec, but there’s nothing stopping you from doing them as post-processing. Eg OpenAPI does it by using a special $ref key where the post processor swaps in the value referenced there.

That’s effectively what jsonnet/cue/hcl do, though as a preprocessor instead of a postprocessor.


It's very counter-intuitive to me that 22:22 would need to be a quoted string, since functionally it's a K-V-pair. YAML itself even uses : in the Dict syntax!


The fact that it is effectively the dict syntax is precisely what makes it intuitive to me that it should be quoted if it’s going to be a a value. I admit the sexagesimal parsing is not the result I expected but I would have certainly expected something odd to happen given that the value includes a “:” character.


It's a key pair in whatever thing reads the YAML and then assign some meaning to that string. In YAML you need to put a space between the semi-colon and the value.


Jsonnet is pretty nice but the library support isn't quite as good. There are some nice libraries for yaml that do round trip processing for example so you can modify a yaml programmatically and keep comments. Yaml certainly has some warts (and a few things that are just frankly moronic) but it deserves some credit for hitting the sweet spot in a bunch of ways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: