Yeah, both examples plus him not backing up his performance assertion showed he ...

dmpk2k · on Jan 23, 2016

I think Bryan Cantrill is referring to both STM and M:N in the context of high performance applications (aka, the norm for him, since he's a kernel engineer). In that sense, both of them do have problems.

Circa 2000, Solaris, Linux, and some (all?) BSDs had M:N threading, and all of them threw it away because there are numerous performance artifacts caused by two schedulers (OS and the M:N) conflicting with each other. Mr. Cantrill was there when this all happened, and even wrote a paper on it: ftp://ftp.cs.brown.edu/pub/techreports/96/cs96-19.pdf

This doesn't affect Erlang as much simply because Erlang is really slow. I think of it as a waterline -- the lower the overhead of the language, the more likely you will run into the serious problems that M:N typically inflicts. E.g. priority inversion, and long and unpredictable latencies when the OS swaps out a thread that M:N wanted running.

As for Haskell and STM, all I know on the topic is that some pretty damn smart engineers at IBM spent a good while trying to make STM work well, and gave up. See this ACM article: https://queue.acm.org/detail.cfm?id=1454466. Perhaps Haskell managed to pull it off through the language restrictions it allows, but I’d like to see some compelling articles on that by engineers who know a thing or two; just because a language has STM doesn’t mean it does it well.

Honestly, I think your comments about M:N and STM are more revealing of your ignorance rather that Cantrill's. While you might disagree with him, there’s sufficient literature to give his stance some legitimacy. Sweepingly calling his stance "full of crap" says a thing or two.

im_down_w_otp · on Jan 23, 2016

Erlang's M:N scheduling isn't what makes it slow. And not all of Erlang is slow either for that matter.

The reason it works for Erlang is becuase M:N is the only model. There's no alternate supported model trying to maintain some competing set of semantics to muck up the works.

You can induce problems by running code that doesn't fit inside the model such as NIF code. But as of R18 you can consume a separately maintained threadpool for that stuff.

When the paradigms collide there are sometimes bizarre non-deterministic interactions, but there's nothing wrong with M:N scheduling, craploads of core infrastructure systems are running on M:N scheduling kernels/runtimes.

If you've used money in any way whatsoever ever at any point in your life you've critically depended on this "fad" and will continue to do so for a long, long, long time.

joveian · on Jan 23, 2016

At least in NetBSD, M:N was abandoned not for performance reasons but because it was too complex and could not be fully debugged. I don't know the details, but I could at least easily imagine multiple models being the main issue.

nickpsecurity · on Jan 23, 2016

It's something that, if it could be optimal, would only be optimal in specific situations. A lot of tactics CompSci discovers are like that. So, we experiment with them, stash them, whatever. Mainstream and crowd effects in academia both have a tendency to occasionally jump on something like a crusade. Many promises are made, much money thrown at it, and it never delivers. Inability to integrate with models of legacy systems, either at all or with desired attributes, is often one result of this. I knew the moment I saw the activity that it couldn't be trusted without further review. It was at least useful in security kernel model but most of those are gone now. So, we stash it in case it benefits another problem one day or someone has a use for it likely in a clean-slate system.

That's how these things go. I wouldn't expect it to work on NetBSD or any UNIX. Even the good ideas often fail to integrate into UNIX. Half-ass ones like M:N more so. UNIX model is just too broken. Hence, my opposing it here.

4ad · on Jan 23, 2016

> In that sense, both of them do have problems.

POSIX thread scheduling implemented as M:N has problems. General M:N or 1:N schedulers don't necessarily have to have problems, as it is demonstrated in practice by virtualization. Virtualized kernels implement their own scheduler on top of the hypervisor (which also has a scheduler), and it all works great. Performance overhead of virtualization is usually in other places, not in scheduler interactions. But even when it is, people do use VMs, so it works well enough, certainly not "pathologically bad"

dmpk2k · on Jan 23, 2016

I'm not sure about virtualised kernels on hypervisors working well. Under the right conditions they can probably be made to, but I've seen cases where apps that could scale reasonably well to 32 cores on a bare-metal OS having difficulty go beyond 4 cores when inside hardware virtualisation; the hypervisor was migrating threads between cores, which the guest OS was unaware of.

My own experience with KVM has been that performance is problematic, thus I don't really find it compelling evidence that M:N schedulers can work well. Maybe other hypervisors can do it better, but I don't see how they could get around fundamental realities: there usually are more hypervisor threads than real cores available to service them, yet guest OS's aren't designed to gracefully handle virtual cores unexpectedly freezing for periods of time.

im_down_w_otp · on Jan 24, 2016

Scheduler binding topologies go great lengths to mitigate this problem.

Don't over-subscribe the cores, disable hyperthreading, and bind scheduler threads to physical cores if your platform supports it.

dmpk2k · on Jan 24, 2016

Unfortunately, that raises the question: what is over-subscription?

Unless you're prepared to have many potentially-idle cores around, due to 1-1 bindings with an unknown, erratic, or uncontrollable workload, we're suddenly in a pretty tricky realm. You can get around this, but only in very specific and known use-cases you have control over; and I suspect the majority of use-cases fall well outside this.

If you want to more efficiently take advantage of free resources with relatively unpredictable or uncontrollable workloads (e.g. most cloud providers), binding threads to cores is a problem.

nickpsecurity · on Jan 23, 2016

"I think Bryan Cantrill is referring to both STM and M:N in the context of high performance applications (aka, the norm for him, since he's a kernel engineer). In that sense, both of them do have problems."

"Honestly, I think your comments about M:N and STM are more revealing of your ignorance rather that Cantrill's."

Probably was. Idk who he is or what he does. I'm aware there were fads to claim the two could solve every problem under the Sun in their category. I'm one of those few that ignore fads to focus on any lessons we can learn from certain tech or research. He made a blanket assertion against at least three technologies in one article with no context. Anyone reading might think the techs don't work well in any context. So, I briefly called that.

As evidence, I gave an example of STM that worked. In OS's, it had mixed results unless hardware accelerated and one typically had to protect I/O stuff with locks due to irreversibility. Far as M:N, only found OS use for it in security kernels (eg GEMSOS) to help supress covert, timing channels. Didn't expect him to know that use-case, though, since the cloud products are usually full of covert channels. And then there's MirageOS on unikernel end using type- and memory-safety for robust, network applications already showing fewer 0-days than alternatives did.

So, all three already have real-world, beneficial use. Quite the opposite of his claim that leads one to believe that never happens.

"Sweepingly calling his stance "full of crap" says a thing or two."

In other comments, I did present data and specific examples to reject his claims about UNIX being a usable foundation for this sort of thing among others. His approach to such things was to ignore all counter evidence and/or leave the discussion. Eventually, a new article shows him making an adverti... err, a bunch of claims, some backed up and some assertions, about all kinds of tech in an anti-unikernel rant.

Strange that you're fine with him ignoring his critic's counterpoints... with references... to his major claims but have trouble with how I handle minor, tangential ones. Says a thing or two indeed.

dmpk2k · on Jan 23, 2016

I comment on the things I know about. Strange indeed.

nickpsecurity · on Jan 23, 2016

Did you know that bulletproof, useful STM implementations existed? And that covert channel suppression was a pre-requisite for secure, cloud applications? And that M:N had been used to help handle that? It seemed to me you didn't know these things.

Instead, you agreed with the blanket statement that those two techs were totally useless instead of a more qualified one that they overpromised and couldn't deliver for "legacy" systems whose models didn't fit. I also bet money that Cantrill's product is still vulnerable to covert, timing channels. I'll win by default because they show up every time a resource is shared, that's his business model, and countering them usually requires ditching UNIX or x86. Want to place that bet? ;)

dmpk2k · on Jan 23, 2016

Citations please. This is the second time I'm asking.

If there are bulletproof STM implementations, there should be some papers which back that up (preferably not academic toys). What throughtput and latency profiles did the STM implementations get, how did that scale with cores, what work profiles do they work well with, and how did all that compare with regular threads? How were livelock and side-effects handled? Was this put in production, and what shortcomings were there (there are always shortcomings)? Etc, etc, etc.

Also, reread my initial post more carefully. I qualified it at the very beginning with "in the context of high performance applications”, and I qualified it again near the end; I didn’t simply agree with any sweeping claims. Yet here I am, getting lectured by someone who says that "performance assertion showed he was full of crap"... which I have demonstrated isn't true.

Back your claims up; maybe I'll learn something interesting.

nickpsecurity · on Jan 24, 2016

"Citations please. This is the second time I'm asking."

I was giving you no references, only stuff you could Google, because you opened with an insult. I don't do homework for people that do that. That said, I think I figured out where this really started:

"by someone who says that "performance assertion showed he was full of crap"... which I have demonstrated isn't true"

This assertion wasn't about M:N or STM. Those came way after the performance assertion. In his third paragraph, he tried to dismiss unikernels with a blanket assertion about [basically] how their performance can't be good enough. Sums it up as saving context switches isn't enough, as if that's supporters' key metric. Other commenters wanted actual data on different types of unikernels and alternatives supporting his assertion. Instead of providing that, he says " let’s just say that the performance arguments to be made in favor of unikernels have some well-grounded counter-arguments and move on."

Seriously? He just opened a counter to unikernels by saying their bad performance makes them worthless compared to alternatives like his, gives context-switches as a counterpoint, and then asks the reader to just take his word otherwise that there's a good counter to anything they'd ask? Unreal... How could I not look at it as either arrogant or propaganda piece after that?

That was the performance assertion some others and I we're talking about. That's what I meant when I said "plus" the performance assertion. I can see how it could be read like it was connected to other two. The M:N and STM claim I made came from this sweeping statement he did for show far as I can tell:

"the impracticality of their approach for production systems. As such, they will join transactional memory and the M-to-N scheduling model as"

That was outside his section on performance toward the end after he talked about all kinds issues. So, he just says they were impractical for production systems. Despite being used in production systems beneficially. Not for ultra-high performance, like you said, but delivered on claims in some deployments and not purely a fad. So, he doesn't substantiate his prior claims on HN when asked (with references supporting counters), his opening claim there, and doesn't qualify those two. So, I called him out on all of it with the assumption this is an ego or ad piece more than any collection of solid, technical arguments. Only debugging claim really panned out and only partially.

I think we just got to talking cross-purposes from then on where the discussion conflated what the performance assertion was with my gripe about his overstatemetns on other two technologies. So, I'm dropping that tangent from here on now that I see what happened. Pointless as I don't disagree with your performance assertion about two technologies so much as his unsubstantiated assertion about unikernels and overstatements about two technologies. So, I'm done there.

Far as my original comment, we're still waiting on him to back up that performance assertion he made that unikernels provide no performance advantage despite reports to contrary from users. I'm also waiting for his security claims on OS vs hypervisors as I've seen more robust, even formally-verified, hypervisors than OS's in general with ZERO verified or secure UNIX's. Merely attempts. I've got citations going back to the 70's supporting a minimal-TCB, kernelized model as superior to unrestrained, monolithic kernels in security. Love to see what his counterpoint is on that. Well, counterpoints given the volume and quality of evidence on my side. :)

chris_wot · on Jan 23, 2016

I don't often see you commenting.