CSP is an interesting take on concurrency but lends a degree of opacity to the scheduling of tasks. So far (working in Go) in minor (or well understood patterns) it is a non-issue, but I wonder what would happen at large scale. It is also worth noting that in Go 1 runtime, switching from 1 "processor" to N can certainly alter the behavior of your program e.g. the same precise [1] binary may lead to CSP/Go's version of "deadlock" e.g. "all goroutines are sleep" or not depending on simply 1 bit of difference in the binary. The current runtime also gives rise to some interesting (and very unintuitive) performance differences when hacking with the CSP mechanism e.g. peppering GO sources with unnecessary sleeps, etc. to hack the scheduler.
[1]: well nearly the same. Somewhere in the binary, a call to set maxprocs is present. Same exact program differing only in the param to this call (i.e. 1 vs 2).
We (tinkercad) have actually ended up running once process per core and limiting the number of concurrent OS threads. This gives significantly faster task switching times and is more performant from a GC perspective. Given our system is highly distributed anyway shared memory would mostly be an optimization and not a primary facility.
[1]: well nearly the same. Somewhere in the binary, a call to set maxprocs is present. Same exact program differing only in the param to this call (i.e. 1 vs 2).