Intel plans immersion lab to chill its power-hungry chips

jabl · on May 22, 2022

Can't say I'm terribly excited about another massive scale usage of forever chemicals, aka flourocarbons. Didn't Intel get the memo, we're trying to reduce usage of these (see e.g. the EU F-gas regulations) not increase.

There's a coolant that's widely used, non-toxic, environmentally benign, cheap, abundant and non-flammable. Yeah, water. So it's not dielectric so needs some engineering. But humanity has a decent track record of building systems with pipes, hoses, heat exchangers and so forth. Same can't be said for cleaning up Superfund sites.

hristov · on May 23, 2022

The article did not say they would be using flourocarbons. They probably have not decided on what to use. So lets not judge them prematurely. Furthermore not all flourocarbons are toxic or environmentally dangerous.

But most importantly, they seem to be pricy. If this stuff is going to be successful in the datacenter industry, a lot of it will be required. So intel will likely go with the cheaper option -- mineral oil.

antisthenes · on May 22, 2022

> Same can't be said for cleaning up Superfund sites.

It's a pretty huge leap to go from "closed loop CFC cooling system for a computer" to "superfund sites".

What am I missing? If we're building a system with pipes and heat exchangers, why can't the coolant be a low-impact CFC rather than water? It's not a system where you just vent those cooling liquids in the atmosphere.

BenoitP · on May 22, 2022

Some of these CFC are almost eternal compounds. They are so stable that no natural light frequency from the sun can break them apart.

Sulfur hexafluoride, used in high voltage circuit breakers, has a half-life of 3200 years, has a global warming potential 22800 times that of CO2.

So you don't want to vent them, but any accident/leak can be considered a catastrophe.

That's just the physics of it: highly dielectric + stable often makes for a big greenhouse gas offender.

moralestapia · on May 22, 2022

Yup, at least Superfund is contained.

This is much worse and the scale the computer consumer market is huge.

2muchcoffeeman · on May 23, 2022

Could you use the heat? Like to pasteurise milk or something?

tlb · on May 23, 2022

Whatever you’re heating up has to be a little cooler than the coolest chips in your system, because heat only transfers that way.

Semiconductor lifetime goes down with temperature, roughly doubling failure rate for every 10C increase.

So if you want to heat something, it can only be to a low temperature. Heating buildings can work, but pasteurization requires temperatures near 100 C which reduces chip lifetime too much to be economical.

robocat · on May 23, 2022

From a quick internet skim, fluorocarbons are uncommon, they are used for two phase cooling (boiling and condensing). Single-phase immersion cooling is far more common e.g. Electrosafe (used in Intel’s 2012 tests [1]) is a hydrocarbon fluid.

Quote from googling: “Single phase immersion cooling fluids can come under several categories which include: hydrofluoroethers, hydrocarbons, silicon oils and water/glycol. Single phase immersion cooling has benefits over 2 phase immersion cooling, in that they tend to be less expensive both due to the liquid itself and the system used to contain them. The ease of implementation was highlighted by Varma [132] who compared Novec 7000, a 2-phase fluorocarbon-based fluid, to GRC Electrosafe, a single phase hydrocarbon based fluid”

[1] https://www.datacenterknowledge.com/archives/2012/09/04/inte...

tremon · on May 22, 2022

Fresh (potable) water is going to be a precious resource too, and salt water is probably out of the question for its corrosiveness. So I'm not sure replacing fluorocarbons with water will be any better. Aren't there other liquids we can explore?

jeromegv · on May 22, 2022

How much water do you really need to cool a computer? Isn’t that negligeable?

kadoban · on May 23, 2022

_maybe_ a liter a year, for consumer water loops. Yeah, there's about a thousand things I'd worry about before the water usage of cooling computers.

hoseja · on May 23, 2022

Hey, plastic straws are also negligible and pearl-clutchers made them an issue too.

javajosh · on May 23, 2022

Salt water is already routinely used to cool something: engines in boats. The key component is the heat exchanger that uses the salt water to cool the fresh water circulating through the engine itself. I don't see why you couldn't do something similar for a computer.

latchkey · on May 23, 2022

I run a number of large datacenters. The reason why most of the worlds datacenters just have air cooling is because it is simple, easy to deploy and easy to maintain. As soon as you complicate cooling systems on a large scale, costs go up and so do failures. These are neat experiments, but you won't see it deployed in any meaningful way.

atty · on May 23, 2022

Im an outsider to data center operations, but I work in machine learning. My suspicion is that this will be driven by either the next-Gen or next-to-next-Gen GPUs/AI accelerators. It’s the curse of the field that no matter how fast the chips get, and how much memory bandwidth they get, it’s never fast enough for the state-of-the-art research. So it’s probably naturally going to require some form of liquid cooling as the chips continue to get bigger and faster. And companies that build their business around machine learning will need to adopt these deployment patterns to stay on the cutting edge.

bertday · on May 23, 2022

TPUv3 was supposed to use liquid cooling. Not sure if it actually is.

selectodude · on May 23, 2022

https://en.wikipedia.org/wiki/Tensor_Processing_Unit#/media/...

Sure seems to.

nix23 · on May 23, 2022

Yep, that's why the new "small" IBM-Mainframes are such hit (no external water-cooling anymore, just air, and the standard-rack size)

azinman2 · on May 22, 2022

Shouldn’t they instead figure out how to run cooler and with less power in general? That’s where it seems everyone else is going…

icegreentea2 · on May 22, 2022

Intel (and everyone else) do work on improving compute efficiency.

But as the article points out, if 40% of your DC's power consumption is in cooling, then you'd be foolish not to target that slice.

Liquid and immersion cooling allows higher power density, which all things being equal (I know there's a lot of heavy lifting being done by this...) will be preferred. Why distribute your components over a rack if you could fit it into a single 4U board? Why distribute your components over an aisle if you could fit it into a rack?

lumost · on May 22, 2022

The other advantage of power density is that it creates stronger convection currents. Whereas data centers have traditionally been actively cooled, it’s not unreasonable to imagine open air dcs with air channels to support convection.

walrus01 · on May 22, 2022

it's sort of been done, though there are still a lot of active fans to move the hot air.

https://www.google.com/search?channel=fs&client=ubuntu&q=chi...

booi · on May 22, 2022

I need… more powahhh…

No but seriously why don’t they build a 128-core atom server. That’s really all anybody wants. I don’t need the fastest most immersed cpu ever, just a bunch of decent ones at 30W or less.

tyrfing · on May 22, 2022

128 cores at 30 watts isn't something I've seen anyone planning. What's more likely is 128 cores at 300-400+ watts, and scaling from there is most likely to increase power usage and core counts. Bergamo (AMD), Graviton (AWS), Sierra Forest (Intel), Grace (NVIDIA) are all going for that.

30 watts is low power mobile and "edge" compute.

booi · on May 23, 2022

You’re definitely right. I meant 30W per cpu or less which usually means maybe 4-8 cpus per host or ~300W. I did mean to compare it to Graviton and will definitely look at the other contenders in this space.

tyrfing · on May 23, 2022

Oh OK. Recent server CPUs (Ice Lake) are roughly 8-12W (maybe up to 15W, there are a lot of SKUs) as a baseline. AMD's latest Milan generation brings it down to 3.5W/core, with some closer to 5W; Intel's Sapphire Rapids only goes down to ~ 6W.

After M1 and Graviton, there's a big rush for making denser server CPUs, pretty much bifurcating the market. Much more powerful than old Atom cores, but more power-efficient than regular ones, and targeted mostly at hyperscalers. AMD should have a dense variant of Zen 4 out next year, going up to 128 cores and reducing per-core power significantly. Intel will likely have a competitor out in 2024, and AMD might hit 256 cores then, although rumors of things that far out aren't very specific. On the ARM side of things, NVIDIA is launching CPUs like this next year. Qualcomm bought Nuvia, which was working on similar server CPUs, but probably switched focus to PCs. There are some other oddballs like Tenstorrent as well.

One interesting part with pushing per-core power so low is that communication between cores starts to be a serious part of total power consumption.

tadfisher · on May 22, 2022

That was Larrabee/Xeon Phi, was it not? Discontinued for lack of sales.

sseagull · on May 22, 2022

That’s basically what I thought. However, IIRC they were marketed more towards high-performance computing (with avx-512).

They were an uncomfortable middle ground though, between normal CPUs and GPUs. My benchmarks showed that there wasn’t much of an advantage over 20-ish normal xeon cores (for my HPC workloads).

(Memory is a little fuzzy - that was 4-6 years ago).

glowingly · on May 22, 2022

While not exactly what you are looking for, Intel Snow Ridge is a continuation of their Atom-based (next to their line of Core-based) networking processors. 8-24 cores.

https://www.intel.com/content/www/us/en/products/details/pro...

Though, unless if you 100% need X86, there is the Ampere Altra 128 core Cortex-N1 chip.

benbenolson · on May 23, 2022

That's almost exactly what the Xeon Phi was, both in coprocessor form and in machine form. It included MCDRAM, too, which was a high-bandwidth memory. Very efficient, but lacked sales; people didn't want it.

Part of that, I think, was lack of parallelism in applications: in order to fully take advantage of those cores, you need to have a nearly-embarrassingly-parallel problem. Otherwise, you're not going to get the performance that you'd expect (but you'll get power efficiency!).

The fact of the matter is, that's not "all anybody wants."

adgjlsfhk1 · on May 23, 2022

also, of you're problem is embarrassingly parallel, you can probably run it on a GPU and it's will be faster.

Dylan16807 · on May 23, 2022

Atoms generally use 3-5 watts per core. So the biggest CPUs they're making are on par with atoms.

They keep designing chips that use less and less per core. Why not cram as many cores as possible into a single server? Lower density means more wasted material, and once you go below 2GHz you're not saving very much power any more.

c_o_n_v_e_x · on May 23, 2022

> Why not cram as many cores as possible into a single server?

Manufacturing yields

Dylan16807 · on May 23, 2022

That's why you limit die size to a certain extent. But that's not a limiting factor on cores per server. You can cram many more 150mm² dies into your server's sockets than you can actually feed with power and cooling and RAM.

xxs · on May 23, 2022

A major issue is the cache coherency. If you dont have coherency, effectively you have an architecture that just multiple distinct servers.

Dylan16807 · on May 23, 2022

You can run coherency protocols across dies. How many cores you decide to make coherent is largely unrelated to how your die yields work out, as far as I know.

Not that it matters either way for this discussion? Whether it's "one server" in that 2U box, or "twenty servers" in that box, you're still shoving in lots of cores and lots of watts to get that density up.

matja · on May 22, 2022

AMD's plan with EPYC Zen 4 "Bergamo", with up to 128 Zen 4c cores.

astrange · on May 22, 2022

That’d be a pretty unbalanced architecture - may use less rack space than 128 servers, but with only one servers’ worth of IO, network, PSUs, it’d be less reliable and maybe not even faster.

booi · on May 23, 2022

A single AWS graviton server is roughly 128 cores with, I believe, 2x or 4x 40gbit networking. It’s not crazy really.

Dylan16807 · on May 23, 2022

One core is probably not pushing more than 100mbit on average. Scaling that much networking up a hundredfold is child's play.

voldacar · on May 22, 2022

There is only so much computation you can do per watt on current process nodes. To increase our computation per chip, which is the goal, we need to increase the amount of watts we consume per chip. The goal should be to make more powerful processors, not ones that do the same with less power.

temac · on May 22, 2022

> we need to increase the amount of watts we consume per chip.

Not sure we need that, except in niches. At scale you often want at least some efficiency, which is certainly not max TDP per core (because the best efficiency point is with lower frequencies and higher width, not the max freq you can achieve). So remains the question of large number of cores, but at some point the area of silicon also goes stupid high. And you can put multiple packages, without sacrificing overall system density too much, and without departing from simpler, and probably lower TCO pollution.

For small systems it depends, but you actually often have even more limited thermal budget, except again in niches if you are ready to tolerate the drawbacks (stupid power req it even becomes hard to have just a few machines on a basic electrical network in standard homes or offices, high noise under load, obviously high TDP so heating up a lot). But you have less space constraints so if you really want absurd systems you already can.

So do we really need to e.g. double or triple the (electrical/thermal) power density at scale? Do we need 2 kW chips? Do we need to sacrifice the efficiency now, and increase the nominal consumption now, instead of waiting just a few years for node improvements? (And I could even ask: do we really need that much increase of processing power, shouldn't we start to optimise for the total ecological cost instead? and I've not tried to do some prospective in that area but maybe this would mean slowing down the processing power growth...)

voldacar · on May 23, 2022

>So do we really need to e.g. double or triple the (electrical/thermal) power density at scale?

Will we ever be able to double or triple our (general purpose) compute per cpu any other way? Moore's law is essentially over. Node improvements aren't really happening outside of TSMC, which doesn't have enough manufacturing ability to supply everyone, and even then those node improvements are getting more and more incremental.

And regarding power consumption, I think we really need to be consuming more energy across most sectors of human activity. The most likely explanation for the "great stagnation" is that our energy consumption has basically flatlined since the 70s. It appears that on a civilizational scale, reaching greater levels of development and expression simply requires more Joules. If you disagree, I highly recommend the book Where is my Flying Car?

adgjlsfhk1 · on May 23, 2022

since the 70s, lights have gotten about 6x more energy efficient, boats and planes are about 2x, and computers are incomparibly more efficient. also, it's pretty clear that before we significantly increase energy consumption, we need to be able to do so without wrecking the environment. massive flooding in major cities and huge refuge crises from desertification are not conducive to progress.

temac · on May 23, 2022

I agree with you, and:

> computers are incomparibly more efficient

Yet to be clear the total world energy consumption for computers has increased, which is sad because we could certainly cope with 1/2 of the current total speed capacity but way more efficiency. Trying to get very high TDP chips and/or density is likely going in the other direction (but I could be wrong for the datacenter).

voldacar · on May 23, 2022

Carbon capture is extremely energy intensive and will never work without significantly increasing our energy production. The only way out of the climate crisis is to massively increase, not decrease, the amount of energy available to us

adgjlsfhk1 · on May 23, 2022

Given that solar and wind are by far the cheapest type of carbon neutral power to deploy and they are already being deployed really quickly, how do you propose doing that?

voldacar · on May 23, 2022

Ideally, fossil fuels should be used in the near term to bootstrap fission and fusion. Needless to say, this would only be possible with complete destruction of the NRC and other entrenched anti-nuclear forces, so it won't happen of course. Note that fission alone is sufficient to supply our current energy needs for any forseeable amount of time.[0] However, getting back on the Henry Adams curve will eventually require moving beyond fission and creating a Dyson swarm. This Dyson swarm can then be used to bootstrap the Caplan engine[1] which slowly disassembles the sun (interestingly, this actually prolongs the sun's lifespan) and uses its matter for a more efficient form of fusion, compared to the wasteful main sequence nuclear process.

This future probably won't happen, but it should.

[0] https://whatisnuclear.com/blog/2020-10-28-nuclear-energy-is-...

[1] https://www.sciencedirect.com/science/article/abs/pii/S00945...

adgjlsfhk1 · on May 24, 2022

So your short term plan is to use fossil fuel to run carbon capture? That's the stupidest thing I've heard in a while.

voldacar · on May 24, 2022

No, read my comment again.

dev_tty01 · on May 23, 2022

Absolutely. This is the wrong direction. Lots of power to run the chips, then lots more power to cool them. The ARM server market is eventually going to eat them alive if they can't rearchitect and seriously up their performance/watt game. I hope this is just an intermediate term solution.

imtringued · on May 23, 2022

What ARM server market? There is at best one ARM server chip per hyperscaler and most have zero ARM server chips. Conventional ARM server efforts failed because the ARM server vendors don't want to talk to mortals, only a very limited set of customers of which most intend to design their own chips and obsolete them. The amount of companies that failed to make their ARM servers available to the public is astonishing.

Linus Torvalds said that ARM needs to be widespread on the desktop, because you need a critical mass of developers targeting ARM. ARM server vendors don't want that critical mass, they want special deals with a handful of big companies which is obviously doomed to fail.

chroem- · on May 22, 2022

This is why I think silicon carbide based chips are going to be a huge deal: you can run them hot enough that you can actually run a heat engine off of the chip's waste heat to recuperate some of the electricity you spent on computation. Now if only I could figure out how to invest in companies developing this technology...

AtlasBarfed · on May 22, 2022

Wasn't the original research push for CVD diamonds for CPUs so you could run them 10x hotter?

picture · on May 22, 2022

Doesn't silicon carbide have unusual properties for making digital circuits with? SiC diodes have forward voltages higher than regular Si, for example. And additionally, I'm not sure if you can get the same performance on SiC at super high temperature, as even the metalization need to be specialized to handle the heat without too much resistive voltage drop, etc

shrubble · on May 22, 2022

I remember reading with surprise that the Motorola 68040 CPU, which was competing with the Intel 486, could have run hotter but at a faster coock speed -- but Motorola didn't want to specify the use of a heat sink. Seems like quite a change!

wincy · on May 22, 2022

Uhh, I think you made a typo that made your post quite phallic.

ece · on May 22, 2022

What is the maximum performance % difference between optimizing for perf/$ and perf/watt? Sure, there are wafer scale chips now, but the TDP for a phone is still ~5W watts, average laptops have gone from ~15 to ~30W, and desktops from ~300-600W+. I suppose with Zen 4, there might actually be an apples to apples comparison baring ISA and uncore differences. If ADL is anything to go by I imagine performance will be within ~15% of each other, but with a ~30% price difference if you care about a more efficient and cooler running chip. Sure the efficiency gains add up, but so do the performance gains on the other side.

tlb · on May 22, 2022

It can be a lot. Speculation requires executing operations before you're sure they'll be needed, which can double or triple power draw in order to increase instruction-level parallelism. And all the machinery needed to enable speculation, like branch predictors, draws power too.

This graph shows a factor of 100 between the highest-performing and most-efficient systems: https://en.wikipedia.org/wiki/Performance_per_watt#Examples

ece · on May 23, 2022

For general purpose compute relying more on ILP than DLP, I would think the range is going to be narrower. Narrow enough to be a meaningful choice for consumers using battery powered devices and wanting more efficient or higher performance laptops and desktops. For servers, it seems like Intel and AMD are choosing optimizing more for perf/$ over perf/watt in their roadmaps. HPC roadmaps might be different still.

wmf · on May 22, 2022

What is the maximum performance % difference between optimizing for perf/$ and perf/watt?

Alder Lake and M1 Pro are good demonstrations of those two approaches.

ece · on May 23, 2022

For a crude comparison, the new framework with Alder Lake is ~20-30% less than the 14" MBPs with M1 Pro. If the uncore becomes more important, maybe one approach will take precedence. It's too bad Apple Silicon isn't as well documented (ie. AMX) as Intel/AMD chips are.

rbanffy · on May 23, 2022

If only they could be as beautiful as the Cray 2 was (and Cray 3 would be)...

As a note, I went down the single-phase immersion cooling rabbit hole (and designed a C-shaped tank for the build) but gave up when I read the instructions for disposal of the coolant (3M's Novec) and found them a little bit too complicated for my taste. I wouldn't want a fish tank of that on my living room.

aj7 · on May 22, 2022

Tik tok is covered with these videos. I saw one with an entire server rack in a tank. https://www.tiktok.com/t/ZTdnJ8Yco/?k=1

dmix · on May 23, 2022

I was also exposed to this on Tiktok. The RGB trend continues, now with fish tanks...

LegitShady · on May 22, 2022

LTT had their mineral oil pc videos...7 years ago

Gravityloss · on May 23, 2022

There was a research project that sent some liquid to some chip and the gates could get chemical energy from it. Then the liquid is circulated back and "recharged" and sent to the chip again. So in theory a chip could be liquid powered. The same liquid could also do cooling at the same time.

Electricity by wire is of course much more convenient to use but I still wonder could there be a use cases for this.

Unfortunately it's hard to find it again.

eternityforest · on May 22, 2022

I wonder what other options there are for cooling.

What's wrong with water and cooling blocks? I'm sure they could develop some quick connect hardware, paired with sensors and valves, so that any leaks could be auto-stopped.

You could build the connectors such that pressing and holding the release button causes the whole loop to drain by suction, for near zero dripping as long as you wait a few seconds first.

initplus · on May 22, 2022

There are quick release water cooling designs for the data center. But water cooling has many issues: the loop needs to be perfectly sealed as water is conductive and will instantly destroy your very expensive servers in the case of a fault. Pumps are a moving part and are prone to failure. Everything needs to be manufactured to precise tolerances.

Meanwhile immersion cooling is the opposite - Just build a tank and drop your stuff in it. Modern immersion cooling fluids aren't like mineral oil so you don't end up with components permanently coated in oil.

eternityforest · on May 22, 2022

At most, I would think it would destroy one server if there was major defect. But solenoid valves are pretty reliable. It's easy to imagine some absorbant material as a fallback, a negative pressure loop, and pressure/humidity sensors that could shut everything down within seconds of a minor leak.

There's no pressure to push anything off it's connector, and in the event of anything failing and disconnecting in a major way, the loop starts draining the other way by atmospheric pressure.

Water pumps are pretty cheap, you could have triple redundant pumps per rack for an irrelevant cost.

Air pumps are also cheap. You could even enclose your connections to heat blocks in a second layer of slight negative pressure, that would both contain any drop-an-hour leaks that somehow happened, and funnel it to a humidity sensor.

If they use anything like Fluorinert, I'm pretty sure a whole rack of this kind of tech could cost less than a single gallon of that stuff.

I'm actually kind of surprised consumer water cooling setups aren't like this. It might be a pretty fun project, and you could probably use 3D printed parts depending on how much you trust the active controls.

tlb · on May 23, 2022

You should try building it. There’s a big market if you can make it reliable.

Keep in mind there are other hot chips besides the CPU. Memory, network interfaces, and power supplies all need cooling too. Most systems are hybrid: water for the few hottest chips and air for everything else.

eternityforest · on May 23, 2022

Honestly I probably would, but it would probably cost $100-$250 at minimum to prototype, and it seems rather difficult to find a company that would be interested in the design.

I assume gaming PCs are where you'd start?

I've never actually had a gaming desktop or anything similar, so I'm not sure what that market is like or if anyone would pay several hundred dollars for one, or if any of those companies would ever hire someone for that without a background in this stuff.

I'm mostly in embedded controls and prop building, and have lots of incidental experience with water handling and compressed air, but I've never done anything related to data centers or anything significant with desktops.

xxs · on May 23, 2022

>I've never actually had a gaming desktop

The waterblock, just acrylic with some fittings, alone is in the 130 euro[0] retail prices. So, there is some market for enthusiast but it's not large, of course.

[0]. https://www.ekwb.com/shop/water-blocks/cpu-blocks/velocity2

imtringued · on May 23, 2022

It would be like optical fiber cables. Unless you hire a highly specialized technician with the right hardware you are not going to crimp your own optic fibers. You just buy factory crimped cables of the right length.

In this case you would need to sell the whole kit already sealed by the factory to guarantee the absence of operator error. Perhaps the OEM could install the cooling system from the get go with high quality quick connect ports integrated into the case.

Don't forget to polarize the damn connectors so that you can't mix up the inlet and outlet ports.

eternityforest · on May 23, 2022

Yeah, that's what I was thinking. It's easy enough to do it yourself(Barb fittings are probably fine since this is slight negative pressure), but everyone would probably prefer not to risk the more-than-i-make-in-three-years server to a non-specialist.

I was thinking to not even use separate inlets and outlets at all. The connectors would be 3-port in/out/vacuum air, with the vacuum port meant to be very slightly leaky.

You'd wrap the whole entire connector in a silicone sleeve, so that any small leaks in the connector(If the negative pressure on the water itself failed) would be contained and sucked back through the vacuum hose and trigger shutdown.

For a prototype you'd just 3D print some connectors and wrap them with tape, and let a mechanical engineer figure it out if your ever get to production, since poor quality, leaky parts are a good thing in a prototype, where the goal is to prove that the server stays dry even when things fail.

zmgsabst · on May 22, 2022

Dumb question, but why don’t we boil nitrogen for cooling?

That seems to have immersion properties, without having the same toxicity.

wmf · on May 22, 2022

Sub-ambient cooling has problems with condensation and thermal stress. With larger amounts of liquid nitrogen there are concerns about leaks causing asphyxiation. And of course there's the power needed to condense the nitrogen again.

fomine3 · on May 23, 2022

Clickbait title?