Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So generally Epyc come with some multiple of 8 CPUs enabled (1 per chiplet)

Not quite, AMD does use values other than 8 cores-per-CCD in Epyc as well. Take the 7402P, that's a 24C SKU, if you did that as "3 8-core chiplets" then you would only have 3 quadrants = 6 memory channels and 48 PCIe lanes. Those are done with 4 chiplets of 6 cores each. Same for 48C SKUs.

AMD also has a number of "frequency-optimized"/"cache-optimized" SKUs that are like, 4C or even 2C per CCD, with all the cache enabled, to allow maximum frequency and maximum cache-per-thread, for stuff like HFT where there just is no substitute for A Few Threads Going Really Fast. Or playing games to minimize your software license costs, as that is often based on core count.

However, the link is certainly a little bit amorphous. Some Epyc SKUs only have 4 memory channels, but through some magic they can still access all the memory slots like they were a full 8-channel part. I guess that means you have 2 quadrants active (2 memory controllers/PCIe controllers) but they can access the PHYs from the other two disabled memory controllers, not 100% on how that is implemented, but it exists.

https://www.servethehome.com/amd-epyc-7002-rome-cpus-with-ha...

Generally Epyc doesn't like "unbalanced" configurations though (this can incur severe performance penalties, like losing 2/3rds of your memory bandwidth level "severe" if you only populate 3/4 or 6/8 of your sticks) so that gets used very sparingly, and only in situations of "power of 2" resources I'm guessing.

As a general rule of thumb, assume anything Epyc-specific (IO die, IF links, etc) is designed to work with 4 of something, or at least 2 of something. All the "variation" happens at the CCD level. So a 24C is not 3x8C, it is always 4x6C as your baseline assumption (and it is). As mentioned sometimes memory controllers can be 2x but... generally four shall be the number that is counted, and the number of counting shall be four. Thou shalt not count to three, except that thou proceed to four. Five is right out.



Heh, right, I did say generally.

The 7453, 7443, 7413, and 7313p have less than 8 chiplets (the 7xx3 chips are Milan/Zen3). I don't believe any of them have less then full memory bandwidth, unlike the previous generation. The spec sheet mentioned PCIe x 128 for all of them as well.


> The spec sheet mentioned PCIe x 128 for all of them as well.

Due to the way they've sliced it, you always get full PCIe PHYs (128 lanes) just like you get full memory PHYs. They literally only gimped the memory bandwidth, like the controllers are gone but the PHYs remain and the other 4 controllers can use all of the PHYs. It's kinda weird, I don't think I've seen it done like that before.

Incidentally though this probably does mean some weirdness with locality at those extremes though - half of your lanes don't have any CPU cores locally and everything they do is running through the quadrant-interconnect.


"They literally only gimped the memory bandwidth".

As I mentioned looks like all the less than 8 chiplet Epycs in the current Zen3/Milan generation look like they have the full memory bandwidth.


No need to get defensive (on behalf of a multibillion-dollar corporation), I'm just more curious about how exactly they did the quad-channel SKUs in Rome.


Heh, didn't think it was defensive.

The 2nd gen epycs did have chips with reduced memory bandwidth, the 3rd gen didn't. I believe the posted URL from servethehome has a pretty good explanation. My theory is that the 2nd gen Epycs had a bottleneck in the chiplet uplink connections, so that 4 chiplets couldn't manage the full bandwidth. So maybe the 3rd gen increased those links so even 4 chiplets could still handle 100% of the available memory bandwidth.

It does boggle my mine that a $2k apple desktop has 400GB/sec of memory bandwidth and that a $4k apple desktop has 800GB/sec of memory bandwidth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: