Intel 80386, a Revolutionary CPU

johnklos · on Nov 6, 2023

I think the 80386's final design benefitted tremendously from the Motorola 68000, then the m68020. Had Motorola not released a proper 32 bit CPU without compromises, it could be argued that Intel would've had yet another stop-gap after the 80286, which itself wasn't intended to be a proper successor to the 8086/8088.

As it is, the 80386 came with a number of compromises. For instance, there was no cache at all beyond a 16 byte instruction prefetch queue, whereas the m68020 had 256 bytes of instruction cache. There were no atomic instructions (LOCK wasn't useful for this), which is why many modern OSes support the 80486 but not the 80386. The fact that compatibility with the 8086 required real mode or VM86 meant that it took quite a long time before software started taking advantage of the 80386's new features.

It was an important chip, but it showed us early signs of what we've come to expect from Intel: attempts to create other markets at the expense of, or with the express desire to not compete with, the x86 (the iAPX 432 then, the Itanic twenty years later), the slapdash addition of "features", such as the additions to the 80286, which then were required to be included forevermore as legacy support, the rushing-to-catch-up when other vendors had features that everyone wanted (real, flat 32 bit support then, 64 bit support twenty years later).

Still, it's interesting history!

flashback2199 · on Nov 6, 2023

It was a 386 that Linus Torvalds wrote the first Linux kernel on, and support for the new features of the 386 from the start was one of the reasons Linux took off instantly.

flashback2199 · on Nov 6, 2023

More info:

"It uses every conceivable feature of the 386 I could find, as it was also a project to teach me about the 386"

https://www.cs.cmu.edu/~awb/linux.history.html

monocasa · on Nov 6, 2023

Which is probably one reason why he worked at Transmeta for a while.

fennecfoxy · on Nov 6, 2023

I'm sorry, but I had to. "Could someone please try to finger me from overseas".

rob74 · on Nov 6, 2023

Yeah, a free (as in freedom, but also as in beer) OS running on hardware that was, by then, commoditized, what's there not to like? No wonder Google and others jumped on it...

sertsa · on Nov 6, 2023

Indeed! My first taste of Linux was Slackware on a 386

steve1977 · on Nov 6, 2023

Not only Linux, all the current BSDs share a lot of “DNA” with 386BSD.

zozbot234 · on Nov 6, 2023

> There were no atomic instructions (/LOCK wasn't useful for this), which is why many modern OSes support the 80486 but not the 80386.

Does the lack of atomics really matter, given that there was (AIUI) no SMP on the 386? You can always disable interrupts to make your operation 'atomic' in a uniprocessor context.

adrian_b · on Nov 6, 2023

Already starting with 8086 all Intel x86 CPUs have been intended to be usable in SMP systems.

Nevertheless, there have been very few SMP systems using early Intel CPUs, before 80486, mainly because those CPUs were still too weak in comparison with the contemporaneous mini-computers, and not even SMP would have made them competitive in performance, while the high price of SMP would have been incompatible with personal computers.

Intel 80486 has been much more frequently used in SMP systems, not only because it had added the more convenient atomic fetch-and-add and compare-and-swap instructions, but because Intel had also provided for 80486 an APIC integrated circuit, i.e. a multi-processor interrupt controller, and also because 80486 was very fast and a SMP system using it was competitive with much more expensive computers.

Intel 8086 was intended to be used in SMP systems by using the atomic swap instruction proposed by Dijkstra, i.e. LOCK XCHG with the Intel mnemonics.

The atomic swap is good enough to implement any kind of concurrent programs, albeit at a lower performance and higher complexity than when the atomic instructions added by 80486 (LOCK XADD and LOCK CMPXCHG) and Pentium (LOCK CMPXCHG8B) are available.

The 80386 has added atomic fetch-and-modify-bit instructions, which remain useful even today in wait-for-multiple-event scenarios (together with LZCNT, which can be used to get the highest-priority event that must be serviced).

gpderetta · on Nov 6, 2023

I don't think exchange is enough to implement all (or even most) lock-free/wait-free algorithms. CAS (i.e. lock cmpxchg), or an equivalent primitive in power, is needed.

XCHG is enough to implement a mutex though, which is what most applications need.

adrian_b · on Nov 6, 2023

Lock-free and wait-free algorithms are not necessary for any computing task.

They are only a performance enhancement and they indeed require either compare-and-swap (simple and double) from the IBM System/370 (1973) (later used by Motorola MC68020, then simple compare-and-swap was added to 486 and double compare-and-swap to Pentium) or load-locked + store-conditional from the S-1 Advanced Architecture Processor (LLNL, 1987) (later used by MIPS, then by POWER, ARM and others).

Moreover, even for improving the performance the lock-free and wait-free algorithms must be used with care, because they are based on optimistic assumptions that may fail to be true in many scenarios with heavily contended shared resources, when the performance of the algorithms based on mutual exclusion is actually higher and more predictable (with mutual exclusion it is easy to guarantee FIFO access to a shared resource, which ensures that the wait times are bounded and that at any moment there is at least one thread that does useful work instead of retrying failed accesses to the shared resource).

gpderetta · on Nov 6, 2023

Well you mention that you can implement all programs, I just wanted to clarify that there are indeed some algorithms that cannot be implemented.

In any case this has nothing to do with performance, you might need lock-free algorithms for correctness when implementing some real-time systems or when you need code to be reentrant.

adrian_b · on Nov 6, 2023

I am curious to hear an example where a lock-free algorithm is needed for correctness, because I have never encountered any such case and this does not seem possible.

Access to any kind of shared resource is always correct when only a single thread can access it.

With mutual exclusion it is very easy to guarantee correctness due to serialized accesses. With lock-free algorithms concurrent accesses are possible and the algorithms must be carefully analyzed to demonstrate their correctness.

Moreover, exactly in real-time systems is where lock-free algorithms are undesirable. Any pure lock-free algorithm must detect a transaction failure and retry it. There is no guarantee of success and no limit for the number of retries, so any hard real-time deadlines can be missed. Lock-free algorithms can be used in real-time systems only if they detect too many retries and then they fall back to lock-based algorithms before it is too late.

The lock-free algorithms improve only the execution time of the typical case, but they increase the execution time for the worst case. For non-real-time applications the typical performance is more important, so lock-free algorithms are good, but for real-time applications the worst-case performance is the most important, which makes lock-free a.k.a. optimistic algorithms bad.

Also neither mutual exclusion nor lock-free/wait-free algorithms have any problem with reentrancy when implemented correctly. Problems with reentrancy appear only in programs where there are mutable variables that are shared between threads and which should not have been shared (like when using some of the old standard C library functions).

It is very common to have a very large number of threads that use reentrantly the same code that implements mutual exclusion for accessing some shared resource that is guarded by a lock.

gpderetta · on Nov 6, 2023

Again, nothing to do with performance.

For example an interrupt handler (or a signal handler) that needs to modify some shared resource. It can't take a mutex or even a spin lock because it might be owned by the thread it just interrupted. There are ways around that of course, for example by having threads disable interrupts inside critical sections, but that's not always appropriate.

Similarly, for realtime systems, if you have threads with different priorities accessing the same data, to avoid deadlocks you either need mutexes with priority inversion (which has its own share of issues) or you use lock free code.

edit: in any case the point isn't that there are better ways to write a program. The point is that if you have a program that uses a CAS-based lock-free algorithm, porting it to 386 it is not just a matter of paying a performance penality, you might need to rewrite it to preserve correctness.

edit2: > There is no guarantee of success and no limit for the number of retries, so any hard real-time deadlines can be missed. Lock-free algorithms can be used in real-time systems only if they detect too many retries and then they fall back to lock-based algorithms before it is too late

wait-free algos have guaranteed bounds.

adrian_b · on Nov 6, 2023

An interrupt handler may use neither lock-based methods nor lock-free methods, because it may not remain stuck in a wait loop as required by the former and it may not use a CAS instrution as required by the latter, because any CAS instruction must be retried when it fails.

Therefore an interrupt handler must always own whatever data structures it writes into, so it may write them at any time.

Lock-free methods are not used inside interrupt handlers, but they are used by the code that reads what the interrupt handlers write. However this is the special case of single writer with one or more readers and this special case of lock-free access does not need compare-and-swap instructions or equivalents.

For some particular cases, like counters that are updated by interrupt handlers, the readers can detect corrupt values and retry the reading, while for other cases, when the interrupt handler updates a more complex data structure, that can be guarded, for example, by a counter that is incremented both before and after the updating, so that the readers may be able to detect when they have to retry the reading.

This special case of lock-free access does not need compare-and-swap, but, depending on the CPU memory access model, it may need store barriers a.k.a. store fences and load barriers a.k.a. load fences. Such lock-free access was trivial to implement on Intel 8086 or even earlier CPUs, because it does not even need atomic read-modify-write instructions.

Normally, when lock-free algorithms are discussed, it is assumed that there are multiple writers, when different algorithms are needed than for the single-writer case, and when CAS instructions or equivalents are needed.

There is no need for lock-free algorithms for avoiding deadlocks. The obvious solution to avoid deadlocks is to use a single lock protecting all shared data. Lock-free algorithms may provide a much better performance, but they are not necessary.

Moreover, even without lock-free methods, deadlocks may be avoided in most cases by reorganizing the shared data. Any program where it is necessary at any point to hold multiple locks, is suspect of having a bad data organization, because this should not normally happen.

You are right that wait-free algorithms by definition have guaranteed bounds, but unfortunately they are seldom applicable, otherwise concurrent programming would have been much easier.

cesarb · on Nov 6, 2023

> i.e. LOCK XCHG with the Intel mnemonics.

Didn't all XCHG with memory operands have an implicit LOCK prefix on Intel 8086?

adrian_b · on Nov 6, 2023

Only since Intel 80286 (1982).

On Intel 8086/8088 and 80186/80188 an explicit LOCK prefix is required.

jacquesm · on Nov 6, 2023

Disabling interrupts is a privileged operation (you need IOPL for the ability to execute the cli or sti instructions). Atomics can work even outside of privileged code.

Otherwise you could from any user program disable interrupts and give the operating system no chance to take back control. Xadd, cmpxchg, bts, btr and btc are all prefixable with lock to make them atomic.

jhoechtl · on Nov 6, 2023

cli

jmp far $-4

kristopolous · on Nov 5, 2023

They were in big trouble then. The entire company was riding on it being great.

It could have easily gone awry as it did for Data General, Honeywell, CDC, AST, Tandy, Olivetti, Xerox, DEC Rainbow, AT&T Hobbit, Wang 2200 and Unisys. Strong survivorship bias on this one. Most of the once Titans are in or near the dustbin now, such as SDS, SDC and Fairchild.

Intel's history was primarily as a memory manufacturer. They're arguably near a similar fucked position now - effectively 0% of the mobile and home appliance market and getting slaughtered in their only remaining stronghold by NVIDIA, AMD and ARM pillaging their castle. Hopefully they'll squeeze out of this one.

scrlk · on Nov 5, 2023

Andy Grove moving Intel from memory to microprocessors was an excellent strategic move.

> "Business success contains the seeds of its own destruction. Success breeds complacency. Complacency breeds failure. Only the paranoid survive."

Something that Intel's leaders didn't pay attention to during the 2010s.

ngcc_hk · on Nov 6, 2023

There is even a story plus a term for it.

The story is about what if we all go out off this room and come back like a new guy. What would we do?

"strategic inflection point" ...

They decide to move from a memory company to a CPU company.

No access to Nikon but being just a Leica len maker for Canon (and famously found by DDD and used in Korea War), then move to nikon rangefinder (but strangely follow Zeiss approach more). Then jump ship to SLR by inventing Nikon F (and used in Vietnam War). It is just crazy move, but at least it is related. And "survive" and move on.

Or just like Fujifilm move to cosmetic in one stage as making film has some knowledge that can be re-used there. And survive and move on.

Laforet · on Nov 6, 2023

Nikon started off as a defense contractor making rifle scopes and artillery sights for the military. Camera lens was a side business that took over after the war.

While no longer on the cutting edge, they still have a significant presence in the semiconductor lithography business. So they will survive in one way or another even if they stop making cameras one day.

WalterBright · on Nov 6, 2023

The DEC Rainbow would have been fine if they didn't do things like require special floppy disks that cost $5 each.

DEC should have packaged the LSI-11 into a consumer machine. They had all the software, which was top shelf.

I had an H-11, it was a great machine.

dboreham · on Nov 6, 2023

I'm a big pdp-11 fan but this wouldn't have worked because 128K address space isn't enough even in 1975. Listen to Dave Cutler talk about this in the 3h long interview.

clausecker · on Nov 6, 2023

The PDP-11 had virtual memory in later models and could address more real memory than the 8086 afaik.

JPLeRouzic · on Nov 6, 2023

Are you sure the PDP-11 has virtual memory in later model?

They improved memory space because QBUS went from 64K (16 bits) to 256K (18 bits) to 4M (22 bits) but always usable only within a 64K window.

But I don't think it has way to resume instruction execution when encountering a memory trap.

You must be thinking of the VAX-11 where virtual memory was an explicit goal in its architecture (hence its name).

https://en.wikipedia.org/wiki/VAX

clausecker · on Nov 6, 2023

The PDP-11 has virtual memory with 8k pages. There are four address spaces of 64k each, one each for supervisor/kernel/user and for each of these text/data. This was extensively used with e.g. 2.11BSD to cram programs into the address space that are way too large for what the PDP-11 can do. And yes, that includes the ability to resume on memory trap.

You can read more on the Gunkies site: http://gunkies.org/wiki/PDP-11_Memory_Management

Aloha · on Nov 6, 2023

Thats an interesting alternative history thought experiment.

I wonder how early one could capture the complexity of the full PDP-11 microarchitecture on a single chip? Would it have been affordable? What about the support hardware?

nxobject · on Nov 6, 2023

Hmm – the smallest, most highly-integrated PDP-11 (-compatible) package ever made was the QFP 1806VM2, with around ~135k transistors; it integrated MMU, UART, parallel interface, keyboard controller, etc, but did floating-point instructions in interpretive microcode.

I think that same transistor count was reached by Motorola on the 68020, which would've been around 1984, but would have needed the peripheral controllers mentioned here.

WalterBright · on Nov 6, 2023

The FPU for the 8086 was a separate chip, and the IBM PC still required a board full of chips to make a working computer.

(If I recall correctly, it wasn't until the 486 that the FPU was incorporated.)

xxs · on Nov 6, 2023

>it wasn't until the 486 that the FPU was incorporated

Indeed and 486SX had the FPU disabled. I think that was the 1st time the binning/SKUs became a marketing strategy.

Aloha · on Nov 6, 2023

The IBM PC had an advantage in that most of the support parts were IIRC largely jellybean 74 series logic - could a PC-PDP have used similar COTS parts?

The larger issue with DEC was a strong NIH trend (almost as strong as IBM), I dont know if they could have bucked that trend to successfully launch a market winning PC. It probably would have looked like the DEC Professional, which I think looks like market failure.

So the dreams of a 64 bit extension to the PDP-11 might be stillborn ;-)

Gibbon1 · on Nov 6, 2023

I read an essay by a guy that worked on project to produce a DEC minicomputer using ECL logic. And there was yet another group working on the DEC Alpha. At the time the main group was using most of the companies resources betting the company on dethroning IBM in the main frame arena. And was trying to stab the Alpha and Minicomputer group in the back.

So probably not.

Suspect IBM's skunk work project was done as a hedge and anti-trust reasons. Personal Computers were going to take some business away from them they wanted it to be their product not someone else. Anti-trust also meant it needed to use off the shelf stuff.

TheOtherHobbes · on Nov 6, 2023

DEC always saw itself as the scrappy, sleek underdog in IBM's big iron market - an engineering company making relatively affordable tools for engineers, academics, and other qualified professionals. Where "affordable" meant six or seven figures instead of seven or eight.

Which was fine while it lasted. But with VAX, the mini market became corporatised, inward-looking, nostalgic, even arrogant. Big-iron hardware had much higher internal prestige than VLSI - unfortunate, because DEC's VLSI people were the best in the world, and the systems software people weren't far behind.

DEC had absolutely no concept of VLSI-based mass-market commodity computing - no idea how to design it, build it, distribute it, market it, or even imagine it. The Rainbow was as close as it got, and that was a disaster.

So even with Alpha it had no chance. Prism would barely have changed that, because the problem was a failure of imagination in upper management.

BigCos should always have an internal team of annoying curious generalists to challenge orthodoxy and report annually on "What trends and opportunities are we missing?" Sometimes the C-Suite has the talent to do that, but more often it just doesn't.

jacquesm · on Nov 6, 2023

Between x86 and the Alpha I much preferred the Alpha. Unfortunately it was wickedly expensive and it was hard to get. But for 64 bit work it was immediately usable and rock solid. It also beat Intel/AMD by about a decade to the market and I'm still kind of surprised that DEC managed to squander that lead.

Gibbon1 · on Nov 9, 2023

I'm pretty sure if Motorola had rolled out the 68008 at the same time as the 68000 the 0x86 would be something few people remember. The hardware people choose the 8088 because it only required 8 DRAM's instead of 16.

Aloha · on Nov 6, 2023

If you read the commentary from the folks who did the PC, the largest thing driving COTS was cost and time to market - while I'm sure anti-trust was a consideration at IBM, I dont think it was a primary one.

varjag · on Nov 6, 2023

I can't imagine Alpha and ECL based designs being done anywhere within the same decade.

Gibbon1 · on Nov 6, 2023

My memory of the 80's was most stuff was fast but power hungry NMOS. People were pushing things like GA, ECL, Silicon on Sapphire as the next thing to replace it.

I might be biased but the company I worked for in the 80's switched to CMOS early[1]. I think the high end guys were ignoring CMOS despite it closing the gap relentlessly. Worse for all the other technologies as integration increased heat became a relentlessly and eventually unsolvable problem. I remember seeing ECL datasheets for simple chips that would draw a half watt.

Notable volume production of CMOS grew and grew while ECL, GA didn't really.

So yes, ECL for new business after 1985 was dumb.

[1] CEO realized enclosures, power supplies, and fans were a large fraction of the bom.

formerly_proven · on Nov 6, 2023

Alpha and VAX 9000

DEC also did a "300 MHz 125 W ECL microprocessor" in the 90s, though that seemed to be mostly about developing cooling.

Someone · on Nov 6, 2023

> I think that same transistor count was reached by Motorola on the 68020, which would've been around 1984

Wikipedia says ~200k (https://en.wikipedia.org/wiki/Motorola_68020), so about 50% more than 135k.

I don’t know much of hardware, but part of that may have been because of (https://en.wikipedia.org/wiki/Motorola_68020#020_concept_eme...):

“A great debate broke out about how to refer to the underlying design of the new chip in marketing materials. Technically, the 020 was moving from the long-established NMOS logic design to a CMOS layout, which requires two transistors per gate. Common knowledge of the era suggested that CMOS cost four times as much as NMOS, and there was a significant amount of the market that believed "CMOS equals bad.”

dboreham · on Nov 6, 2023

68020 is much more complex than any pdp-11.

varjag · on Nov 6, 2023

1806VM2 wasn't released until late 1980s when there were numerous superior options, even if your yardstick is transistor count (but naturally not in USSR).

SomeoneFromCA · on Nov 6, 2023

VM1 though appeared in 1983 and arguably BK home computer was the smallest ever PDP-11

Aloha · on Nov 6, 2023

I think DEC's strong trend line of NIH (almost as strong as IBM) almost certainly dooms a PC-PDP - IBM only made the PC by effectively creating a skunkworks within the company that was focused solely on 'go to market' compared to the normal IBM development process. I dont know if DEC could or would have done that.

WalterBright · on Nov 6, 2023

I was a DEC-head in the 70's and 80's, and along with my fellow DEC-heads anxiously awaited DEC's entry into the PC arena. Everyone was excited to see the rollout of the Rainbow, sure it would be a killer machine like other DEC machines. After the presentation, we were all in shock. It did not play to any of DEC's strengths, and was just a crummy, proprietary x86 insult.

That was the end of our love affair with DEC. Very sad.

Aloha · on Nov 6, 2023

IMO, part of what killed DEC was what almost killed IBM - poor management and a somewhat loathed sales force.

DEC tried to go after IBM and in doing so structured itself after IBM, it lead to multiple competing projects and groups doing similar work, multiple layers of management making the company unwieldy to manage, and a sales force that had trouble building the customer relationship because of internal structures.

The same thing functionally killed Motorola, and will eventually harm Cisco too (Cisco is a company that in my opinion is ripe for 'disruption').

jacquesm · on Nov 6, 2023

One of the worst own goals in the history of computing imnsho.

SomeoneFromCA · on Nov 6, 2023

Soviet home computer BK https://en.wikipedia.org/wiki/Electronika_BK was the smallest pdp-11, but it was not a single chip one. Also MSP430 is a cutdown pdp-11.

WalterBright · on Nov 6, 2023

If Intel could do it, DEC could do it.

xxs · on Nov 6, 2023

DEC Alpha had the weakest memory model, which while cool to write for with its all memory barriers/concurrency - it's quite annoying to work with in the real world, a major reason while DEC failed.

jacquesm · on Nov 6, 2023

Compared to the madness that was x86 memory models the Alpha was quite sane. I'm not sure what you base your 'weakest' on (it suggests a comparison with others) but when Alpha was released there wasn't much to compare it to besides the R4000 and that just as hard if not harder to source than the Alpha machines were. I had a bunch of them (SGI boxes of various plumage) and the OS hardly took advantage of the chip, but on the Alpha it all just worked at 64 bits out of the box.

xxs · on Nov 6, 2023

>I'm not sure what you base your 'weakest'

Linux kernel has memory barriers, Alpha makes an exception with needing virtually all of them - with other architectures (esp. the total store order ones) some (most) of the barriers are nop.

Yet, overall Alpha is famous for how crazy the memory model is. There is nothing like any longer (personally I am happy with Java Memory Model). Just to make sure it's understood "weak" attributed to :memory model: just means how concurrent with regards to reads and writes it is.

A quote[0]:

  AND THEN THERE'S THE ALPHA
  --------------------------

  The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that, some versions of the Alpha CPU have a split data cache, permitting them to have two semantically-related cache lines updated at separate times.  
  This is where the address-dependency barrier really becomes necessary as this synchronises both caches with the memory coherence system, thus making it seem like pointer changes vs new data occur in the right order.

[0]: https://www.kernel.org/doc/Documentation/memory-barriers.txt

jacquesm · on Nov 6, 2023

That's what allows the Alpha to be multi-processor.

And if you like the 'Java Memory Model' I'm not sure what your point really is, you're comparing a virtual machine with actual hardware.

formerly_proven · on Nov 6, 2023

The "oh we can't make the cache as big as we'd like so we just make it two fully independent banks and bake the lack of synchronization of them into the ISA" in DEC Alpha very clearly falls into the "baking restrictions of today into the ISA", which virtually always turns out to be a bad idea.

jacquesm · on Nov 6, 2023

That I'll be happy to agree with but compared to the 640 k limit it's clearly a manageable one, and one that end users of the system normally do not have to concern themselves with.

I've written a (small) OS for x86/32 just prior to getting my hands on an Alpha and compared to that the Alpha looked (and still looks) like a model of sanity to me (as does 68xxx). Just the number of tricks required to get an x86 system properly booted up is off the scale, you have to deal with a whole pile of memory insanity including repeated switches between modes (and memory models) in order to get your stuff even loaded. I spent weeks debugging that loader, to the point that I had a reset switch connected to a musical instrument foot pedal so I didn't have dive under the desk 10 times per hour to reset the box I was writing this on. If not for DJGPP to validate the 32 bit code ahead of time I doubt I would have been able to bring it up at all (note this was well before VMs became a thing on consumer hardware).

monocasa · on Nov 6, 2023

Java was one of the first machines (virtual or physical) to take a stab at defining its memory model in modern way.

It's also pretty sane while also scaling to large systems as Azul's 768 core boxes showed.

xxs · on Nov 6, 2023

>you're comparing a virtual machine with actual hardware.

The memory model is meant for developers - and how many&different memory barriers they have to issue.

jacquesm · on Nov 6, 2023

Your typical Java developer has zero knowledge about this and on the UNIX versions that shipped with the Alpha (for me, at the time that was RedHat) fairly elegantly hid that complexity to the point that you could just forget about it unless you cared about extreme performance. In my case the ability to address large amounts of memory and a filesystem that wasn't limited by the 32 bit limitation was the key factor and while it took another decade for x86 to catch up I was happily shipping.

xxs · on Nov 6, 2023

>Your typical Java developer has zero knowledge about this

That's bit much of stereotyping, depends I guess - "Java Concurrency in Practice" is one of the most sold books (when it comes to Java), and it covers some parts quite well. I have met quite a few folks that understand the matters pretty well, admittedly part of jsr-166. It was the "double-checked idiom" vs the original Java memory model is effectively what brought sane memory models even to C++.

>unless you cared about extreme performance

That's the whole purpose of the weak memory models. Albeit, they did fail to deliver - very error prone, close to impossible to debug, effectively outclassed by x86-64 and Sun Sparc (both being total store order), even arm-64 became "stronger".

jacquesm · on Nov 6, 2023

> That's bit much of stereotyping, depends I guess - "Java Concurrency in Practice" is one of the most sold books (when it comes to Java), and it covers some parts quite well.

That's because Java concurrency in practice is harder than it should be. I've seen plenty of teams struggling with what should be trivial problems trying to work their way around the various limitations and/or bugs in the JVM.

> Albeit, they did fail to deliver - very error prone, close to impossible to debug, effectively outclassed by x86-64 and Sun Sparc (both being total store order), even arm-64 became "stronger".

That's true, but at that time those weren't shipping 64 bit systems. It was pretty much R4000, Alpha or bust and the Alpha - even taking into account those limitations - worked surprisingly well for real world problems. In fact the systems that I'm talking about worked non-stop for more than a decade until they got de-commissioned and I never heard a single peep from those that took over the project that the memory model of the Alpha was somehow either a problem in practice or a concern.

Could it have been done better: sure, but with the knowledge of the time these seemed to be pretty reasonable choices and compared to what the alternatives were they were doing great. If anything the failure of the Alpha architecture is one of marketing more than anything. Pjlmp has some good information elsewhere in this thread, which pretty much corresponds with my experience of the time. There were some systems on the drawing board that were better in theory and they were still better in theory 10 years later, meanwhile DEC was shipping.

That they squandered their head start has nothing to do with the memory model, but everything to do with how they ran their business.

Aloha · on Nov 6, 2023

Making the microprocessor is just one challenge - you have to produce a workable system that someone wants to buy (aka, is cheaper or more attractive than the other options. It does appear that DEC could have made a single chip PDP-11 in 1980, and probably could have made one with an expanded address space to get around the memory limitations in the PDP-11.

Intel did not do that, IBM in this case managed to - and it had the cachet and name recognition to sell an product that was largely middling in a technical level, over other options that were often superior technically and cheaper.

Intel made a decent enough microprocessor, yes it was kludgy (even the 8086 was), but it performed well enough (I've never seen a claim that the 68000 significantly outperformed the 8086), and was available in quantity. The rest of the things that made the IBM PC a system that stormed the world were COTS parts assembled in an attractive package.

Now, could DEC have done all of that with the PDP-11, yes, absolutely - but the winds were prevailing against it because of the nature of DEC management - I fundamentally do not believe that DEC would not have allowed itself to ship something as 'flawed' as the IBM PC was.

gumby · on Nov 6, 2023

> I've never seen a claim that the 68000 significantly outperformed the 8086...

I think you made a typo and meant 6800. Too late to edit though.

I make this typo too because the 68k was such a significant chip in history.

Aloha · on Nov 6, 2023

No, I meant 68000 - what I'm trying to say is that while the 68000 is a significantly more elegant microarchitecture, had more functionality - and it did get to better performance faster, its not really any better in performance when judged on a clock for clock basis to the 8086.

xxs · on Nov 6, 2023

DEC Alpha had the weakest memory model, which while cool to write for with its all memory barriers/concurrency - it's quite annoying in the real world, a major reason while DEC failed.

gumby · on Nov 6, 2023

Except the difference was Grove vs Olsen. Both were engineers, both built great hardware, but Grove feared his competitors and his customers' fickleness while Olsen was about beating / carving off business from IBM.

In the early 80s I moved from the MIT/128 ecosystem to the Stanford/101 ecosystem and even to my business-ignorant eyes (I was still in research at the time) and though I didn't understand why for years, it was like night and day.

mschuster91 · on Nov 6, 2023

> and getting slaughtered in their only remaining stronghold by NVIDIA, AMD and ARM pillaging their castle

NVIDIA doesn't make good CPUs, even their SoCs are ... not exactly state of the art. No mass market adoption besides automotive who don't care about anything but long availability of (spare) parts and the Nintendo Switch which likely only still uses the same 2015-era Tegra chipset because even someone as big as Nintendo couldn't kick enough arses at NVIDIA to bring up a new design. ARM doesn't make generally available server CPUs (the ones that do exist all get gobbled up by cloud providers), and there are outside of the Mac world no viable ARM desktop or laptop CPUs because Qualcomm completely fucked up that market for likely years to come - I don't see any way of ARM adoption in that market unless ARM comes up with a competing solution to Rosetta and Qualcomm comes out of the mindset "if it works just barely, ship it" that may be acceptable to smartphone vendors but not the PC/desktop market.

That leaves AMD as the sole remaining threat to Intel, and AMD doesn't have the fab space to be enough of a threat to Intel's moat.

Yes, I may or may not be extremely frustrated at the state of competition in general computing.

kristopolous · on Nov 6, 2023

Arm snatched Intel's hold on the Apple market and we can invoke any Clayton Christensen book on the Raspberry PI series. Maybe not the Raspberry PI 5, but what about say, the imaginary 8 or the 9 a few years hence?

The Pi 400 isn't their final attempt into the PC market, only their first. Surely a slew of decent ARM based laptops from one of these SBC manufacturers is coming along eventually - and I don't mean chromebooks.

mschuster91 · on Nov 6, 2023

Yeah but that was only made possible by Apple's unique circumstances: their close control over the entire tech stack, their experience with architecture transitions (PPC->x86-32->x86-64->ARM) and the required tooling (Rosetta, fat binaries, compilers), their relatively small market size, their expertise in developing with/for ARM from iOS, and enough cash in hand to buy out the entire fab capacity of TSMC.

In the Windows world, no one holds even closely to the capabilities required for a transition to ARM: Microsoft doesn't have a fat binary standard or the toolchains needed (that all went down the drain with the end of Microsoft Windows CE / Windows Phone and even then, it was a nightmare to develop for these), the third party developers - especially the enterprise tailor-made application market - have zero experience with ARM and a lot of stuff used in enterprise was made by companies that went defunct long ago, Microsoft doesn't have any hold over what the device vendors do in terms of drivers (unlike Apple, who famously cut ties with NVIDIA because NV didn't want Apple to write drivers for their GPUs), and neither Microsoft itself nor the conglomerate of hardware OEMs has the cash in hand to take all the stuff people use on x86 Windows and make it work on ARM Windows (as they infamously discovered with the early Windows Qualcomm stuff). Oh, and the ARM vendors can't be bothered to get something as basic as PCIe working on a fundamental level beyond "if it works in my very specific use case, ship it" - just look at the RPi 4's issues where people developed breakout boards for PCIe only to discover that crucial functionality was flat out broken [1]. And even with people complaining for years about PCIe issues on the Pi 4, turns out the Pi 5 still managed to fuck things up [2].

The only player in town able to make ARM work on anything but smartphones and servers is Apple, and they don't (and never will, assuming regulators don't finally wake up and force them) sell to third parties.

> The Pi 400 isn't their final attempt into the PC market, only their first. Surely a slew of decent ARM based laptops from one of these SBC manufacturers is coming along eventually - and I don't mean chromebooks.

These things are and will be toys. The money is in getting corporate to switch over to ARM and until the problems above (especially backwards compatibility and standards conformance) are worked out, which I don't see happen any time soon because it's so hard to break through the chicken-egg scenario, there will be no threat to Intel. Especially not if even many years of development and complaining are not enough to arse Broadcom into fixing PCIe.

[1] https://www.jeffgeerling.com/blog/2023/i-built-special-pcie-...

[2] https://www.jeffgeerling.com/blog/2023/testing-pcie-on-raspb...

FirmwareBurner · on Nov 6, 2023

You're wrong about Microsoft here. They too have loads of experience developing their SW for multiple architectures, even moreso than Apple.

Windows NT shipped on about 7 or so architectures including PowerPC for the Xbox360. And they also have experience with emulation, that's how they got Xbox 360 emulation on the newer X86 models. Just read their papers on arch emulation.

What they don't have and Apple has is 10+ experience in shipping tailor made ARM chips fit to their needs, because they always left chip design to their partners, as they were always a SW company first not a HW product company like Apple, and this isn't something they can start and catch up with in the snap of a finger.

hakfoo · on Nov 6, 2023

The problem isn't even building NT for ARM/RISC-V/SH-4 or whatever, it's being able to reproduce enough of the surrounding universe that Windows on x86-64 has.

Apple has more leverage over devs; they can say "No more x86-64 in N years" and the developers basically have to move to ARM or abandon MacOS. This bootstraps the market; people who want MacOS have to suck it in and buy ARM because it's the only new hardware we'll be seeing in the future.

Microsoft doesn't control the hardware sector to put a hard deadline on new x86-64 products, and it would be suicidal to cut off the x86-64 software support at any time in the near future.

This means we'll see new x86-64 Windows machines in the store, and all the third-party apps supporting x86-64 Windows, for years to come. So as a consumer, why would I want an ARM-Windows machine? It has little exclusive software, is likely buggier and less mature, and probably runs the vast majority of x86-64 software in an emulation penalty box.

dboreham · on Nov 6, 2023

I've been using an ARM Windows machine as a daily driver (when I'm traveling) for a year. I do that because it has much better battery life than any similar Intel machine, and it has integrated 4G.

jacquesm · on Nov 6, 2023

How is the software support for that? What kind of machine do you have?

bigstrat2003 · on Nov 6, 2023

> Apple has more leverage over devs; they can say "No more x86-64 in N years" and the developers basically have to move to ARM or abandon MacOS.

I think Microsoft could do the same thing as far as devs go. If Apple can exert that influence over devs with their extreme minority market share, I think MS could too. The problem for MS is their customers. Apple customers will buy whatever Apple puts out, because they're extremely loyal to the brand. The same isn't true for Microsoft, and I imagine that pressure would cause them to fold on any major changes.

xxs · on Nov 6, 2023

>I think Microsoft could do the same thing as far as devs go

That's absolutely unrealistic. The main Windows part is the backward compatibility + the corporates. Nowadays Microsoft develops stuff written in Javascript, not their own frameworks, even.

dzonga · on Nov 6, 2023

yeah, I wouldn't count microsoft or intel out.

arm cpu performance advantages at low power are great. i'm typing this on an m2.

but i'm sure intel will figure out how to have great cpu's at low power draws that perform well. amd already did. so i'm sure intel will.

mschuster91 · on Nov 6, 2023

> Windows NT shipped on about 7 or so architectures including PowerPC for the Xbox360. And they also have experience with emulation, that's how they got Xbox 360 emulation on the newer X86 models.

The entire NT stuff has gone down the drain. The last non-x86 platforms were dropped around 2000 [1] until ARM entered the picture in 2012, but the latter was mostly used for Windows Phone for many years which itself got discontinued around 2017.

All these many thousand human-years of experience have long ago retired or went to other companies, their institutional knowledge is effectively lost for Microsoft. And that is the problem.

[1] https://en.wikipedia.org/wiki/Windows_NT#Supported_platforms

Kwpolska · on Nov 6, 2023

Windows 2000 shipped with x86 support only. But just one year later, Windows XP had an Itanic port at launch.

dboreham · on Nov 6, 2023

Cutler is still there :)

pjmlp · on Nov 6, 2023

As per his last interview, nowadays busy porting GNU/Linux into XBox running on Azure, for AI workloads when they are idle.

Somehow there is a certain irony on that.

pjmlp · on Nov 6, 2023

That is not the issue, rather convincing the Windows developer community to actually care about ARM.

Traditionally Microsoft isn't like the others (Apple/Google), "take this or go away", which is why they became so big for enterprises in first place.

ack_complete · on Nov 6, 2023

They do have lots of experience porting Windows to multiple platforms. They don't have very good experience managing the user experience of Windows transitioning between platforms.

I was one of the early adopters of Windows on ARM, the Windows 10 native port to ARM64 (ARMv8). At release, practically the only native development tool was WinDbg -- neither Visual Studio nor Windows Performance Analyzer had been ported. You could install Visual Studio in x86 emulation mode but it wouldn't run reliably as the toolchain would keep throwing heap errors, so cross-compilation was required and debugging was harder. There was basically zero information about what was and wasn't supported -- you'd just start porting and run into something like there being no OpenGL acceleration support. Or even more fun, that there was no ARM64 version of the Visual C++ Redistributable published, so you couldn't distribute a program that was dynamically linked to the CRT -- and the Visual Studio support staff didn't even know what ARM64 was and pointed to the x64 redist. This didn't start getting ironed out until around three months after Windows on ARM machines had started shipping.

And assuming you got past these problems, Windows has no universal binary system, so it's your job to figure out how to properly get the right platform executable installed and launched without any support from the OS. This was really bad in the early days of x64, where XP would just say "invalid executable" when trying to launch a x64 program on x86; these days it displays a slightly less cryptic "Machine Type Mismatch" error dialog with no further help.

Microsoft is trying to fix these problems now, but they're years late and the amount of software available as native ARM64 is still very low. Oh, and they already dropped support for the early gen Snapdragon 835 and 850 devices in Windows 11, which means no x64 emulation or ARM64EC support, and an even tinier effective market. In contrast, Apple managed the ARM transition much, much better -- they had native tooling, documentation, and development systems lined up in advance and a much more polished user experience on day one.

pjmlp · on Nov 6, 2023

Are you sure regarding ARM64EC? I doubt that they are dropping it any time soon, specially when they just announced the Arm Advisory Service.

ack_complete · on Nov 7, 2023

They aren't dropping ARM64EC, it's still fully supported and how Visual Studio and Office run native. It is based on x64 emulation support, however, so it requires Windows 11. Snapdragon 835 is officially unsupported by Windows 11; 850 is conditionally supported, but IIRC at one point it didn't support hardware 3D acceleration in x64 apps.

pjmlp · on Nov 6, 2023

Microsoft has a now an hybrid format for ARM/x64, Arm64EC.

https://learn.microsoft.com/en-us/windows/arm/arm64ec

Besides Windows NT has been designed and used for multiple architectures for years, Windows CE and Pocket PC were not the only ones with ARM support, Windows IoT and the original Windows 8 WinRT tablets did as well.

The biggest issue is lack of incentives, for most business there is no ROI to install ARM compilers alongside x64 for .NET and C++ toolchains, and yet another set of architectures to debug on, and take into consideration.

It is as you say, unless they behave like Apple or Google, imposing a transition, most of those companies won't care.

They have recently putted out a kind of ARM porting help center, but I doubt it will make any impact.

https://blogs.windows.com/windowsdeveloper/2023/10/16/window...

gumby · on Nov 6, 2023

> unlike Apple, who famously cut ties with NVIDIA because NV didn't want Apple to write drivers for their GPUs

You're thinking about lack of support for NVidia eGPUs but the bad blood goes back to 2008 when NVidia screwed apple with failing GPUs in macbook pros and told Apple to pound sand.

https://hothardware.com/news/apple-admits-nvidia-gpu-defect-...

Someone · on Nov 6, 2023

> The only player in town able to make ARM work on anything but smartphones and servers is Apple, and they don't (and never will, assuming regulators don't finally wake up and force them) sell to third parties.

I think that would be new territory for regulators. Have there ever been examples where they forced a vertical integrator to sell individual parts to other parties (either at mass to other manufacturers or to consumers?)

fanf2 · on Nov 6, 2023

> These things are and will be toys.

Heh, https://cdixon.org/2010/01/03/the-next-big-thing-will-start-...

«The reason big new things sneak by incumbents is that the next big thing always starts out being dismissed as a “toy.” This is one of the main insights of Clay Christensen’s “disruptive technology” theory.»

Tho Raspberry Pi computers are designed to a low price point, so they are aiming for the low end not the high end. But the 5 is about as powerful as the last-generation Intel MacBooks, despite using fairly old Arm cores. I don’t think they can be casually dismissed.

KerrAvon · on Nov 6, 2023

I still have an original RPi model B serving as a PiHole for my home network. It never was a toy.

jacquesm · on Nov 6, 2023

I have a Pi 400 here and for the money it's most impressive. It runs pretty much the whole house in terms of heating and power management, with full autonomy using HA and a very limited bit of custom stuff. You can stick in anything at all and it 'just works' the only things that have given me a headache are Zigbee dongles, everything else worked without issue.

flykespice · on Nov 6, 2023

> there are outside of the Mac world no viable ARM desktop or laptop CPUs because Qualcomm completely fucked up that market for likely years to come

Huh citation needed? I would like to know more context on that

mschuster91 · on Nov 6, 2023

Qualcomm had an exclusivity deal with Microsoft for years [1], which they then used to deliver absolute crap to customers, which in combination with almost zero software being available on Windows for ARM [2] (even popular software such as Chrome...) led to these things being nice paperweights.

[1] https://www.xda-developers.com/qualcomm-exclusivity-deal-mic...

[2] https://www.digitaltrends.com/computing/why-windows-on-arm-c...

awiesenhofer · on Nov 6, 2023

> no viable ARM desktop or laptop CPUs because Qualcomm completely fucked up that market for likely years to come

not a huge fan of Qualcomm but you might be interested in this:

"Qualcomm Snapdragon X Elite Performance Preview: A First Look at What’s to Come" https://www.anandtech.com/show/21112/qualcomm-snapdragon-x-e...

mschuster91 · on Nov 6, 2023

I'll believe it when I see it having been taken apart by actually independent nerds who look beyond benchmarks and actually take care if the fundamentals work properly.

The state of Qualcomm products over the last years has left me with absolutely zero confidence.

xattt · on Nov 5, 2023

What’s their hope right now? Some sort of Quark-derived desktop CPU à la Dothan?

kristopolous · on Nov 5, 2023

Coming out with better chips at lower prices. I know how obvious that sounds but it's true.

The difficulty is they need to make nearly decade-long bets that are the size of small countries economies due to the complexity of manufacturing and Intel's made a few bad ones recently.

I don't know who to listen to on what chip design will be a market win in 2030 either. AI applications are extremely resource intensive so that will be driving things for a while but how to solve that in an affordable chip created by a reliable efficient manufacturing process is beyond me. This stuff is phenomenally hard.

I'd say the winner is something like an NVIDIA graphics pipeline that is separated into a pile called "graphics" and "ai" and then has the graphics part gutted for a cheaper AI pipeline which can use system memory as opposed to preciously expensive graphics memory and then gets integrated into their next gen CPUs taking Nvidia out of the loop and dealing a blow to AMD at the same time. They'd mop the floors with something like that especially if you could just drop it into pytorch and have it work automagically. They could probably then just turn around and license it to ARM.

AMD and Nvidia wouldn't work together to mount a unified defense because of ATI and this would allow Intel to weasel their way back into the Apple money stream.

But I'm just some unemployed dude typing this on a 4 year old android. Don't listen to me.

comex · on Nov 6, 2023

Isn’t AI usually highly memory-bandwidth constrained? Designing a new powerful AI chip to rely on slow memory probably isn’t the best strategy.

kristopolous · on Nov 6, 2023

I'd imagine when the AI dust clears there's going to be consumer and producer sides of AI and the consumer requirements will be a carveout of the producer.

This just appears to be the case with everything else. Making a video game, movie, song, computer program, etc, requires more resources than using one and there was a significant price delta between them for a long time.

tomcam · on Nov 6, 2023

wouldn’t graphics and AI both pretty much reduce to mad matrix operations?

kristopolous · on Nov 6, 2023

Things like hardware raytracing probably aren't needed and there's shading units, texture mapping, and ROPs. I'm not a hardware engineer but there's probably some way Intel can rejigger its IGT (https://en.wikipedia.org/wiki/Intel_Graphics_Technology) to have the equivalent of Nvidia's "tensor cores".

They have an NVIDIA mainline competitor in their A770 but I think they should be exiting the direct nvidia assault strategy. I really don't know how that's going to work. They're basically a nonplayer https://www.videocardbenchmark.net/high_end_gpus.html

What's their winning move in that approach? It's just money in a volcano.

monocasa · on Nov 6, 2023

Graphics has more than that between texture pipelines, ROPs, more complex compute dispatch than needed for AI, etc. All of that comes with area and power costs even if you clock gate the blocks you can.

scrlk · on Nov 5, 2023

Doubling down on becoming a competitive foundry and becoming the western equivalent to TSMC. If their 18A process ships on time (2025), there's a chance that they could regain process leadership.

sweetjuly · on Nov 6, 2023

I can't imagine the US government letting Intel Foundry Services fail. Not so much in the banking "too big to fail" sense but a "national security interest" sense: not only are semiconductors critical to the domestic economy but having the latest nodes available for military application gives a major leg up. We see this elsewhere too; SkyWater is a kinda terrible fab (their yield for even very old processes is incredibly bad lol) but the DoD is still throwing gobs of money at them to developed their radhard process because they need some domestic, trusted vendor to turn to. IFS might not be competitive but they probably aren't going anywhere, even if they fail to deliver anywhere near on time.

baq · on Nov 6, 2023

Fingers crossed but they don’t have a good track record in the last 10 years. The world needs at least two good processes.

cmrdporcupine · on Nov 6, 2023

Imagine they got into making an ARM CPU competitive with Apple's? Or a server chip that is better than Ampera's, and sold at scale? I think they have the expertise and ability to do it.

userbinator · on Nov 6, 2023

A dual-mode ARM-x86 core would be an interesting CPU that they certainly have the skill to create. Thanks to the protected mode descriptor model, imagine having ARM64, ARM32, x86-32, and x86-64 code segments all coexisting in the same system, with no emulation nor virtualisation. There wouldn't be any overhead, because all instructions regardless of ISA get translated to uops anyway. They could add a RISC-V front-end too, just for completeness sake.

snvzz · on Nov 6, 2023

I can see AMD/Intel releasing RISC-V CPUs with x86 acceleration.

I do not see it with an ARM mode, because there's no such software moat stuck on that platform as there is with x86.

mratsim · on Nov 6, 2023

That sounds like dev and user experience nightmare.

StillBored · on Nov 6, 2023

I really wish people would stop applying the "flat 32-bit" revisionist history to the 386. That wasn't its obvious target, but rather picking up the important "capability" arch features which were seen as the future before unix/c/risc/single supervisor/ideas destroyed the previous 30 years of mainframe/minicomputer OS research in things like security.

So, what this article fails to really clarify is that the segment registers were now basically "selector" indexes into tables with base+length (in either pages or bytes) fields, execution permission controls. And these selectors and the GDT/LDT/IDT/TSS/call gates/task gates/etc were all designed to support OSs with a 4 level permissions hierarchy, user/library/driver/kernel (or similar), passing around access selectors which could do things like enforce the size of data structures, etc. And to support this, they added FS/GS so that all the general purpose registers could have their own permissions masks.

Pause for a moment and consider that again, Pointers (capabilities, aka selectors) can have not only a base address, but a hardware enforced limit, along with a permissions model that means a function like strcpy() would be incapable of writing to any memory that wasn't the target buffer or part of its own scratch space. Languages/os's could have enforced that called functions were unable to write to the callers stack, or even possibly run in their own completely separate stack. And that is just the beginning.

So, here nearly 40 years later the industry is still trying to recover from the mistakes of designing OS's and programming languages around flat memory models and simplistic user/supervisor permissions models. The 386 provided hardware assistance for writing OS's features that to this day aren't common.

ex: see CHERI.

bonzini · on Nov 6, 2023

> Pointers (capabilities, aka selectors) can have not only a base address, but a hardware enforced limit

There are only 8k possible pointers in the LDT, plus 8k in the GDT. The x86 segmented model isn't really suitable for implementing capabilities.

StillBored · on Nov 6, 2023

Sure, 40 years later, but for comparison my computer in 1990 had 1MB ram. Its replacement had 8M iirc a year or two later. I remember in the later 1990's having a problem with my socket7 computer because the caches couldn't physically tag more than 64M of ram, so everything above that was uncached. Linux of the mid 1990's would print a half dozen lines when one typed 'ps'.

A limitation of 8 thousand different protection ranges would have been a lot for a program utilizing a few hundred KB of actual data and coming from a system were it was a PITA to access a data structure > 64K. It might not have been enough to do a super fine grained implementation, but it was more than enough for the time period, and had any significant OS's used it in a meaningful way I'm sure it would have been extended when limitations here hit, as was everything else in the following products.

Oh, and also one could have reloaded the GDT, or swapped some number of LDTs at some boundary if needed. It wouldn't have really been much more disruptive in the 1980s than switching the page tables on task switch, like every modern OS.

aforwardslash · on Nov 6, 2023

8k LDT descriptors per task. Each LDT table also needed an entry in GDT, limiting the amount of tasks to also another 8k. So 64k descriptors total. Thing is, most ia32 operating systems dont actually use the existing hardware model for multitasking, and instead use simpler approaches that cater to the flat memory model.

nezirus · on Nov 6, 2023

It been long since I have stopped following grsecurity, but I would not be surprised if segment registers are still used (e.g. Pax UDEREF)

https://forums.grsecurity.net/viewtopic.php?f=7&t=3046

https://pax.grsecurity.net/docs/PaXTeam-H2HC12-PaX-kernel-se...

sweetjuly · on Nov 6, 2023

I would be amazed of PaX still used segment registers seeing as they don't really do much in long mode. In fact, they specifically call this out in the description of UDEREF in your second link.

clausecker · on Nov 6, 2023

The whole segmented protected mode stuff was introduced with the 286 already. The 386 only added FS/GS and grew segments to 32 bits.

sombragris · on Nov 6, 2023

I want to stress how important the 386SX was. My dad wanted a PC for me and asked a friend to build a 286 clone. My friend gave us instead a 386SX. "It's about the same price as an 286 but what you're getting now is a 32-bit CPU, make no mistake about it", he said, and he was right. I was able to run Win 3.11 with it. A 32-bit CPU for the price of a 286 and thus quite affordable? That was genius.

StillBored · on Nov 6, 2023

Its nice that the 286 gets some love here too, that chip was really underrated considering how much better its IPC was vs its predecessors, which is largely the main ding against the 386. Running existing 16 bit code its IPC was basically the same as the 286, and given its initial 12Mhz clock rate, was pretty underwhelming. It wasn't until the clock really started to scale and people started using the 32-bit capabilities that it was anything more than an expensive dos/286 competitor.

nu11ptr · on Nov 6, 2023

The 386 was released shortly before I bought my first computer as a teen. At the time, I saw both 286 and 386 PC's on the market, but the latter with a large price premium. I wasn't sure what the difference was at the time, so I bought the 286 system. Within a couple of years I had learned the difference and had large regret I didn't save my money and buy a 386. Shortly after that, I started writing low level assembler including the system bootstrap for a toy OS, which now could not use 32-bit protected mode. In addition, 386-only games started to be released at some point, and so I very much felt left out until I bought my first 486, but that wasn't until years later.

cturner · on Nov 6, 2023

Intel evolved the x86 line beyond 386, into Pentium Pro and then to amd64. Why did Motorolla not do the same with the 68k?

I have seen discussion that treats it as assumed knowledge that 68k was obsolete and needed to be replaced by powerpc. But it seems like conjecture - I have not seem technical arguments, and 68k seems like a cleaner architecture to ride forward than post-286 x86.

ndsipa_pomu · on Nov 6, 2023

Not enough computers were sold using 68k to provide enough demand for the chips and so it couldn't compete with Intel on price and couldn't justify spending lots on research and development. Intel had the PC-compatible market and thus a huge demand for its chips.

xtracto · on Nov 6, 2023

Here's my 3rd world country perspective, from someone growing in the 80s:

We never had intel chips. They were expensive. I had a Texas Instruments 286 and a cyrix 486 years later. The computer in my Dad's office were all "pc compatible" . This was a university (biology department in a God forsaken poor small city in Mexico). Someone got it bought because it was going to "change the world" (oh boy). We played TDCGA.EXE and PRINCE.EXE from its 2 51/4 floppy drives. No HDD.

We heard about apple's, commodore's and other crazy computers, but they were for "gringos" or rich people.

At the same time, we shared (pirated) software as if there was no tomorrow. Spread like wildfire. As and played the heck out of ID shareware.

Then you had the Mexican middle upper class: whose dad bought Macintosh one of those others. They NOW had to spend all this money to get it to do something (buy software). Nobody was pirating/sharing programs, and the PC ones just didn't work.

So a virtuous cycle continued and we kept buying x86.

Great memories! I'm glad I was part of that dawn of PC.

mixmastamyk · on Nov 8, 2023

The problem wasn't technical, it was the lack of "buckets of money" that intel had access to, combined with customers that demanded backwards compatibility.

Intel tried at least twice to move them away to i860 or Itanic and failed. So, improve x86 was the winning bet.

https://news.ycombinator.com/item?id=37796469

tyingq · on Nov 6, 2023

>Were there any commercial efforts to build IDE, VESA or PCI systems around a 68k processor?

Sort of. There was the VME bus. An attempt to have a standards based bus that would work across vendors, and also for the the 88k cpu. It wasn't wildly successful, but was mildly successful.

hulitu · on Nov 6, 2023

There was also Apollo. They had the Domain 3000 series pretty close to PC architecture (ISA bus). Unfortunately, HP ate them.

kjs3 · on Nov 5, 2023

Maybe just me, but this sounds pretty revisionist wrt 'most important'. If the 8086/8088 hadn't stumbled into ubiquity via the IBM PC, there probably never would have been an 80286 much less an 80386. YMMV.

That said...the 386 was a world-changing engineering achievement, and as much as I think in a just and fair timeline the 68030 would have taken over the world ( :-) ), you can't discount what Intel did.

Narishma · on Nov 6, 2023

The 386 maybe, but the 286 was already completed by the time the IBM PC launched if not before.

kjs3 · on Nov 8, 2023

The IBM PC launched in mid 1981, which means the 8088 was fixed in the design at least, what, a year before if not more? The 286 shipped in 1984. Are you saying the 286 was a completed design in, say, 82-83, or that Intel was saying "this is what we think the 286 will look like" in that timeframe?

Narishma · on Nov 9, 2023

The 286 launched in early 1982, just a few months after the IBM PC. 1984 is when IBM put it in the PC AT.

HankB99 · on Nov 6, 2023

The '386 was the first in that line to support demand paged virtual memory which opened up a lot of things an OS could do. IMO that's the most important thing the '386 provided. My second PC was a '386 that ran SCO UNIX. (First was an 8080 based Heathkit H-8 that ran CP/M.)

commandlinefan · on Nov 6, 2023

> Bob Childs, one of the architects of the 286, worked underground to lay out some ideas of what could be a 32-bit extension to the 286. After about six months

How the hell does _that_ happen? I've never had a job in 30 years where I didn't have somebody breathing down my neck to produce something tangible every couple of _days_.

blakespot · on Nov 6, 2023

Interestingly, I've never owned a 386. I was not much of a DOS/Win PC person until 1994, when I got a 486 to run NEXTSTEP. I've had an i8088, NEC V20, i80286, i80486, AMD 5x86, P4, and then on with the Mac starting in 2006 with (well, just before) Intel Core microarch. In those early days I was more Amiga, ST, etc. ( https://bytecellar.com/the-list/ )

I felt more spiritually connected to the MC68K line back when, for lack of a better term.

Amusingly the i386 system I spent the most time with was in college in 1993/4 on a Sun 386 tower running SunOS or Solaris in the APCS lab.

TristanBall · on Nov 6, 2023

While I got the 80386 programmers manual as one of my teen birthday presents.. I only ever actually did a little ASM programming but I loved that book anyway and read it a lot.

Really annoyed at myself that I got rid of in some fit of "well, I'll never use that again" cleaning some time.. especially given somehow still have "Sendmail, edition 2".

I might read the 386 book for nostalgia.. the Sendmail one..well, PTSD isn't something you get nostalgic about!

steve1977 · on Nov 6, 2023

I’ll dream in M4 tonight, thank you…

TMWNN · on Nov 5, 2023

How different are the instruction sets of the 80486 and Pentium from the 386? Put another way, had the instruction set been frozen as of the 386 (barring any required changes for 64-bit), would we notice any difference in performance today?

duskwuff · on Nov 5, 2023

80486 and Pentium added relatively few instructions to the core instruction set, but there are a couple of pretty important tools which got added. The ones you'd miss the most would probably be:

* CMPXCHG (486). Central to multiprocessor synchronization and locking.

* CPUID (P6). Admittedly, if the instruction set were frozen you wouldn't need this... but if not, it's how you detect what CPU you're running on and what it supports.

* RDMSR/WRMSR (P6, kernel only). A general-purpose mechanism for adding extra special-purpose registers to the CPU without having to allocate an instruction to each one.

* INVD/WBINVD/INVLPG (486, kernel only). This was the first Intel CPU to support cache; these instructions were used to manage it.

userbinator · on Nov 5, 2023

CPUID was available on late-model 486s too.

epcoa · on Nov 6, 2023

Yes.

For one no one has mentioned that the 386 itself had no on die hardware floating point. That seems like a huge one.

Even if only a few instructions were added to the “core” some are huge like CMPXCHG, CMOV and although not an instruction itself the LOCK prefix.

But the extensions are huge. We don’t even still use the floating point instructions of the 386/387 era. MMX was pretty lame but SSE and AVX are critical. AES-NI is now necessary for most people with FDE commonplace.

toast0 · on Nov 6, 2023

> although not an instruction itself the LOCK prefix.

From what I can tell, the 386 had the LOCK prefix. Pin 26 (bottom left) is LOCK# driven by the LOCK prefix [1]. But CMPXCHG is very useful and wasn't available until 486, and Pentium added some other stuff that's important.

[1] https://www.eeeguide.com/intel-80386-pin-diagram-description...

userbinator · on Nov 6, 2023

The 8086 had the LOCK prefix. Intel was thinking of multiprocessing from the beginning.

epcoa · on Nov 6, 2023

My mistake.

tssva · on Nov 5, 2023

The 486 added XADD, BSWAP, CMPXCHG, INVD, WBINVD, INVLPG to the instruction set.

The original Pentiums added CPUID, CMPXCHG8B, RDTSC, RDMSR, WRMSR, RSM to the instruction set.

Later Pentiums added the MMX instruction set.

api · on Nov 5, 2023

Vector instructions are the obvious thing and being massive gains to media, cryptography, math, AI, graphics, and signal processing.

Beyond that there have been a few additions like CMOV (conditional move) that would be missed, though instruction fusion in pipelines can sometimes achieve the same speed up.

Lastly you would have to add some atomic instructions to support SMP.

clausecker · on Nov 5, 2023

CMOV is very important for high-performance programming as it greatly simplifies the design of branchless code. There are workarounds, but they either involve conditional branches (you don't want these) or increase the critical path latency significantly (the simplest workaround is to materialise the carry flag using SBB, then use that as a mask).

giantrobot · on Nov 5, 2023

Even besides new instructions the 486 and then Pentium ran existing x86 code faster than the 386 clock for clock. Various new instructions did add capability but just running existing code faster was a huge win on subsequent chips.

TacticalCoder · on Nov 6, 2023

Yeah. Not many new instructions but many instructions required less cycles per instruction and the 486 also got way bigger caches. And the integrated FPU (IIRC the FPU was an add-on for the 386).

Switching from a 386 to a 486 was bringing a huge speedup back then.

pkaye · on Nov 5, 2023

I think the early Pentiums are pretty close to the instruction set of the 80386. However there were many iterations to the Pentium that added new instruction like the MMX.

jstanley · on Nov 5, 2023

In specialised applications, you'd definitely miss AVX and SSE. Beyond that, I'm not sure.

hulitu · on Nov 6, 2023

I'm shocked nobody mentions it: 386 added the turbo button.

mkl · on Nov 6, 2023

Because it's not true. I had a 286 with a turbo button. https://en.wikipedia.org/wiki/Turbo_button

M95D · on Nov 6, 2023

I saw a 8088 that had a frequency switch, but it was on the back and not labeled as "Turbo".

Narishma · on Nov 6, 2023

That was already present in 286 and even higher clocked 8088 and 8086 machines.

mrnage · on Nov 6, 2023

Website is down, got a 502; see mirror: https://web.archive.org/web/20231106023033/https://www.xtof....

phendrenad2 · on Nov 6, 2023

The 386 (and actually, the 286) were great designs because they kept backward compatibility. 386 assembly isn't fun to write, and memory management with segment registers is gross. But backward compat is worth it.

outside1234 · on Nov 5, 2023

I forgot how powerful the i960 was - and how this demonstrated that despite that - that compatibility was king.

nine_k · on Nov 6, 2023

The i960 apparently had enough embedded use, e.g. in printers, switches, terminals. That is, where binary compatibility did not matter much.

We under-appreciate how little binary compatibility matters now, so that you can even develop something on an ARM-based machine and then rebuild and safely deploy to an x64-based machine (usually because it's Node, JVM, Python, etc).

shrubble · on Nov 6, 2023

The problem with the i960 was that (according to the last comment on https://www.righto.com/2023/07/the-complex-history-of-intel-... ) the 386 team got more resources and a better process node than the i960, which was produced on a 1.0-micron process, which was already old at that time.

kjs3 · on Nov 6, 2023

See also: Itanium. Volume customers care about software, not hardware.

avidphantasm · on Nov 6, 2023

Nice write-up. Re-reading about the evolution and complexity of x86 makes me wonder about attempts to modernize x86. Does anyone know how Intel’s x86-S proposal to do a cleaned-up 64-bit architecture has been received? I looked for updates in the media but haven’t been able to find anything.

lloydatkinson · on Nov 6, 2023

I seem to remember I didn’t think it was radical enough but a good start.

doubloon · on Nov 6, 2023

So many times it is some small team working on a unknown disregarded project, from fallout to the mac intel transition, that a company is saved by. The best companies surely must know this and knowing it, allow these tiny disregarded projects to exist on purpose.

weinzierl · on Nov 6, 2023

The 80386 DX was a revolutionary CPU. It certainly foreshadowed the 486 and ultimately the Pentium. Most people I know only had a 80386 SX which was still revolutionary but it hid it well by being essentially a glorified (but slower) 80286 on the outside.

nu11ptr · on Nov 6, 2023

The SX may have been slower, but it could still run all 386 software which was a huge advantage over the 286. I had a 16MHz 286 and I so badly wanted a 386SX 16Mhz so I could run 32-bit software.

MichaelRo · on Nov 6, 2023

Superseding a ZX-Spectrum, luckily my first PC was a Siemens-Nixdorf 386-SX @ 16 Mhz, no FPU, with 2 Mb of RAM and a 40 Mb hard drive: https://www.ebay.com/itm/172038842293

I did install Windows on it briefly from what I recall but wasn't impressed, there wasn't much to do with it. Games would be pure DOS and for programming I'd use Borland Pascal so again DOS.

But as a gaming machine it ran anything I could throw at it at the time, which was 286-games actually. Without realizing I had the absolute best "286" machine I could have, for DOS gaming it is apparently much better to play them on a 386: (Why you don't want a vintage 286 PC -- but I like mine anyway): https://www.youtube.com/watch?v=Htbvm5_NZHc

icedchai · on Nov 6, 2023

The 386SX could do everything the 386DX did, just slower. My first Linux box was a 386SX machine. Before Linux, I ran Coherent on it: https://en.wikipedia.org/wiki/Coherent_(operating_system)

ianmcgowan · on Nov 6, 2023

There are dozens of us! Dozens!

I paid $99 for Coherent because Linux didn't support the fancy RLL hard drive in my work computer. Eventually Linux caught up (or I got a new computer) and Slackware replaced Coherent (for little things like X11, better networking etc.). Those were the days :)

weinzierl · on Nov 6, 2023

Coherent sounds really cool, I've never heard of it before. I wish I had known about it back in the day.

I had a 386DX, 25 MHz with 4MB and ran Slackware 3.0 with kernel 1.2.13 on it. It worked pretty nicely for me, but I have to say that I spent most of my time on the console. X11 did run but it was too slow to be fun.

icedchai · on Nov 6, 2023

Coherent was quite impressive for the early 90's! It had some limitations though, the major one being no networking built in. I remember using a 3rd party app (KA9Q, I think?) to give me "user level" networking so I could connect to another system over SLIP. I had an early home network built out of serial cables.

Once Linux stabilized, the writing was on the wall...

jacquesm · on Nov 6, 2023

Another Coherent fan here. That was an awesome system. And for once: awesome documentation. The Coherent book long outlived the Coherent system for me.

icedchai · on Nov 6, 2023

I totally agree! I still remember the cover of that book, with the picture of the shell on it. I learned most of the "POSIX" APIs from it before I moved to Linux.

lakkal · on Nov 6, 2023

I still have all my Coherent manuals and disks. I still referenced the manual for years after I moved on to Linux.

aap_ · on Nov 6, 2023

I always found it interesting how the 286 and 386 seem to have been designed with Multics in mind. The hardware is not a perfect match, but it's weirdly close.

atan2 · on Nov 6, 2023

I just found some other great articles on this website as well.

drooopy · on Nov 6, 2023

I came here to say that. Lots of good articles in there on retro systems.

exstential · on Nov 6, 2023

crazy stuff