I used to hate Python too, but i complety 180ed in the past few years as I learn...

I used to hate Python too, but i complety 180ed in the past few years as I learned more about computer science.

If you look at compute in general, it can be pretty much be summed up as add/multiply/load/store/compare/jump (straight from Jim Keller). All the other instructions are more specialized versions of that, with some having dedicated hardware in CPUS.

If you need to do those 6 things as fast as possible, on single piece of hardware, you are most likely writing a video game. Thus video game development is pretty much C/C++ with a few things of Swift/C# sprinkled about.

If a single piece of hardware requirement goes away (i.e you are writing a distributed system to serve a web app), people quickly figured out that hardware is cheaper than developer salary, and also network latency is going to be the dominant thing for speed. This is the reason Python took off - its super quick to write and deploy applications, and instead of paying a developer $10k+ a month, you can just spend half that on more EC2s that handle the load just fine, even if the end user has to wait 1.5 seconds instead of 1.1 for a result.

If you don't need compare/jump, your program is essentially better off suited to running on the GPUs. OpenCL/CUDA came about because people realized that a lot of applications simply need to do math without any decisions along the way, and GPUS are much better at this. The paradigm is that you write kernels that you then load onto the GPU - this can be done in any language since you really just need to run the code once. I.e Python, despite being slow is used primarily for ML because of this.

Then there is multiply/add only, which you probably best know in implementation as ASICs for bitcoin mining that blew GPUs out of the water. When you don't have memory controllers and just load/store from predefined locations, your speed goes through the roof. This is the future of ML chips as well, where your compiler looks a lot like the verilog/hdl compilers for FPGAs.

Furthermore, with ML, the compare/jump and even load/store is being rolled into multiply/add. You have seemingly complex algorithms like GPT that make decisions, but without any branching. Technically speaking, a NAND gate is all you need to make a general purpose CPU and you need 2 neurons to simulate a NAND gate. So you can build an entire general purpose CPU from multiply/add.

So in the end, its absolutely worth investing in Python and making it better. Languages like Julia are currently better suited to performant tasks, but the necessity of writing performant code to run on CPUs is going away slowly with every day. Its better to have a high purpose language that allows you to put ideas into code as quickly as possible, and then have different specializations for more generic tasks.