C code needs to be updated to be safe in a GIL free execution environment. It is a lot of work! The pervasive problem is that mutable data structures (lists, dict etc) could change at any arbitrary point while the C code is working with them, and the reference count for others could drop to zero if *anyone* is using a borrowed reference (common for performance in CPython APIs). Previously the GIL protected where those changes could happen. In simple cases it is adding a critical section, but often there multiple data structures in play. As an example these are the changes that had to be done to the standard library json module:
The json changes above are in Python 3.15, not the just released 3.14.
The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.
It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
I think Java got this mostly right. On the threading front, very little is thread-safe or atomic (x += 1 is not thread-safe), so as soon as you expose something to threads, you have to think about safe access. For interacting with C code, your choices are either shared buffers or copying data between C and Java. It's painful, but it's needed for memory safety.
The core Python data structures are atomic to Python developers. eg there is no way you can corrupt a list or dictionary no matter how much concurrency you try to use. This was traditionally done under the protection of the global interpreter lock which ensured that only one piece of C code at a time was operating with the internals of those objects. C code can also release the GIL eg during I/O, or operations in other libraries that aren't interacting with Python objects, allowing concurrency.
The free threaded implementation adds what amounts to individual object locks at the C level (critical sections). This still means developers writing Python code can do whatever they want, and they will not experience corruption or crashes. The base objects have all been updated.
Python is popular because of many extensions written in C, including many in the standard library. Every single piece of that code must be updated to operate correctly in free threaded mode. That is a lot of work and is still in progress in the standard library. But in order to make the free threaded interpreter useful at this point, some have been marked as free thread safe, when that is not the case.
So its the worst of all possible worlds then. It has the poorest performance due to forced locking even when not necessary and if you load a library in another language (C), then you can still get corruptions. If you really care about performance, probably best to avoid Python entirely, even when its compiled like it is in CPython.
PS For extra fun, learn what the LD_PRELOAD environmental variable does and how it can be used to abuse CPython (or other things that dynamically load shared objects).
A library written in another language would have a Python extension module wrapping it, which would still hold the GIL for the duration of the native call (it can be released, but this is opt-in not opt-out), so that is usually not the issue with this arrangement.
The bigger problem is that it teaches people dangerously misguided notions such as "I don't need to synchronize if I work with built-in Python collections". Which, of course, is only true if a single guaranteed-atomic operation on the collection actually corresponds to a single logical atomic operation in your algorithm. What often happens is people start writing code without locks and it works, so they keep doing it until at some point they do something that actually requires locking (like atomic remove from one collection & add to another) without realizing that they have crossed a line.
Interestingly, we've been there before, multiple times even. The original design of Java collections entailed implicit locking on every operation, with the same exact outcome. Then .NET copied that design in its own collections. Both frameworks dropped it pretty fast, though - Java in v1.2 and .NET in v2.0. But, of course, they could do it because the locking was already specific to collections - it wasn't a global lock used for literally every language object, as in Python.
It is multiple fine grained locking versus a single global lock. The latter lets you do less locking, but only have a single thread of execution at a time. The former requires more locking but allows multiple concurrent threads of execution. There is no free lunch. But hardware has become parallel so something has to be done to take advantage of that. The default Python remains the GIL version.
The locking is all about reading and writing Python objects. It is not applicable to outside things like external libraries. Python objects are implemented in C code, but Python users do not need to know or care about that.
As a Python user you cannot corrupt or crash things by code you write no matter how hard you try with mutation and concurrency. The locking ensures that. Another way of looking at Python is that it is a friendly syntax for calling code written in C, and that is why people use it - the C code can be where all the performance is, while retaining the ergonomic access.
C code has to opt in to free threading - see my response to this comment
It is true that more fine grained locking can end up being done than is strictly necessary, but user's code is loaded at runtime, so you don't know in advance what could be omitted. And this is the beginning of the project - things will get better.
Aside: Yes you can use ctypes to crash things, other compiled languages can be used, concurrency is hard
It depends on how you define "corruption". You can't get a torn read or write, or mess up a collection to the point where attempts to use it will segfault, sure. You can still end up with corrupt data in a sense of not upholding the expected logic invariants, which is to say, it's still corrupt for any practical purpose (and may in turn lead to taking code paths that are not supposed to ever happen etc).
> If you really care about performance, probably best to avoid Python entirely
This has been true forever. Nothing more needs to be said. Please, avoid Python.
On the other hand, I’ve never had issues with Python performance, in 20 years of using it, for all the reasons that have been beaten to death.
It’s great that some people want to do some crazy stuff to CPython, but honestly, don’t hold your breath. Please don’t use Python if Python interpreter performance is your top concern.
Arguably, it's a step in the wrong direction. Share memory by communicating is already doable in Python with Pipe() and Queue() and side steps the issue entirely.
whether 'someField' is volatile or not. The volatile just affects the load/store semantics of the GETFIELD/PUTFIELD ops. For atomic increment you have to go through something like AtomicInteger that will internally use an Unsafe instance to ensure it emits a platform-specific atomic increment instruction.
Python and Javascript (in the browser) due to their single threaded nature. C++ too as long as you have a std::atomic on the left hand side (since they overload the operator).
There is NB_INPLACE_ADD... but I'm struggling to find enough details to be truly confident :\ possibly its existence is misleading other people (thus me) to think += is a single operation in bytecode.
> It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
Create or extend a list of answers to:
What heuristics predict that code will fail in CPython's nogil "free threaded" mode?
But as an example neither include PySequence_Fast which is in the json.c changes I pointed to. The folks doing the auditing of stdlib do have an idea of what they are looking for, and so would be best suited to keep a list (and tool) up to date with what is needed.
A list of Issue and PR URLs that identify and fix free threading issues would likely also be of use for building a 2to3-like tool to lint and fix C extensions to work with CPython free threading nogil mode
I agree and honestly it may as well be considered a form of ABI incompatibility. They should make this explicit such that existing C extensions need to be updated to use some new API call for initialization to flag that they are GILless-ready, so that older extensions cannot even successfully be loaded when GIL is disabled.
This has already been done. There is a 't' suffix in the ABI tag.
You have to explicitly compile the extension against a free threaded interpreter in order to get that ABI tag in your extension and even be able to load the extension. The extension then has to opt-in to free threading in its initialization.
If it does not opt-in then a message appears saying the GIL has been enabled, and the interpreter continues to run with the GIL.
This may seem a little strange but is helpful. It means the person running Python doesn't have to keep regular and free threaded Python around, and duplicate sets of extensions etc. They can just have the free threaded one, anything loaded that requires the GIL gives you the normal Python behaviour.
What is a little more problematic is that some of the standard library is marked as supporting free threading, even though they still have the audit and update work outstanding.
Also the last time I checked, the compiler thread sanitizers can't work with free threaded Python.
the problem with that is it effects the entire application and makes the whole thing free-threading incompatible.
it's quite possible to make a python app that requires libraries A and B to be able to be loaded into a free-threaded application, but which doesn't actually do any unsafe operations with them. we need to be able to let people load these libraries, but say: this thing may not be safe, add your own mutexes or whatever
SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
Unless you compile SQLite yourself, you'll find the maximum mmap size is 2GB. ie even with your pragma above, only the first 2GB of the database are memory mapped. It is defined by the SQLITE_MAX_MMAP_SIZE compile time constant. You can use pragma compile_options to see what the value is.
SQLite has 32 bit limits. For example the largest string or blob it can store is 2GB. That could only be addressed by an incompatible file format change. Many APIs also use int in places again making limits be 32 bits, although there are also a smattering of 64 bit APIs.
Changing this default requires knowing it is a 64 bit platform when the C preprocessor runs, and would surprise anyone who was ok with the 2GB value.
There are two downsides of mmap - I/O errors can't be caught and handled by SQLite code, and buggy stray writes by other code in the process could corrupt the database.
It is best practise to directly include the SQLite amalgamation into your own projects which allows you to control version updating, and configuration.
The general test suite is not proprietary, and is a standard part of the code. You can run make test. It uses TCL to run the testing, and covers virtually everything.
There is a separate TH3 test suite which is proprietary. It generates C code of the tests so you can run the testing in embedded and similar environments, as well as coverage of more obscure test cases.
This isn't an issue as SQLite doesn't accept contributions because they don't want to risk someone submitting proprietary code and lying about its origin.
I've never understood why other large open-source projects are just willing to accept contributions from anyone. What's the plan when someone copy-pastes code from some proprietary codebase and the rights holders finds it?
If the rights holder is particularly litigious then I could see them suing even if you agreed to take out their code under the argument that you've distributed it and profited from it. I don't know if there's been any cases of this historically but I'd be surprised if there hasn't been.
I'd love to see an analysis of byte ordering impact on CPU implementation. Does little vs big endian make any difference to the complexity of the algorithms and circuits?
This is how Apple addressed audio hardware and do something similar for speakers. Instead of trying to make speakers that have the desired frequency response or microphones that produce the desired signal, they let the analog hardware do whatever it does.
Then in software they use digital signal processing. For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response, and for the microphones they do this work to extract the desired signal.
If Linux addressed the speakers as is, you would get unpleasant sound, and if it read the microphones as is, it would get a lot of noise. That is why Asahi had to add digital signal processing to the audio input and output, to get the "correct" audio.
It does mean the processing is specific to the analogue audio hardware in each of the different Mac models.
The processing could be done in additional hardware, but why bother when you have a very good CPU that can do the work.
> For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response
As I understand, this is not a magic pill: it probably won't help to pull out frequencies which are suppressed by 30-40 dB and I assume that if the frequency response graph is too wavy (lot of narrow peaks and dips), it won't help either.
Also, you need to have calibration files to use this method, right?
Yes you need calibration files for supported models. You can see the details and explanation at the asahi audio repository. They also criticize the MacOS curves, and point out how some Windows vendors are doing the same DSP approach.
By the way I now realized that simply adding an equalizer before the amp might be not enough; speakers typically produce different sound in different directions, so for a perfect sound you need to somehow track location of the head and adjust filter curves.
SQLite has a session extension that can record changes on a local database into a changeset and you can replay those changes on another SQLite instance. Note that it replays what the changes were, not the queries that resulted in the changes. When applying changes you provide a conflict handler. (You can also invert changesets making a handy undo/redo feature.)
You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.
there's also a CRDT version of this, which allows two databases to be sync'ed to each other in real time (aka, updates to one will eventually make it to the other, and both database would eventually contain the same data).
It's https://vlcn.io/docs/cr-sqlite/intro , and i find it amazing that this is doable in sqlite. It is perfect for small scale collaboration imho, but it also works to sync across local client and remote server (for a single db per user scenario).
Interesting link, it'd be great if their solution meets expectations.
Right now, the proof-of-concept they've provided seems simplistic. Their progress seems to have shifted from cr-sqlite into "Zero" instead. I'm guessing it has something to do with CRDTs being quite app-specific and hard to generalize.
I would want to see this library used in production first before hyping it
in a sense it is quite specific. In a difference sense, this is as generic a CRDT as you can get - it's CRDT on table(s). There's no merging of rows iirc (unless you write a custom merge, which is supported but probably need some tweaking and could lead to poor results?).
My sweet spot is working as a developer in a small team that has to do all the work to ship the product, using Python for practical and productive development, and as glue, C for performance and lower levels, and domain specific tools and languages when necessary.
https://github.com/python/cpython/pull/119438/files#diff-efe...
This is how much of the standard library has been audited:
https://github.com/python/cpython/issues/116738
The json changes above are in Python 3.15, not the just released 3.14.
The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.
It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
reply