More

rogerbinns · 2025-10-25T15:22:07 1761405727

C code needs to be updated to be safe in a GIL free execution environment. It is a lot of work! The pervasive problem is that mutable data structures (lists, dict etc) could change at any arbitrary point while the C code is working with them, and the reference count for others could drop to zero if *anyone* is using a borrowed reference (common for performance in CPython APIs). Previously the GIL protected where those changes could happen. In simple cases it is adding a critical section, but often there multiple data structures in play. As an example these are the changes that had to be done to the standard library json module:

https://github.com/python/cpython/pull/119438/files#diff-efe...

This is how much of the standard library has been audited:

https://github.com/python/cpython/issues/116738

The json changes above are in Python 3.15, not the just released 3.14.

The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.

It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.

dehrmann · 2025-10-25T17:49:32 1761414572

I think Java got this mostly right. On the threading front, very little is thread-safe or atomic (x += 1 is not thread-safe), so as soon as you expose something to threads, you have to think about safe access. For interacting with C code, your choices are either shared buffers or copying data between C and Java. It's painful, but it's needed for memory safety.

rogerbinns · 2025-10-25T20:41:29 1761424889

The core Python data structures are atomic to Python developers. eg there is no way you can corrupt a list or dictionary no matter how much concurrency you try to use. This was traditionally done under the protection of the global interpreter lock which ensured that only one piece of C code at a time was operating with the internals of those objects. C code can also release the GIL eg during I/O, or operations in other libraries that aren't interacting with Python objects, allowing concurrency.

The free threaded implementation adds what amounts to individual object locks at the C level (critical sections). This still means developers writing Python code can do whatever they want, and they will not experience corruption or crashes. The base objects have all been updated.

Python is popular because of many extensions written in C, including many in the standard library. Every single piece of that code must be updated to operate correctly in free threaded mode. That is a lot of work and is still in progress in the standard library. But in order to make the free threaded interpreter useful at this point, some have been marked as free thread safe, when that is not the case.

hunterpayne · 2025-10-25T21:27:27 1761427647

So its the worst of all possible worlds then. It has the poorest performance due to forced locking even when not necessary and if you load a library in another language (C), then you can still get corruptions. If you really care about performance, probably best to avoid Python entirely, even when its compiled like it is in CPython.

PS For extra fun, learn what the LD_PRELOAD environmental variable does and how it can be used to abuse CPython (or other things that dynamically load shared objects).

int_19h · 2025-10-29T10:26:22 1761733582

A library written in another language would have a Python extension module wrapping it, which would still hold the GIL for the duration of the native call (it can be released, but this is opt-in not opt-out), so that is usually not the issue with this arrangement.

The bigger problem is that it teaches people dangerously misguided notions such as "I don't need to synchronize if I work with built-in Python collections". Which, of course, is only true if a single guaranteed-atomic operation on the collection actually corresponds to a single logical atomic operation in your algorithm. What often happens is people start writing code without locks and it works, so they keep doing it until at some point they do something that actually requires locking (like atomic remove from one collection & add to another) without realizing that they have crossed a line.

Interestingly, we've been there before, multiple times even. The original design of Java collections entailed implicit locking on every operation, with the same exact outcome. Then .NET copied that design in its own collections. Both frameworks dropped it pretty fast, though - Java in v1.2 and .NET in v2.0. But, of course, they could do it because the locking was already specific to collections - it wasn't a global lock used for literally every language object, as in Python.

rogerbinns · 2025-10-26T00:22:38 1761438158

It is multiple fine grained locking versus a single global lock. The latter lets you do less locking, but only have a single thread of execution at a time. The former requires more locking but allows multiple concurrent threads of execution. There is no free lunch. But hardware has become parallel so something has to be done to take advantage of that. The default Python remains the GIL version.

The locking is all about reading and writing Python objects. It is not applicable to outside things like external libraries. Python objects are implemented in C code, but Python users do not need to know or care about that.

As a Python user you cannot corrupt or crash things by code you write no matter how hard you try with mutation and concurrency. The locking ensures that. Another way of looking at Python is that it is a friendly syntax for calling code written in C, and that is why people use it - the C code can be where all the performance is, while retaining the ergonomic access.

C code has to opt in to free threading - see my response to this comment

https://news.ycombinator.com/item?id=45706331

It is true that more fine grained locking can end up being done than is strictly necessary, but user's code is loaded at runtime, so you don't know in advance what could be omitted. And this is the beginning of the project - things will get better.

Aside: Yes you can use ctypes to crash things, other compiled languages can be used, concurrency is hard

int_19h · 2025-10-29T10:28:30 1761733710

It depends on how you define "corruption". You can't get a torn read or write, or mess up a collection to the point where attempts to use it will segfault, sure. You can still end up with corrupt data in a sense of not upholding the expected logic invariants, which is to say, it's still corrupt for any practical purpose (and may in turn lead to taking code paths that are not supposed to ever happen etc).

Demiurge · 2025-10-26T04:26:32 1761452792

> If you really care about performance, probably best to avoid Python entirely

This has been true forever. Nothing more needs to be said. Please, avoid Python.

On the other hand, I’ve never had issues with Python performance, in 20 years of using it, for all the reasons that have been beaten to death.

It’s great that some people want to do some crazy stuff to CPython, but honestly, don’t hold your breath. Please don’t use Python if Python interpreter performance is your top concern.

AlphaSite · 2025-10-25T22:40:06 1761432006

It’s another step in the right direction. These things take time.

ReflectedImage · 2025-10-26T01:15:59 1761441359

Arguably, it's a step in the wrong direction. Share memory by communicating is already doable in Python with Pipe() and Queue() and side steps the issue entirely.

hunterpayne · 2025-10-25T21:19:06 1761427146

> x += 1 is not thread-safe

Nit, that's true iff x is a primitive without the volatile modifier. That's not true for a volatile primitive.

wmanley · 2025-10-25T22:50:51 1761432651

Even with volatile it’s a load and then a store no? It may not be undefined behaviour, but I don’t think it will be atomic.

diek · 2025-10-26T01:27:50 1761442070

You're correct. If you have:

  public int someField;
 
  public void inc() {
    someField += 1;
  }

that still compiles down to:

  GETFIELD [someField]
  ICONST_1
  IADD
  PUTFIELD [somefield]

whether 'someField' is volatile or not. The volatile just affects the load/store semantics of the GETFIELD/PUTFIELD ops. For atomic increment you have to go through something like AtomicInteger that will internally use an Unsafe instance to ensure it emits a platform-specific atomic increment instruction.

colonCapitalDee · 2025-10-25T18:14:03 1761416043

Is compound assignment atomic in any major language?

delusional · 2025-10-25T19:37:44 1761421064

Python and Javascript (in the browser) due to their single threaded nature. C++ too as long as you have a std::atomic on the left hand side (since they overload the operator).

Groxx · 2025-10-25T18:47:59 1761418079

it has been in Python due to the GIL.

i80and · 2025-10-25T20:35:49 1761424549

It's not atomic even with the GIL, though: another thread can run in between the bytecode's load and increment, right?

The GIL's guarantees didn't extend to this.

Groxx · 2025-10-25T22:58:35 1761433115

There is NB_INPLACE_ADD... but I'm struggling to find enough details to be truly confident :\ possibly its existence is misleading other people (thus me) to think += is a single operation in bytecode.

Or, on further reading, maybe it applies to anything that implements `_iadd_` in C. Which does not appear to include native longs: https://github.com/python/cpython/blob/main/Objects/longobje...

arccy · 2025-10-25T18:52:25 1761418345

no... but some languages may disallow simultaneously holding a reference in different execution threads

westurner · 2025-10-25T16:09:34 1761408574

> It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.

Create or extend a list of answers to:

What heuristics predict that code will fail in CPython's nogil "free threaded" mode?

rogerbinns · 2025-10-25T16:26:50 1761409610

Some of that is already around, but scattered across multiple locations. For example there is a list in the Python doc:

https://docs.python.org/3/howto/free-threading-extensions.ht...

And a dedicated web site:

https://py-free-threading.github.io/

But as an example neither include PySequence_Fast which is in the json.c changes I pointed to. The folks doing the auditing of stdlib do have an idea of what they are looking for, and so would be best suited to keep a list (and tool) up to date with what is needed.

westurner · 2025-10-26T05:25:17 1761456317

A list of Issue and PR URLs that identify and fix free threading issues would likely also be of use for building a 2to3-like tool to lint and fix C extensions to work with CPython free threading nogil mode

sgammon · 2025-10-25T20:33:13 1761424393

> “at least scan a C code base to detect free threaded issues”

if such a thing were possible, thread coordination would not have those issues in the first place

rogerbinns · 2025-10-25T20:44:46 1761425086

Some examples of what it could do when using the C Python APIs:

* Point out using APIs that return borrowed references

* Suggest assertions that critical sections are held when operating on objects

* Suggest alternate APIs

* Recognise code patterns that are similar to those done during the stdlib auditing work

The compiler thread sanitizers didn't work the last time I checked - so get them working.

Edit: A good example of what can be done is Coccinelle used in the Linux kernel which can detect problematic code (locking is way more complex!) as well as apply source transformations. https://www.kernel.org/doc/html/v6.17/dev-tools/coccinelle.h...

radarsat1 · 2025-10-25T19:26:44 1761420404

I agree and honestly it may as well be considered a form of ABI incompatibility. They should make this explicit such that existing C extensions need to be updated to use some new API call for initialization to flag that they are GILless-ready, so that older extensions cannot even successfully be loaded when GIL is disabled.

rogerbinns · 2025-10-25T20:15:15 1761423315

This has already been done. There is a 't' suffix in the ABI tag.

You have to explicitly compile the extension against a free threaded interpreter in order to get that ABI tag in your extension and even be able to load the extension. The extension then has to opt-in to free threading in its initialization.

If it does not opt-in then a message appears saying the GIL has been enabled, and the interpreter continues to run with the GIL.

This may seem a little strange but is helpful. It means the person running Python doesn't have to keep regular and free threaded Python around, and duplicate sets of extensions etc. They can just have the free threaded one, anything loaded that requires the GIL gives you the normal Python behaviour.

What is a little more problematic is that some of the standard library is marked as supporting free threading, even though they still have the audit and update work outstanding.

Also the last time I checked, the compiler thread sanitizers can't work with free threaded Python.

electroglyph · 2025-10-25T19:34:18 1761420858

the problem with that is it effects the entire application and makes the whole thing free-threading incompatible.

it's quite possible to make a python app that requires libraries A and B to be able to be loaded into a free-threaded application, but which doesn't actually do any unsafe operations with them. we need to be able to let people load these libraries, but say: this thing may not be safe, add your own mutexes or whatever

rogerbinns · 2025-09-18T18:37:47 1758220667

Apparently using Linux does the trick too. I have no idea what technical limitation exists to prevent the code from working on Linux.

rogerbinns · 2025-09-05T00:08:04 1757030884

SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.

https://sqlite.org/sessionintro.html

That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:

https://rogerbinns.github.io/apsw/example-session.html

rogerbinns · 2025-07-15T15:09:17 1752592157

Unless you compile SQLite yourself, you'll find the maximum mmap size is 2GB. ie even with your pragma above, only the first 2GB of the database are memory mapped. It is defined by the SQLITE_MAX_MMAP_SIZE compile time constant. You can use pragma compile_options to see what the value is.

https://sqlite.org/compile.html#max_mmap_size

Ubuntu system pragma compile_options:

    MAX_MMAP_SIZE=0x7fff0000

otterley · 2025-07-15T16:27:34 1752596854

That seems like a holdover from 32-bit days. I wonder why this is still the default.

rogerbinns · 2025-07-16T12:49:33 1752670173

SQLite has 32 bit limits. For example the largest string or blob it can store is 2GB. That could only be addressed by an incompatible file format change. Many APIs also use int in places again making limits be 32 bits, although there are also a smattering of 64 bit APIs.

Changing this default requires knowing it is a 64 bit platform when the C preprocessor runs, and would surprise anyone who was ok with the 2GB value.

There are two downsides of mmap - I/O errors can't be caught and handled by SQLite code, and buggy stray writes by other code in the process could corrupt the database.

It is best practise to directly include the SQLite amalgamation into your own projects which allows you to control version updating, and configuration.

wmanley · 2025-07-16T19:40:33 1752694833

>There are two downsides of mmap - I/O errors can't be caught and handled by SQLite code,

True. https://www.sqlite.org/mmap.html lists 3 other issues as well.

> and buggy stray writes by other code in the process could corrupt the database.

Not true: "SQLite uses a read-only memory map to prevent stray pointers in the application from overwriting and corrupting the database file."

otterley · 2025-07-16T16:41:42 1752684102

All great points. Thank you!

rogerbinns · 2025-05-27T14:10:29 1748355029

The general test suite is not proprietary, and is a standard part of the code. You can run make test. It uses TCL to run the testing, and covers virtually everything.

There is a separate TH3 test suite which is proprietary. It generates C code of the tests so you can run the testing in embedded and similar environments, as well as coverage of more obscure test cases.

https://sqlite.org/th3.html

OJFord · 2025-05-27T15:21:41 1748359301

Why is that? Surely that leads to conversations with open source contributors like 'this fails the test suite, but I can't show you, please fix it'?

LiamPowell · 2025-05-27T15:35:40 1748360140

This isn't an issue as SQLite doesn't accept contributions because they don't want to risk someone submitting proprietary code and lying about its origin.

I've never understood why other large open-source projects are just willing to accept contributions from anyone. What's the plan when someone copy-pastes code from some proprietary codebase and the rights holders finds it?

OJFord · 2025-05-27T20:32:03 1748377923

Partly why they have CLAs I suppose?

If someone sells me something they stole, I'm not on the hook for the theft.

Vvector · 2025-05-27T15:55:59 1748361359

The "plan" is to take out the contaminated code and rewrite it.

LiamPowell · 2025-05-27T16:05:55 1748361955

If the rights holder is particularly litigious then I could see them suing even if you agreed to take out their code under the argument that you've distributed it and profited from it. I don't know if there's been any cases of this historically but I'd be surprised if there hasn't been.

Vvector · 2025-05-29T21:59:07 1748555947

Every open source project has the possibility of litigation. Can't always live in fear of the bogeyman

timewizard · 2025-05-27T18:10:48 1748369448

The same issue is present with the use of LLMs. Are you absolutely sure it didn't just repeat some copyrighted code at you?

jitl · 2025-05-27T15:32:36 1748359956

SQLite doesn’t accept contributions

rogerbinns · 2025-05-11T16:43:09 1746981789

I'd love to see an analysis of byte ordering impact on CPU implementation. Does little vs big endian make any difference to the complexity of the algorithms and circuits?

rogerbinns · 2025-03-24T15:58:31 1742831911

This is how Apple addressed audio hardware and do something similar for speakers. Instead of trying to make speakers that have the desired frequency response or microphones that produce the desired signal, they let the analog hardware do whatever it does.

Then in software they use digital signal processing. For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response, and for the microphones they do this work to extract the desired signal.

If Linux addressed the speakers as is, you would get unpleasant sound, and if it read the microphones as is, it would get a lot of noise. That is why Asahi had to add digital signal processing to the audio input and output, to get the "correct" audio.

It does mean the processing is specific to the analogue audio hardware in each of the different Mac models.

The processing could be done in additional hardware, but why bother when you have a very good CPU that can do the work.

codedokode · 2025-03-24T16:07:13 1742832433

> For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response

As I understand, this is not a magic pill: it probably won't help to pull out frequencies which are suppressed by 30-40 dB and I assume that if the frequency response graph is too wavy (lot of narrow peaks and dips), it won't help either.

Also, you need to have calibration files to use this method, right?

rogerbinns · 2025-03-25T00:39:46 1742863186

Yes you need calibration files for supported models. You can see the details and explanation at the asahi audio repository. They also criticize the MacOS curves, and point out how some Windows vendors are doing the same DSP approach.

https://github.com/AsahiLinux/asahi-audio

codedokode · 2025-03-25T09:41:18 1742895678

By the way I now realized that simply adding an equalizer before the amp might be not enough; speakers typically produce different sound in different directions, so for a perfect sound you need to somehow track location of the head and adjust filter curves.

cship2 · 2025-03-24T16:40:46 1742834446

Interesting, does that means Mac speakers may be great for certain sounds, but not others.

crazygringo · 2025-03-24T17:14:30 1742836470

I mean, Apple uses high quality speakers to begin with, as far as laptops go. I'm sure they're not making 40 dB corrections, that would be ginormous.

Yes, I would be very surprised if they weren't using specific calibrations for each model. That's pretty basic.

rogerbinns · 2025-03-03T21:49:00 1741038540

SQLite has a session extension that can record changes on a local database into a changeset and you can replay those changes on another SQLite instance. Note that it replays what the changes were, not the queries that resulted in the changes. When applying changes you provide a conflict handler. (You can also invert changesets making a handy undo/redo feature.)

You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.

https://www.sqlite.org/sessionintro.html - overview

https://www.sqlite.org/session/sqlite3changeset_apply.html - conflict information

https://www.sqlite.org/session/rebaser.html - rebaser

chii · 2025-03-04T03:52:25 1741060345

there's also a CRDT version of this, which allows two databases to be sync'ed to each other in real time (aka, updates to one will eventually make it to the other, and both database would eventually contain the same data).

It's https://vlcn.io/docs/cr-sqlite/intro , and i find it amazing that this is doable in sqlite. It is perfect for small scale collaboration imho, but it also works to sync across local client and remote server (for a single db per user scenario).

hitekker · 2025-03-04T07:24:15 1741073055

Interesting link, it'd be great if their solution meets expectations.

Right now, the proof-of-concept they've provided seems simplistic. Their progress seems to have shifted from cr-sqlite into "Zero" instead. I'm guessing it has something to do with CRDTs being quite app-specific and hard to generalize.

I would want to see this library used in production first before hyping it

chii · 2025-03-04T07:34:44 1741073684

in a sense it is quite specific. In a difference sense, this is as generic a CRDT as you can get - it's CRDT on table(s). There's no merging of rows iirc (unless you write a custom merge, which is supported but probably need some tweaking and could lead to poor results?).

stronglikedan · 2025-03-04T15:39:50 1741102790

> in real time (aka, updates to one will eventually make it to the other

The term you're looking for is "eventual consistency".

roncesvalles · 2025-03-04T05:39:30 1741066770

This is just clobbering one of the divergent copies with per-field granularity.

rogerbinns · 2025-01-08T20:35:04 1736368504

Rump kernel/anykernel is the concept. The drivers can run in user space with minimal underlying support.

https://en.wikipedia.org/wiki/Rump_kernel

rogerbinns · 2025-01-02T20:13:19 1735848799

Location: Santa Cruz, California

Remote: Yes

Willing to relocate: No

Technologies: Python (including its C API), C, SQLite (C API), Linux, deep techie generalist

Résumé/CV: https://www.rogerbinns.com/rogerbresume.html

Email: rogerb@rogerbinns.com

My sweet spot is working as a developer in a small team that has to do all the work to ship the product, using Python for practical and productive development, and as glue, C for performance and lower levels, and domain specific tools and languages when necessary.