So many thoughts on this. The community has definitely ebbed and flowed, on this for a while. A few varying pieces of insight with no intention other than to share a bit more on the PG community. And I'm sure some current and former colleagues already in comment threads are going to correct me on nuance of a lot of this.
For several years there were no new committers at all. In recent years the team has tried to be a little more intentional about adding new ones and culling those no longer involved.
About 15 years there was a phase of letting a lot of younger people earn their commit bit. I can recall 3 people by name that all got a commit bit before the age of 25, and they may have actually all been under 22. One of those three shortly after moved on to work outside of the Postgres community, another quietly was busy on other things for over 10 years before coming back, and the third was actively involved going forward. I suspect there was some unease of folks getting a commit bit and then sort of falling off a cliff so it slowed for a few years on adding new folks. Edit - sounds like it was less age driven but maybe still slightly related to some folks falling off that there was a slow down in new committers – tldr - you're not getting a commit bit right out of college for Postgres.
What to me would be interesting but likely hard to gather is what age to people become a committer to Postgres. It wouldn't surprise me if the average age of getting a commit bit is closer to 45 than not. Many folks contributing come to Postgres after other systems work or just don't consider contributing to they're a bit more seasoned because it feels intimidating–I mean patches sent on a mailing list who does that any more? Postgres thats who.
I have the honor of working with a Postgres ~committer~ contributor who was just over 25 when they first contributed! The story about their first commit is great:
They were testing SQL behavior for Materialize and thought to check that both systems handle interval functions identically. Being thorough, they tried something like:
select interval '0.5 months 2147483647 days';
You can try it yourself on dbfiddle[0] Instead of erroring, Postgres returned a bogus value `{"days":-2147483634}` you can read why here[1]
So naturally they decided to fix it in Postgres, which is why they contributed and why it's handled properly in 15+ [2]
> I have the honor of working with a Postgres committer ...
That's not a committer, that's someone who submitted a patch that got committed. A committer is the one who actually applies the patch and can push the branch into the mainline repo. Committers decide if something is worthy of being merged.
Now that aside, yes this plus reviewing patches to get a wider feel for the codebase is how you eventually become a committer.
Best way to eat an elephant is one bite at at time.
This is a common source of confusion for a ton of folks. Anyone can submit a patch, but commit bits are reserved for a much smaller list. The attitude is something like you commit it, you maintain it–so if bugs come in you'll spend your time fixing those for whatever time it takes vs. working on the next shiny feature that you're excited about for the next release.
There was sort of a fuzzy "major" contributors (https://www.postgresql.org/community/contributors/) which were people that contributed major features and then a list of other contributors. Depending on who you talk to this is either dated or a pretty close attempt at reflection of reality but not perfect. In recent years they expanded the contributors to include others that were contributing in non-code ways though it's still a decent place to find people contributing to major feature sets.
Of course this is not to be confused with the core team–which is more like a steering committee. But not so much steering committee of code and feature sets.
The thing about becoming a PG contributor is that the barrier to entry is fairly high.
I love Postgres so much I have a PG tattoo, but from the perspective of the two ways you can contribute:
- As a random user, in your free time: There's not a ton of "Good first issue" type tickets. Where you can ease your way into PG dev by working on something that doesn't require you to have context on many parts of the PG architecture and at least a little historical knowledge on why things are written the way they are. Also, it can be a bit intimidating to have your patches reviewed by the likes of Tom or Andres.
- As a developer for a paid PG company like EDB/PG Pros/Crunchy etc: It's a sort of Catch-22 scenario here, where it's difficult to get hired as a junior without having previous PG hacking experience, but the path to doing that is not the easiest thing in the world.
If I was going to work somewhere that wasn't $CURRENT_CO, it'd be somewhere doing PG work, but there's not a lot of viable avenues/inroads there.
PostgreSQL isn't that special as a codebase. Every codebase has its quirks, every project has its own processes and there's a learning curve. When you switch to a new job as a software engineer, you pick it up. PostgreSQL is no different: you can hire an engineer to work on PostgreSQL.
I'm not sure how well that path works in growing new contributors, though. In a usual company setting, the goals are better defined, and the company is in control. Once you reach the goals, mission accomplished. With an open source project it's more nebulous. Others might have different criteria and different priorities. You are not in control. Choosing the right problems to work on is important.
Other storage or database projects would be a good source of new contributors. If you have worked on another DBMS, you're already familiar with the domain, and the usual techniques and tradeoffs. But to stick around, you need some internal desire to contribute, not just achieve some specific goals.
The biggest hurdle I see is that it is a C project, unfortunately something we can do nothing about. It is so much harder to trust a random code not have to have serious implications for the database. It will take ages for someone to get comfortable with the pg-code-base way of handling errors, basic string manipulation, memory alloc/free etc.
I want to highlight the difference in "making a non-core contribution" to "understanding database internals". I am highlighting it is not the latter, but the former that is the first hurdle.
I wanted to reuse builtin pg code to parse the printed statements from logs - I ended up writing a parser (in a non-C language) myself which was faster.
Couple of points in this post, so will address a few of them:
"(Paraphrased) C is bad, and it takes forever to pick up the PG-specific C idioms"
There's probably not a productive conversation to be had about C as a language. I will say that as of C23, the language is not quite as barebones as it used to be and incorporates a lot of modern improvements.
On the topic of PG-specific C -- there are a handful of replacements for common operations that you use in PG. Things like "palloc/pfree", and the built-in macros for error and warning logging, etc.
I genuinely don't think it would take a motivated party more than a day or two to pick all of these up -- there aren't that many of them and they tend to map to things you're already used to.
"I wanted to reuse builtin pg code to parse the printed statements from logs - I ended up writing a parser (in a non-C language) myself which was faster."
It's true that the core PG code isn't written in a modular way that's friendly to integration piecemeal in other projects (outside of libpq).
For THIS PARTICULAR case, the pganalyze team has actually extracted out the parser of PG for including in your own projects:
libpg_query is a godsend of a library. I spent a lot of time writing a custom parser before I found it - was very happy to replace the whole thing. A major boon was the fingerprinting ability - one of my needs was to track query versions in metadata.
I disagree on this. Yes it's C. But I've heard people comment "I don't like writing C, but I don't mind Postgres C".
The bigger hurdle which Peter mentioned in another thread is simply building up enough expertise with the system and having the right level of domain expertise.
I found that I learned a lot when trying to write a logical decoding plugin. So I guess if you are a user of Postgres and there’s some small friction you could reduce by writing a plugin, it’s a good way to get started. Scratch your own itch, you don’t have to publish the results :-)
I don't have the data for the average age, but I was recently in a conversation around how long does it take to become a committer since getting involved in Postgres by writing code for it.
So, I wrote a couple git commands like below [1] to figure out when someone was first named in a commit message vs when they made their first commit (as a committer) for the last 10 people who became committers.
The average time of involvement was ~8.9 years (just comparing month / year), with the lowest being ~6.5 years.
Obviously one could do better analysis but my goal was just to get an approximate understanding.
This is counting non-empty lines. It's definitely not a good measure of overall code size, as it includes things like regression tests "expected" files. But as that's true for all versions, it should still allow for a decent comparison.
8.3.0 was released 2008-02-01, with 2M non-empty lines, we're now at 3.4M.
great contribution here from Craig, in terms of the ebbs and flows and useful history. i had no idea about that cluster of folks under 22 with commit bits.
For several years there were no new committers at all. In recent years the team has tried to be a little more intentional about adding new ones and culling those no longer involved.
About 15 years there was a phase of letting a lot of younger people earn their commit bit. I can recall 3 people by name that all got a commit bit before the age of 25, and they may have actually all been under 22. One of those three shortly after moved on to work outside of the Postgres community, another quietly was busy on other things for over 10 years before coming back, and the third was actively involved going forward. I suspect there was some unease of folks getting a commit bit and then sort of falling off a cliff so it slowed for a few years on adding new folks. Edit - sounds like it was less age driven but maybe still slightly related to some folks falling off that there was a slow down in new committers – tldr - you're not getting a commit bit right out of college for Postgres.
What to me would be interesting but likely hard to gather is what age to people become a committer to Postgres. It wouldn't surprise me if the average age of getting a commit bit is closer to 45 than not. Many folks contributing come to Postgres after other systems work or just don't consider contributing to they're a bit more seasoned because it feels intimidating–I mean patches sent on a mailing list who does that any more? Postgres thats who.