We updated our Git version which made this change for the reasons explained. At the time we didn't foresee the impact. We're quickly rolling back the change now, as it's clear we need to look at this more closely to see if we can make the changes in a less disruptive way. Thanks for letting us know.
Consumers often mistake hasn’t changed for a commitment to never change: any sufficiently large product will be littered with these kind of implicit commitments made by the product to consumers that nobody has visibility into. You’re unfortunate that we were all relying on this commitment you’ve never made, but the quick reversion is the best we can hope for. People will theorise how this could have been avoided but c’est la vie — easy mistake that you’ve responded well to.
With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
At this point they'll be stuck on old git for all of eternity unless they just roll their own archive/compress step out of band so the old hashes still work. Yikes.
Of the 11,656 packages in OpenBSD’s package repos, 2,984 are built from source originally hosted on GitHub or Sourceforge. That’s a full 25%.
Moralize all you want about where these upstreams should host their software, but why claim that the downstream package manager is “poorly implemented” to fetch source code from those hosts? Your complaint was not technical—you imply the proprietariness of Microsoft servers is the problem (although open source servers like GitLab also have the problem of unstable checksums)—but HTTPS is HTTPS.
I'm fairly certain that Homebrew doesn't function without GitHub. The index (and Cargo's index, and probably others) is hard-coded to be hosted on GitHub.
> Moralize all you want about where these upstreams should host their software, but why claim that the downstream package manager is “poorly implemented” to fetch source code from those hosts?
Because it should validate checksums of content of the tarball, instead of just the outside blob.
Then:
* you don't care about compression method or implementation
* you don't care about archive method or implementation
* your system works just as well for "download a tarball" as for "shallow copy the remote repo
I am not sure how you could not rely on GitHub when packaging code that is hosted on GitHub.
My personal Gentoo ebuilds for example contain a URI variable that points to the GitHub auto-generated archives for projects that use GitHub for hosting. What am I supposed to do in this case?
The only option here is to setup a mirror and have a backup of the data. Packages that are in the official Gentoo repository do get mirrored, but overlays do not, and most people probably don't have the ability, time or money to setup their own mirroring service.
I agree that relying on GitHub sucks to be clear, but I don't think we can blame package managers for fetching code from it when people are hosting their projects there!
> I think you meant _poorly implemented_ open source packaging systems.
or under-resourced ones. If the upstream source only appears on GitHub, without formal release tarballs, your only options as a downstream packager are literally to get the source from GitHub or host your own mirror of every source tarball you build yourself.
Downloading a source tarball is significantly cheaper on both sides than git. A source tarball is 100% served from CDNs, whereas I don't believe the same isn't quite true for git (even over https).
It's way more resource-intensive and much slower, which is why it's not preferred in Nixpkgs, for example.
But it's also vulnerable to the same problem in that your package manager's build system is still dependent on GitHub. It will take more to screw you up, but a whole GitHub outage, for example, will definitely still hurt.
Sadly there has been a sharp uptick in software that provides no release tarballs anymore. With the rise of GitHub many upstreams choose to make a tag and let people download the autogenerated tarballs, despite the fact that they won’t contain preprocessed autoconf or (more importantly) any Git submodules.
The situation is deteriorating further as some projects make no releases at all, assuming users will add the project’s own package mirror to the user’s trusted package repositories, or use Docker.
> many upstreams choose to make a tag and let people download the autogenerated tarballs
Which is fine as long as you rely on the hash of the tag rather than the hash of the tarball.
> despite the fact that they won’t contain preprocessed autoconf
This is a feature; run `autoreconf -vfi` at build time, so that you don't depend on the maintainer's idiosyncratic autotools setup and local macros, and so that you can reliably regenerate it all if you want to change configure.ac or Makefile.am.
> This is a feature; run `autoreconf -vfi` at build time, so that you don't depend on the maintainer's idiosyncratic autotools setup and local macros, and so that you can reliably regenerate it all if you want to change configure.ac or Makefile.am.
On a package bulk build machine, that’s a lot (like, a lot) of wasted CPU cycles multiplied by the thousands of packages that use autoconf. For the majority of packages that don’t patch configure.ac or Makefile.am, it’s nicer to use a preprocessed tarball and check that you can reproduce the same autoconf output when adding the package to the package manager, because then it only happens once.
There are a lot of things that waste cycles on a build machine, but they're worth the reproduciblity. I think that problem would be better solved via caching, ideally.
I would hazard a guess that there are far fewer people these days who download a tarball and `./configure` `make` `make install` than there are distros (who often need to patch) and developers (who will be working from git anyway).
Having autocruft in the default tarball is a design flaw in autotools, you should never ship prebuilt files in git nor in source tarballs. I think `make distcheck` should by default put the autotools files into a separate foo-1.2.3.4-2023-01-31-autocruft.tar alongside the real source tarball generated by git-archive.
The distros always run `autoreconf` these days, so that they can verify that they can still build the build system from the source configure.ac/Makefile.am files.