> When every commit is pushed to the parent repo by default, it encourages a working style in which every commit is tested first. It encourages thinking before acting. We believe this is an inherently good thing.
Sounds like a bad thing to me. Sometimes I just want to make sure my work is saved in remote before quitting for the day, even if I haven't rigorously tested said changes.
Yeah, I often do dummy commits with half-baked work just for fear of losing it. I don't work as a programmer so it doesn't really affect other people, but I don't see a way around it to be honest.
That part of the article isn't talking about WIP branches, experimental branches, etc. My personal Fossil development philosophy is that you can go wild on those. The article's point applies primarily to long-lived development branches, things like what Git now calls "main", which Fossil calls "trunk". Commits to such branches should always build and pass all tests on all supported platforms.
Serious projects with a non-trivial amont of people working on them don't rewrite the git history of "real" branches anyway, except maybe on very rare and targetted occasions (which are more to be understood as: let build a new repository with some resamblance to the old one)
I could see projects with a very low number of devs AND users doing that, but even then they should probably not.
Does that mean that all the tooling to cherry pick / rebase / filter / import-export patch series, shall disappear? Certainly not, because there are things that are _not_ main branches. An alternative could be to develop with completely different tooling, but I'm not sure what would be the point.
So it actually looks like Fossil and Git basic good practices are quite similar, and I really don't see the point in pretending that the lack of support by fossils for some operations is an advantage. E.g. part of preparation of a good patch series suitable for a proper review often involves reordering, mixing and splitting commits. It is easier to do if the tooling does not actually try to prevent you from doing it because of some kind of misapplied dogma...
I simply disagree with the position that it is better to not have the tools integrated for "history" rewriting in a version control system, esp. when that position is presented quite aggressively against VCS that have those tools, when such tools are in practice overwhelmingly used correctly (and where they should be abused overwhelmingly for it to become something to get ride of despite the existence of valid use cases, leaving a huge margins between the real and the straw man situation). I actually would not even call that part of (most of) the tooling "history rewriting" but more "patch preparation" tools, because that's what they are used for. In my books, conflating with the flow WIP changes with "real" commits has little value, but I see value in using some of the same tooling to handle the two. I also typically do not want to long-term track false starts and the WIP history, but if I wanted to I could with git, and that would even be trivial to do.
It is fine for a VCS to not propose anything for history rewriting, it is not fine to criticize others by employing a discourse making it look the mere existence of such possibilities is a mistake that is often abused, when it is not.
BUT if some people found some value in that and good alternative workflows for their projects, with Fossil, good for them. It is just that for now, on my side, I'm (highly) unconvinced. And I'm not forced to use that. And I don't. And if limitations are to be considered a main advantage, I could as well put a few aliases and conf in place to forbid be from invoking some git commands anyway, and continue to happily use the rest...
The only difference between a shelveset and a WIP branch is that shelving happens after the work, and creating a WIP branch happens before (or at least before the first commit).
Then there's everything that branching can do, that shelves cannot. For example: history, and undoing your mistakes/experiments. Shelvesets contain no history, and tacking dates/numbers after the name is exactly the problem that VCS sets out to solve. To name one example.
I'm a strong advocate of commit early, commit often, but not everything fits neatly into a 20 line change, particularly on a legacy c++ codebase. Shelving lets me pause and resume with a backup at the end of the day.
> history, and undoing your mistakes/experiments.
I don't need history or undoing mistakes when I'm shelving, I need a quick server side backup to show someone else something, or to "save" my work at the end of the day when it's going to run on 30 minutes into the next day.
Branches may be superior, but a kitchen knife is superior to a pocket knife, yet I don't pull out my 8 inch steel knife to open an Amazon parcel .
> Git strives to record what the development of a project should have looked like had there been no mistakes.
Hmm. I think I disagree. One can clean the commit history to look like that, but it's optional. You can merge all of your messy starts and restarts right into master if you want. Easier to do, in fact.
I've always advocated maintaining true, "messy" history (in git) amongst my teams. I know it's controversial, but anecdotally, verbose history has helped me understand the nuance of otherwise opaque changes and decisions so many times that I can't help but cringe when a bunch of context is lost to squashing for the sake of a "clean" history. Many of the technical arguments for squashing tend to be things that can be mitigated by simply traversing merge commits only.
TBH That's because squashing is the quick and dirty option. If I have time, I prefer to rebase and organise the changes into a number of commits each addressing one concern. The only thing that gets removed is bugs and typos which I were introduced on that feature branch, and then fixed again. No need for those to go upstream - it would just add noise.
Wouldn't _not_ squashing be sufficient, rather than having to keep all of the messy commits? That is, would you be satisfied with modifying history such that the commits are logically separate, even if it's not how the code actually evolved?
I don't understand the desire to squash every PR into one commit each either…
Sure, I'm not a total purist about it. What's important to me is maintaining the logical progression of a change, even if it includes some dead ends, non starters, or mistakes.
If you want to amend commits for a typo, code style fix, or otherwise inconsequential change that doesn't add any useful context for future developers, by all means go for it.
This is why the best Git workflows IMO involve rebasing and squashing the branch down to a couple of well-defined "logical commits", then merging that (with --no-ff of course) into master/main/trunk.
This forces individual programmers to think somewhat carefully about leaving a sensible commit history behind.
Right, I think there's a bit of subtlety lost here. "Squash and merge to trunk" is very different from "rebase on every pull on a feature branch" which is very different from "rebase and fixup/squash before merging to trunk" etc.
On a feature branch, whether you constantly rebase against master or merge from master probably should depend on the nature of the feature branch and what exactly has been happening on master. And it ultimately doesn't matter that much.
In my experience, as long as branches are ultimately merged to master with --no-ff, you can almost always recover a sensible commit history, and you should almost always end up with a sensible, tidy master to base new branches off of. If you rebase with squash/fixup on the feature branch before merging, all the better, because then you get nice "meaningful" commits on the feature branch. But even if that doesn't happen and the feature branch is a mess, at least you could write a meaningful merge commit message.
Maybe things get hairy with huge teams or huge projects that could necessitate a different workflow. But most developers are not working on projects like that.
Yep I agree. I tend to rebase my feature branches into a sensible history and then merge that in as I find that leaves the cleanest history that is still easily navigable.
I wish this workflow was better supported but since it isn't most people just default to squash and merge or rebase and merge.
I use Git every day, but I am curious about Fossil and have been researching it for a while. I am not planning on changing, as I need
to get to the bottom of some data corruption events I have
seen in their forums. Not saying it was the fault of Fossil,
just that it requires more research on my side.
We should not forget Version Management did not start with Git but has a vibrant history that began with SCCS, RCS, CVS, ClearCase, Perforce, SVN, etc... As I used all the systems above except for SCCS and RCS I would advise for some caution here on thinking Git core flows are the one true way. I know the authors of ClearCase though the same way :-)
If the author of SQLite advances some arguments around
the proper way of doing version management, I would advise to error on the side of caution and listen at least a second time. Do not throw away your own experience and insight, but at least listen carefully. Don't just dismiss the arguments as quickly as some are doing here.
I found this recent thread very useful not only as a features comparison but also as a core reminder of what the main use case for Fossil is:
"Fossil's reason for existence, plain and simple, is managing the source code for sqlite3. That's the scope of project it was initially envisioned for, and that size of project seems to be quite average for the FOSS world, falling clearly somewhere between "small" and "large." It does not scale well to projects with hundreds of thousands of files in any given version and/or tens of GB of data."
This is obviously written by someone who's never done any sort of review-based workflow. Trying to review the "what you did" with all the random false starts, bug fixes, and who knows what, would be a nightmare; as would just reviewing the megapatch squashed from the end of the series. Presenting the final result of your "feature branch" as a series of self-contained changes in a logical order is the only sustainable model for a project of any size.
EDIT: I really wanted to like Fossil when I first heard about it -- having it based on SQLite (which is AWESOME), having issue tracker / wiki / whatever built in, etc. sounded great. When I learned that it was basically incapable of rewriting your development branch into something more logical for someone to review because the author had never experienced a need for it I was quite disappointed.
> [Fossil is] basically incapable of rewriting your development branch into something more logical...
False. All you need to do is start a separate "presentation branch" and cherrypick and/or merge changes from your "development branch" in as necessary to make it logical for review. Not difficult to do.
The difference from Git is that in Git you end up discarding the development branch and keeping only the presentation branch, whereas Fossil preserves them both.
Fossil does anything Git will do, except for one thing: Fossil does not (easily) sync individual branches. (You can do it, but it is a pain.) Git comes with the idea that individual developers keep their own private branches and only share them after they've been cleaned up. The idea behind Fossil is that everybody shares everything.
By analogy: Git is like where everybody has their own office with doors that close, and developers emerge from their own office from time to time to share their work or collaborate. Fossil is more like an open office with no walls or doors.
There are advantages and disadvantages to both approaches. I encourage you to use whichever one works best for you. If you like the Git approach, then by all means use Git. I find that the Fossil approach works better on the projects that I manage, but that just me. You do whatever works best for you.
Bottom line: Fossil was created to support SQLite development. It does this very, very well. If it never does anything else, it will have been a great success. Any use of Fossil beyond SQLite is just gravy. As it turns out, many other developers have found Fossil useful, too. But perhaps your usage patterns are different and the Fossil model does not work for you. That does not invalidate the fact that it works well for others.
All that said, there are many ideas in Fossil that could be copied into Git without changing the Git development model. So even if you don't like Fossil's development model, you should still look at Fossil, so that you can steal ideas (and/or source code) to import into Git and hence make Git better.
> Bottom line: Fossil was created to support SQLite development. It does this very, very well. ...But perhaps your usage patterns are different and the Fossil model does not work for you. That does not invalidate the fact that it works well for others.
So first, let me reiterate that SQLite is awesome; you and your team have probably had a larger positive impact on the world per man hour of coding than any software project in the history of the world. You've already given us an amazing self-contained database system; I'm hardly in a position to demand that you write me a version control system as well.
Secondly, when I wrote "this could only be written by", I was really mixing up the linked text with another article on your website, "Rebase considered harmful" [1], which is far more opinionated. I'm sorry for getting that slightly mixed up; but only slightly sorry, since the "original designer of Fossil" wrote both; I understand that person to be you.
I didn't actually say anything bad about Fossil or that it wasn't a useful system. What I did say was that presenting a change as a series of chunks presented in a logical order, targeted at easy reviewing, is necessary for a large project. I stand beside that assessment.
> Git comes with the idea that individual developers keep their own private branches and only share them after they've been cleaned up. The idea behind Fossil is that everybody shares everything.
And I can't see how this is remotely sustainable for a large project, or a project hoping for "drive-by" contributions. I understand SQLite to be developed by 4 people or so, who have worked closely together for years; I can see where having every development branch of every developer would be an advantage in your case. And since you don't generally accept external contributions (as I understand it), "drive-by" contributions aren't a consideration.
The previous release of my project, which represented only 8 months of development, had changesets from 61 different people; and we frequently get submissions from people who just have a single fix or a single feature they wanted to add, after which we never hear from them again. I can't imagine having every version of every development branch of every person who'd ever contributed to the project (or thought about contributing to the project) in the project log.
> By analogy: Git is like where everybody has their own office with doors that close, and developers emerge from their own office from time to time to share their work or collaborate. Fossil is more like an open office with no walls or doors.
Git was designed so that thousands of people in different workshops and in different towns could come together in an ad-hoc fashion to share what they'd been doing. Having four people working together in the same keycarded space sounds nice; but having thousands of people in one gigantic gym sounds like pandemonium, and having the doors open so anyone can walk in and start vandalizing the place is unworkable.
> False. All you need to do is start a separate "presentation branch" and cherrypick and/or merge changes from your "development branch" in as necessary to make it logical for review. Not difficult to do.
And if you need to iterate over that "presentation branch"? Do you make a new "presentation branch" every time you realize some change from patch A would logically go better in patch B or vice versa? Is the final version of the presentation branch automatically linked to the intermediate presentation branches, and the original development branch(es)?
In your documentation you seem to be aware that Fossil will "nudge" developers in certain directions -- including "nudging" people to delay checking in changes (e.g., before proper testing). Don't you think having all these previous versions dangling around will "nudge" developers into doing fewer iterations of their "presentation branch", resulting in a branch which is more difficult to review and/or do archaeology on? Don't you think the delaying of check-in, in particular with the knowledge that what you've done will be seen, will negatively affect the way people develop?
I mean, I don't inherently have a problem with keeping all my random development stuff locally. git actually does keep all your old commits around when you do a rebase -- they're just not linked anywhere, and they get garbage collected occasionally. (There's no reason someone couldn't build a porcelain on git to make branches for those commits and link them in the comments.) But I wouldn't to have to manually hide all those branches -- I'd want them to disappear by default, as they do with git, unless I actively look for them. Having them not disappear would nudge me in a direction I don't want to go. And as a maintainer of an open-source project I don't want random people on the internet sending me all of the development versions of their patches; I want to see the actual change they're proposing to make to my project, and nothing else.
I'm willing to accept that I maybe I just don't understand how your model works; and if I did I'd be less skeptical of it. But having read [1], I'm quite sure you don't understand how my model works, or you wouldn't say it "has no offsetting benefits".
> This is obviously written by someone who's never done any sort of review-based workflow.
I think you meant to say the following:
> This is obviously written by someone who's never used my review-based workflow.
As is is so often the case in these discussions, there's a conflation of what works best for one person and what works best in general. I very much doubt that the difference in viewpoints is simply ignorance.
Edit: I also wish HN had a rule against this type of dismissal:
> the only sustainable model for a project of any size
Fossil was designed for the development of SQLite. It does not help the conversation at all to have an anonymous commenter dismiss SQLite as a toy project.
> I very much doubt that the difference in viewpoints is simply ignorance.
You can usually tell if a difference in viewpoints is ignorance by how the other person talks about the idea they disagree with. If they don't mention any of the perceived advantages (or try to but miss the most obvious ones -- like, say, the ability to do bisection or review changes), there's a good chance that it's out of ignorance.
EDIT: It seems I was slightly mixed up; I was referring not to to the text linked story (which just talks about valuing the "messy" history) but to "Rebase considered harmful" [1], which having just re-read, am confirmed in my opinion was written by someone ignorant of what they're arguing against.
> It does not help the conversation at all to have an anonymous commenter dismiss SQLite as a toy project.
First, I'm not anonymous -- I don't have my full name in my description, but if you wanted to track me down it shouldn't be too hard based on my comment history.
Second, I never said SQLite was a toy project; on the contrary, I said it was "AWESOME". What I said was that it was small; and what I meant was that it had a small development team. My understanding is that SQLite doesn't accept external contributions, and that the team developing it is 4 people (or something on that order of magnitude). With only 3 other people, all hand-picked, committing to your tree, you can afford to just look at the total diff of the branch before merging it (and/or presumably have a conversation about what's in the branch). That's simply not going to work if you're reviewing dozens of patches a day from random people on the internet.
> With only 3 other people, all hand-picked, committing to your tree, you can afford to just look at the total diff of the branch before merging it (and/or presumably have a conversation about what's in the branch). That's simply not going to work if you're reviewing dozens of patches a day from random people on the internet.
I don't understand this comment. What are you looking at if not the total diff when reviewing dozens of branches a day? Once the author has squashed (which as far as I can tell if prevalent) the total diff is all you've got. Fossil on the other hand because it keeps all that history, you can look at the total diff, or look at the path they took to get there. Obviously, git can do this too, just don't squash.
> Once the author has squashed (which as far as I can tell if prevalent) the total diff is all you've got. ... Obviously, git can do this too, just don't squash.
You've presented two options:
1. Include every random commit that a person made as they were developing a feature; including ones with bugs, ones they went back and overwrote / changed / whatever
2. Submit the final endpoint of a development branch as one giant squashed patch.
This is a false dichotomy. There's a third option: After you as the developer have your giant squashed patch, go through and break it down into individual, reviewable chunks in a logical order. "Reviewable chunk" doesn't mean short necessarily, but that it only contains one "idea" for the reviewer to keep in their head. That's what we insist on in our project.
As a kind of extreme example, consider this patch, which is hundreds of lines long, doesn't make any functional change, but is there to make future patches easier to review. The underlying code here is so complicated that either a squashed megapatch or the as-it-was-written development patches would be completely impossible to review.
«This is a false dichotomy. There's a third option: After you as the developer have your giant squashed patch, go through and break it down into individual, reviewable chunks in a logical order. "Reviewable chunk" doesn't mean short necessarily, but that it only contains one "idea" for the reviewer to keep in their head.»
You can do that in Fossil simply by merging periodically to a separate branch for inspection.
You may not now this, but, in Git parlance, all merges in Fossil are "no-ff". When you merge into a branch, you don't get all the commits of the original branch moved to that branch: you merge only the diff. As branches are never deleted in Fossil, you can go check the original development branch if you want. But each merge is the result of all the changes made up to that point. (Unless you cherrypick, of course.)
yes agreed. sqlite is tiny in contrast to the kernel. a marvel, created by a small group at a steady pace.
in contrast to that the linux kernel patch review workflow lives on history rewrites until all reviewes agree.
and most projects I know, kernel included, after this allignment then freeze the history.
so its not a "history rewrite all the time" but a commit cleanup rewrite, before final acceptance.
in "feature branches", if you wish, and just until the history is helpful for future readers. who cares about typo fixes? for heavens sake, amend them...
> That's simply not going to work if you're reviewing dozens of patches a day from random people on the internet.
This sounds incapable of producing reliable code even with best tools. Do people actually work this way, as opposed to some deeper hierarchy where maintainers of smaller modules are likely to already know everyone who is going to send non-trivial patches this week?
maybe team size, project size and pace of change are different in the "home" projects of fossil and git.
the patch rewiew workflow of the linux kernel community depends on being able to rework the history until all reviewer agree: this is how this new feature is done, and this is how it is exposed. and then it is added and it history will never be changed again.
is that type of workflow possible with fossil? e.g. reworking the history for a feature branch?
I wonder how they deal with the "bisect" flow? I use git bisect a lot and can't imagine how it would go with every WiP commit... (my answer: an unusable nightmare). Good luck with that.
Fossil's bisect works better than Git's in my experience, since Fossil's data model permits bidirectional tracing, whereas Git can go backwards in time only. This allows commands like "fossil bisect status" to show where you are in the decision tree, since it has to trace forward in time as well to provide the answer.
Fossil's strong resistance to changing committed history encourages a culture of not committing predictably-broken stuff, at least not to long-lived branches. This seems like an inherently good thing, to me. Every commit to long-lived development branches should pass all tests and should build on every supported platform. Fossil has tools for fixing this after the fact, but test-first development is your first line of defense for avoiding problems here.
If you're doing experimental work, you do it on a branch in Fossil, which unlike in Git is sync'd globally across all repo clones, so you can do things like check out the branch on another machine and test there before merging it down to the parent branch, presumably one of these long-lived development branches. This is useful in cross-platform development, so you catch portability problems.
(Incidentally, this merge point provides most of the benefits of Git's rebase without rewriting history and without losing the record of how you got to that merge point.)
Fossil just got a feature to avoid the need for a branch in that case: "fossil patch", which creates a Fossil repository (a specially-formatted SQLite database) with as little as one proposed commit, then gives you tools for schlepping that to another repo and integrating it temporarily so you can test on another machine before committing anything. These patch files solve all of the problems with "diff -u": preserve commit comments, file additions/deletions/renames, branches/tags, and so forth.
Fossil's shortest-path bisect algorithm can sometimes cause it to go down unstable branches, but there are ways to redirect it back to the stable branch or to skip broken commits.
Fossil is quite the antithesis of unusable and nightmarish in my experience.
Agreed. I'm entirely not following how a commit stream that may contain entirely broken commits aids in bisecting to find a specific flaw. Those would just be noise, at best. Even if you're just manually testing each commit, you do need it to build and run.
Does everyone test if their commits don't break the build? I doubt that, e.g. if you want to commit before going home, there isn't always time to test and more importantly until you push to master/target branch there is no need for it besides bisection, which I'm not sure if is common.
Depends on your workflow of course. Having all commits on master always pass tests (and thus requiring to rebase the merged branches so their individual commits pass) has advantages (such as this easy bisect).
You would still be able to commit and push to your feature branch without breaking the CI. I guess it would be possible to enforce the rules on feature branches but that would be counter productive IMO.
OK, but if you commit on feature branches then those commits will land also on master (squashing is not something popular and makes frequent commits pointless).
Frequent commits (without making sure they pass tests) allow better git blame/annotate (figuring out why given line of code was added, and having a general comment "introduce feature ABC" is not helpful, but having a message "add field to to something" is).
You can't make detailed commit messages if you have large changes.
But yeah, everything depends on the workflow one commits to.
When I worked with git I squashed/reordered my feature branch to make it pass tests on every commit before merging it with the master. This is usually not very hard and you can still do small commits. You can even split new tests into a separate commit.
Honestly I don’t really see why would you want to change code that breaks tests without updating them in the same commit. Updates in tests give strong signal to the PR reviewer to see that an API has changed. When fixing bugs it’s also nice to be able to reproduce it with a test case first, then fix it.
The problem is that bisect depends on being able to determine whether the commit you land on has the newly found bug or not. That almost certainly depends on it at least building, so that you can run it and check. If you can't do that then you can't determine if the bug exists on that commit, and you can't bisect effectively.
Completely agree. What on earth is the value in knowing that John decided to commit something at 11:24 29 days ago that may or may not have been complete rubbish?
The only reason I am aware of for keeping history is bisecting it to track regressions. The "development history" is completely useless for this purpose. What I want is the version history. That's why git is called a version control system.
If John's rubbish ended up in the merge down to trunk, then you absolutely do want to know about that individual commit.
If you've never received a 500-line patch with a one-line error in it, you haven't been developing collaboratively for very long. Me, I want all 10 commits so I can see from the commit message why the third commit broke the build on platform X, the one clearly tested only on platform Y. I don't want to disentangle the whole mess-o-hackage with a single vague commit comment ("added the foo feature") when faced with such problems.
Am I going to want each and every commit for all history? Of course not, but they're cheap to store, and Fossil makes it easy to ignore the ones not immediately in front of you.
> If you've never received a 500-line patch with a one-line error in it, you haven't been developing collaboratively for very long.
Which is why after I've finished doing the development of a big change, I want to rewrite the history into a series of logical patches, each exactly the right size for review.
> Trying to review the "what you did" with all the random false starts, bug fixes, and who knows what, would be a nightmare;
At some point there was a plugin for Google Docs where you could record the actual typing of a document; supposedly if good writers (even on the order of Spolsky/Graham) did it, we could stand to learn a lot.
Every git forge is able to show changes against target branch without anyone needing to rewrite history, though. Reviews can be done against changes, not individual patches.
Now imagine that you had the "mess" each contributor produced in the course of development in the history of the Linux Kernel - instead of just a clean set of final patches.
That's what git was written for.
On the other hand recording all the nitty gritty details might be nice for smaller projects.
The kernel uses a patch-based workflow. Someone sends a bunch of patches to a list, gets feedback, improves on the patches, and sends the new set of patches. This might have added patches to the set, or changes to the existing patches. Repository write access is not (generally) shared between developers.
This is a very, very different way of using git from how almost everyone else uses git.
For me, version management, like IDEs are just tools that help a software developer towards their goal of *delivering* software.
And, IMHO, towards that goal, Fossil does an incredible job.
Besides this, Fossil is extremely comprehensive with respect to the additional things that a developer might use. It has an amazing timeline view, a wiki, a ticketing system, and now even chat. Again, I find the overall experience is way better than using Git, and all that from a single executable.
More than once, while using Git, I've had to look up the arcane sequence of commands which I need to utter, to get out of a weird scenario. I've never had to do that with Fossil.
Fossil, like it says elsewhere in the site, is a *sane* version management system. At least it keeps me sane! And lets me get on with my work!
Does it not come down to the usual "cathedral vs. bazar" opposition? SQLite, for which Fossil was originally built, lists 3 persons on its developers page and looks pretty much like the definition of a "cathedral", whereas git was built by Linus Torvalds for Linux, which is the prototypical bazar project.
It makes sense when you have a small team of people that know the project very well to record everything, and they can easily maintain stringent standards, like never committing anything that breaks the tests.
Whereas for a big project that involve thousands of people mailing patches around, some of them first time contributors, you'd rather make sure that what ends up in the immutable log has is clean enough.
> Like Git, Fossil has an amend command for modifying prior commits, but unlike in Git, this works not by replacing data in the repository, but by adding a correction record to the repository that affects how later Fossil operations present the corrected data.
I may be misreading this, but what if I accidentally commit (not push) a piece of sensitive data (say a private key), but notice before I commit. In Git, I would simply remove it, and amend, and it is now gone from the Git history. It sounds like with Fossil, it would remain in the history, or am I misunderstanding something?
There are times where being able to rewrite the history is critical.
Careful, amending doesn’t actually remove the bad commit, but it rather creates a new one and changes the appropriate branch pointer towards it. The old commit is still there, browsable if you know what you’re doing. This is fine if you’re doing it locally only, but incredibly misleading if you’ve already pushed the bad commit to a remote repository.
It’s possible to remove sensitive data from Git, but it requires commands rarely used in a day to day workflow. It’s better to assume that any committed credential is burned and re-issue them.
In Fossil it will remain in your local repository but you can put the hash for the data you accidentally committed in a "blacklist" so that it wont be pushed/pulled. I think you can also rebuild the local repository to purge any data, though i'm not 100% sure about that.
(though you can certainly make a local clone of it which wont clone the blacklisted data, but you may need to manually copy/configure some stuff that aren't cloned)
> what if I accidentally commit (not push) a piece of sensitive data (say a private key), but notice before I commit
Fossil has autosync by default, so the commit has probably reached another instance anyway. But it has a provision exactly for those cases, the shunning mechanism.
> We don't look at this difference as a workaround in Fossil for autosync, but instead as a test-first philosophical difference: fossil commit is a commitment. When every commit is pushed to the parent repo by default, it encourages a working style in which every commit is tested first. It encourages thinking before acting. We believe this is an inherently good thing.
I wonder how this works with tests that cannot be run locally (or ones that I'm not willing to sit through for the whole 1 hour the test suite runs for?)?
Eager to hear from Fossil practicioners on this.
The ideals do sound very good - but they remind me of the difference between an academic and a practicioner. One gets things done while the other advances the state of the field. Depends on what you like.
Previously, the standard solution was to commit work with potential cross-platform issues to a short-lived experimental branch, so you can check out the branch on all your test machines and try it there before merging it down to trunk.
> I'm not willing to sit through for the whole 1 hour the test suite runs for
Hour-long test suite runs are a problem in and of themselves. That sets the shortest period of time your development organization can react to a test failure. Look to any application of control theory — feedback systems, process control, OODA loops... — for what happens when you have a slow-reacting feedback loop.
You either need multiple levels of testing so you get some feedback quickly, or you need some sort of parallel build system that breaks the tests up so that the whole set completes in a reasonable amount of time. Without that, you're going to have people either sitting around waiting on test runners or skipping the tests entirely.
> One gets things done
Effort and resources spent on a massively parallel CD buildbot sounds like a good way to get things done to me. Computers are cheap; developer time is expensive.
Awesome, for some reason I couldn't think of the workaround of committing to a temporary branch.
> Hour-long test suite runs are a problem in and of themselves.
I'm not sure I agree. Let's say I have a data integration system that has 70 modules. Each module have some integration tests and some baseline benchmarks.
At the time of development I'm only concerned with the 1 or 2 modules I change whose tests can finish in < 5m easily.
But when I push the change I want to be sure nothing else broke. Maybe because the change was in a base module used as a library by some other modules. Maybe I forgot to run tests on 2 of the dependent modules. So running the entire suite on merges (and optionally on PRs if required) is a necessity in my opinion.
Thanks for the link though - I'll take a look.
> You either need multiple levels of testing so you get some feedback quickly, or you need some sort
> of parallel build system that breaks the tests up so that the whole set completes in a reasonable
> amount of time. Without that, you're going to have people either sitting around waiting on test
> runners or skipping the tests entirely.
100% agreed. With most of these points you can now ignore my previous paragraph. By a 1 hour test suite I mean the wall-clock time of a single build-test cycle on CI runners. i.e. to run ~130 test suites in parallel against various containerised databases and some DBaaS (like maybe Redshift, BigQuery etc.) takes 1 hour where each individual module takes 5 to 10 mins. Some may take longer depending on the depth of things you are testing.
> Effort and resources spent on a massively parallel CD buildbot sounds like a good
> way to get things done to me. Computers are cheap; developer time is expensive.
100% agreed there again. In my opinion the organisations that treat CI improvements as technical debt and not as first class citizens of the software development flow are doomed to move slowly and have a hard time getting things done.
> I wonder how this works with tests that cannot be run locally (or ones that I'm not willing to sit through for the whole 1 hour the test suite runs for?)?
Create local, quick-running proxies for those tests.
I realise that this seems glib, but it is workable and the correct approach in the vast majority of cases, including the vast majority of those who claim that it's not possible for them.
Agreed. It's useful either way to have some sort of "smoke-test" version of the entire test suite so that you can have reasonable confidence that things are working and if needed you can still run the full in-depth suite.
I think the issue (which seems to exist in pretty much every VCS out there) is that there is a single flat timeline - even with branches (assuming they aren't treated as just separate parts of the tree) you get a single timeline per branch at best.
Wouldn't it be better to be able to "nest" or "group" commits so that, e.g. all the "intermediate" commits go under the final commit which is what is displayed by default and "traveled" through via tools like log, blame, etc?
Right now the best (though really not solving the issue) is to have a branch per task for VCSs where branching is trivial but this doesn't really solve it since you still get all the commits in your history anyway (i think some clients can display only the merge, but this is up to the tool, not an inherent feature).
There was the other day a submission about a project on github where people noticed that the git history looked as if the author was making a new commit automatically every time he saved a source file and most comments were about how bad that it was. And i was thinking "Why? The issue here is that you can see those commits, not that they exist".
You could easily have something like
* Feature #1 submission
* Feature #2 submission
* Quick prototype hack
* Save at 12/13/14 09:30
* Save at 12/13/14 09:31
* Save at 12/13/14 09:41
...
* Attempt #1
* Save at blahblah
...
* Attempt #2
* Save at blahblah
...
* Final Fix
* Save at blahblah
...
* Bug fix for #8923
* Attempt #1
* Review update
* Etc
and by default in the timeline/log you'd only see the top three commits until you drilled down, even to automated ones from file saves. Similarly when you looked at "blame" (or whatever) you'd see these commits. And of course all these tools should have a "max depth" (default 1) to allow you go into more detail. And yeah, you'd also need some functionality to specify and edit (preferably with its own history - Fossil already does that for other stuff like wiki) the parent/child relationships, perhaps with some current state (ie. under which commit all new commits will be placed).
With something like that in place i do not see any reason to not keep as much as possible around.
I've been testing out a new workflow I'm Git in some personal projects that basically works like this.
Basically, I make a single wip branch, which contains all of my messy/frequent commits, then when I feel that things are in a good state and I'm ready to cut a release off that branch, I tag that commit with "wip-vX.Y.Z".
Separately, there is a "release" branch (which is basically master/main/trunk) that only ever gets code that was first committed into wip. When there's a new release tag in the wip branch, I run the following command on the release branch: "git cherry-pick -X theirs -n wip-vA.B.C..wip-vX.Y.Z", where A.B.C is the tag of the previous wip version and X.Y.Z is the tag of the current new wip version. This has the effect of taking all of those changes from the wip branch and staging them to be committed on the release branch. I then commit them with a descriptive commit message and tag that commit "release-vX.Y.Z".
What you end up with, as a result, is a "release" branch with a very clean commit history and a "wip" branch with a very detailed commit history. If you want to run a more detailed blame or bisect, all you need to do is checkout the "wip" branch. If you want to use code that's stable enough to be called a release, all you need to do is checkout the "release" branch. Rather than squashing/amending wip commit history out of existence or maintaining on a series of scattered branches, this workflow makes it all conveniently available in one place, without directly polluting the main branch.
As for the wip commits themselves, I do try to have them all be mostly atomic and to have them always build. But I feel much less concerned with having a couple of commits that try some idea then a couple more later on that undo it. Polluting a main branch with these sort of no-ops has never seemed particularly appealing to me, but not has simply deleting them, or tucking them away somewhere difficult to find. I've only been using this new workflow for a couple of weeks, but so far it seems to solve that problem and in general be working great.
Dr. Hipp seems to be trying hard to promote Fossil as superior to Git. Perhaps it is, but that ship has sailed already. He recently commented in an interview that Fossil is tailored and perfectly suited for the needs of the SQLite project. That's a strong argument in favor of Fossil and all he really needs to say on the topic.
>"I looked at Git, I looked at Mercurial, and I looked at my requirements and I thought, “You know what? I’m just going to write my own,” so I wrote my own version control system, which is now a project unto itself, and that worked out very, very well, because think, Linus Torvalds wrote Git to support the Linux kernel, and it is perfectly designed to serve the needs of the Linux kernel development community. If you’re working on the Linux kernel, Git is absolutely the perfect version control system.
>Now, if you’re working on something else, though, maybe not so much, and so, it’s a perfectly atrocious version control system for working for SQLite.
>Fossil is absolutely the perfect version control system for working in SQLite, and I wrote it for that purpose, and so, because I wrote it myself, it exactly meets my needs and is the perfect product for what it’s doing, so by doing things yourself, you control your own destiny, you have more freedom, you’re not dependent upon third parties."
It sounds to me like promoting SQLite itself. There are lots of people deploying Postgres and friends when their best DB for their case would be SQLite. In the same way, we are all using Git even for microprojects that would be better served with Fossil.
I never had the impression of Dr. Hipp saying "Fossil > Git", but "Fossil might fit you better than Git".
Why does it matter if he can or cannot capture the majority of that market? The points can still be valid. A few thousand users is still a win for him.
> Why does it matter if he can or cannot capture the majority of that market?
Amen to that.
FWIW... i've actively contributed to fossil since 2008 and can say with some authority that we (the "regulars" on the project) have never done "PR pushes" for more users. It's simply not how any of us function. We welcome those who stop by, and we're happy to evangelize when approached, but we don't go actively hunting for more users. We're more of a "build it and they will come (or not)" project. Every couple of years someone pops up in the mailing list/forum proposing ways we could gain more users if we'd just do this and that and that other thing, but that's simply not a priority for any of us who actively work on Fossil. Continued work on Fossil is not about dominating the DVCS field (git has, for better or worse, long since done that). As Richard is quoted in earlier responses in this thread, Fossil is, first and foremost, a tool for maintaining sqlite. Similarly, i make use of Fossil for literally all of my own code because it's a great tool for the job, whereas git is... errr... less so.
PS: the overwhelming majority of FOSS projects are far closer in size and scope to sqlite than they are to the Linux kernel (for which git was created).
> PS: the overwhelming majority of FOSS projects are far closer in size and scope to sqlite than they are to the Linux kernel (for which git was created).
It's amusing if not ironic how many comments are something to the effect of "...but how messy and clumsy would the Fossil approach be when working with the Linux kernel that has 1000s of contributors pushing 1000s of commits..." when the sum total of projects approximating the size of the Linux kernel can be counted on one hand, and the authors of such comments are almost certainly not working on such projects.
Not to mention the author of Fossil has clearly stated that Git is great for Linux kernel development, and Fossil was not developed for such a project.
I see it as more of a "don't try to use Fossil as Git or Git like Fossil". If you are just using them for code control for your own project, then this might tell you which fits your style.
SQLite has had about 33 contributors (after you combine people using multiple login names). But 92% of the commits have been from just two people. 97% from just three people. Then there is a long tail of other contributors.
But aren't most projects like this? A few core developers are responsible for most changes and enhancements, and then there lots of others that might contribute a patch or two here and there? I suspect there exceptions to this (the Linux kernel comes to mind) but I think they are rare. Correct me if I'm wrong.
Sqlite officially does not accept patches or contributions.
There are very few projects that are openly hostile to contributors making sqlite unique. To be honest the link you posted shows it seems people have fixed some stuff (it seems mostly security holes and build fixes in the past).
We accept (quality) patches and contributions from people who have completed the necessary paperwork to put their patches/contributions in the public domain. This restriction is in place so that that SQLite itself can remain in the public domain.
If we accept drive-by patches from anonymous users, then SQLite would have to be relicensed as GPL or similar, which would undermine many of its use cases.
>people who have completed the necessary paperwork to put their patches/contributions in the public domain.
In the US, authors cannot put their works in the public domain. This ability doesn't exist in statute, nor have the courts accepted the concept. Past attempts to allow this have failed in Congress. One can refuse to enforce their rights on creative works, but copyright still exists from the moment of creation and works won't enter public domain until the expiration of the copyright.
Furthermore, the US allows authors (except work-for-hire) and their heirs the right of termination for copyright transfers and licenses. This is an inalienable statutory right which cannot be restricted by contracts.
Like everything else in git, rebase feels like bits of seemingly-unrelated functionality bolted together, but I think I find basically all of the parts useful:
- Rebasing my local changes onto remote updates has always felt safer and more natural than creating merge commits containing actual diffs.
- The ability to trivially patch, drop, combine, and reorder commits has always felt like a superpower when it comes to telling the "story" of a change in a review, or breaking up a review into multiple smaller ones.
- It's a bit more roundabout compared to just "edit", but if I want to amend a commit that isn't the HEAD, I often find it easier to just make a new commit with the desired change and then use rebase -i to move it up and squash it into where it needs to go.
That's like saying "Long ago X said he didn't eat cruise ship buffet food, so I never do that either when I'm on a cruise. I don't feel my cruise ship experience has suffered as a result"
It begs the question - how do you know it wouldn't improve your experience if you've never tried it?
1. Fossil syncs branching and tagging info among all clones; it isn't local state as with Git. Therefore, when you're looking at a timeline, others' branches and such are named, labelled, and tagged. If you need to filter away all but your current branch, that's easy, because you have all the tags. If you need to see what others are up to in parallel with you so there isn't a big surprise at merge time, Fossil does that readily, too.
2. There's a "hide" feature in Fossil. Rather than destroy information, it simply marks a given branch as not-to-be-displayed unless you go out of your way to unhide everything. It's useful for branches that never go anywhere and thus need to be forgotten.
3. For the intermediate case, where you have a branch that shouldn't be forgotten but doesn't need to be in your face all the time, you can "close" it, meaning it still shows up in the timeline when you scroll far enough back in history to see it, but it's tagged so you can filter away all closed branches, leaving only what's active. This is the default mode for "fossil branch" without arguments, for instance, where it gives only the list of open branches. Your next command will almost certainly have a branch from that list, not something closed long ago.
I could probably come up with more, but that should get you started.
I don't understand all of the ceremony around keeping developer experimental checkpoints ("test", "fix", "fix2", "change name") around. If you really want to do this in git, you can. But there's little value in preserving this information.
Keep the tiny commits, then squash when you merge to the main branch. You shouldn't have a feature branch for too long, so it won't be a problem. If there are merge commits, you can squash before rebase.
I have to admit, in my career so far, I've never had to, or seen anyone have to, interact with a commit more than a sprint or two old. Almost all my workplaces enforced the usual standards regarding linear history, squash/rebases before PRs and commit naming, and then did absolutely nothing with the results of that effort. It probably means that I've had an atypical career, but I still wonder if all these ceremonies are as meaningful as we hold them to be.
For most projects, history gets used less and less the older it is. That's also true of the code. A lot gets abandoned or scarcely used.
But for the most important, widely used, long-lived projects, things are different. You come to appreciate history when you have legacy code that you want to change and you'd like to figure out what the authors were thinking when they wrote it. (Unfortunately, too often, the version control history wasn't kept at all.)
I think that we often assume our code is more important than it is, so we use ceremonies that aren't appropriate. But this is hard to judge in advance. The people writing long-lived code often didn't know that they were writing it.
How else do you answer questions like "How long has this been broken?" And without the answer to that, how do you know who needs to upgrade, and who doesn't?
I'm sure you can install old binaries to answer this sometimes, but I'd prefer to do it with a DVCS bisect, since that'll narrow the matter down to the individual commit that broke things.
Working in open source, I’ve interacted with commits that predate git and were imported into it a decade ago, whether to track a bug, see how changes correlated, or check the historical behavior of a function. I couldn’t work without the ability to do so.
I've successfully looked at commits years old to understand the context around a feature/bug that was confusing myself and a product owner. This context is where small, well-detailed commits come in so handy beyond just the initial review.
Same here, and I’ve worked at everything from mom-and-pops to a couple of household names. IME this (seemingly endless) discussion about how to manage commit history is the tabs-versus-spaces of source control.
Interesting - I've only worked on pretty small eng teams, but I couldn't live without git blame (or an editor/IDE feature using git blame). It's like another dimension, seeing the history of code line-by-line and learning why it was added, especially if those lines are months or years old. With clean linear commit history and clear commit messages you can easily see not just what was changed, but why it was changed, along with other commentary from the author and often a test plan.
FWIW i often had to go backwards in history (that was with Perforce, not Git) in previous jobs to figure out when a bug was introduced and by who so i can see what was the logic behind the change (either from the commit message or by asking the person directly, if they were still around). It can be helpful to avoid breaking something else by mistake when trying to fix something that had a reason to be as it is.
It's basically only done when something goes wrong, IME. All but the most recent history exists mainly for git-bisect and being able to branch historical versions to patch some old release-in-the-wild (if you're developing on the Web with a modern release-often style, the latter may rarely happen for, as you write, "a commit more than a sprint or two old", but it's common on software with longer release cycles or long-supported older versions)
I always enjoy these threads because I get to hear about all the crazy git commands and workflows that people use besides (clone, commit, checkout, merge, pull, push).
I don't understand the obsession some VCS have (such as fossil and mercurial) with forcibly preventing the user from altering history. Even if it's intended to be a safety rail that prevents losing work, I should be the one that decides if I want the safety rail or not. Just like I should be able to run `sudo rm -rf /` on my computer if I want without some "mother-knows-best" OS preventing me from doing so. While the tool might be well-meaning and looking out for my best interests, I should have control at the end of the day, not the tool.
Super contrived example: Say I'm writing an book in markdown and I'm using VCS to track changes. Maybe it's a private repo to start with, but I plan on eventually making it public and open source. Maybe it's a book that touches on racial issues, like "To Kill a Mockingbird". Maybe I don't want the whole world to be able to look back through iterations of paragraphs I wrote and rewrote over and over because I was trying to figure out how to strike the right tone. You're telling me with Fossil/Hg I can't ever truly erase mistakes I never intended people to see? What's stopping people from mining the repo's "trash commits", looking for offensive text to label me a racist on Twitter? With Git I'm in control and before I make it public I'll prune the private repo to throw away any trash commits I never intended people to see. With fossil I'd have to... create a brand new repo and rebuild it from the ground up with some automated tool that skips commits I don't want people to see? Or somehow use the "shunning" feature to painstakingly delete commits I don't want people to see?
Fossil seems like a great fit for SQLite. Heck, seems like you use SQLite while using Fossil so it's like dogfooding (like using Git to version control Git). And I have never used fossil, so I can't knock it until I've tried it. But, I am reluctant to try it when I've already learned Git. I don't need wikis and chat and issue tracking or anything else fossil offers because I already use GitHub/Discord/etc for that (not to mention bundling all those things into a single executable is against the Unix philosophy, which I mostly subscribe to). However, I admit it is kind of cool that the wiki, issue tracking, chat, etc. are all just tables in the fossil repo so I guess theoretically you could easily search them with SQL commands.
I’d love to have a robust and frictionless way for my working area (staged and un-staged) to be in sync with remote somehow on a press of some key combination.
A remote backup/copy of edited files without involving an external backup tool. Stash is just local. Next best thing is a WIP commit but that I usually do at day end.
> You must do something drastic like git reset --hard to revert that rebase
This betrays a deep failure to understand git; `git reset --hard` is one of the least "drastic" things you can do to a repository. It's certainly a lot less drastic than `git commit`, because it does a subset of the things `git commit` does. All `git reset --hard` does (to the repo, I mean; it also has working tree effects in yet another example of bad git UX) is point a branch to a different commit. `git checkout -B` is another way to accomplish the same thing, as is `git branch -f` (did I mention the bad UX?).
Sounds like a bad thing to me. Sometimes I just want to make sure my work is saved in remote before quitting for the day, even if I haven't rigorously tested said changes.