GitHub's Missing Merge Option

u801e · on Oct 1, 2022

>> An eternal war rages between team “git log should be clean” vs. team “git log should have an accurate history.”

Having a clean commit history makes it easier to utilize commands like git blame and git bisect. I'm not convinced those who want to have an accurate history really understand what they want. For example, if after every single keystroke, I saved the file, added it to the index and ran git commit -F - <(date), then I certainly could have a comprehensive history detailing every single keystroke I made as I edited the code, but the history would not be very useful when looking back on it.

>> Team Clean

>> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)

>> Pros: Linear history, git log is easy to read, git revert requires no thought.

>> Cons: You’re erasing history—you can no longer tell if two commits were written together on a single feature branch.

Could the con not be addressed by using the --no-ff even if it's a fast-foward merge? A merge commit in that situation points to the same tree as the HEAD commit of the branch that was merged, but the merge commit also has the information about the base commit and the head commit of the branch, so you know what commits were in the branch even after it's merged.

jillesvangurp · on Oct 1, 2022

A lot of this stuff is written from the point of view of people pushing changes to a central repository. Git was actually designed from the point of view of people pulling changes between multiple forks and doing that at an enormous scale.

Team clean and team accurate both win here. If your git branch has an ugly history, you'll have a hard time convincing others to pull those ugly changes. They'll throw it back at you and tell you to clean that up using the tools that are available to you. That's why it's called a pull request and not a push demand. You are asking others to pull your changes and merge them.

Open source projects are a lot better at this than corporate projects where it is common for people to huddle around a central repository where everyone has write access like they are still using subversion 20 years ago.

With Linux, the only person that does not have to worry about people upstream from him is Linus Torvalds everybody else is pulling changes from others all the time and they are paranoid about making sure to not rewrite history. You only ever rebase your own changes and only before you share them with anyone else. Linus Torvalds runs a complex network of people from which lots of patches emerge (git patches, that is). If it's not clean, it won't get in. If it's not accurate, it won't get in. And they all pull changes from upstream. So if upstream no longer merges cleanly because you rewrote history on your fork, you have a problem. Simple solution: don't do that.

The linux commit history is an endless sequence of squashed merge commits (and some small commits). Some of them big, some of them small. It's both clean and accurate. And this simplifies the process of pulling those changes for the many forks. torvalds/linux actually has no branches, just an insane amount of forks (45k, and that's just on Github), which of course is the same thing in git. Some of those forks are where maintenance releases are created for older versions of linux.

seba_dos1 · on Oct 1, 2022

> The linux commit history is an endless sequence of squashed merge commits

Nothing is being squashed in Linux.

metadat · on Oct 1, 2022

IIRC, there are also no empty merge commits in the Linux kernel git logs.

seba_dos1 · on Oct 1, 2022

What do you mean by "empty merge commits"?

Because of the way Linux development is structured, it's probably highly unlikely for maintainer to ever merge a fast-forwardable branch into their tree, but conflict-less (empty?) merges sure do happen. Linux doesn't try to keep its history linear in any way, so merges actually represent the actual graph of merged branches.

Shorel · on Oct 1, 2022

I agree with you in the most general sense.

Someone's "accurate history" is just noise to everyone else.

Squash and cherry-pick your commits to a public repository, people.

Now, multiple people working in parallel is something else, but many times a long set of code experiments is not valuable history.

eurasiantiger · on Oct 1, 2022

It is valuable history: it is archaeological evidence of what has been attempted before.

u801e · on Oct 1, 2022

> It is valuable history: it is archaeological evidence of what has been attempted before.

Except that is not readily accessible. When looking at a particular section of code, you could either run git blame, or run git log and filter it to only list commits that updated that file.

Either way, you're not going to easily be able to find what previous implementations were tried, or why they were not used compared to the current implementation.

But if you make a clean commit with the working implementation and include an explanation of what other approaches were tried and why they weren't used, then someone could use git blame to get the information pretty easily.

seba_dos1 · on Oct 1, 2022

It sure is valuable, which is why it's retained in your reflog. You don't have to push it out anywhere though (or at least not somewhere where it pretends to be part of the project).

eurasiantiger · on Oct 1, 2022

My reflog is not accessible to other contributors. I could keep a decision log, but that is double work.

seba_dos1 · on Oct 1, 2022

My reflog can sometimes be useful for myself, but I can't imagine anyone ever taking time to go through someone else's (either literal or in form of commits) messy reflog. If that's your argument, then no, it's not exactly what I'd call "valuable".

eurasiantiger · on Oct 1, 2022

Reflog was your argument.

Going through a reflog is much more noisy than going through commits, which is why I would prefer the commits.

seba_dos1 · on Oct 1, 2022

My argument was that the only cases where going through unfiltered chain of WIP commits is somewhat valuable are well served by browsing through your reflog. I don't see any value in peeking at other people's unfinished stuff (unless doing something like pair programming or teaching). Any decisions that are worth noting go into commit messages anyway. That "archaeological evidence" has, in general, no reason to leave the local machine at all.

the_gipsy · on Oct 1, 2022

It can also be a burden pushed away from the author to reviewers (or future contributors), hence why OSS repos usually require clean/squashed PRs.

eurasiantiger · on Oct 1, 2022

How is it a burden? Just diff the HEADs of the branches like GitHub does by default?

gherkinnn · on Oct 1, 2022

- wip

- wip fix this

- wip revert

- wip almost done

- wip ruehejhejdhdhd

- wip tests

I don’t see the value here.

eurasiantiger · on Oct 1, 2022

That is a strawman argument: you are pretending that bad commit messages are the result of retaining commit history.

masklinn · on Oct 1, 2022

That is not a strawman argument, that is what routinely happens in development branches.

eurasiantiger · on Oct 1, 2022

Maybe it does, but couldn’t the developer write more descriptively?

The argument is a non sequitur too: It does not follow that a git merge strategy that keeps all commits is at fault for a culture of writing bad commit messages.

Presenting that argument as a rationale for avoiding straight merges while preferring squashing and/or rewriting history is the strawman.

masklinn · on Oct 1, 2022

> Maybe it does, but couldn’t the developer write more descriptively?

Commonly while you're fixing things up there's nothing of worth to be descriptive about. "made test x pass" is a useful checkpoint during development, it's not a useful step to conserve forever.

> The argument is a non sequitur too: It does not follow that a git merge strategy that keeps all commits is at fault for a culture of writing bad commit messages.

I see your accusation of strawmanning was just projection. Thank you for the insight, have a nice day.

eurasiantiger · on Oct 1, 2022

”Made test x pass” is still a bad commit message, as it says nothing about what changes were made to the codebase — which are always made when ”fixing things up”!

I did not intend for my comment to be taken as an accusation, as that would imply intention on your part. I simply meant to point out the logical fallacy in statements I read online, i.e. ”because someone on the Internet was wrong”.

Trying to gaslight me with personal accusations of psychological projection, however, is crossing a line. Do not do that to other people, it is violence.

You do not know what I think or feel, only what I write.

metadat · on Oct 1, 2022

Spending excessive time authoring high quality commit messages for low value commits [during development] is a waste of time. In general it will end up costing you in pure velocity, and reduce how much can get done in a day.

seba_dos1 · on Oct 1, 2022

You do know that temporary commits made locally that usually should never leave your development machine are not the ones that you push out to review and merge, right?

lamontcg · on Oct 1, 2022

That works when your developer laptop can actually run all the tests. If you're writing code which runs on something like AIX then generally you need to push code to CI in order to get it tested at all. And even for stuff which should test correctly both locally and in CI a lot of time gets burned yak shaving fixing the CI configuration, which can't be done locally. With something like Windows there's often issues between the configuration of the CI builders/testers and what you're running in some kind of virtual environment locally.

The idyllic world of "when it passes on my machine it always passes in CI" doesn't exist once you start doing sufficiently complicated testing and have a sufficiently large test matrix.

And as you pull in dependencies (different version of the underlying language/framwork, different versions of upstream or downstream deps that you need to test against) everything gets messy and even if you could reproduce the full matrix on your laptop it would take a day to run to completion.

seba_dos1 · on Oct 2, 2022

> That works when your developer laptop can actually run all the tests.

That's completely orthogonal. A remote branch that nobody looks at except you and your CI service is also part of what I'd consider "developer machine", you're just not necessarily sitting right in front of it. The important part is: these are not the commits that go anywhere further in the development process, and you certainly don't push them out to review. There's absolutely no reason for anyone to look at my "aaaa" and "fix" commits full of commented out code that serve as nothing more than an "undo" feature for what happened in my brain.

eurasiantiger · on Oct 2, 2022

Again, the real problem is in commit practice. Why commit commented-out code in the first place?

seba_dos1 · on Oct 2, 2022

Because why not? It's an undo feature for my brain, and a way to trigger remote CI process. Using tools to help with thinking is not "a problem", and even printf debugging has its perfectly valid uses.

My temporary commits can be as messy as it's reasonable for them to remain helpful, because they're not influencing the review in any way.

metadat · on Oct 1, 2022

"Should" is a funny word. I push funky commits to development branches often, because I always try to sync my work to the git host at the end of the workday in case I wakeup to a non-functioning laptop the next morning. Better to have a risk of ugly commits in a wip branch than risk losing work.

> Temporary commits .. are not the ones you push out to review, right?

Fully agree with this; I am a clean freak and groom the entire branch differential into sets of atomic or related commits before submitting a merge request.

seba_dos1 · on Oct 1, 2022

> I push funky commits to development branches often, because I always try to sync my work to the git host at the end of the workday

I do that too, and also to get CI artifacts - but I consider that an "implementation detail" that doesn't affect any other part of the development process ;)

eurasiantiger · on Oct 1, 2022

How often does someone go through your code commit-wise? What is the business value of grooming commits, if the reviewer only looks at the branch diff (as they very well should, lest they miss something in the big picture)?

u801e · on Oct 1, 2022

They are valuable when using git blame to see the line in the context of the change it was part of and the associated commit message. That information could help avoid introducing a regression for example.

seba_dos1 · on Oct 1, 2022

I go through individual commits during review quite regularly. Sometimes it's enough to look at the diff of the whole branch, but sometimes it's much more manageable to look at individual commits for meaningful review.

TomSwirly · on Oct 1, 2022

No, there's no reason to preserve commit messages you used during development.

When I am developing, I make many tiny commits with an automatically generated title ('Modify util/files.py') each time my tests pass, or really, when I do anything of value. (I use `git-infer`: https://github.com/rec/gitz/blob/master/git-infer)

This makes it impossible for me to lose work, and acts like a coarse-grained undo for me, where I can quickly move back and forth between spots that the tests worked if I decide I'm going the wrong way, or create a new branch, move back a bit, and make some changes and compare.

_Before anyone sees this code_ I rebase it down to a logical sequence extremely-carefully named and organized commits. (The word "manicured" has been used more than once.)

As I go through code review, I make tiny commits and at the end, rebase them into my carefully-named commits.

I create at least five commit IDs for each final commit I created. No one wants to see these.

I spend considerable time organizing everything so just the information you need to see is in the final commits. All the information should be there.

eurasiantiger · on Oct 1, 2022

Why do you do it? Does it bring some business value down the line? If, have you seen that value materialize?

u801e · on Oct 1, 2022

It helps in terms of maintenance. For example, when changing some code, you could look at the commits that arrested the lines of coffee you're going to modify. With well crafted commit messages, you may be able to avoid introducing a regression based on information in those past commit messages.

u801e · on Oct 2, 2022

s/arrested the lines of coffee/changed the lines of code/

another-dave · on Oct 1, 2022

If you keep all the intermediary commits, it's pushing more process onto the dev in the middle of coding. Some people may do this well, but more often than not I do just see "WIP (test broken)" etc.

With a squash/rebase, the development has to write an accurate commit message once & they're doing it when they've fully understood the solution they've written.

eurasiantiger · on Oct 1, 2022

It doesn’t seem logical to me that someone working on solving an intellectual puzzle would do so without having a consistent internal narrative of what they are doing.

If a developer can’t fit their change description on one line or don’t even know where to start describing the changes, chances are they could commit much more often.

The way I see it, a commit message like ”WIP (test broken)” is unprofessionalism bordering on self-sabotage.

seba_dos1 · on Oct 1, 2022

> it's pushing more process onto the dev in the middle of coding

Uhm, how exactly? Splitting & merging things into sensible, atomic commits with proper commit messages is usually what you do right before pushing it out for someone else to see, definitely not "in the middle of coding".

MrPatan · on Oct 1, 2022

Just always do a merge commit into your main branch, and you have the best of both worlds, don't you?

- A straight history of merge commits you can revert or bisect.

- Handy pointers to all the branches you merged, in case you ever need them (I never have, but I do use merge commits)

At some point I may learn to let go and go full squash-and-rebase, which is the same thing just without the handy pointers to the dev branches (which, if I ever needed them, I'm sure there's a git command to fish them out of the repo). But for now I'm OK

TomSwirly · on Oct 1, 2022

Merge commits add negative value to your commit history.

They are a memory of a commit ID and of a branch that no longer exist. They add complexity and empty information to your project for no useful value.

Also, for writing git tools, life is much easier if the commits form a tree. But a merge commit has _two_ parents. If you have even one merge commit, your commits are no longer a tree.

> Handy pointers to all the branches you merged, in case you ever need them (I never have

TOO MUCH INFORMATION.

No, get rid of it. Extract any information you might need into the code review or the branch.

I mean, that commit ID will probably float around your repo for the next six months, so it isn't like it's gone immediately.

It's hoarding! Let it go! :-)

seba_dos1 · on Oct 1, 2022

> It's hoarding! Let it go! :-)

Don't. There's no cost of retaining it and it's super useful to have it when needed. Are you trying to tell me that you never use tools like bisect or blame, and find reading commit messages with justification of the whole worth of a MR instead of a single logical atomic change that are then grouped into a MR more informative?

(unless you mean the exact pointer to the exact branch you worked on, in which case - sure, there's no need to keep it, rebase that branch while merging and let original commit IDs go, they're useless)

In a ff-only merge workflow with merge commits, all you get from a merge commit is grouping some commits together with some commit message. There's no cost to it since you can still easily consider the whole repo to be like a tree, and it makes browsing through commit logs easier for humans without making tools like bisect less useful, like squashing does.

Reading threads like this one, I get an impression that people who want to squash things away and remove information to "keep history clean" are simply not very proficient in using git as a tool that helps with development.

u801e · on Oct 1, 2022

> Handy pointers to all the branches you merged, in case you ever need them

The one thing that GitHub does when it generates the merge commit (which it does even for fast-forward merges) is that it adds a link to the PR associated with that commit. It's a handy way to go back to the PR and see the review of the branch before it was merged.

For projects that use the email patch workflow like the Linux kernel or git itself, the merge commit message contains the branch name which can be used to find the mailing list discussion. In the git project, you can find the "what's cooking" emails to find the branch name and references to the message-id of the patch series cover letter.

TomSwirly · on Oct 1, 2022

Hear, hear.

I have absolutely no interest in "accurate history".

WHY. Do people really want to see my minor spelling mistakes and goofs that got corrected in review?

It's worth spending the extra time to have a completely clean and perfect log, if only for being able to read your commit history like a novel, but also for using `bisect` to find regressions and behavior changes.

---

I have gone to extremes. At some point this summer, I realized that running the tests with one collection of flags led to breakage, and this breakage had been there for almost two months, and it was important that this flag work. (Yes, it was also a testing/CI issue, and we fixed that.)

I found the issue with `git bisect` and then patched that change two months ago and rebased everything - a couple of hundred commits.

We all feel it is of key importance that all the tests pass at each commit ID, so no one had any issues with that and it really didn't take very long.

To rebase without changing modification dates, use

`git -c rebase.instructionFormat='%s%nexec GIT_COMMITTER_DATE="%cD" git commit --amend --no-edit' rebase -i`

y4mi · on Oct 1, 2022

> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)

>> Pros: Linear history, git log is easy to read, git revert requires no thought.

Does it require no thought because it's fundamentally impossible? if you're doing -ff without squashing it's gonna get hard to figure out which commits you'll have to revert I think. All histories get merged into a single stream after all.

unicornmama · on Oct 1, 2022

Merge commits provide best of both worlds. Too bad GitHub and friends refuse to render history with —first-parent. So we’re stuck with inane squash commits.

epage · on Oct 1, 2022

What I wish github supported was "git rebase && git merge --no-ff", or as Azure DevOps calls it "semi-linear merge".

A fairly linear history is a lot easier to read but it is nice to preserve the branch by having a merge-commit.

q7xvh97o2pDhNrh · on Oct 1, 2022

Genuinely asking: What's the value prop of having the merge-commit in the history for you? Have you ever capitalized on it? And if so, could you share the command-line-fu that's helped you take advantage of having such a commit in your repo history?

I'm asking because I used to do exactly this and feel much the same way as you — I felt it was nice to preserve the historical context.

At some point, it dawned on me that, amidst all the anecdata I'd acquired, I'd never once benefited from having those commits cluttering up the history.

These days, I usually advocate "squash & merge" for the 80%+ of teams that lack the discipline to write good commit messages for every one of their branch commits. And, for those rare teams where every engineer writes thoughtful commit messages on their branches (a true delight!), I savor being able to do "git rebase && git merge --ff".

Either way, we end up with a lovely linear history of (hopefully) atomic commits.

chrsig · on Oct 1, 2022

it's way easier to revert the merge commit than potentially every commit that was included in it.

having everything squashed into one commit can be annoying if the commit is too large, because it can be hard to identify why a specific change was made via git blame.

With a merge commit, I can make individual commits for logical changes, potentially rebasing, or squashing commits together. that also gives an opportunity to write more intentional and informative commit messages. once i have a good changeset, I can publish a single PR for review.

I've only really taken to this approach lately, for a long time I just didn't care. As the years go by, I care more and more about what git blame can tell me. I've been maintaining some projects for a long time, and I've had to pick up many projects that don't have their original authors available for consultation. In both cases, having a quality git commit log is just a time saver.

I don't care about preserving the history from when the code was initially typed, I care about the context the log can provide me about the change sets that have been applied. I've been debating starting to include it in my code reviews: "this commit log needs more details: can you edit the commit message and do a git push --force?"

If you're dealing with a more junior team, or people that don't care about the log, it's really a toss up for me. Roll your dice on maybe an informative message helping you, or sift through 15 commits saying "poke", or squash an enormous commit and get a single sentence about it.

q7xvh97o2pDhNrh · on Oct 1, 2022

> I've been debating starting to include it in my code reviews: "this commit log needs more details: can you edit the commit message and do a git push --force?"

Strongly agree with this. I hold the line on this for every team I lead (which, sadly, is not every team I consult for, so sometimes I have to keep my mouth shut and spend my political capital elsewhere).

Many junior/mid-level engineers have simply never seen commit messages and source control done well. Additionally, because many folks starting out in the field don't know how to truly use git beyond memorizing a few basic commands [1], they also never harvest any of the benefits of writing great commit messages.

I've got a stash of links that I find myself often sending to engineers who are on the way up, and the tpope article on git commit messages [2] is one I find myself sharing around constantly.

[1]: https://xkcd.com/1597/

[2]: https://tbaggery.com/2008/04/19/a-note-about-git-commit-mess...

Hamuko · on Oct 1, 2022

>I've been debating starting to include it in my code reviews: "this commit log needs more details: can you edit the commit message and do a git push --force?"

I've accepted my role as the team's villain and regularly bitch about commit messages being too vague. Luckily they can't spit in my drinks since I work from home.

gravlaks · on Oct 1, 2022

> having everything squashed into one commit can be annoying if the commit is too large, because it can be hard to identify why a specific change was made via git blame.

True, you can't via git blame. You can however go to the Github pull request, and look at the individual commits there. (If the the squashed commits containthes the PR/issue number.)

chrsig · on Oct 1, 2022

i've learned that anything not stored in the repo itself is ephemeral. org decides to leave github? hope you did a good job migrating all that. same goes for issues, wiki, etc.

if it's not in the repo, there's a good chance it wont survive the test of time

almostnormal · on Oct 1, 2022

> having everything squashed into one commit can be annoying if the commit is too large, because it can be hard to identify why a specific change was made via git blame.

Maybe it should be split into individual PRs. If there's dependencies between the changes there's tools to support stacked changes.

chrsig · on Oct 1, 2022

as with all issues, it depends :)

I've been struggling with finding the sweet spot between number of pull requests and size of pull requests.

The pipeline on my primary project takes an annoying amount of time to complete, ~30 mins or so if things go smoothly. It sounds like a lot, but it's got a very comprehensive set of tests.

When PRs are too granular, the merge/rebase/test pipeline/merge cycles can be quite time consuming.

Admittedly, I don't use stacked changes. They're something I only learned about somewhat recently, and haven't yet incorporated into my development cycles. Can you recommend any resources?

cyphar · on Oct 1, 2022

I believe that git bisect is a bit smarter when there are proper merge commits (it won't test a random patch in the middle of a patchset unless it's figured out that the issue is in that merge). You can also revert an entire merge in one shot (but this is also true for squash-and-merge).

However, given that you're comparing this to squash-and-merge, you might see most branches to merge as being something that is reasonable to make a single commit (in which case the above feature isn't useful to you because all of your merges are single commits anyway -- and in that case there is little benefit of having merge commits, though I also use them for tracking who reviewed a patch).

I suspect this comes down to the environment you are working in -- if you are working on an internal repo then it is much easier to make separate pull requests for smaller changes (and in that case squash-and-merge makes sense) but in open source projects where it makes more sense to batch sets of changes to reduce the amount of round-trips spent doing reviews, squash-and-merge would cause its own issues (you wouldn't be able to have small commits for logically separate changes). I regularly make pull requests that have >3 commits which cannot be reasonably squashed without making things more annoying down the line if we ever have to revert or bisect something (or even just doing git blame -- it reduces the amount of work when you're spelunking in the git history).

u801e · on Oct 1, 2022

> What's the value prop of having the merge-commit in the history for you?

The merge commit contains the sha values of the parent commits. For example, if merged a branch that could otherwise be a fast forward merge using the --no-ff option, then you get a merge commit, but the commit does not change any of the files compared to the HEAD commit of the branch you merged.

Having the parent commit sha values allows you to see the commits that were in the branch that was merged even after it's merged. For example

    git log merge-commit^1..merge-commit^2

Would show the commits made in that branch. Or you could run

    git diff merge-commit^1..merge-commit^2

to get the overall diff introduced by the commits in that branch.

> These days, I usually advocate "squash & merge"

The disadvantage of doing that is that those commits can't be reverted easily (without conflicts). If the commits were separate, it's easier to revert a single one since it affects fewer files or lines within a file.

MrPatan · on Oct 1, 2022

> Genuinely asking: What's the value prop of having the merge-commit in the history for you? Have you ever capitalized on it? And if so, could you share the command-line-fu that's helped you take advantage of having such a commit in your repo history?

I recently reverted one of those, and it was much easier because it was a single commit, I just had to pick the right parent to revert to.

I'm not bothered about dev branches commit messages, I consider them a backup of my local workspace anyway. But my dev branches are pretty short-lived, maybe that's significant.

seba_dos1 · on Oct 1, 2022

git log --first-parent

git log --graph --oneline

As long as we're not talking about single commit merges, these merge commits don't clutter the history; they enrich it.

saghm · on Oct 1, 2022

That's what I prefer to use when merging locally (often from a cleaned up version of my feature branch). I'd for for it to be added, but at this point I honestly just wish there were a way for me to manually push and then have it be tied to the PR in some way that shows it as "merged" rather than "closed", since that would essentially handle not only my preference but any possible one someone could have in the future. It would be cool if there were some flag I could specify with a PR id or something when pushing, but I'm not really sure that's feasible without some sort of standard protocol, since I don't expect (and wouldn't want!) git to put in Github-specific functionality. I'd certainly settle for some menu item in Github's PR view for a maintainer to be able to do something like select a recent push from a menu to then be attached to the PR's metadata for future viewers, but at this point I doubt it will ever be added.

sudo_chmod777 · on Oct 1, 2022

Vote here: https://github.com/community/community/discussions/8940

gorgoiler · on Oct 1, 2022

There are two kinds of history: private and public.

When you are writing your book, your drafts might be versioned and you might want to revert a chapter or section if it turns out to be of no value. Later on you might want to cherry pick that chapter out of the bin*. That’s all part of the writing process.

When your work is complete you have your editor give you feedback on which also use a revision control system. This isn’t quite public but it’s different to the private process in that the changes have semantics related to the decisions making you and your editor take part in. Those decisions need to be recorded in the history of the project. You eventually publish the first edition.

Longer periods of time see revisions to the first edition. A friend of mine just finished his third edition ten years after the previous one. These big changes will be made up of many small changes you do want to know about (decisions you and your editor came to) and many you don’t (“wip reverting section 43 / 45 ordering”.)

Software is the same. Some people want to publish their entire decision making process. This is a mistake — no one finds value in it, so it should be kept private. Most people are happy for the decision making process to be public (pull request arguments) or at least their outcomes to be visible (ff commits of approved PRs.)

Editions of the work (tags and/or release branches) shoul have a git log at the granularity of ideas that were agreed upon and made public as opposed to every single idea we ever had.

*Is a writer’s reflog their wastepaper basket or their brain?

madmax108 · on Oct 1, 2022

> This is a mistake — no one finds value in it, so it should be kept private.

Just anecdotal, but I disagree vehemently. In my experience, an easy way to follow Chesterton's fence ( to figure out why a piece of code is the way it is before modifying it's behaviour), is to understand the decision making process that was being followed and not just the blob of code that added it (Even more so on older repos where the original author may not be around).

Many a times, it becomes obvious that the code was written under a very different understanding of the what the system evolved to be, and as such can be changed accordingly, and inversely that the code makes certain strong assumptions when written which hold true even today.

This is why as a thumb rule, especially for junior developers just getting started with git, I spend time on getting them to understand good git commit messages instead of 'git add . ; git commit -m "some code"' because I've seen a tonne of value retrospectively in this.

gorgoiler · on Oct 2, 2022

You’re absolutely right. Version control should show the changes to the code as well as the reasoning for making the change. Our only disagreement is that you want to track that using the commits that led up to the final idea being complete, whereas I delete the history and rewrite the story of the idea, what was there before, what’s changing, and why in the final commit message.

When the problem you want to solve is to provide a historical account of new features for future developers fixing bugs — often you, yourself — then a written document of a few paragraphs, right there in the git log/ blame, provides the highest signal.

almostnormal · on Oct 1, 2022

> [...] I spend time on getting them to understand good git commit messages instead of 'git add . ; git commit -m "some code"'

Decisions could also be documented explicitly, explaining the pros and cons of possible solutions to future readers. However, in absense of this, commit logs may contain this information, better than nothing. But with all the commits kept from development the logs will contain a significant amount of non-interesting information, and make the history hard to read a few years later. And reading the log to find something will get slowed down by explanations that should be in docunents or comments in the code.

masklinn · on Oct 1, 2022

> Decisions could also be documented explicitly, explaining the pros and cons of possible solutions to future readers.

Such decisions would be documented in the commit log. Optionally linked to an expanded version off-log (e.g. mailing list threads, issues, tickets, ...)

lamontcg · on Oct 1, 2022

> understand the decision making process that was being followed

This should be documented in the PR in human language, hopefully with some backwards and forwards and questions asked and answered.

I'll frequently go back and document why the PR wound up the way it did even after everything has been merged and shipped because some discussion a day later makes me realize that the information had never been captured.

And those can be options that were never in the commit history because they weren't the path that was taken.

Then a few years later you can read the PR and see that there were different options, that some of them were rejected as not meeting the actual requirements, there might have been concern about breaking changes, the very issue that has cropped up later might have been anticipated but a simpler solution was used on the expectation that the complicated solution wouldn't be necessary (and now it is necessary). The "why" of a change is a lot more than the history of the person typing on the keyboard and making commits, there's a whole lot of thinking that goes into a change which is never reflected in a single line of code.

copirate · on Oct 1, 2022

The private/public distinction also matches Linus' view on when history can be rewritten: https://www.mail-archive.com/dri-devel@lists.sourceforge.net...

kayson · on Oct 1, 2022

I'm not sure I'd call Rebase & Merge clean... If you do it you get duplicate commits (the hash is different but the content is the same). It appears clean if you look at the commit history for a given branch on Github, but it's not all that clean in git itself.

I work on smaller scale projects and set up a work flow where development is done in a develop branch, and main is locked behind PRs so everything gets CI'd and reviewed before its CD'd. I'm definitely on "Team Clean", because I think it works better for projects of our size, but the only way to get a rebase and ff merge is to do it locally and push to main, which I've blocked for safety.

Maybe there's a solution but I couldn't figure one out.

masklinn · on Oct 1, 2022

> I'm not sure I'd call Rebase & Merge clean... If you do it you get duplicate commits

Which matters… how? The duplication will disappear when you nuke the development branch.

MathiasPius · on Oct 1, 2022

One way in which it is useful, is that you can tag your build artifacts (like docker images) with the commit id in the development branch, and just re-tag them when you want to promote the artifact to main or production, or whatever.

Without this option, you have to either rely on branch names when promoting, which means you can't tie your build to a specific commit in your main branch anymore[1], or you can rebuild your image from the new commit, potentially producing an entirely different artifact which might differ from the one you just tested and validated on your development branch.

[1] Unless you back-fill that information after merge, but the workflow is getting really convoluted at this point.

masklinn · on Oct 1, 2022

> One way in which it is useful, is that you can tag your build artifacts (like docker images) with the commit id in the development branch, and just re-tag them when you want to promote the artifact to main or production, or whatever.

I don't understand what you're talking about.

The development build artifacts are useless afterwards, you can't "promote" an artifact from development to main without rebuilding unless you're basically the only person working on it, as the integrated behaviour (and thus the corresponding artifacts) can differ from the pre-integration ones.

The only situation where you could reuse the development artifacts as-is with no risk is if the branch is fast-forwardable. Which the integration could just take in account by not rebasing branches which are already rebased.

> potentially producing an entirely different artifact which might differ from the one you just tested and validated on your development branch.

You need one anyway, even if the development branch validates it still has to be validated post-integration as it may fail at that point.

> [1] Unless you back-fill that information after merge, but the workflow is getting really convoluted at this point.

Is it? All of it is easily automatable. Why are you limiting yourself for things the computer takes care of?

seba_dos1 · on Oct 1, 2022

If your branch is not fast-forwardable and has to be rebased before merging, then you wouldn't be able to reuse the artifact from development branch anyway as your production branch is already somewhere else than your development branch was.

If it is fast-forwardable, then individual commit IDs don't change at all and you only get a single, effectively empty, merge commit on top of them, so you actually can reuse the artifact just fine.

kayson · on Oct 1, 2022

That's never going to happen if development is ongoing, though. Especially if multiple people are working on it

masklinn · on Oct 3, 2022

If development is ongoing, the branch is not merged, there is no duplication.

Ferret7446 · on Oct 1, 2022

Why do people think that having branches in their history is "unclean"? I'm struggling to understand what is the issue here.

generationP · on Oct 1, 2022

When merge commits involve conflict cleanup, they are very hard to debug (at least with standard tools like tig/gitk/diff; there might be something more advanced). I've seen lines of code disappear into nowhere in merge commits. You can only hope the conflicts were resolved well.

Even conflict-free merges can be scary -- you're trusting the system to find the right context. Of course, the same errors can happen on rebase, but at least the resulting commits can easily be inspected.

georgyo · on Oct 1, 2022

The problems here come from the fact that a "merge" commit is just a normal commit with multiple parents.

This means that anything can happen inside that merge commit, including add, remove, and modify all the code, even if not touched by a parent.

Merge commits are good place for a malicious actor to hide changes.

gregmac · on Oct 1, 2022

I can see it being an issue with people that don't bother to keep clean git history? Eg, that write commit messages like "fixed bug" "oops" "more stuff".

I'm on Team Write-Good-Commit-Messages. The key of being on this team is learning interactive rebase: of course everyone makes mistakes, but by rebasing before you push no one has to know you did. When the commits are clean, the history is clean.

Write good PRs, too, and make sure they translate to good merge commits. Then you can run `git log --first-parent master` and get a pretty decent high-level overview of changes, but with all the other benefits of keeping individual branch commits in history.

kortex · on Oct 1, 2022

Because it is visual noise. It's really hard to track, especially on the CLI. It's also a lot easier to reason about the codebase through time if it's basically a linear track.

We started off with merge commits and moved to squash rebase (team history to team clean), and I gotta say, I absolutely love team clean in the overwhelming majority of cases.

The only time team history is handy is as a sort of temporary space on feature branches where you are sorting out complex merges, debugging, or trying different patterns. But all of that gets squashed away onto main eventually.

seba_dos1 · on Oct 1, 2022

I can perfectly understand the appeal of keeping the history clean by requiring ff-only merge commits, but I fail to see any proper case where squash rebase is actually useful in any way. It only loses potentially useful information and doesn't give anything in return. The only usefulness I can image getting out of it is throwing away garbage commit messages and fixup commits made by people who didn't bother to get them right, which sounds like something to fix in the first place.

When you want to maintain linear history, FF-only merge commits allow you to have cake (keep properly atomic changes retained) and eat it (filter out the detail in git log when it's unwanted). Squashing (eating cake) or rebasing with no merge commit (having cake) only makes things worse from there.

Gigachad · on Oct 1, 2022

Every PR I have worked on starts with some good commits but then they become useless when review happens and a bunch of commits that are essentially “do the suggested thing”. Better to just squash it all in to one commit for the whole PR.

Ideally the PR is small enough to be logical as one commit.

seba_dos1 · on Oct 1, 2022

> Better to just squash it all in to one commit for the whole PR.

Absolutely not. It's better to edit your PR to contain suggested changes in a clear, atomic history before presenting it for another round of review. Why would I want to present commits that "do the suggested thing" to the reviewer in the first place? That doesn't make any sense and it shouldn't pass the review.

agsnu · on Oct 1, 2022

Having a commit in isolation that "does the suggested thing" reduces the cognitive load on the reviewer(s) because you can see the conversation history and the specific changes in response to that comment without having to re-parse the entire branch changeset from first principles.

Basically the trade off is, re-write history before review or after.

If you re-write before review, you lose some context about the evolution of the changeset - also the GitHub UI can get confusing (because some conversation comments will apply to commits that no longer exist on the PR branch). Keeping the pre-review iteration visible in the context of the PR can make it easier for someone else to come along and join the discussion later.

If your feature branches/PRs are short-lived, then the size of a squashed merge is still quite small, and the pre-review iteration is irrelevant. If for some reason the pre-review iteration is interesting, it's still there in the context of the PR, which is back-referenced from the commit message.

seba_dos1 · on Oct 1, 2022

> Having a commit in isolation that "does the suggested thing" reduces the cognitive load on the reviewer(s) because you can see the conversation history and the specific changes in response to that comment without having to re-parse the entire branch changeset from first principles.

When I do a review this way, conversation history is retained and I can easily see the diff between rounds of reviews in the UI. I'm mostly using GitLab. Are GitHub's reviews really this bad?

When working on a project that maintains linear history, you should imagine your merge requests to be like patchsets sent via e-mail. If you send a set of three patches to Linux and you get some feedback in return, you don't reply by sending a fourth patch on top. You reply by sending a V2 of your patchset. Previous discussion is still there attached to V1, and everything is clear for both the submitter and to reviewer, and even to bystanders reading the discussion in future. That's pretty much what you get when you force-push an updated patchset onto your MR branch in GitLab.

> then the size of a squashed merge is still quite small

In my experience, only trivial MRs fit comfortably into a single commit. Usually a good MR contains somewhere around 3-10 commits. More than that and it's probably worth splitting to make review easier, less than that and you're into single bug fix territory where this whole discussion doesn't matter much.

> and the pre-review iteration is irrelevant.

Pre-merge iteration, either before or during review, is always irrelevant for the repository. It should only be retained in your review tool (GitLab does that well) and referenced in merge commit (after all, its whole purpose is to do just that and group commits together).

kortex · on Oct 1, 2022

I think it depends a lot on your process as well. I should have added that we use Jira+Confluence and every merge commit starts with

    ENG-1234 | Does the needful with foobar

So most of the context is in the tickets, which then point to design docs. The code has very high percentage of docstrings, very high type coverage (for Python anyways), so there's really not much to be had in commit messages. We are a small (<20) team of engineers in a fast-moving year-old startup. Very tight slack comms. PRs to main are often single-digit lines, typically a few dozen, and anything more than a few hundred LOC is considered suboptimal.

I think if you are running an open-source project, with lots of disparate contributors, using Git as your main knowledge base, with bigger PRs, it's a very different ballgame. I think in that situation I would be team history.

ilammy · on Oct 1, 2022

> It's really hard to track, especially on the CLI.

    git log --first-parent --max-parents=1

Now you don't see merge commits.

Forbid merge commits with conflict resolution, and you're done.

seba_dos1 · on Oct 1, 2022

You usually don't want to use both arguments at the same time as that will hide the whole merges (both merge commits and merged commits).

You can however use one of those arguments to choose what you want to filter out, which is very handy. FWIW, `--max-parents=1` has an alias `--no-merges`.

Gigachad · on Oct 1, 2022

How do you forbid the ones with resolution?

Genuine question because it seems like merge commits are a black hole where anything could happen and no one would notice.

seba_dos1 · on Oct 1, 2022

Just require the merged branch to be fast-forwardable (but don't actually fast-forward it). If it's not, it will be obviously visible in `git log --graph`.

raggi · on Oct 1, 2022

git log --topo-order

yakkityyak · on Oct 1, 2022

I'm betting that a perfectly linear git history is probably easier/nicer to have for people who think of git as a sequence of deltas.

That is not how git works, but I can understand the appeal. For all my personal projects I squash and merge everything.

tsimionescu · on Oct 1, 2022

This depends less on how you conceive of Git, and more on the project's development model. For example, in a team using Git for an internal product, with a centralized repo a main branch and (relatively short-lived) feature branches, the conceptual history of the repo is linear. Seeing all of the merges that everyone did when getting into work is line noise, not likely to be meaningful - so rebasing feature beanches on master every day and then doing squash or FF commits when merging back into master makes the most sense.

Alternatively, for an open source project with no "blessed", and with long-lived branches that get contributed by various orgs, trying to make the history appear linear would help nobody.

So obviously this kind of model wouldn't work for Linux, but it works better for most in-house software using Git as a centralized VCS with some nice offline history features.

teraflop · on Oct 1, 2022

If you have a Git repository with merge commits that merge changes back into your main branch, all you have to do is "git log --first-parent -m" to get a log that looks exactly as if you had squashed and rebased every change.

I'm surprised I don't see more people talking about this option. Sure, the syntax is inelegant, but it seems like it gives you the best of both worlds: you can browse your history as if it was linear, and drill down to the individual commits in a branch only when necessary.

graton · on Oct 1, 2022

> all you have to do is "git log --first-parent -m"

What is the `-m` for? Looking at the man page it says: "-m will produce the output only if -p is given as well."

Did you mean "--max-parents=1" as a different commenter used with "--first-parent"?

teraflop · on Oct 1, 2022

The "-m" doesn't do anything by itself, but it's there in case you want to add an option like "-p", which otherwise wouldn't show a useful diff for a merge commit.

With "--first-parent -m -p", the log shows only the diff of the merge commit against its first parent, which is the same as what you would get from squashing.

seba_dos1 · on Oct 1, 2022

You rarely want to use `--max-parents=1` with `--first-parent` as that will hide all merge commits and merged branches entirely. `-m` is useless here, just use `git log --first-parent`.

tsimionescu · on Oct 1, 2022

Do Github, GitLab, BitBucket, or any other Git UIs support this option? Relatively few people use the command line for inspecting Git history. Not even sure Magit exposes this option directly in most places where it shows history, I believe.

zdw · on Oct 1, 2022

Gerrit has a way to look at all patches uploaded, but only take the last one which should be squashed/rebased, if it's configured that way (The "Fast forward only" option).

It results in very clean history, and unmodified commit hashes from a developer's commit, whereas Github's Squash and Merge will mangle the commit message by adding a PR # to it, as well as other comments, which changes the hash. PR #'s are not as good as clean hashes, IMO.

As someone who went from an org that was using Gerrit to mandatory Github usage, the latter is definitely a step backwards in code review. That said Github does have a better browsing interface than gitiles.

graton · on Oct 1, 2022

> Relatively few people use the command line for inspecting Git history.

Counterpoint: I use the command line all the time to look at Git history.

seba_dos1 · on Oct 1, 2022

Same here. Web UIs are pretty cumbersome compared to Git and only really useful for quick lookups in recent history. For anything else, git log is much more convenient.

dataflow · on Oct 1, 2022

Do you have a useful & understandable mental model for "delta"-based operations like "rebase", "revert", etc. for arbitrary commit DAGs? Admittedly I haven't sat down to think about it for very long, but I feel like I'm not the only one who feels like lots of operations (including those) become difficult if not impossible to reason about with nonlinear history.

codeapprove · on Oct 1, 2022

As someone who is a big proponent of "squash and merge" I am susprised to see myself labeled an extreme clean freak! I don't have very extreme git uses or opinions.

I want a few things in my team's workflow:

* A commit history that a human can mostly understand

* Effective code reviews where I can easily see what changed between rounds

* The ability to revert a whole Pull Request easily and cleanly

* A local workflow that does not get in my way and allows me to be messy when I need to be

For our team that means we commit whatever we want to our local branch and then squash and merge the whole thing at the end. The result is a single commit saying "Adds feature foo". I don't feel like I lost anything in the squash ... am I missing something? I really don't want to have "fix lint, address review comments" in the shared commit history.

jlokier · on Oct 1, 2022

> The result is a single commit saying "Adds feature foo". I don't feel like I lost anything in the squash ... am I missing something? I really don't want to have "fix lint, address review comments" in the shared commit history.

Neither.

If you're doing it then way it's done in Linux, you will end up with neither of those. Instead you will have: "Add a new geometric engine with tests", "Add the geometric engine to view API keeping backward compat", "Use the new view API extensions in the main window renderer", "Implement behaviours for feature foo in the renderer", "Hook up scripting extensions, keys and menu options to make feature foo user-accessible", "Add basic documentation for feature foo".

No "lint", "reviewer feedback", "fix typos" or "I changed my mind and reverted the last revert" trivia goes in the history.

You should not in general submit your minute by minute trivial actions, indeed the commits submitted are usually not even in the same order as they were written and refined

Nor are the above individually complex, logical, meaningfully revertable and bisectable changes squashed down to an unhelpfully crude one-liner "Add feature foo" with a 5,000 line diff that touches most files in the project.

Some would say the logical changes should be separate PRs. To some extent, but it depends how fast you wrote the feature, whether the round trips make sense, and how involved in the nitty gritty you want the feedback to be.

The ideal unit of review or feedback, ie PR size, is often larger than the unit of a logical change. That's because the "future why" is not so apparent with individual changes, so people tend to zoom into the weeds instead of using appropriate context. In so doing they also tend to use up their reviewer attention budget so the other changes don't get the same scrutiny.

Think about the feedback to the first big logical change "Add a new geometric engine with tests": "Why are you adding this? What is the motivation?", or "This works but the API has these unnecessary warts" (warts you have to explain will be necessary in 3 PRs time). You can either have a chat in the PR feedback where you have to persuade someone that the rest of your plan makes sense, not easy if they can't see how it does. Or you can present the later changes all together as a group and then the code speaks for itself, much more convincingly.

drewcoo · on Oct 1, 2022

Well if we're going to start adding wish list items . . .

I would like decent distributed bug tracking such that I could follow a bug from first detection through branching and merging and forking until fix and then follow the absence of bug through branching and merging and forking.

No more centralized bug trackers. Death to Jira!

mwint · on Oct 1, 2022

In a perfect TDD world, you could write a test for the bug and let bisect figure out where the bug first appeared and when it was fixed.

Hard to do, but when I was at a big company they had something similar.

masklinn · on Oct 1, 2022

Good luck with that.

The idea of a distributed bug tracker was in vogue in the first years of the DVCS rise.

The problem is that it doesn’t really make any sense, or at least nobody managed to make it make sense: a bug tracker is a synchronisation point, and a bug is a cross-cutting concern, and one which outlives it’s fix.

melonrusk · on Oct 1, 2022

I have no idea what this article is talking about, but if there are two plausible representations for the whatever-it-is, why not allow it to be represented both ways ?