Why do people think that having branches in their history is "unclean"? I'm stru...

generationP · on Oct 1, 2022

When merge commits involve conflict cleanup, they are very hard to debug (at least with standard tools like tig/gitk/diff; there might be something more advanced). I've seen lines of code disappear into nowhere in merge commits. You can only hope the conflicts were resolved well.

Even conflict-free merges can be scary -- you're trusting the system to find the right context. Of course, the same errors can happen on rebase, but at least the resulting commits can easily be inspected.

georgyo · on Oct 1, 2022

The problems here come from the fact that a "merge" commit is just a normal commit with multiple parents.

This means that anything can happen inside that merge commit, including add, remove, and modify all the code, even if not touched by a parent.

Merge commits are good place for a malicious actor to hide changes.

gregmac · on Oct 1, 2022

I can see it being an issue with people that don't bother to keep clean git history? Eg, that write commit messages like "fixed bug" "oops" "more stuff".

I'm on Team Write-Good-Commit-Messages. The key of being on this team is learning interactive rebase: of course everyone makes mistakes, but by rebasing before you push no one has to know you did. When the commits are clean, the history is clean.

Write good PRs, too, and make sure they translate to good merge commits. Then you can run `git log --first-parent master` and get a pretty decent high-level overview of changes, but with all the other benefits of keeping individual branch commits in history.

kortex · on Oct 1, 2022

Because it is visual noise. It's really hard to track, especially on the CLI. It's also a lot easier to reason about the codebase through time if it's basically a linear track.

We started off with merge commits and moved to squash rebase (team history to team clean), and I gotta say, I absolutely love team clean in the overwhelming majority of cases.

The only time team history is handy is as a sort of temporary space on feature branches where you are sorting out complex merges, debugging, or trying different patterns. But all of that gets squashed away onto main eventually.

seba_dos1 · on Oct 1, 2022

I can perfectly understand the appeal of keeping the history clean by requiring ff-only merge commits, but I fail to see any proper case where squash rebase is actually useful in any way. It only loses potentially useful information and doesn't give anything in return. The only usefulness I can image getting out of it is throwing away garbage commit messages and fixup commits made by people who didn't bother to get them right, which sounds like something to fix in the first place.

When you want to maintain linear history, FF-only merge commits allow you to have cake (keep properly atomic changes retained) and eat it (filter out the detail in git log when it's unwanted). Squashing (eating cake) or rebasing with no merge commit (having cake) only makes things worse from there.

Gigachad · on Oct 1, 2022

Every PR I have worked on starts with some good commits but then they become useless when review happens and a bunch of commits that are essentially “do the suggested thing”. Better to just squash it all in to one commit for the whole PR.

Ideally the PR is small enough to be logical as one commit.

seba_dos1 · on Oct 1, 2022

> Better to just squash it all in to one commit for the whole PR.

Absolutely not. It's better to edit your PR to contain suggested changes in a clear, atomic history before presenting it for another round of review. Why would I want to present commits that "do the suggested thing" to the reviewer in the first place? That doesn't make any sense and it shouldn't pass the review.

agsnu · on Oct 1, 2022

Having a commit in isolation that "does the suggested thing" reduces the cognitive load on the reviewer(s) because you can see the conversation history and the specific changes in response to that comment without having to re-parse the entire branch changeset from first principles.

Basically the trade off is, re-write history before review or after.

If you re-write before review, you lose some context about the evolution of the changeset - also the GitHub UI can get confusing (because some conversation comments will apply to commits that no longer exist on the PR branch). Keeping the pre-review iteration visible in the context of the PR can make it easier for someone else to come along and join the discussion later.

If your feature branches/PRs are short-lived, then the size of a squashed merge is still quite small, and the pre-review iteration is irrelevant. If for some reason the pre-review iteration is interesting, it's still there in the context of the PR, which is back-referenced from the commit message.

seba_dos1 · on Oct 1, 2022

> Having a commit in isolation that "does the suggested thing" reduces the cognitive load on the reviewer(s) because you can see the conversation history and the specific changes in response to that comment without having to re-parse the entire branch changeset from first principles.

When I do a review this way, conversation history is retained and I can easily see the diff between rounds of reviews in the UI. I'm mostly using GitLab. Are GitHub's reviews really this bad?

When working on a project that maintains linear history, you should imagine your merge requests to be like patchsets sent via e-mail. If you send a set of three patches to Linux and you get some feedback in return, you don't reply by sending a fourth patch on top. You reply by sending a V2 of your patchset. Previous discussion is still there attached to V1, and everything is clear for both the submitter and to reviewer, and even to bystanders reading the discussion in future. That's pretty much what you get when you force-push an updated patchset onto your MR branch in GitLab.

> then the size of a squashed merge is still quite small

In my experience, only trivial MRs fit comfortably into a single commit. Usually a good MR contains somewhere around 3-10 commits. More than that and it's probably worth splitting to make review easier, less than that and you're into single bug fix territory where this whole discussion doesn't matter much.

> and the pre-review iteration is irrelevant.

Pre-merge iteration, either before or during review, is always irrelevant for the repository. It should only be retained in your review tool (GitLab does that well) and referenced in merge commit (after all, its whole purpose is to do just that and group commits together).

kortex · on Oct 1, 2022

I think it depends a lot on your process as well. I should have added that we use Jira+Confluence and every merge commit starts with

    ENG-1234 | Does the needful with foobar

So most of the context is in the tickets, which then point to design docs. The code has very high percentage of docstrings, very high type coverage (for Python anyways), so there's really not much to be had in commit messages. We are a small (<20) team of engineers in a fast-moving year-old startup. Very tight slack comms. PRs to main are often single-digit lines, typically a few dozen, and anything more than a few hundred LOC is considered suboptimal.

I think if you are running an open-source project, with lots of disparate contributors, using Git as your main knowledge base, with bigger PRs, it's a very different ballgame. I think in that situation I would be team history.

ilammy · on Oct 1, 2022

> It's really hard to track, especially on the CLI.

    git log --first-parent --max-parents=1

Now you don't see merge commits.

Forbid merge commits with conflict resolution, and you're done.

seba_dos1 · on Oct 1, 2022

You usually don't want to use both arguments at the same time as that will hide the whole merges (both merge commits and merged commits).

You can however use one of those arguments to choose what you want to filter out, which is very handy. FWIW, `--max-parents=1` has an alias `--no-merges`.

Gigachad · on Oct 1, 2022

How do you forbid the ones with resolution?

Genuine question because it seems like merge commits are a black hole where anything could happen and no one would notice.

seba_dos1 · on Oct 1, 2022

Just require the merged branch to be fast-forwardable (but don't actually fast-forward it). If it's not, it will be obviously visible in `git log --graph`.

raggi · on Oct 1, 2022

git log --topo-order

yakkityyak · on Oct 1, 2022

I'm betting that a perfectly linear git history is probably easier/nicer to have for people who think of git as a sequence of deltas.

That is not how git works, but I can understand the appeal. For all my personal projects I squash and merge everything.

tsimionescu · on Oct 1, 2022

This depends less on how you conceive of Git, and more on the project's development model. For example, in a team using Git for an internal product, with a centralized repo a main branch and (relatively short-lived) feature branches, the conceptual history of the repo is linear. Seeing all of the merges that everyone did when getting into work is line noise, not likely to be meaningful - so rebasing feature beanches on master every day and then doing squash or FF commits when merging back into master makes the most sense.

Alternatively, for an open source project with no "blessed", and with long-lived branches that get contributed by various orgs, trying to make the history appear linear would help nobody.

So obviously this kind of model wouldn't work for Linux, but it works better for most in-house software using Git as a centralized VCS with some nice offline history features.

teraflop · on Oct 1, 2022

If you have a Git repository with merge commits that merge changes back into your main branch, all you have to do is "git log --first-parent -m" to get a log that looks exactly as if you had squashed and rebased every change.

I'm surprised I don't see more people talking about this option. Sure, the syntax is inelegant, but it seems like it gives you the best of both worlds: you can browse your history as if it was linear, and drill down to the individual commits in a branch only when necessary.

graton · on Oct 1, 2022

> all you have to do is "git log --first-parent -m"

What is the `-m` for? Looking at the man page it says: "-m will produce the output only if -p is given as well."

Did you mean "--max-parents=1" as a different commenter used with "--first-parent"?

teraflop · on Oct 1, 2022

The "-m" doesn't do anything by itself, but it's there in case you want to add an option like "-p", which otherwise wouldn't show a useful diff for a merge commit.

With "--first-parent -m -p", the log shows only the diff of the merge commit against its first parent, which is the same as what you would get from squashing.

seba_dos1 · on Oct 1, 2022

You rarely want to use `--max-parents=1` with `--first-parent` as that will hide all merge commits and merged branches entirely. `-m` is useless here, just use `git log --first-parent`.

tsimionescu · on Oct 1, 2022

Do Github, GitLab, BitBucket, or any other Git UIs support this option? Relatively few people use the command line for inspecting Git history. Not even sure Magit exposes this option directly in most places where it shows history, I believe.

zdw · on Oct 1, 2022

Gerrit has a way to look at all patches uploaded, but only take the last one which should be squashed/rebased, if it's configured that way (The "Fast forward only" option).

It results in very clean history, and unmodified commit hashes from a developer's commit, whereas Github's Squash and Merge will mangle the commit message by adding a PR # to it, as well as other comments, which changes the hash. PR #'s are not as good as clean hashes, IMO.

As someone who went from an org that was using Gerrit to mandatory Github usage, the latter is definitely a step backwards in code review. That said Github does have a better browsing interface than gitiles.

graton · on Oct 1, 2022

> Relatively few people use the command line for inspecting Git history.

Counterpoint: I use the command line all the time to look at Git history.

seba_dos1 · on Oct 1, 2022

Same here. Web UIs are pretty cumbersome compared to Git and only really useful for quick lookups in recent history. For anything else, git log is much more convenient.

dataflow · on Oct 1, 2022

Do you have a useful & understandable mental model for "delta"-based operations like "rebase", "revert", etc. for arbitrary commit DAGs? Admittedly I haven't sat down to think about it for very long, but I feel like I'm not the only one who feels like lots of operations (including those) become difficult if not impossible to reason about with nonlinear history.