This is incredibly timely and I for one appreciate the post. I'm just moving myself and a team from past-experience of SVN into the world of git.
None of us know git, but we're all vaguely aware of it. Unfortunately the place we are at requires us to integrate with ClearCase and SVN, and to make matters worse we're all on Macs and Linux for which ClearCase tools don't appear to exist.
Our proposed solution to bring sanity to this is to use git with git-cc and git-svn to act as bridges to the legacy systems, whilst putting all new work natively into git.
Now... I'll be the first to admit that we're not full of confidence about this as it's the fear of the unknown, so if anyone else has solid advice on how to work with git and workflows and integrating into existing legacy systems I would be keen to hear it.
Are you trying to integrate ClearCase with svn (with git as the go-between bridge)?
Thats a bit more than I've had to deal with (though I've heard of perforce->(git)->svn being done).
As for git-svn itself, I'll say what I said on the other thread - what finally got me to switch is that with git-svn, my local copy is a real live Git repo. Which means the code I write, my actual work, is all still there and accesible via standard Git commands that I can look up. The fix for the worst-case scenario? I do a fresh svn checkout to a new directory, and I do git checkout, I copy the files out, I do an svn commit of that version of files.
Far from ideal, but knowing that I could recover from worst the system could throw at me, put me quite at ease. (Yeah, yeah git svn dcommit, but we're talking about a fearsome worst-case scenario.)
We all have experience with SVN, and that's been our preferred setup in our prior roles.
Here, ClearCase is used by the back end services team and they use that to push things through the different environments too. So at some level we need to access their code, and to contribute code to their repository.
SVN is used by other rogue developers to try and avoid ClearCase... very understandable really. So we also need to get some of their code and contribute to them.
So neither holds all knowledge, and IBM doesn't provide ClearCase software for Mac OS X.
So what we're looking at is a git-cc bridge to go in that direction, installed on a Windows box and then allowing us internally to use git to work and then just to periodically push upstream from the Win box.
And as we'll be using git, it seems desirable (though not required) to use the git-svn bridge if that works well as then we only need to use the one tool daily.
I've read a lot about how to setup the CC2Git bridge, and no solution seems great but all appear to be adequate enough for it to work at some level. I haven't checked out git-svn too much but what I had read was mostly good.
The concerns we have are once we're in the git world, how do you use the thing? I've seen a lot mentioning the importance of understanding git and workflows, but not a great deal describing git for SVN users and then showing simple to follow guides that not just communicate the commands, but also a nice and solid workflow.
I assume you've seen the git svn crash course (http://git.or.cz/course/svn.html) which is good overview, but falls short on the subtleties of merging (particularly using git pull --rebase). The side-by-side commands in the two VCS it has is great. The ParrotVM wiki also has a git-svn guide (http://trac.parrot.org/parrot/wiki/git-svn-tutorial) that I've found quite helpful and has an introductory workflow.
It seems to get linked with every git story on HN but schacon's Pro Git is unequivocally the best reference and tutorial I've found: http://progit.org/book/
As a git user, i do really think it's critical that people think about their workflows and talk over what strategy they're going to use w/ their fellow committers.
There's (definitely) more than one way to do it with Git, so it's best to read up on how other people use it.
"is that there are a set of high-level version control operations that you’d expect git to be able to handle in simple cases without a lot of fuss." True.
git is a "particular kind of" swiss knife of Version Control - tonnes of options, not all of them intuitive but extremely powerful and useful if you like to work in the git way.
I, for one am a bit surprised about all this noise on workflows of Version Control , you use what you think is best suited to you and your team.
Its as useful as a language war , not all languages are equal but you use what suits your use-case and what you are most comfortable with. End of.
The reason language and version control wars exist is because these are the mechanisms by which developers interoperate. It is worthwhile for me to convince you to use Git instead of Subversion, or Python instead of C++, because I might one day work with you and have to use your tools. If there's consensus about what the best tools are, then I am less likely to experience pain when moving.
If we were talking about which editor to use, I am rarely bound to use a certain editor based on the team I'm on, so that's why editor wars are, indeed, largely noise.
"I might one day work with you and have to use your tools. If there's consensus about what the best tools are, then I am less likely to experience pain when moving."
Your point though correct in the abstract misses one crucial piece of information . Is it the best tool for the job i.e. it is not simply enough to say Use git or Python or <foo> etc. The context around it is very important.
For e.g. in my current team everyone is comfortable using git as we have a largely de-centralised development model. On the other hand in my previous role it was a very centralised - one true copy of source code model - hence SVN.
The comparisons and wars that come out are devoid of this context and just say , i use this and therefore you should do. My point is that there is no one true way and attempting to posit that even though interesting is largely futile.
When we get to the point where two tools have legitimate, reasonable tradeoffs, then I agree with you.
However, your example isn't a good one because Subversion has serious deficiencies and Git works perfectly well in a centralized model. (Just "bless" one repo on a server and have everyone push to it.) I can't think of a single use-case where Subversion is a preferred way to solve the version control problem.
"I can't think of a single use-case where Subversion is a preferred way to solve the version control problem".
Developer expertise.
My point is not that Subversion is a better VCS than git , it is not. I am a big fan of git and will use it everywhere i can but i don;t get to make the decision to choose the "right" VCS every time , sometimes you end up in places where there are factors other than technical merit at play.
However, i think we are discussing a point orthogonal to the main topic at hand.
I like Yehuda's advice to use --rebase. I'll have to try that.
I used to really like svn, but after using git for about a year I am now feeling some cognitive dissonance using svn on a customer project. Great tools are better than good tools.
I understand why --rebase isn't the default, but in the overwhelming majority of cases, you want to --rebase when pulling. I always tell newbies that they should default to --rebasing.
Why isn't --rebase default? Can anybody explain what it actually does and why it's not default, man pages are not clear to me? Every thing is explained by referring to a bunch of other things and there's not a trace of a simple explanation.
Okay, so you have to think about repository state in order to understand rebasing.
You share some common history with your remote repository (for the sake of argument lets assume you've got a single remote repo that you and your friends push to).
When you make changes and commit them locally, what you're doing is adding new commit nodes to the history tree of your local repository.
When you push to the remote repo, the chain/branch of commit nodes from your local repository gets added on to the remote repository's history.
BUT, if, while you were working and committing, someone else adds commits to the remote repository... There is now new history in the remote repository that you do not share.
What rebasing does, is say "take all the commit nodes from my local branch (remember them), and rewind the history back to the point where my repository was in synch with the remote repository. THEN update my repository with the remote changes, and try and apply all my local commits on top of my newly synched tree."
TL;DR:
if your commit tree looks like this (where 'H' are commits which you share with the remote repository. 'L' are local commits and 'R' are new commits to the remote repo, which you don't have):
H-H-H-H-H-L-L-L-L
Your friend pushes his changes so the remote repository now looks like
H-H-H-H-H-R-R-R-R
Rebasing does this:
1) H-H-H-H-H [snip] L-L-L-L
2) H-H-H-H-H << R-R-R-R
3) H-H-H-H-H-R-R-R-R << L-L-L-L
If there are conflicts, Git steps through each of your commits and lets you fix the problems and merge. If there are no conflicts, it just seamlessly merges.
Thanks knowtheory, that's exactly what I wanted to read, that git pull --rebase is "take all the commit nodes from my local branch (remember them), and rewind the history back to the point where my repository was in synch with the remote repository. THEN update my repository with the remote changes, and try and apply all my local commits on top of my newly synched tree."" Now what's "git pull" explained in the same terms?
Is it "produce errors as if remote were in sync with local even if it's obvious it isn't"? Who needs that?
If it's not, then what is a proper description?
And is there any scenario in which remote was changed, you want to push there your changes and you wouldn't want "git pull --rebase"?
"Normal" git development (without rebasing) naturally creates a branching history when two people make changes based off a common root. As was explained above, rebasing causes this branching history to be rewritten into a linear one. The alternative is merging, where you cause Git to record what actually happened -- which is that two changes occurred simultaneously and you explicitly brought them in sync again.
Sometimes (often) this is what you want -- it's especially useful for longer-lived branches of work, where having the history of the branch can be valuable in itself.
Also, if you share a commit with someone, you can no longer safely rebase it! The reason is that part of the definition of a commit is its set of parents -- the commits it depends on (usually 1, but can be 2 or more for a merge commit). When you rebase, you are rewriting all the parents, so you end up creating a whole stream of new commits and discarding the old ones. If you then attempt to share history with someone who has the old ones, presumably Git will become confused. (I've never actually run into this problem, but I don't use rebasing much.)
Those are good arguments why "pull --rebase" is not a default action, thank you. Now I don't understand why either "pull" or "pull --rebase" is supposed to have any different behaviour when somebody does it! As far as I undestand git has enough information not to complain more in one case and less in another, and as far as I understood, the main argument of the main article was that "--rebase" somehow "eases some pain"!? How come?
I can't comment on the motivations of the maintainers, but in my experience normal merges can handle more conflicts cases smoothly than rebase. Rebase is also controversial because technically it's rewriting history. I think for the special case of pulling from remote, though, that the tradeoffs are often worth it, but nevertheless, merge is the safer default.
REMOTE history is absolutely sacrosanct (A remote being any repository that someone is going to pull from), but you can do whatever you like with your local repository, so long as you only push from it.
OK, but I believe remote repo is there only to be pushed to, so then is there any example of benefit of pull without --rebase if the pull is anyway only done locally?
Says " If you rebase commits that have already been pushed publicly, and people may have based work on those commits, then you may be in for some frustrating trouble." but not how that can happen at all, provided you always "pull --rebase" and then "push", and if that can happen at all, then what to do.
Sacrosanct? No. But still useful. With a real merge, you get a special commit that summarizes what had to be done to complete the merge. That can come in handy, especially with the sort of bugs that creep in during merges. Rebase will actually change your original commit, which obscures the original intent of the commit and hides the merge. In those cases I'd rather have a slightly messier history but more information about what really happened.
If only somebody would explain what this really does! The man pages are not helpful!
git-pull(1) Manual Page says: "--rebase: Instead of a merge, perform a rebase after fetching. If there is a remote ref for the upstream branch, and this branch was rebased since last fetched, the rebase uses that information to avoid rebasing non-local changes. To make this the default for branch <name>, set configuration branch.<name>.rebase to true."
Oh how clear. So let's try git-rebase:
"If <branch> is specified, git rebase will perform an automatic git checkout <branch> before doing anything else. Otherwise it remains on the current branch.
All changes made by commits in the current branch but that are not in <upstream> are saved to a temporary area. This is the same set of commits that would be shown by git log <upstream>..HEAD (or git log HEAD, if --root is specified).
The current branch is reset to <upstream>, or <newbase> if the --onto option was supplied. This has the exact same effect as git reset --hard <upstream> (or <newbase>). ORIG_HEAD is set to point at the tip of the branch before the reset.
The commits that were previously saved into the temporary area are then reapplied to the current branch, one by one, in order. Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped)."
I'm lost!
Now can anybody explain so nice like Yehuda did about the workflow, what and why that
git pull
did "wrong", and what the hell is that that "--rebase" does "right." Especially as here and there "--rebase" is described as something that "doesn't keep history" and therefore "bad."
Note that I put "bad" "wrong" and "right" in quotes. I know that's all relative, I just want to get some simple explanation what's going on without reading "recursion see recursion" man pages!
I don't think "git pull" does anything particularly incorrect. Its just that when you have merge issues with a git pull you have to resolve all merge issues at the one time(as in resolve all changes all together).
Someone can correct me if I am wrong, but a "git pull --rebase" will put all your current local commits aside and then pull everything from wherever you are pulling, applies those changes to the tree without your local changes, then applies all your changes in order one by one, until you are current with both the remote and local changes.
If there is a problem with a merge on one of your applied changes, your rebase pauses , you fix the conflict as you would normally and then continue your rebase(as outlined in the article).
Essentially the rebase takes all your changes and applies them in order on top of the changes you are pulling.
Once again, if I made any mistakes in this explanation, someone please correct me.
Pull with rebase essentially takes all of your local commits and applies each one in turn to the remote head as if each one was a patch. Plain "git pull" doesn't do anything wrong at all, rebase is just a different style. It lets you deal with conflicts one by one, in the context of the original commit.
I actually disagree with Yehuda on this one. I like pull-with-rebase in the case where there are no conflicts, but if there are conflicts, I'd rather have a record of the original work, plus a special commit that shows the work done to reconcile in the case of a conflict. But that's really a matter of taste.
Standard merge feathers commits together. Rebase looks at your new changes, puts them aside, brings in the remotebranch's changes, and then replays all your new changes on top of the remotebranch. This is great because it makes YOUR changes truly new, and you can handle all your conflicts once.
git fetch is what syncs the remote changes to your local repo. . If you want to see these changes, they are referenced by so-called remote-tracking branches. To view them use:
git branch -r
Now, if your local branch (let's say its master) has not diverged from the remote branch to which it corresponds (typically origin/master), then it doesn't matter whether you use rebase or merge. In the case where your local master is a superset of the remote master, there are no new remote changes to incorporate. In the case where where your local master is a strict subset of the remote master, then either rebase or merge will perform a so-called "fast-forward" operation which basically just updates your local master to the same commit as the remote master.
However, if your local master has diverged from the remote master, then there is work to do:
Rebase will take all your new local commits, set them aside, reset your local master to the same commit as the remote master, then replay your commits one at a time. If there are conflicts, you will of course have to resolve them as you go.
Merge, on the other hand, will attempt to create a merge commit. A merge commit is one that has two parents. In this case, the parents are the tip of the remote master and the tip of the local master. Of course, here too there may be conflicts for you to resolve.
Regardless of whether you performed rebase or merge, when done, your local master is now a superset of the remote master and you can push out the change. (Unless of course, while you were rebasing/merging, someone else pushed their changes. In this case, wash, rinse, repeat…).
Now, as to whether to use merge or rebase… there is no right answer. It very much depends on the workflow.
Rebase will give you a "cleaner" history in the sense that the history remains linear which some folks prefer. A linear history is easier to understand and if you ever need to use "git bisect" to find where something broke, a linear history is much easier to deal with.
However, some folks prefer to see the true history of development in the sense of "when so-and-so started this work, on which commit was it originally based?". In that case, merge preserves that information (as long as it wasn't a fast-forward -- but you can always force a merge commit with "git merge --no-ff").
In some cases, it makes sense to use both rebase and merge -- rebase for simpler changes and merge to incorporate long-lived topic-branches.
It pains me that the git man pages are so awful. Patches welcome (it is unfortunate that newbies provide the best perspective on how awful the docs are, while simultaneously are the least likely to be able to contribute improvements to them). Ultimately, git is actually based on some fairly simple concepts and I happen to think that once you have a conceptual understanding of it, all the commands (more or less…) start to make sense. So I strongly advocate trying to understand what the heck is going on under the hood.
That said, the Pro Git book is a much better place to start learning git than the git man pages.
"Merge, on the other hand, will attempt to create a merge commit. A merge commit is one that has two parents. In this case, the parents are the tip of the remote master and the tip of the local master."
Now what I understand from that is that "merge" (that is pull without rebase) should not be worse in any way!
In both scenarios git doesn't have less information to begin with, it doesn't start with any false assumption, it's just that without --rebase the resulting local repo is supposed to contain a bit more information.
Then why in the world is "pull --rebase" again "easy" to make the result and why is plain "pull" something that's "hard" to do for a beginner?
Now I think it must be some side effect of one or another that makes the whole story relevant?
I moved all my projects and code over to git from svn about a year ago. My workflow has improved tremendously as a result - especially working in larger teams.
That being said, resolving conflicts in git has almost always been more difficult and counter-intuitive than resolving conflicts in SVN. With git, you have to learn the "right way" to do it, which typically involves several steps or a "git reset --hard origin". I can understand the frustration coming from the guy in the linked post that this post is a response to.
I'm using git on AWS running Ubuntu Hardy. Is there a GUI tool I can use to look at my repository from a remote Windows box (where I use SecureCRT to access the instance)?
Comparing git with svn doesn't make sense. It's two different tools with different use-cases and a transition from svn to git only make sense for some (motivated) teams.
While there are quite a few different workflows, there are a number of similar operations (get a repository, get updates from a remote, push changes to the remote). My point was not to compare git (as a whole) to subversion, it was to demonstrate how the tools handle those operations.
I think it's reasonable for people who are moving from subversion to git to expect that commands for these operations will exist (and they, of course do).
Finally, even in my limited description, I managed to demonstrate a benefit of git (namely, that the slightly different default workflow handles the common problem of rollback much more elegantly)
I would have thought they're both basically version control, since one of the decisions in version control is whether to use svn or git or something else.
So what use cases are svn appropriate for (and not git), and what use cases are git appropriate for (and not svn)?
I think it can be useful to compare git and svn. True, it is comparing apples and oranges. However, that's a reasonable thing to do when you're considering migrating from apples to oranges.
None of us know git, but we're all vaguely aware of it. Unfortunately the place we are at requires us to integrate with ClearCase and SVN, and to make matters worse we're all on Macs and Linux for which ClearCase tools don't appear to exist.
Our proposed solution to bring sanity to this is to use git with git-cc and git-svn to act as bridges to the legacy systems, whilst putting all new work natively into git.
Now... I'll be the first to admit that we're not full of confidence about this as it's the fear of the unknown, so if anyone else has solid advice on how to work with git and workflows and integrating into existing legacy systems I would be keen to hear it.