My walk through the Git book
Now that I’ll be working with it full time (git is one of the “semi wildly adopted” SCMs at Google), I thought it’s time to take a closer look at some wisdom accumulated by other folks, so I finally cracked open the Git book and did a pass over it.
The book is great and usually very fluid. It begins by show-casing the simple use cases you’ll encouter with git, and is filled with short code snippets you can try (even on a train with no WiFi – this is a distributed source control system after all). Some of the examples weren’t crystal clear straight out of the box, and relied on some previous knowledge the authors had (after all, much of the book was pulled together from different sources, so I imagine it was relatively easy to accidentally assume a bit of knowledge that its readers don’t necessarily have at that point).
Here is a summary of questions I had while reading the book, followed by some cool stuff I found at the end. I recommend at least some knowledge of git for the rest of this article, best accompanied with a reading of the Git book itself. As usual, if you find a mistake, please let me know. Some more related recommended reading is the Git for beginners SO question.
What happens on double git add?
git add is used not just to add new files, but also to ‘add’ changes in existing files.
When I do:
echo v1 > foo git add foo echo v2 > foo git add foo git commit -m bar
Are both versions of foo added to the commit log, or just the latest?
The answer is that just the latest version is actually committed.
After I git merge without conflicts, is a git commit needed?
Coming from svn it was my expectation that after I merge changes into my local branch, I will have to commit them. Doing a quick experiment showed that in git this is not the case at all – if a merge is resolved without manual intervention (including concurrent edits to different places of the same file), then no commit is needed. If there are any conflicts that are resolved manually (by git adding the file after fixing the merge), then a git commit is required.
How does gitk work? Sometimes I see branches, sometimes I don’t … it’s very confusing
This one has been puzzling me for quite a long time. I found that I couldn’t trust gitk, the graphical tool for visualizing commits, branches and merges, because it kept giving me inconsistent results, and for the life of me I couldn’t understand why.
Now I did a few experiments and digging, and found that by default gitk will only show you the current branch, and any objects that are its descendants in the version graph. If you create a branch, switch back to master, and ran gitk, you would not see this branch. What confused me is that upon refreshing, gitk rescans the current branch and add any new nodes to its display, while retaining anything alreaday shown – meaning if you run gitk, switch to a new branch, and refresh gitk, the new branch and its relation to the previous will now be displayed in gitk.
Of course, like all things linux, gitk can be controlled to behave like you want it. Just follow the gitk command with the names of the branches you want shown, or simply add “–all” to see all the branches in your repository.
How can you see the ‘branch structure’ of a repository?
In svn, there is a well defined directed graph between branches. When a branch is created of its parent, this parent-child relation is created and maintained, and the tools readily show you this branch graph.
I could have guessed this, but sources on Stack Overflow confirmed that there is no direct equivalent in git. Instead of branches having parent-child relations, there is a parent-child relation between objects, and so individual files and directories can have multiple parents in the version graph, where other files on the same branch might have completely linear histories. The model is more complex, but more powerful, and it seems to be the core reasons why merges in git are supposed to be easier than in svn.
What does ‘fast forward’ really mean?
Using git, I often saw messages with the words “fast forward”, but never really understood what it meant. This bit is explained rather nicely in the Git book – a fast forward happens when you merged branch b1 to b2, resolved any possible conflicts, and then merge the result back to b1. b2 already contains a version that is a descendant of the “heads” of both b1 and b2, meaning all the “merge work” was already done in it. So, when this structure is merged back to b1, what actually happens is all the revisions and merge work that happened on b2 is copied to b1. After this copying, the b1 branch (a pointer into the revision DAG) is “fast forwarded” to a descendant node that is the head of b2. In effect, the merge’s result becomes the head of b1 in a clean and simple manner.
This is radically different than svn – I still have horror flashbacks sometimes about trying to merge a branch back to trunk. I always first merged trunk to the branch, had to work my ass off to resolve all the conflicts and make the build green, and then sometimes had to do double the work when merging back to trunk. With git, you’re assured that the conflict resolution work you do on your branch is presereved and used to make merging back to master (the git equivalent of trunk) is as easy as cake.
git pull, fetch, and what’s in between
It is said that “git pull” is equivalent to “git fetch”, followed by “git merge”.
The ability to immediately fetch all the content of any remote repository without forcing you to merge it right now is great – you’re free to do the actual merge work and conflict resolution separately, and you only need connectivity to the remote repository for the fetch phase. When I tried this using two local folders, git merge complained, and I failed to understand what arguments I should pass to “git merge” in this case?
This turned out to be a simple technical issue. To merge the changes manually after fetching from an arbitrary remote, simply run git merge FETCH_HEAD (sometimes you just have to know the magic words). Normally, you would fetch from origin (usually the branch you cloned off), or another remotely tracked named branch, so you would just specify its name as the parameter to “git merge”.
How does pushing actually work?
Let’s say I setup a local “common” repo (it has to be bare for reasons explained in the Git book)
mkdir bare cd bare git init --bare cd .. git clone bare alice cd alice touch a && git add a && git commit -m "Added a" git push # This fails
It turns out that the problem was I tried to push to an empty repository. If I do “git push origin master”, then subsequent “git push” with no arguments succeed.
And now, for some cool stuff:
git bisect ftw
Suppose you just found a critical bug, and have no idea when it was introduced. You write a simple (manual/automated) test for it, and reproduce it, but you’re not sure what it causing it. git bisect to the rescue!
git bisect allows you to do a binary search on your repository to find the exact commit that introduced the bug. While this is possible with other VCSs, it is so natural in git that it’s beautiful. You simply do “git bisect start”, followed by “git bisect good” to indicate the current version works, and “git bisect bad” to indicate it doesn’t, and git will direct you towards the correct half of the version graph until you find the exact version when things turned bad.
Configure your defaults for fun and profit
Here are some tweaks I found in the book that you might want to do (if you have any other tweaks you’d like to recommend, please comment!)
oneline log messages
If, like me, you find the “one liner” log messages easier to read, you can make it the default with
git config –global format.pretty oneline
Life is colorful
Make git status and other messages much easier to read with
git config –global color.ui true
Reference: My walk through the Git book from our JCG partner Ron Gross at the A Quantum Immortal blog.