Git Commit Illustrated: Simplicity Revealed by a Complex Exercise for Simplest Git Task

| Comments

In this post I will try to explain the underlying commands and to a level the internal working of the git system involved when making a ‘commit’.
This post is actually the by-product of my research for another post about git (.git/ actually). I was so much astonished by the amount of myths about complexity of git system removed from my mind, that I decided to make it the today’s post itself. To me the git system appears much simple now than yesterday.

Commit

Commit is the central piece of the git system. The git world is simply a collection of commit objects, each of which hold a tree, which hold references to other trees and blobs. The branches, the tags, the HEAD are just fancy aliases for commits (more on these in some other post (next may be)).
A commit is basically the snapshot of present working tree. I will spare the details for a future post (it’s worth it).

Let’s now discuss what this post is about. Revealing the secrets involved in every git users ritual of ‘commit’ing, and performing a commit manually.

This should reveal quite some details about the internal working of git (no you don’t need to run away, it’s not that deep).

You might already know the concepts, but knowing sex and having sex are kind of different things.

Ok! Let’s start the exercise for manual commit.

First we need to create an empty directory, call it ‘work’. And some simple file in it.
=> mkdir work
=> echo “Hello world!” > hello_world
Initiate a git repo in it and add the ‘hello_world’ file
=> git init
We will keep an eye on the changes that happen in the ‘.git’ repo throughout our exercise. For now check out what’s saved in the HEAD.
=> cat .git/HEAD
=> ref: refs/heads/master
As HEAD is basically just a reference to the commit which represents the current commit associated with the working tree. So one might guess .git/refs/heads/master would point to the tip of a branch. Let’s check it
=> ls .git/refs/heads/master
=>
=> git branch
=>
There is nothing in there. Since we have not committed any commits yet, there are no branches (since branches are merely named commits which happen to have multiple child commits).
If you are feeling adventurous, you can try ‘git log’.
=> git log
=> fatal: bad default revision ‘HEAD’
Let’s now add our file to the staging area
=> git add hello_world
Staging area is the middle system which keeps our content after ‘git add’ and before ‘git commit’
 A blob is the git’s representation of a file. It’s not actually a file, but just the content. A blob do not have any name or other metadata. It’s referenced in trees which contain metadata for blobs.
Tree is the object which stores references to other trees and blobs as leaf nodes.
This command converted content of ‘hello_world’ file into a blob and placed it in the index (aka staging area). A ‘blob’ is how our content is represented in git. You can check the .git dir, a new file ‘index’ is created. This file contain the references to all blobs and trees which get added to our staging area.
=> ls .git
=> branches  config  description  HEAD  hooks *index*  info  objects  refs
At this point we would generally just ‘commit’ the ‘index’, but not this time. The ‘git commit’ command hides many details and is a great convenience. You’ll value it after this exercise.

Git store all our content in form of blobs. The blobs do not have any kind of meta-data attached with them (like name, creation date or something). They are just nameless ‘blobs’. To identify a blob, they are saved in ‘trees’ as leaf nodes. Different trees can save reference to same blob with different meta-data attached. But a git repository will have exactly one copy of a blob. This is the reason of compact storage of git.

We can see the blob for our content in ‘hello world’ present in staging area (index)
=> git ls-files –stage
=> 100644 802992c4220de19a90767f3000a79a31b98d0df7 0       hello_world
If you entered the same content as me, both your and my hashes should be same. We can check what type of object the above hash belong to
=> git cat-file -t 802992c
=> blob
The above blob is not referenced by any tree. It’s only referenced from .git/index (which store references to objects (blobs and trees) which make up our staging area).
 A ‘commit’ object in git holds a single tree. A tree may have references to more trees or blobs. So to ‘commit’ the above created blob of our content, we need a tree.

So we now need to create a tree.
=> git write-tree
=> cdbf8e1f00e97366e01cbf2d73f3689a60107686
‘write-tree’ command make a tree with the contents of the ‘index’.
Now when we have the tree, let’s create a commit object with it.
=> echo “Initial commit” | git commit-tree cdbf8e1
=> a5a86835ba72e3ca7d5267c68c06c212392f9b7d
That’s the hash for our commit object. You directly use ‘git commit ’. Then the command will wait for the commit message to be entered and ‘ctrl-d’ for proceeding.

So our commit object is ready, and we are done. Right? Not actually. What we have created is called an ‘unreachable commit’.
An unreachable commit is a commit which do not have any parents and which is not referenced by any of the files in .git/refs/heads/. Such commits are automatically removed by the git system after some time.
 To make our commit reachable, we need to create a reference in a file in .git/refs/heads.
=> echo a5a86835ba72e3ca7d5267c68c06c212392f9b7d > .git/refs/heads/hello
Actually we should instead use the more safe way to update references in git system.
=> git update-refs refs/head/hello a5a86835ba72e3ca7d5267c68c06c212392f9b7d
Here if we had used name ‘master’ instead of ‘hello’, we could have used ‘git log’. But now it’s still giving ‘fatal:’ because the HEAD is referring to ‘refs/heads/master’ which does not exist.

Now try the command ‘git branch’.
=> git branch
=> hello
Here we see what a branch actually is to git. A reference to a ‘commit’ object.
Wait there is more to it. Now when we have created the branch ‘hello’, we need to make HEAD refer to it.
=> git symbolic-ref HEAD refs/heads/hello
This command associated our working tree with the newly created branch/commit of ours. This is what actually happens on a checkout normally.
Now we can use ‘git log’. If you are using ‘zsh’ with appropriate theme, at this point the git branch indicator will change form uncommitted ‘master’ to committed ‘hello’.

Now we are done. Officially. Git system is this frighteningly simple inside. Hope this was as helpful for you as it was for me. I really enjoyed writing this post.

Comments