Version Control
Whenever you are working on something that changes over time, it’s useful to be able to track those changes. This can be for a number of reasons: it gives you a record of what changed, how to undo it, who changed it, and possibly even why. Version control systems (VCS) give you that ability. They let you commit changes to a set of files, along with a message describing the change, as well as look at and undo changes you’ve made in the past.
Most VCS support sharing the commit history between multiple users. This allows for convenient collaboration: you can see the changes I’ve made, and I can see the changes you’ve made. And since the VCS tracks changes, it can often (though not always) figure out how to combine our changes as long as they touch relatively disjoint things.
There a lot of VCSes out there that differ a lot in what they support, how they function, and how you interact with them. Here, we’ll focus on git, one of the more commonly used ones, but I recommend you also take a look at Mercurial.
With that all said – to the cliffnotes!
Is git dark magic?
not quite.. you need to understand the data model. we’re going to skip over some of the details, but roughly speaking, the core “thing” in git is a commit.
- every commit has a unique name, “revision hash”
a long hash like
998622294a6c520db718867354bf98348ae3c7e2
often shortened to a short (unique-ish) prefix:9986222
- commit has author + commit message
- also has the hash of any ancestor commits usually just the hash of the previous commit
- commit also represents a diff, a representation of how you get from
the commit’s ancestors to the commit (e.g., remove this line in this
file, add these lines to this file, rename that file, etc.)
- in reality, git stores the full before and after state
- probably don’t want to store big files that change!
initially, the repository (roughly: the folder that git manages) has no content, and no commits. let’s set that up:
$ git init hackers
$ cd hackers
$ git status
the output here actually gives us a good starting point. let’s dig in and make sure we understand it all.
first, “On branch master”.
- don’t want to use hashes all the time.
- branches are names that point to hashes.
- master is traditionally the name for the “latest” commit. every time a new commit is made, the master name will be made to point to the new commit’s hash.
- special name
HEAD
refers to “current” name - you can also make your own names with
git branch
(orgit tag
) we’ll get back to that
let’s skip over “No commits yet” because that’s all there is to it.
then, “nothing to commit”.
- every commit contains a diff with all the changes you made. but how is that diff constructed in the first place?
- could just always commit all changes you’ve made since the last
commit
- sometimes you want to only commit some of them (e.g., not
TODO
s) - sometimes you want to break up a change into multiple commits to give a separate commit message for each one
- sometimes you want to only commit some of them (e.g., not
- git lets you stage changes to construct a commit
- add changes to a file or files to the staged changes with
git add
- add only some changes in a file with
git add -p
- without argument
git add
operates on “all known files”
- add only some changes in a file with
- remove a file and stage its removal with
git rm
- empty the set of staged changes
git reset
- note that this does not change any of your files! it only means that no changes will be included in a commit
- to remove only some staged changes:
git reset FILE
orgit reset -p
- check staged changes with
git diff --staged
- see remaining changes with
git diff
- when you’re happy with the stage, make a commit with
git commit
- if you just want to commit all changes:
git commit -a
git help add
has a bunch more helpful info
- if you just want to commit all changes:
- add changes to a file or files to the staged changes with
while you’re playing with the above, try to run git status
to see what
git thinks you’re doing – it’s surprisingly helpful!
A commit you say…
okay, we have a commit, now what?
- we can look at recent changes:
git log
(orgit log --oneline
) - we can look at the full changes:
git log -p
- we can show a particular commit:
git show master
- or with
-p
for full diff/patch
- or with
- we can go back to the state at a commit using
git checkout NAME
- if
NAME
is a commit hash, git says we’re “detached”. this just means there’s noNAME
that refers to this commit, so if we make commits, no-one will know about them.
- if
- we can revert a change with
git revert NAME
- applies the diff in the commit at
NAME
in reverse.
- applies the diff in the commit at
- we can compare an older version to this one using
git diff NAME..
a..b
is a commit range. if either is left out, it meansHEAD
.
- we can show all the commits between using
git log NAME..
-p
works here too
- we can change
master
to point to a particular commit (effectively undoing everything since) withgit reset NAME
:- huh, why? wasn’t
reset
to change staged changes? reset has a “second” form (seegit help reset
) which setsHEAD
to the commit pointed to by the given name. - notice that this didn’t change any files –
git diff
now effectively showsgit diff NAME..
.
- huh, why? wasn’t
What’s in a name?
clearly, names are important in git. and they’re the key to
understanding a lot of what goes on in git. so far, we’ve talked about
commit hashes, master, and HEAD
. but there’s more!
- you can make your own branches (like master) with
git branch b
- creates a new name,
b
, which points to the commit atHEAD
- you’re still “on” master though, so if you make a new commit,
master will point to that new commit,
b
will not. - switch to a branch with
git checkout b
- any commits you make will now update the
b
name - switch back to master with
git checkout master
- all your changes in
b
are hidden away
- all your changes in
- a very handy way to be able to easily test out changes
- any commits you make will now update the
- creates a new name,
- tags are other names that never change, and that have their own message. often used to mark releases + changelogs.
NAME^
means “the commit beforeNAME
- can apply recursively:
NAME^^^
- you most likely mean
~
when you use~
~
is “temporal”, whereas^
goes by ancestors~~
is the same as^^
- with
~
you can also writeX~3
for “3 commits older thanX
- you don’t want
^3
git diff HEAD^
- can apply recursively:
-
means “the previous name”- most commands operate on
HEAD
unless you give another argument
Clean up your mess
your commit history will very often end up as:
add feature x
– maybe even with a commit message aboutx
!forgot to add file
fix bug
typo
typo2
actually fix
actually actually fix
tests pass
fix example code
typo
x
x
x
x
that’s fine as far as git is concerned, but is not very helpful to your future self, or to other people who are curious about what has changed. git lets you clean up these things:
git commit --amend
: fold staged changes into previous commit- note that this changes the previous commit, giving it a new hash!
git rebase -i HEAD~13
is magical. for each commit from past 13, choose what to do:- default is
pick
; do nothing r
: change commit messagee
: change commit (add or remove files)s
: combine commit with previous and edit commit messagef
: “fixup” – combine commit with previous; discard commit msg- at the end,
HEAD
is made to point to what is now the last commit - often referred to as squashing commits
- what it really does: rewind
HEAD
to rebase start point, then re-apply the commits in order as directed.
- default is
git reset --hard NAME
: reset the state of all files to that ofNAME
(orHEAD
if no name is given). handy for undoing changes.
Playing with others
a common use-case for version control is to allow multiple people to make changes to a set of files without stepping on each other’s toes. or rather, to make sure that if they step on each other’s toes, they won’t just silently overwrite each other’s changes.
git is a distributed VCS: everyone has a local copy of the entire repository (well, of everything others have chosen to publish). some VCSes are centralized (e.g., subversion): a server has all the commits, clients only have the files they have “checked out”. basically, they only have the current files, and need to ask the server if they want anything else.
every copy of a git repository can be listed as a “remote”. you can copy
an existing git repository using git clone ADDRESS
(instead of git
init
). this creates a remote called origin that points to ADDRESS
.
you can fetch names and the commits they point to from a remote with
git fetch REMOTE
. all names at a remote are available to you as
REMOTE/NAME
, and you can use them just like local names.
if you have write access to a remote, you can change names at the remote
to point to commits you’ve made using git push
. for example, let’s
make the master name (branch) at the remote origin
point to the commit
that our master branch currently points to:
git push origin master:master
- for convenience, you can set
origin/master
as the default target for when yougit push
from the current branch with-u
- consider: what does this do?
git push origin master:HEAD^
often you’ll use GitHub, GitLab, BitBucket, or something else as your
remote. there’s nothing “special” about that as far as git is concerned.
it’s all just names and commits. if someone makes a change to master and
updates github/master
to point to their commit (we’ll get back to
that in a second), then when you git fetch github
, you’ll be able to
see their changes with git log github/master
.
Working with others
so far, branches seem pretty useless: you can create them, do work on them, but then what? eventually, you’ll just make master point to them anyway, right?
- what if you had to fix something while working on a big feature?
- what if someone else made a change to master in the meantime?
inevitably, you will have to merge changes in one branch with changes
in another, whether those changes are made by you or someone else. git
lets you do this with, unsurprisingly, git merge NAME
. merge
will:
- look for the latest point where
HEAD
andNAME
shared a commit ancestor (i.e., where they diverged) - (try to) apply all those changes to the current
HEAD
- produce a commit that contains all those changes, and lists both
HEAD
andNAME
as its ancestors - set
HEAD
to that commit’s hash
once your big feature has been finished, you can merge its branch into master, and git will ensure that you don’t lose any changes from either branch!
if you’ve used git in the past, you may recognize merge
by a different
name: pull
. when you do git pull REMOTE BRANCH
, that is:
git fetch REMOTE
git merge REMOTE/BRANCH
- where, like
push
,REMOTE
andBRANCH
are often omitted and use the “tracking” remote branch (remember-u
?)
this usually works great. as long as the changes to the branches being merged are disjoint. if they are not, you get a merge conflict. sounds scary…
- a merge conflict is just git telling you that it doesn’t know what the final diff should look like
- git pauses and asks you to finish staging the “merge commit”
- open the conflicted file in your editor and look for lots of angle
brackets (
<<<<<<<
). the stuff above=======
is the change made in theHEAD
since the shared ancestor commit. the stuff below is the change made in theNAME
since the shared commit. git mergetool
is pretty handy – opens a diff editor- once you’ve resolved the conflict by figuring out what the file
should now look like, stage those changes with
git add
. - when all the conflicts are resolved, finish with
git commit
- you can give up with
git merge --abort
- you can give up with
you’ve just resolved your first git merge conflict! \o/
now you can publish your finished changes with git push
When worlds collide
when you push
, git checks that no-one else’s work is lost if you
update the remote name you’re pushing too. it does this by checking
that the current commit of the remote name is an ancestor of the commit
you are pushing. if it is, git can safely just update the name; this is
called fast-forwarding. if it is not, git will refuse to update the
remote name, and tell you there have been changes.
if your push is rejected, what do you do?
- merge remote changes with
git pull
(i.e.,fetch
+merge
) - force the push with
--force
: this will lose other people’s changes!- there’s also
--force-with-lease
, which will only force the change if the remote name hasn’t changed since the last time you fetched from that remote. much safer! - if you’ve rebased local commits that you’ve previously pushed (“history rewriting”; probably don’t do this), you’ll have to force push. think about why!
- there’s also
- try to re-apply your changes “on top of” the changes made remotely
- this is a
rebase
!- rewind all local commits since shared ancestor
- fast-forward
HEAD
to commit at remote name - apply local commits in-order
- may have conflicts you have to manually resolve
git rebase --continue
or--abort
- lots more here
git pull --rebase
will start this process for you- whether you should merge or rebase is a hot topic! some good reads:
- this is a
Further reading
- Learn git branching
- How to explain git in simple words
- Git from the bottom up
- Git for computer scientists
- Oh shit, git!
- The Pro Git book
Exercises
-
On a repo try modifying an existing file. What happens when you do
git stash
? What do you see when runninggit log --all --oneline
? Rungit stash pop
to undo what you did withgit stash
. In what scenario might this be useful? -
One common mistake when learning git is to commit large files that should not be managed by git or adding sensitive information. Try adding a file to a repository, making some commits and then deleting that file from history (you may want to look at this). Also if you do want git to manage large files for you, look into Git-LFS
- Git is really convenient for undoing changes but one has to be familiar even with the most unlikely changes
- If a file is mistakenly modified in some commit it can be reverted with
git revert
. However if a commit involves several changesrevert
might not be the best option. How can we usegit checkout
to recover a file version from a specific commit? - Create a branch, make a commit in said branch and then delete it. Can you still recover said commit? Try looking into
git reflog
. (Note: Recover dangling things quickly, git will periodically automatically clean up commits that nothing points to.) - If one is too trigger happy with
git reset --hard
instead ofgit reset
changes can be easily lost. However since the changes were staged, we can recover them. (look intogit fsck --lost-found
and.git/lost-found
)
- If a file is mistakenly modified in some commit it can be reverted with
-
In any git repo look under the folder
.git/hooks
you will find a bunch of scripts that end with.sample
. If you rename them without the.sample
they will run based on their name. For instancepre-commit
will execute before doing a commit. Experiment with them -
Like many command line tools
git
provides a configuration file (or dotfile) called~/.gitconfig
. Create and alias using~/.gitconfig
so that when you rungit graph
you get the output ofgit log --oneline --decorate --all --graph
(this is a good command to quickly visualize the commit graph) -
Git also lets you define global ignore patterns under
~/.gitignore_global
, this is useful to prevent common errors like adding RSA keys. Create a~/.gitignore_global
file and add the pattern*rsa
, then test that it works in a repo. -
Once you start to get more familiar with
git
, you will find yourself running into common tasks, such as editing your.gitignore
. git extras provides a bunch of little utilities that integrate withgit
. For examplegit ignore PATTERN
will add the specified pattern to the.gitignore
file in your repo andgit ignore-io LANGUAGE
will fetch the common ignore patterns for that language from gitignore.io. Installgit extras
and try using some tools likegit alias
orgit ignore
. -
Git GUI programs can be a great resource sometimes. Try running gitk in a git repo an explore the different parts of the interface. Then run
gitk --all
what are the differences? - Once you get used to command line applications GUI tools can feel cumbersome/bloated. A nice compromise between the two are ncurses based tools which can be navigated from the command line and still provide an interactive interface. Git has tig, try installing it and running it in a repo. You can find some usage examples here.
Licensed under CC BY-NC-SA.