Introduction to Git
As web development, and for that matter any software development becomes more and more complicated the need to track changes, compare new code against old and the ability to enable multiple people to work on a project requires a version control system to manage all the files that constitute a web site, software application or backed server software components. Many of these work hand in hand to deliver a web presence as the web continues to be more interactive or more and more types of devices.
While there are many version control systems out there, Git has been gaining popularity with developers from all facets of the development world. For the uninitiated, Git is a free and open source, distributed version control system that has been designed to be fast and efficient and handle anything from small to very large projects using central repositories and command line tools.
The main git repository is called “Github” and provides a web interface to display and manage the content under source control. However most of the operations relevant to Github are performed using the Git command line tool.
Getting started with Git usually involves cloning an existing repository but you can just as easily create a new repository on you local machine, then you push your repository to Github. At any time additional files can be added and existing files can be edited then added to the staging area awaiting a commit, and finally pushed back to Github. Changes can be reversed, branches from the main code base can be created and the work of multiple people can be merged and managed.
We wont cover the installation of the Git software in this introduction so this article assumes it's installed and working.
The “git init” command creates a new Git repository on your local machine in the current working directory. It can be used to convert an existing project to a Git repository or initialise a new empty repository. Most of the other Git commands will fail if not executed in a directory with a .git repository directory structure. So “Init” is usually the first command you will run for a new project.
Executing “git init <directory>” creates a new directory and in it a .git subdirectory. If you look inside the .git directory you will find it contains all of the necessary Metadata for the repository. Aside from the .git directory, an existing project remains unaltered (unlike SVN, Git doesn't require a .git folder in every subdirectory).
Cloning an Existing Repo
The git clone command copies an existing Git repository. It is similar to an SVN checkout, except the “working copy” is a full-fledged Git repository (but completely isolated environment from the original repository), with it's own history and it manages its own files.
As a convenience, cloning automatically creates a remote connection called “origin” pointing back to the original repository. This makes it very easy to interact with a central repository. In many “Push” commands you will use the term “origin”.
Its usage is pretty simple: git clone <repo> where the repository located at <repo> is specified like a URL and it copies everything onto the local machine. The original repository can be located on the local file system or on a remote machine accessible via HTTP or SSH.
Even when you clone a repo, you can still make changes and push it back to YOUR OWN repo on a central Repository server (like Github.com). Then, if your changes fix an important issue the original author can do a pull request from your repo back to theirs.
While most of git is automated, you will need to add some contact details, these get recorded against commits made to files. Git stores configuration options in three separate files, which basically maps to individual repositories, users, or the entire system:
<repo>/.git/config – Repository-specific settings.
~/.gitconfig – User-specific settings. This is where options set with the –global flag are stored.
$(prefix)/etc/gitconfig – System-wide settings.
At a minimum you should configure your name and email using:
git config --global user.name <name>
git config --global user.email <email>
If you wish to see all the configuration in an editor you can enter:
git config --global --edit
A sample file might have the following:
[user] name = John Smith email = [alias] st = status co = checkout br = branch up = rebase ci = commit [core] editor = vim
The basic process for saving changed files back to a repository is:
Add → Commit → Push
If you change an existing file (or create a new one) you use git add <file> to register it in the staging area (once it's added you don't need to re-add it).
You can also “add” the current directory contents using git add . Or a directory using git add <directoryname>.
At any time you can get a status of what's been staged, committed using git status. When you are ready to commit the file changes you use:
git commit -m "<message>"
If you don't include the -m and message then an editor will pop up and you can manually add it.
Saving your files
When you are ready to save your files back to the central repository you use the “push” command. But before you use “push”, you must specify an existing repository. The “git remote add” command is required prior to a push to define the repo. With remote add you specify an alias and the master repository. The command format is:
git remote add origin https://github.com/yourname/REPO.git
You can change the name “origin” to anything and a complete list of configured repo's can be found using git remote -v
If you inadvertently miss-typed the URL you can do a “git remote remove <alias>” and clear the entry and the do a remote add alias URL again. You can also define multiple remotes and push to each as needed. Please note that the repo MUST already exist on Github. You cannot use the “Git” tool to create a repo on Github.
Once the repo has been configured we can then “push” our file using:
git push origin master
Where “master” is the repository and origin is the current project directory. You will be prompted for username and password.
If the “push” is successful you will see output similar to the following:
Counting objects: 37, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (36/36), done.
Writing objects: 100% (37/37), 10.77 KiB | 0 bytes/s, done.
Total 37 (delta 12), reused 0 (delta 0)
* [new branch] master -> master
At any time you can get the status of added files using git status, you can also review the git log of commits using “git log”. The commit history stays with your repo so if you upload your repo to a new server the history goes with it. The log output can show additional information using git log -p there is also a –stat option that abbreviates some of the output that the -p option would display. There is even the ability to use formatting commands to tailor the output even further for reporting systems.
Now we can look at some more advanced management tools.
Git has the ability to add “tags” to your repo files which typically represent release points in your development cycle. A tag might look like “v1.0”, representing version 1.0 of your software. You can view tag history by executing “git tag” and you can add tags as development progresses. Adding a tag is typically done using the -a option to the tag command and a message similar to the commit message. The format of the command would look like:
git tag v1.1 -m 'v1.1 minor bug fix release'
Git has the concept of Annotated tags and Lightweight tags. The lightweight tag has no message associated with it.
Branching & Forks
Branching is where you diverge from the main line of development (the master) and continue to do work on new features or specific issues without breaking the master files. Once your finished with the branch, you can merge the changes back into the master.
A “Fork” is where you divert completely from the master using it as an initial base for a new product development.
To create a new branch in git you use the “git branch” command, for example:
git branch testing
While you now have a testing branch you actually switch to it using the “checkout” command, as changes are made the master repo is now behind the testing branch. You can swap back to the master using:
git checkout master
After you swap back all the files prior to modification are there. If you now modify the master files you have two separate changes in play. The “git log” command will display all the committed changes.
At some point you need to merge all these changes, if you checkout back to the master you can now use the “git merge testing” command and all the changes will be merged in. You can also delete the branch using the -d option so “git branch -d testing”.
Sometimes a conflict will arise in an attempted merge, usually the same bit of code has been changed and the merge will conflict. For example:
$git merge testing
(content): Merge conflict in index.php
Automatic merge failed; fix conflicts and then commit the result.
At this point the merge cannot be done so a human needs to fix it and attempt the merge again. The “git status” command will shed some light on the issue.
A successful merge might return something like the following:
$git merge testing
Config.h| 5 +++++
1 files changed, 5 insertions(+)
You will notice the line “Fast-forward” in the output from the merge. When you try to merge one commit with an existing commit that can be reached by using the previous commit’s history, Git simply moves the pointer forward because there is no divergent work (in master) to merge together – this is called a “fast-forward.”
Merging a change from the master to the testing branch can be done especially if a fix was applied to the master branch after the testing branch was checked out, then later the newer changes to testing can be merged back into the master.
For most bug fixes, you branch out, add then fix the issue, test it and merge it back into the master, tagging it as a new version over time. Sometime however the testing process is not as good as it should be and doesn't find the newly introduced bugs that find there way back into code, often due to tight deadlines, inadequate testing processes, overly complicated coding practice and human factors.
When this happens most organisations weigh up the new issue against the old and often a change is reverted back.
For long running branches which is typical of a new product development cycle the master is often left as the last know good stable copy of the system and all tested development work is merged at a later date. Remember that “master” is the default name used when we did our first “git init” so a branch could be the new master leaving behind the older, initial code base.
Pull and Fetching
With our fixes merged back and pushed up to the Github Repository, others can do “git fetch” and “git pull” requests. The “git fetch” command will fetch down all the changes from the repository server that you don’t have yet. Fetch does not modify your working directory (repo) at all, allowing you to merge changes as needed.
The “git pull” command is basically a fetch and merge in one operation.
Your Own Git Server
You are not limited to saving data on Github, even though the private repository mode is very secure (and a paid service) corporate policy may dictate that all intellectual property be maintained in house. With this in mind, you can setup your own repository in your office and have it available to your local network only. While we wont delve into the installation process here, the basic components are a Linux server running at least an Apache web server, the git and git daemon packages and additionally the git smart http package.
There is also a number of web interfaces available for Git servers, the basic one is called GitWeb, then there is GitLab followed by a range of other third party packages.
It will take time and effort to setup and maintain your own server so there is a cost justification for using a hosted service. Due to the nature of the git command line tools, a fully automated backup process can be quickly scripted to ensure you have backups of your repo's locally if the worst should happen to you repository.
If your already a seasoned user of an exiting Version Control System that's getting long in the tooth, does not have any on-going support and being left behind as everyone moves to Git, then maybe you should consider migrating away from your old version control system?
Migrating away from an obsolete VCS into Git can be done with some degree of work but without too many issues. Git provides some limited support for migrating SVN, TFS and Mercurial systems, the rest will really just require the extraction of all your projects (think “git clone”) then some work to initialise, add and then commit your files to the new repo. The main issue in migrating will be user information and the previous commit history.
If your using a supported VCS then your best tactic is to clone the existing VCS projects and then import them into Git.
Where to from here?
This article has only just touched the surface, there are stacks of resources on Git and Github, setting up a Github account is easy and painless so this would be the very first thing to do then instal the tools and begin.