Frequent git use cases

From PKP Wiki
Revision as of 21:27, 20 April 2010 by Jerico.dev (Talk | contribs)

Jump to: navigation, search

Understand where you are

The state of your repository

To see all branches that are currently in your repository:

git branch -a

To see which branch you are in and what files have been staged to be committed and/or modified

git status

To get a graphical overview of your branches

gitk --all

In gitk you can add additional customized gitk views that show you a selection of commit nodes.

Mac OS X users may try using "GitX". There are a variety of other software solutions on other operating systems that will help you visualize your Git branches.

Recent changes to your repository

To see the changes that will be introduced by the next pull from the master repository

git fetch official
git log ..official/master
git diff ...official/master

To see the changes introduced by a recent pull from the master repository enter:

git log head^..official/master
git diff head^...official/master

The branch 'official/master' can be replaced by any branch you like of course.

NB: Note that '..' and '...' mean different things for 'git log' and 'git diff'. See 'man git-rev-parse' and 'man git-diff' for a full explanation.

Find commits for a given Bugzilla entry

The following command finds all commits in the official master branch that concern Bugzilla entry number 4374:

git log -E --grep='^(\*|#)4374(\*|#)' official/master

Working on BugZilla Bugs

Configure your development branch

Open your .git/config file with your favourite editor. Then look for the entry that configures your development branch. Change it to look like this:

[branch "dev"]
       remote = official
       merge = refs/heads/master
       rebase = true

This does two things:

  • It configures your development branch to track official/master.
  • It switches the pull command from "merge" to "rebase" mode so that we'll get a clean sequential commit history in the master branch.

If you want to let git add the rebase parameter automatically for all your remote tracking branches then execute:

git config --global branch.autosetuprebase remote

You should also execute

git config --global push.default tracking

This will make sure that you automatically push to the branch that is configured as a tracking branch (in our case official/master). It will save you some typing when you push.

(For those who followed the previous "entry level" approach: this is exactly what we had documented as "advanced approach" in our Wiki so far.)

Note: the "push" configuration is missing in git version 1.5.5.6 (and maybe others). In this case when pushing you can use:

git push official dev:master

Start working on a BugZilla bug

If you have not yet checked out your development branch then do:

git checkout dev

both, in the application master project and in the lib/pkp sub-module.

I assume that your development branch is clean (i.e. it has no local changes or commits). In this case issuing

git pull

Should fast forward you to the latest state of the official master branch without any warnings.

Start making your changes

...hack, hack, hack...

Work on a second BugZilla bug in parallel

The following approach is not based on topic branches. The reason is that I assume that, although belonging to different BugZilla entries, the changes are interrelated and depend on each other.

Commit what you've done on the other bug so far

git commit -m '*1234* my first bug - wip'

Start working on the other bug

...hack, hack, hack...

Once you've finished working on the second bug or want to return to work on the first bug you say:

git commit -m '*1235* my second bug'

You can go on working like this on as many bugs as you like. Just make sure that you locally commit your changes (no push) and that you always only have changes belonging to one bug report in a single commit. You can have many separate stacked commits per task, though.

Distributing non-commited changes onto distinct commits

If you have worked on several bugs at the same time without committing changes to distinct bugs or you want to distribute a big change into several commits, then interactive add comes in handy:

git add -i

This allows you to mark changes hunk by hunk or file by file for commit. You select the changes you wish to group together, add them to the staging area (index) and commit them. You repeat this until you have committed all changes.

Please have a look at the section about interactive add in 'man git-add' for detailed explanations of the interface.

Cleaning up your local commit history

Once you've got one or more BugZilla tasks completed you'll have to re-combine all commits that belong to that change and clean up your commit comments. You do that by issuing

git rebase --interactive

You'll get a list of all of your local commits since you last pulled from official/master:

pick 4639349fa.. *1234* my first bug - wip
pick 4639349fa.. *1235* my second bug
pick 4639349fa.. *1234* my first bug - finished

Reorder the changes so that changes belonging to the same BugZilla entry are grouped together. Then only leave one 'pick' per Bugzilla bug report and add the letter 's' in front of the following commits which means that they'll be joint together into one single commit:

pick 4639349fa.. *1234* my first bug - wip
s 4639349fa.. *1234* my first bug - finished
pick 4639349fa.. *1235* my second bug

Be careful that you don't change the order within a task. The following is wrong and will most probably give you conflicts on rebase:

pick 4639349fa.. *1234* my first bug - finished
s 49038ec696.. *1234* my first bug - wip
pick 463df9349.. *1235* my second bug

If you continue by saving your changes, git will re-execute your changes in the correct order and join commits that you marked with 's'. This can give merge conflicts if you touched the same files for different BugZilla entries. Say both *1234* and *1235* changed the same lines in index.php or something like that. This should occur very rarely and is very easy to avoid if you're aware of what you're doing. It is much more frequent to get merge conflicts on rebase when you get the order of changes wrong.

If something unexpected happens simply issue

git rebase --abort

and you'll be back where you were before. Make sure that your commits don't change the same files and think about the correct order of your commits. Then try again.

If everything went well then git will ask you to give a single commit message for the commits that you joined together. Then you'll have a clean local commit history like this:

git log -n2
4639349fa.. *1234* my first bug
463df9349.. *1235* my second bug

Update your working copy with changes from master

Even if you don't intend to commit you should frequently pull changes from the master repository. This makes sure that you catch conflicts with what others on the team are doing early and don't have to resolve large conflicts once you want to commit. You can go so far as to pull every hour or so.

You'll only be able to pull in changes from the outside if you own working branch is clean. There are two ways to do this. You either can make a commit of your work in progress as described above or you can temporarily stash your local uncommitted changes away:

git stash

Once you've got a clean working copy (git status does not give you anything) then you issue:

git pull

As you have configured your branch to rebase on pull this will do the following:

  • It will roll back all your local commits and store them separately
  • It will fast forward your local branch to the current master branch
  • It will re-apply your local commits as patches on top of the latest master branch

The last step will of course lead to merge conflicts when conflicting changes have been made to the upstream master branch. You resolve these conflicts in the exact same way as you'd clean up a patch that doesn't apply any more due to upstream changes.

Git supports you by letting you configure a merge tool. You can say:

git mergetool

to do graphical merging.

If you pull very frequently then you should only get minor merge conflicts that are easy to resolve.

If you've a really messy state and don't know any more how to resolve it then simply say

git rebase --abort

That will take you back to the state before your pull action. You can then see what happened. You might for example branch off the official master branch and cherry pick your changes in the dev branch one by one to resolve conflicts in a more controlled environment. Please don't hesitate to call for help if you get merge conflicts.

Make quick fixes to the master copy without interrupting your workflow

git stash
git fetch official master
git checkout master
... make a change ...
git add -A
git commit -m '...'
git checkout dev
git pull
git stash pop

Amending the last commit with additional changes

If you want to change the commit message

git commit --amend

If you want to maintain the commit message

git commit -C HEAD --amend

Closing a BugZilla bug

Publishing your changes

Please follow the instructions above to update and rebase your local dev branch to the latest version of official/master. I assume that your local commits are clean, nicely stashed and all rebased onto official/master.

Now you can simply execute:

git push

This is the point of no return. If you discover that you made an error in one of the changes you just published then you'll have to correct them with an additional commit that you may not amend or squash to commits that have already been published. What is published cannot be changed. You're guaranteed to get an error on push if you amend a commit that has been published before. Recovering from that is possible but needs help that you should ask from somebody who's more experienced with git when you cannot do it yourself.

You NEVER ever may issue git push -f to the official repository to resolve an error on push. This will bring everybody else into trouble.

(You can use push -f for your personal development repository, of course if you're sure that you know what you're doing...)

If you do not want to publish all your local commits because some of them are still work in progress you can do something like:

git push official dev^^:master

This will leave out the last two commits. Please check out 'man git-rev-parse' for different ways to specify a commit point back in your history.

Attach a patch to the BugZilla bug

If you've committed your changes and pushed to the official repository, then there isn't a need to publish a patch on the bugzilla entry. You can just post a link to the commit on github such as http://github.com/pkp/pkp-lib/commit/8ea781a7a364ea81f73ba2fbc9d5228e562c410e. For good record keeping, ONLY links to the pkp github account should replace patches, as only commits that make it into the official repository have a guarantee of being persistent. For other cases, you may still want to know how to create a patch.

Use 'git diff' or 'git format-patch' within the prepared master branch to create patches then submit the patches to BugZilla.

Create a patch based on the start of a single commit comment:

git diff-tree -u --pretty :/#4867 >patch-lib-pkp.diff

Create a patch based on the last commit:

git diff-tree -u --pretty HEAD >patch-lib-pkp.diff

Create a patch based on what's staged:

git diff -u --pretty --cached >patch-lib-pkp.diff

etc.

How to deal with the /lib/pkp submodule entry when committing changes

How submodules work

When you are about to commit a change to one of our base projects (ocs, omp, ojs, harvester) then you'll realize that very frequently the entry for /lib/pkp is marked as "dirty" (changed). This is the entry that represents the submodule state within the base project. This submodule entry can be compared to a symbolic link in the unix file system. Rather than saving a whole snapshot of the submodule, the main repository only points to the commit id (sha1) that was checked out in the submodule at the time of the last commit of the base repository.

As soon as you commit something in lib/pkp or you check out a branch in lib/pkp that differs from the one checked out in the base repository, the commit id of the submodule HEAD will change in lib/pkp/.git/HEAD. When the submodule HEAD is not the same as the one currently recorded in the submodule "symlink", then git will consider it as changed or "dirty" and will show it as such when you type 'git status' in the base repository.

The big advantage of this system is that you can record for every commit you make in the base system the exact state of the submodule that goes with it. If you do this correctly you'll never have the problem again that the pkp library is out of sync with your base repository. You can reconstruct for every single commit in the base repository the exact state the code of the library was in at the time of the commit.

Whenever you commit changes to a base repository that depend on a certain version of the PKP library you should make sure that you commit the correct lib/pkp entry.

Understand what's going on

To check the commit id that the currently saved submodule "symbolic link" in the base repository points to, execute

git submodule status

in the base repository.

To check the current commit id of your submodule, enter

cd lib/pkp
git log

and look at the commit id of the latest recorded commit or look at the .git/HEAD file which contains the reference with the latest commit id. You'll find the references in the .git/refs folder. They contain the corresponding commit id.

Gotchas and best practices for submodule "symlink" maintenance

While highly flexible and precise, the submodule system has several implications that might appear confusing in the beginning or lead to errors:

  1. If you commit changes in your base repository (including the /lib/pkp entry) and then commit changes in the submodule afterwards you'll see that the /lib/pkp entry in the base repository will have "dirtied" again. This is because the currently checked out commit id of the submodule changed when you committed the submodule state. An undesired side effect of this is that you recorded an outdated state of the submodule when you checked in your base repository code. To avoid this always commit submodule changes first then commit changes in the base repository.
  2. When you commit /lib/pkp in the base repository while you have a private branch checked out in /lib/pkp and then push the change in the base directory up to the official repository, then you have recorded a "symlink" to a submodule state that does not exist in the public submodule repository. This will lead to errors when other developers try to reproduce your submodule state as they'll not have the necessary files available in their repositories. It is like trying to unpack a tar file that contains a symlink that points to a non-existent place on your file system. If you try to access this symlink it will cause an error. Unfortunately git does not check whether your symlinked submodule state is publicly available when you push your changes. So you'll have to check this yourself. To avoid this problem always make sure that you have the correct version and branch of the submodule checked out when you commit lib/pkp in the base repository. Most importantly: If you commit to the latest official master branch in your base repository you must have checked out an up-to-date official master branch in your submodule as well.
  3. When you pull from another repository or branch then the lib/pkp entry will often cause merge conflicts if you already committed a differing lib/pkp in your local branch. You usually can ignore the merge conflict. Just make sure that you do your next add + commit of lib/pkp with the correct version of lib/pkp checked out in the sub-module.
  4. If you use a forward slash (/) after the submodule name when adding changes to a submodule and updating the container repository to use the latest submodule changes that you have pulled from the remote source:
    git add /lib/pkp/
    git will think you want to delete the submodule and want to add all the files in the submodule directory. Please don't use a forward slash after the submodule name when adding it to the index. You must type it like this:
    git add lib/pkp

A few more hints on how to work with submodules

As with every other changed repository entry you can decide whether you want to commit the changed submodule smylink (or not) by adding it to the staging area (=index) with 'git add' (or not). Use 'git add /your/changed/paths' or 'git add --interactive' rather than 'git add -A' if you want to avoid adding /lib/pkp. Use 'git reset HEAD -- /lib/pkp' to remove /lib/pkp from the index if you inadvertently added it.

The lib/pkp marker in 'git status' will disappear when you check out the currently "symlinked" snapshot in the lib/pkp directory. To do so enter:

git submodule update

in the base directory.

NB: It's not safe to run git submodule update if you've made and committed changes within a submodule without checking out a branch first. They will be silently overwritten.

If this doesn't work then make sure that you don't have any commits or uncommitted changes in your submodule that have not been pushed to the official repository (otherwise you'll loose these changes). Then enter:

git submodule status

in your base repository. Copy the commit id from there, then enter:

cd lib/pkp
reset --hard ...the sha1 you just copied...

If you want to record a specific commit id as a submodule "symlink" then I propose you do:

cd lib/pkp
git checkout -b temp some-commit-id  # this can also be something like head^ if you just want to leave out recent commits or so
cd ../..
git add lib/pkp
git commit -m '...'
cd lib/pkp
git checkout dev
git branch -d temp

Use aliases to become more efficient

Once you've found your routine you'll realize that you'll type the same commands over and over again. Git has a mechanism that helps you to avoid unnecessary typing. Please have a look at

man git-config

and search for the 'alias' section there. You can configure aliases in ~/.gitconfig like this:

[alias]
        st = status
        ci = commit -m
        co = checkout
        ai = add -i

etc.

Revert a change

One confusing thing when coming from the CVS world is that you can undo changes in git and change the commit history! In git, there are basically 2 commands you'll want to use when you undo things: git reset and git checkout. As you know, git checkout can be used to switch between branches. However, it can also be used to revert the changes to a particular file. Say you've modified config.inc.php, but now you want to revert it back to the latest version in the master branch:

git checkout master config.inc.php

The other scenario is undoing all of your changes. For this, the git reset command will wipe out everything. Warning: there is no warning before your changes are wiped out:

git reset --hard 

Used alone (without --hard). git reset will unstage all your changes, but leave the changed files in place.

If you have inadvertently lost changes by hard resetting your repository then you might be lucky and get them back using:

git reflog 

This is a log of the last few actions. You might find old commits that have been lost by doing a complex merge or a reset.

Co-operate with other developers

Add other developer's personal repositories as remote repositories

To find other developers' personal repositories you can visit "PKP-lib watchers". Similar pages exist for all PKP project repositories. These users can be added as remotes to your repository where you can pull their latest changes.

Once you found the right repository you do something like:

cd lib/pkp
git remote add juan \
               git://github.com/jalperin/pkp-lib.git
cd ../..
git remote add juan \
               git://github.com/jalperin/omp.git

This will add a link to your fellow developer's personal repository.

Pull changes from other developer's personal repositories

You can integrate changes from other developers' repositories into your own code line.

To do so both developers should first rebase their repositories onto a common base branch as described above in this document. Usually this is official/master. Otherwise you'll get lots of unintelligible merge conflicts.

To integrate other developer's code into your own codeline you then execute:

git pull juan dev

Rather than pulling all changes from the remote repository you can also cherry pick a few commits:

git cherry-pick 3464af     # The cherry-pick command takes a commit pointer which in this case is a part of the commit's sha1

Pulling in or cherry picking changes from others may cause merge conflicts if you are not starting off the same commit point. See 'man git-merge' and 'man git-mergetool' for more info on how to solve merge conflicts.

Play with code other developers provide

If you want to play around with other developers' code without merging it into your branch (e.g. because you just want to have a look at it or test it without developing off it), you'll want to pull their branch into a local temporary branch. Use 'git stash' if you have uncommitted changes in your development branch you want to save. I.e. execute:

git stash save "Temporarily stashing wip code"
git fetch juan          # This is only necessary if you didn't pull or fetch the latest changes before
git checkout -b juan juan/dev
......Play with Juan's code.......
git checkout dev
git branch -d juan      # This deletes the temporary branch you created.
git stash pop

If you don't have any local changes you leave out the stash commands.

Create a patch stack to be sent to non-git developers

The following command assumes that you have tagged the commit preceding the first commit to be included in the patch stack with "patchbase":

git format-patch -N -o /path/to/outdir patchbase

Porting Changes

Use the following commands to port all changes from a maintenance branch to the development branch:

git co master
git pull
for cherry in `git cherry master official/ojs_2_3_2_rc | sed -n '/^+/ s/^+ //p'`; do git cherry-pick -x $cherry; done
gitk  # (optional) just to check if everything went alright
git push