Setting Up Git to Use Your Diff Viewer or Editor of Choice

Git offers two ways of viewing differences between commits, or between commits and your working tree: diff and difftool.
The first of these, by default, dumps the results to the standard output.
This mode of presentation is great for quick summaries of small sets of changes, but is a little cumbersome if there are a large number of changes between the two commits being compared and/or you want to closely examine the changes, browsing back-and-forth between different files/lines, search for specific text, fold away or hide non-changed lines etc.
In these cases, you would like to use an external or third-party diff program/viewer to review the differences, and
Git offers two ways to allow for this.

The Less-Than-Ideal Approach

You can set a configuration variable to send the results to a third-party diff program by adding the following line to your “~/.gitconfig“:

    external = 

where “” is a script that invokes your diff program/viewer of choice.
The reason you need to use a wrapper script rather than the external program directly is because Git calls the program specified by passing it seven arguments in the following order: “path“, “old-file“, “old-hex“, “old-mode“, “new-file“, “new-hex“, “new-mode“.
So, depending on your diff program, you would need to filter out unneeded/unused arguments, or add switches/flags as appropriate.
For example, if you want to use Vim, the wrapper script may look something like:

#! /bin/sh
vimdiff "$2" "$5"

This approach is less than ideal, however, at least for me, because once you have configured your Git this way, then all invocations of “git diff” will launch the external applications.
And the fact is that there are many times (the majority, in my case) where this is simply overkill and the short summary in standard output does just fine.
You can, of course, still get the native Git standard output diff dump even with the external diff program configured as above by passing in an appropriate flag, but, trust me, this is a bit of a pain.

The Ideal Approach

Git, fortunately, offers a second approach: difftool.
This is essentially a wrapper around diff, taking the same arguments and options, but instead calls the external diff program/viewer by default.
This approach thus allows you to retain “git diff” for standard output reviews of changes, while unleashing the power of a more sophisticated diff program/viewer for more extended/flexible/complex reviews by invoking “git difftool“.

Git offers a range of difftools pre-configured “out-of-the-box” (kdiff3, kompare, tkdiff, meld, xxdiff, emerge, vimdiff, gvimdiff, ecmerge, diffuse, opendiff, p4merge and araxis), and also allows you to specify your own.
To use one of the pre-configured difftools (for example, “vimdiff”), you add the following lines to your “~/.gitconfig“:

    tool = vimdiff

Specifying your own difftool, on the other hand, takes a little bit more work, but is still pretty simple … IF you know how to do it.

I did not. And, unfortunately, the documentation did not help me very much.
It took quite a bit of Googling and experimentation before I figured it out.

You basically need to add the following lines to your “~/.gitconfig“:

    tool = default-difftool

[difftool "default-difftool"]
    cmd = $LOCAL $REMOTE

You can, of course, replace “default-difftool” with anything you care to name your preferred difftool, and “” with whatever you end up calling your wrapper script.

My difftool of choice is Vim, and, while “vimdiff” is a pre-configured option, I did not want to use it because I wanted the flexiblity to invoke MacVim when I am using my laptop but console Vim when I am working remotely on a Linux box (my Git configuration, and for that matter, my entire work environment from the shell to Vim to what-have-you, all 37MB, is shared across multiple machines … and all managed/synced using Git, of course).
So my wrapper script looks like the following:

#! /bin/bash

if [[ -f /Applications/ ]]
    # bypass mvim for speed
    VIMPATH='/Applications/ -g -dO -f'
elif [[ -f /usr/local/bin/mvim ]]
    # fall back to mvim
    VIMPATH='mvim -d -f'
    # fall back to original vim


And that’s all there is to it!

One Last Tweak

I find it very annoying to have to hit “” before moving on to the next file. The following lines added to your “~/.gitconfig” put a stop to that:

    prompt = false

Using DendroPy Interoperability Modules to Download, Align, and Estimate a Tree from GenBank Sequences

The following example shows how easy it can be to use the three interoperability modules provided by the DendroPy Phylogenetic Computing Library to download nucleotide sequences from GenBank, align them using MUSCLE, and estimate a maximum-likelihood tree using RAxML. The automatic label composition option of the DendroPy genbank module creates practical taxon labels out the original data. We also pass in additional arguments to RAxML to request that the tree search be carried out 250 times (['-N', '250']).

(The tree below is shown just as an example of the output; some errant taxa placement is evident).

                                           /--- AF098348_Trictena_atripalpis
|                                          \--- AF098349_Trictena_sp.
|                                          /--- AF098335_Aoraia_lenis
|                                      /---+
|      /-------------------------------+   \--- AF098336_Aoraia_rufivena
|      |                               |
|      |                               \------- AF098334_Aoraia_enysii
|      |
|  /---+  /------------------------------------ AF098347_Oxycanus_sphragidias
|  |   |  |
|  |   |  |                            /------- AF098346_Oxycanus_antipoda
|  |   |  |                         /--+
|  |   |  |                         |  |   /--- AF098345_Oxycanus_dirempta
|  |   \--+   /---------------------+  \---+
|  |      |   |                     |      \--- AF098344_Oxycanus_australis
|  |      |   |                     |
|  |      |   |                     \---------- AF098343_Jeana_robiginosa
|  |      |   |
|  |      |   |                     /---------- AF098337_Cladoxycanus_minos
|  |      \---+                     |
|  |          |   /-----------------+      /--- AF098338_Dioxycanus_fuscus
|  |          |   |                 |  /---+
|  |          |   |                 \--+   \--- AF098339_Dioxycanus_oreas
|  |          |   |                    |
|  |          |   |                    \------- AF098342_Heloxycanus_patricki
+  |          \---+
|  |              |  /------------------------- AF098340_Dumbletonius_characterifer
|  |              |  |
|  |              |  |   /--------------------- AF098351_Wiseana_cervinata
|--+              |  |   |
|  |              \--+   |             /------- AF098355_Wiseana_jocosa
|  |                 |   |          /--+
|  |                 |   |          |  |   /--- AF098356_Wiseana_mimica
|  |                 |   |          |  \---+
|  |                 \---+      /---+      \--- AF098354_Wiseana_fuliginea
|  |                     |      |   |
|  |                     |      |   |  /------- AF098359_Wiseana_umbraculata
|  |                     |      |   \--+
|  |                     |  /---+      |   /--- AF098357_Wiseana_signata
|  |                     |  |   |      \---+
|  |                     |  |   |          \--- AF098358_Wiseana_signata
|  |                     \--+   |
|  |                        |   \-------------- AF098350_Wiseana_cervinata
|  |                        |
|  |                        |          /------- AF098341_Dumbletonius_unimaculatus
|  |                        \----------+
|  |                                   |   /--- AF098353_Wiseana_copularis
|  |                                   \---+
|  |                                       \--- AF098352_Wiseana_copularis
|  |
|  \------------------------------------------- AF098333_Aenetus_virescens
\---------------------------------------------- AF098332_Fraus_simulans

Vim Regular Expression Special Characters: To Escape or Not To Escape

Vim‘s regular expression dialect is distinct from many of the other more popular ones out there today (and actually predates them).
One of the dialect differences that always leaves me fumbling has to do with which special characters need to be escaped.
Vim does have a special “very magic” mode (that is activated by “\v” in the regular expression) that makes thing very clean and simple in this regard: only letters, numbers and underscores are treated as literals without escaping.
But I have never got used to the habit of preceding my expressions with “\v“, though maybe I should.

In the mean time however, I thought I would put up a quick reference that lists all the special regular expression characters in the default “magic” mode Vim dialect, divided into those that do not need to be escaped vs. those that do.

Regular Expression Special Characters Not Requiring Escaping

The following special characters are interpreted as regular expression operators without escaping (escaping will result in them being intepreted as literals):

\ Escape next character (use “\\” for literal backslash).
^ Start-of-line (at start of pattern).
$ End-of-line.
. Matches any character.
* Matches 0 or more occurrences of the previous atom.
~ Matches last given substitute string.
[] Matches any of the characters given within the brackets.
[^] Matches any character not given within the brackets.
& In replacement pattern: insert the whole matched search pattern.

Regular Expression Special Characters Requiring Escaping

The following special characters are interpreted as regular expression operators only when escaped (otherwise they will be interpreted as literals):

\< Matches beginning of a word (left word break/boundary).
\> Matches end of a word (right word break/boundary).
\(...\) Grouping into an atom.
\| Separating alternatives.
\_. Matches any single character or end-of-line.
\+ 1 or more of the previous atom (greedy).
\= 0 or one of the previous atom (greedy).
\? 0 or one of the previous atom (greedy).
\{ Multi-item count match specification (greedy).

\{n,m} n to m occurrences of the preceding atom (as many as possible).
\{n} Exactly n occurrences of the preceding atom.
\{n,} At least n occurrences of the preceding atom (as many as possible).
\{,m} 0 to n occurrences of the preceding atom (as many as possible).
\{} 0 or more occurrences of the preceding atom (as many as possible).
\{- Multi-item count match specification (non-greedy).

\{-n,m} n to m occurrences of the preceding atom (as few as possible).
\{-n} Exactly n occurrences of the preceding atom.
\{-n,} At least n occurrences of the preceding atom (as few as possible).
\{-,m} 0 to n occurrences of the preceding atom (as few as possible).
\{-} 0 or more occurrences of the preceding atom (as few as possible).

Unconditionally Accepting All Merging-In Changes During a Git Merge

Merge conflicts suck. It is not uncommon, however, that you often just know that you really just want to accept all the changes from the branch that you are merging in. Which makes things a lot simpler conceptually. The Git documentation suggests that this can also be procedurally simple as well, as it mentions the “-s theirs” merge strategy which does just that, i.e., unconditionally accept everything from the branch that you are merging in:

$ git merge -s theirs 

Unfortunately, however, running the above command results in an error message along the line of “theirs” is not a known strategy. This is because, as discussed here (original reference here), this option has been removed from Git. I am sure the reasons for this are sound. But it is too bad that the documentation (as of my current installation, 1.7.7) has not been updated to reflect these changes. Really too bad. Really, really, really, really too bad. Because it takes a potentially dangerous, always stressful, and sometimes frustrating experience and makes it all the worse due to outright false documentation. Still, as an open source developer myself, I recognize that the time, energy, and effort demands of maintaining open source software have to be fitted in in between the demands of the other aspects of life, and how there consequently is sometimes considerable lag in updating what usually receives the lowest priority: the documentation.

In any case, luckily the page referred to previously provides some nice solutions for achieving the same effect as “-s theirs, which I am summarizing here for my own reference.

Method #1

git merge -s ours ref-to-be-merged
git diff --binary ref-to-be-merged | git apply -R --index
git commit -F .git/COMMIT_EDITMSG --amend

Method #2

git checkout MINE
git merge --no-commit -s ours HERS
git rm -rf .
git checkout HERS -- .
git checkout MINE -- debian # or whatever, as appropriate

Method #3

# get the contents of another branch
git read-tree -u --reset 
# selectivly merge subdirectories
# e.g superseed upstream source with that from another branch
git merge -s ours --no-commit other_upstream
git read-tree --reset -u other_upstream     # or use --prefix=foo/
git checkout HEAD -- debian/
git checkout HEAD -- .gitignore
git commit -m 'superseed upstream source' -a

Of these, I tried the first method, and it worked like a charm. YMMV.

The Power and Precision of Vim’s Text Objects: Efficent, Elegant, Awesome.

Vim’s text objects are not only a powerful, flexible and precise way to specify a region of text, but also intuitive and efficient.
They can be used with any command that can be combined with a motion (e.g., “d“, “y“, “v“, “r“), but in this post I will be using the “c” command (“change”) to illustrate them.

Imagine you were on a line looked like this, with the cursor on the letter “r” of the word “dry”:

print “Enter run mode (‘test’, ‘dry’, or ‘full’)”

Then, after typing “c” to start the “change” command, you can type “i” or “a” followed by another character to define a text object to which to apply the “change”.

Using “i” gives you the “inner” text object (i.e., a less-inclusive or non-greedy version that excludes the wrapping characters or delimiters) while “a” gives the full object (i.e., a more-inclusive or greedy version, that includes the wrapping characters or delimiters).

The third and final character of the command sequence gives the criteria by which the text object is defined.

This is typically the initial letter of the name of the object (e.g., “w” for “word”, “s” for “sentence”) or a character that delimits/wraps a region of text (e.g. a parenthesis for text in parentheses, a quote for quoted text, a curly brace for text in curly-braces).

For example:

  • Typing “ci'” will delete the word [dry], not including the quotes, and enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


    print “Enter run mode (‘test’, ‘ ‘, or ‘full’)”
  • Typing “ca'” will delete the word [‘dry’], including the quotes, and enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


    print “Enter run mode (‘test’,  , or ‘full’)”
  • Typing “ci(” or “ci)” will delete everything inside the parentheses, but not the parentheses themselves, and enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


    print “Enter run mode ( )”
  • Typing “ca(” or “ca)” will delete the everything inside the parentheses as well as the parentheses themselves, and then enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


    print “Enter run mode  
  • Typing “ci"” will delete everything inside the double-quotes, but not the double-quotes themselves, and enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


  • Typing “ca"” will delete everything inside the double-quotes, as well as the double-quotes themselves, and enter insert mode:


    print “Enter run mode (‘test’, ‘dry’, or ‘full’)”


<p>The idiom is naturally extended to many types of different delimiting characters in the most intuitive way possible (i.e., simply by specifying the delimiting character or first character of the name of the text object):

    <li>"<code>i{</code>" or "<code>i}</code>" selects curly-brace surrounded text, with the greedier versions being "<code>a{</code>" and "<code>a}</code>", respectively.
    <li>"<code>i[</code>" or "<code>i]</code>" selects square-bracket surrounded text, with the greedier versions being "<code>a[</code>" and "<code>a]</code>" respectively.
    <li>"<code>i'</code>" selects single-quote surrounded text, with the greedier version being "<code>a'</code>".
    <li>"<code>it</code>"  selects the text within the surrounding HTML/XML tag or container, with the greedier version being "<code>at</code>".
    <li>"<code>iw</code>"  selects the surrounding word, with the greedier version being "<code>aw</code>".
    <li>"<code>is</code>"  selects the surrounding sentence, with the greedier version being "<code>as</code>".
    <li>"<code>ip</code>"  selects the surrounding paragraph, with the greedier version being "<code>ap</code>".
    <li> etc.

Once you start using Vim's <a href="">text objects</a>, there really is no going back.
The power and precision they provide to specify regions of text to pass onto other Vim commands makes for an extremely efficient and elegant <i>modus operandi</i>, for which there simply is no equivalent in any other text editor.

For more information, please refer to the Vim documentation, either online, or by typing “:help text-objects” inside Vim.

Supplementary Command-History Logging in Bash: Tracking Working Directory, Dates, Times, etc.


Here is a way to create a secondary shell history log (i.e., one that supplements the primary “~/.bash_history“) that tracks a range of other information, such as the working directory, hostname, time and date etc. Using the “HISTTIMEFORMAT” variable, it is in fact possible to store the time and date with the primary history, but the storing of the other information is not as readibly do-able. Here, I present an approach based on this excellent post on StackOverflow.

The main differences between this approach and the original is:

  • I remove the option to log the extra information to the primary history file: I prefer to keep this history clean.
  • I add history number, host name, time/date stamp etc. to the supplementary history log by default.
  • I add field separators, making it easy to apply ‘awk‘ commands.

The (Supplementary) History Logger Function

First, add or source the following to your “~/.bashrc“:


Activating the Logger

Then you need to set this function to execute on every command by adding it to your “$PROMPT_COMMAND” variable, so you need the following entry in your “~/.bashrc“:

    export PROMPT_COMMAND='_loghistory'

There are a number of options that the logging function takes, including the adding terminal information, the adding of arbitrary text or the execution of a function or function(s) that generate appropriate text. See the function documentation for more info.

Add Some Useful Aliases

Add the following to your “~/.bashrc“:

# dump regular history log
alias h='history'
# dump enhanced history log
alias hh="cat $HOME/.bash_log"
# dump history of directories visited
alias histdirs="cat $HOME/.bash_log | awk -F ' ~~~ ' '{print $2}' | uniq"

Checkout the Results! The ‘histdirs‘ command is very useful to quickly list, select (via copy and pasting) and jumping back to a directory.

$ h
14095  [2011-11-23 15:36:20] ~~~ jd nuim
14096  [2011-11-23 15:36:21] ~~~ ll
14097  [2011-11-23 15:36:23] ~~~ git status
14098  [2011-11-23 15:36:33] ~~~ jd pytb
14099  [2011-11-23 15:36:36] ~~~ git status
14100  [2011-11-23 15:36:53] ~~~ git rm --cached config/*
14101  [2011-11-23 15:37:00] ~~~ git pull
14102  [2011-11-23 15:37:11] ~~~ e .gitignore
14103  [2011-11-23 15:37:28] ~~~ git status
14104  [2011-11-23 15:37:35] ~~~ e .gitignore
14105  [2011-11-23 15:37:44] ~~~ git status
14106  [2011-11-23 15:38:10] ~~~ git commit -a -m "stuff"
14107  [2011-11-23 15:38:12] ~~~ git pushall
14108  [2011-11-23 15:50:38] ~~~ ll build_c/
14109  [2011-11-23 15:53:16] ~~~ cd
14110  [2011-11-23 15:53:18] ~~~ ls -l
14111  [2011-11-23 16:00:12] ~~~ cd Documents/Projects/Phyloinformatics/DendroPy/dendropy
14112  [2011-11-23 16:00:15] ~~~ ls -l
14113  [2011-11-23 16:00:22] ~~~ cd dendropy/
14114  [2011-11-23 16:00:24] ~~~ vim *.py

$ hh
[2011-11-23 15:36:20] ~~~ /Users/jeet ~~~ jd nuim
[2011-11-23 15:36:21] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ ll
[2011-11-23 15:36:23] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ git status
[2011-11-23 15:36:33] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ jd pytb
[2011-11-23 15:36:36] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:36:53] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git rm --cached config/*
[2011-11-23 15:37:00] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git pull
[2011-11-23 15:37:11] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ e .gitignore
[2011-11-23 15:37:28] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:37:35] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ e .gitignore
[2011-11-23 15:37:44] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:38:10] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git commit -a -m "stuff"
[2011-11-23 15:38:12] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git pushall
[2011-11-23 15:50:38] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ ll build_c/
[2011-11-23 15:53:16] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ cd
[2011-11-23 15:53:18] ~~~ /Users/jeet ~~~ ls -l
[2011-11-23 16:00:12] ~~~ /Users/jeet ~~~ cd Documents/Projects/Phyloinformatics/DendroPy/dendropy
[2011-11-23 16:00:15] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy ~~~ ls -l
[2011-11-23 16:00:22] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy ~~~ cd dendropy/
[2011-11-23 16:00:24] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy/dendropy ~~~ vim *.py

$ histdirs

Further Reading

Stripping Paths from Files in TAR Archives

There is no way to get tar to ignore directory paths of files that it is archiving. So, for example, if you have a large number of files scattered about in subdirectories, there is no way to tell tar to archive all the files while ignoring their subdirectories, such that when unpacking the archive you extract all the files to the same location. You can, however, tell tar to strip a fixed number of elements from the full (relative) path to the file when extracting using the “--strip-components” option. For example:

tar --strip-components=2 -xvf archive.tar.gz

This will strip the first two elements of the paths of all the archived files. To get an idea of what this will look like before extracting, you can use the “-t” (“tabulate”, or list) in conjunction with the “--show-transformed option:

tar --strip-components=2 -t --show-transformed -f archive.tar.gz

The “--strip-components” approach only works if all the files that you are extracting are the same relative depth. Files that are “shallower” will not be extracted, while files that are deeper will still be extracted to sub-directories. The only clean solution to this that I can think of would be to extract all the files to a temporary location and then move all the files to single directory:

mkdir /tmp/work
cd /tmp/work
tar -xvzf /path/to/archive.tar.gz
mkdir collected
find . -type f -exec mv {} collected/ \;

Pure-Python Implementation of Fisher’s Exact Test for a 2×2 Contingency Table

While Python comes with many “batteries included”, many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap. For example, Fisher’s exact test is not part of the standard library.

While Python comes with many “batteries included”, many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap. For example, Fisher’s exact test is not part of the standard library.

But a nice, clean third-party solution can be found here. And, of course, SciPy has one as well. However, these are “mostly” Python implementations of this test, and I could not find a pure-Python one. I thought this would be useful and interesting in some contexts, and so I decided to code it up myself.

The final product was incorporated into the DendroPy phylogenetic computing library, but I have abstracted the code into a self-contained module and present it here (under the BSD license, as is DendroPy), in case others might find it useful.


Piping Output Over a Secure Shell (SSH) Connection

We all know about using scp to transfer files over a secure shell connection.
It works fine, but there are many cases where alternate modalities of usage are required, for example, when dealing when you want to transfer the output of one program directly to be stored on a remote machine.
Here are some ways of going about doing this.

Let "$PROG" be a program that writes data to the standard output stream.

  • Transfering without compression:
    $PROG | ssh destination.ip.address 'cat > ~/file.txt'
  • Using gzip for compression:
    $PROG | gzip -f | ssh destination.ip.address 'gunzip > ~/file.txt'
  • Better compression can usually be achieved by bzip2
    $PROG | bzip2  | ssh destination.ip.address 'bunzip2 > ~/Scratch/file.txt'

I find this useful enough to source the following function into all my shells:

## xof #######################################################################
# Pipe from standard input to remote file.
xof() {
    if [[ -z $1 || -z $2 ]]
        echo usage: ' | xof  '
        echo Pipe standard input to remote file.
        bzip2 | ssh $1 'bunzip2 > '$2

And when I want to get fancy, I pipe the output directly to my favorite text editor, BBEdit:

## xobb #######################################################################
# Pipe from standard input to bbedit.
xobb() {
    if [[ -z $1 ]]
        echo usage: ' | xobb '
        echo Pipe standard input to BBEdit.
        bzip2 | ssh $1 'bunzip2 | bbedit'

On a tangential note, if you have a large number of files that you want to transfer, the following is more efficient than separately tar-ing and scp-ing:

tar cf - *.t | ssh destination.ip.address "tar xf - -C /home/jeet/projects/bbi2

Parse Python Stack Trace and Open Selected Source References for Editing in OS X

UPDATE Nov 7, 2009: Better parsing of traceback.

UPDATE Nov 4, 2009: Now passing a "-b" flag to the script opens the parsed stack frame references in a BBEdit results browser, inspired by an AppleScript script by Marc Liyanage.

When things go wrong in a Python script, the interpreter dumps a stack trace, which looks something like this:

$ python
Calling f1 ...
Traceback (most recent call last):
  File "", line 6, in 
  File "/Users/jeet/Scratch/snippets/", line 15, in f3
  File "/Users/jeet/Scratch/snippets/", line 11, in f2
  File "/Users/jeet/Scratch/snippets/", line 7, in f1
    print "Hello, %s" % value
NameError: global name 'value' is not defined

I got tired of hunting through the stack trace for the line number, and then returning to my text editor to find the file and then navigating down to the line number (yes, I can be that lazy). So I wrote the following script.

The following Python script searches for text that resembles a Python stack trace in the current history of the OS X Terminal application and parses it into its components (source file, line number, statement). By default, it opens the source file associated with the most recent stack frame for editing at the appropriate line. By passing the flag "-a", it opens all the source files referenced throughout the trace at the appropriate lines. Alternatively, specific frames/files can be selected by specifying the the index at the command line. You can also simply list the parsed stack trace ("-l"), or enter an interactive mode ("-l"), where by typing an index opens the associated file for editing at the correct location. The option "-c" displays the stack frame list in color, while the "-r" option restricts the results to only the most recent traceback. BBEdit users will appreciate the "-b" option, which opens a BBEdit results browser on all the parsed stacked frames.

#! /usr/bin/env python

import subprocess
import StringIO
import re
import os
import sys

from optparse import OptionGroup
from optparse import OptionParser

class StackDesc(object):
    def __init__(self, items):
        self.filepath = os.path.abspath(items[0])
        self.line_num = int(items[1])
        if len(items) >= 4 and items[3] is not None:
            self.object_name = items[3]
            self.object_name = ""
        self.statement = None

def scrape_terminal_history():
    script = """\
/usr/bin/osascript -s o -e   '
tell application "Terminal"
    tell front window
        return (the history of selected tab)
    end tell
end tell
    p = subprocess.Popen(script, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = p.communicate()
    return stdout, stderr

def scrape_traceback_lines(most_recent_only=False):
    stdout, stderr = scrape_terminal_history()
    stdout = stdout.split("\n")[::-1]
    if most_recent_only:
        for idx, f in enumerate(stdout):
            if f.startswith("Traceback (most recent call last)"):
                tb = (stdout[0:idx])[::-1]
                return tb
        return stdout[::-1]
    return []

def parse_traceback(traceback_lines):
    pattern = re.compile("""^\s+File \"(.*)\", line (\d+)(, in (.*))*\s*""")
    if traceback_lines is None or len(traceback_lines) == 0:
        return [], ""
    stack_descs = []
    error_str = None
    prev_line_is_stack_frame = False
    for tb in traceback_lines:
        m = pattern.match(tb)
        if m is not None:
            items = m.groups()
            prev_line_is_stack_frame = True
        elif prev_line_is_stack_frame:
            stack_descs[-1].statement = tb.strip()
            prev_line_is_stack_frame = False
        elif error_str is None and len(stack_descs) > 0:
            tbs = tb.strip()
            if tbs and tbs != "^":
                error_str = tbs
            elif tbs == "^":
                error_str = tb
            prev_line_is_stack_frame = False
        elif (error_str is not None and error_str.strip() == "^"):
            error_str = error_str + "\n" + tb.strip()
            prev_line_is_stack_frame = False
    return stack_descs, error_str

def get_traceback_stack(most_recent_only=False):
    traceback_lines = scrape_traceback_lines(most_recent_only)
    stack_descs, error_str = parse_traceback(traceback_lines)
    return stack_descs, error_str

def compose_ansi_color(code):
    return chr(27) + '[' + code + 'm'

def display_stack_descs(stack_descs, error_str, color):
    if color:
        title_color = compose_ansi_color("1;91")
        error_color = compose_ansi_color("1;91")
        index_color = compose_ansi_color("0;37;40")
        filepath_color = compose_ansi_color("1;94")
        line_num_color = compose_ansi_color("1;94")
        obj_color = compose_ansi_color("1;34")
        statement_color = compose_ansi_color("0;31")
        clear_color = compose_ansi_color("0")
        title_color = ""
        error_color = ""
        index_color = ""
        filepath_color = ""
        line_num_color = ""
        obj_color = ""
        statement_color = ""
        clear_color = ""
    sys.stdout.write("%sPython stack trace (most recent call last):%s\n" % (title_color, clear_color))
    for sdi, sd in enumerate(stack_descs):
        if sd.object_name:
            obj_name = ": %s%s%s" % (obj_color, sd.object_name, clear_color)
            obj_name = ""
        sys.stdout.write("%s% 3d %s: %s%s%s [%s%d%s]%s\n" % (
                index_color, sdi, clear_color,
                filepath_color, sd.filepath, clear_color,
                line_num_color, sd.line_num, clear_color,
        if sd.statement:
            sys.stdout.write("      >>> %s%s%s\n" % (statement_color, sd.statement, clear_color))
    if error_str:
        sys.stdout.write("%s%s%s\n" % (error_color, error_str, clear_color))

def bbedit_browse(stack_descs, error_str):
    script_template = """
/usr/bin/osascript 2>/dev/null <= len(stack_descs):
                sys.stderr.write("Index %d is out of bounds [0, %d]" % (idx, len(stack_descs)-1))
                idx = None
            if idx is not None:
                sd = stack_descs[idx]
                command = "%s +%d %s" % (editor_invocation, sd.line_num, sd.filepath)
                edp = subprocess.Popen(command, shell=True)
                idx = None
        indexes = []
        if opts.open_all_files:
            indexes = range(0, len(stack_descs)-1)
        elif len(args) == 0:
            indexes = [-1]
            for i in args:
                if i.startswith("^"):
                    i = -1 * int(i[1:])
                    i = int(i)
                if i > len(stack_descs):
                    sys.exit("Index %d is out of bounds [0, %d]" % (i, len(stack_descs)-1))
        editor_args = []
        for i in indexes:
            editor_args.append("+%d %s" % (stack_descs[i].line_num, stack_descs[i].filepath))
        editor_args = " ".join(editor_args)
        command = "%s %s" % (editor_invocation, editor_args)
        edp = subprocess.Popen(command, shell=True)

if __name__ == '__main__':

Neat Bash Trick: Open Last Command for Editing in the Default Editor and then Execute on Saving/Exiting

This is pretty slick: enter “fc” in the shell and your last command opens up for editing in your default editor (as given by “$EDITOR“). Works perfectly with vi. The”$EDITOR” variable approach does not seem to work with BBEdit though, and you have to:

$ fc -e '/usr/bin/bbedit --wait'

With vi, “:cq” aborts execution of the command.

Most Pythonique, Efficient, Compact, and Elegant Way to Do This

Given a list of strings, how would you iterpolate a multi-character string in front of each element?

For example, given:

    >>> k = ['the quick', 'brown fox', 'jumps over', 'the lazy', 'dog']

The objective is to get:

    ['-c', 'the quick', '-c', 'brown fox', '-c', 'jumps over', '-c', 'the lazy', '-c', 'dog']

Of course, the naive solution would be to compose a new list by iterate over the original list:

    >>> result = []
    >>> for i in k:
    ...     result.append('-c')
    ...     result.append(i)

Alternatively, you could take advantage of the itertools module:

    >>> import itertools
    >>> result = list(itertools.chain.from_iterable(itertools.product(['-c'], k)))

This second form is certainly more compact, and probably more efficient … but perhaps lacks some of the clarity and elegance one might expect from Python.

Is there another solution?

Molecular Sequence Generation with DendroPy

The DendroPy Phylogenetic Computing Library includes native infrastructure for phylogenetic sequence simulation on DendroPy trees under the HKY model. Being pure-Python, however, it is a little slow. If Seq-Gen is installed on your system, though, you can take advantage of a lightweight Seq-Gen wrapper added to the latest revision under the interop subpackage: dendropy.interop.seqgen. Documentation is lagging, but the following examples should be enough to get started, and the class is simple and straightforward enough so that all options should be pretty much self-documented. With either the native or Seq-Gen wrapper
methods, you can easily generate alignments from within DendroPy, facilitating simulation and analytical pipelines.

#! /usr/bin/env python

import dendropy
from dendropy.interop import seqgen

trees = dendropy.TreeList.get_from_path("trees.nex",
s = seqgen.SeqGen()

# generate one alignment per tree
# as substitution model is not specified, defaults to a JC model
# will result in a DataSet object with one DnaCharacterMatrix per input tree
d0 = s.generate(trees)
print len(d0.char_matrices)

# instruct Seq-Gen to scale branch lengths by factor of 0.1
# note that this does not modify the input trees
s.scale_branch_lens = 0.1

# more complex model
s.char_model = seqgen.SeqGen.GTR
s.state_freqs = [0.4, 0.4, 0.1, 0.1]
s.general_rates = [0.8, 0.4, 0.4, 0.2, 0.2, 0.1]
d1 = s.generate(trees)
print len(d1.char_matrices)

Managing and Munging Line Endings in Vim

If you have opened a file, and see a bunch “^M” or “^J” characters in it, chances are that for some reason Vim is confused as to the line-ending type.
You can force it to interpret the file with a specific line-ending by using the “++ff” argument and asking Vim to re-read the file using the “:e” command:

:e ++ff=unix
:e ++ff=mac
:e ++ff=dos

This will not actually change any characters in the file, just the way the file is interpreted.
If you want to resave the file with the new line-ending format, you can give the “++ff” argument to the “:w” command:

:w ++ff=unix
:w ++ff=mac
:w ++ff=dos

Alternatively, you can just set the line-ending format, and the file will be written out with the new line ending format the next time it is saved:

:set ff=unix
:set ff=mac
:set ff=dos

OS X Terminal Taking a Very Long Time to Start

For a week now, opening a new tab or window in OS X’s Terminal application has been major palaver, sometimes taking up to a minute. CPU usage would shoot up (mostly/usually by WindowServer, but sometimes by kernel_task). It was driving me nuts. I practically live in the Terminal (or the be more accurate, Terminal + Vim), and usually spawn a new Terminal window several times in an hour for everything from using R as a calculator to opening files for viewing to actual development work. With this slow down, I found myself sometimes literally screaming in frustration: I would flick the hot key that I had bound to spawn a new Terminal window and start typing, only to realize after a half dozen characters that I might as well be watching cat videos on the internet because the window had not finished opening.

Of course, the natural suspect was my ‘~/.bashrc. It is about 1500 lines long, and, moreover sources other files. But this has been the case for years, and while I am constantly tweaking it, I could not recall making any changes in the days preceding the slow-down. More to the point, temporarily disabling this had absolutely no effect at all. Terminal still opened new windows at a tectonic pace.

I was musing reinstalling the operating system (or moving to Lion), when I stumbled upon this post:

Basically, log files build up in ‘/var/log/asl’, and this baggage causes Terminal to go catatonic during launch. Cleaning out the directory:

  $ sudo rm -r /var/log/asl/*

solved everything!

Terminal now opens new windows and tabs instantly, 1500+ line ‘.bashrc’ and all, and I am back to spawning shells as wantonly and profligately as before!

Locally Mounting a Remote Directory Through a Firewall Gateway on OS X

  1. Download and install MacFUSE.
  2. Download the sshfs binary, renaming/moving to, for example, “/usr/local/bin/sshfs“.
  3. Create a wrapper tunneling script and save it to somewhere on your system path (e.g., “/usr/local/bin/“), making sure to set the executable bit (“chmod a+x“):
    #! /bin/bash
    ssh -t GATEWAY.HOST.IP.ADDRESS ssh $@
  4. Create the following script, and save it to somewhere on your system path (e.g., “/usr/local/bin/“), making sure to set the executable bit (“chmod a+x“):
    #! /bin/bash
    if [[ -n $1 ]]
    if [[ -n $2 ]]
    if test -d "$LOCAL_MOUNT"
        echo "Mount point $LOCAL_MOUNT already exists."
        echo "Creating mount point: $LOCAL_MOUNT"
        mkdir "$LOCAL_MOUNT"
    $SSHFS_PATH -o ssh_command=$SSH_TUNNEL_WRAPPER $REMOTE_HOST:$REMOTE_DIR $LOCAL_MOUNT -oreconnect,volname=$MOUNT_NAME -o local -o follow_symlinks
    if [[ $? == 0 ]]
        echo "Mounted $REMOTE_HOST:$REMOTE_DIR at $LOCAL_MOUNT"
        echo "Use \"umount $LOCAL_MOUNT\" to unmount."
  5. And the fat lady sings …

List All Modules Provided By A Python Package

The following is an example of how to use the "pkg_resources" module (provided by the setuptools project) to compose a list of all available modules in a Python package.

#! /usr/bin/env python

import sys

    import pkg_resources
except ImportError:
    sys.stderr.write("'pkg_resources' could not be imported: setuptools installation required\n")

def list_package_modules(package_name):
    Returns list of module names for package `package_name`.
        contents = pkg_resources.resource_listdir(package_name, "")
    except ImportError:
        return []
    module_names = []
    for entry in contents:
        if pkg_resources.resource_isdir(package_name, entry):
            module_names.extend(list_package_modules(package_name + "." + entry))
        elif not entry.endswith('.pyc'):
            if entry.endswith(".py"):
                entry = entry[:-3]
            module_names.append(package_name + "." + entry)
    return module_names

if __name__ == "__main__":
    m = list_package_modules("dendropy")
    print "\n".join(m)

List All Changes from a Git Pull, Merge, or Fast-Forward

When you pull and update your local, it would be nice to easily see all the commits that you have applied in the pull. Sure you can figure it by scanning through the git log carefully, but adding the following to your ‘~/.gitconfig’ gives you an easy way to see it in a glance:

    whatsnewlog = !"sh -c \"git log  --graph --pretty=format:'%Creset%C(red bold)[%ad] %C(blue bold)%h%C(magenta bold)%d %Creset%s %C(green bold)(%an)%Creset'  --abbrev-commit --date=short $(git symbolic-ref HEAD 2> /dev/null | cut -b 12-)@{1}..$(git symbolic-ref HEAD 2> /dev/null | cut -b 12-)\""
    whatsnew = !"sh -c \"git diff  $(git symbolic-ref HEAD 2> /dev/null | cut -b 12-)@{1}..$(git symbolic-ref HEAD 2> /dev/null | cut -b 12-)\""

When you pull, the HEAD of the current branch fast-forwards to the end of all the new commits. HEAD@{1} refers to the previous position of HEAD, so ‘git diff HEAD@{1}..HEAD shows you all the stuff in the current HEAD that was not in the previous HEAD position. The same applies to branch references, and the messy stuff ‘$(…)’ is simply to get the current branch name (using the branch name is better than HEAD, because it will have correct behavior even if you do a pull and checkout another branch).

The fancy colors (‘%C(…)’) will only work with Git 1.6.6 or later, so you can remove it if you do not want it.

Lazy-Loading Cached Properties Using Descriptors and Decorators

Python descriptors allow for rather powerful and flexible attribute management with new-style classes. Combined with decorators, they make for some elegant programming.

One useful application of these mechanisms are lazy-loading properties, i.e., properties with values that are computed only when first called, returning cached values on subsequent calls.

An implementation of this concept (based on this post) is:

class lazy_property(object):
    Lazy-loading read-only property descriptor.
    Value is computed and stored in owner class object's dictionary on first
    access. Subsequent calls use value in owner class object's dictionary

    def __init__(self, func):
        self._func = func
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__

    def __get__(self, obj, obj_class):
        if obj is None:
            return obj
        obj.__dict__[self.__name__] = self._func(obj)
        return obj.__dict__[self.__name__]

Example usage:

class A(object):

    def __init__(self, s):
        self.s = s

    def hello(self):
        print("[computing hello]")
        return "Hello, %s" % self.s
>>> a = A('world')
>>> print(a.hello)
[computing hello]
Hello, world
>>> print(a.hello)
Hello, world
>>> print(a.hello)
Hello, world

Here, when the “hello” property of an object of class “A” is called for the first time, “lazy_property.__get__” calls on function “A.hello()” and stores the return value directly in dictionary of the calling object. Subsequent calls to the “hello” property of an object find this value in the object’s dictionary, and so both “lazy_property.__get__” and “A.hello()” are by-passed altogther.

This is a neat way to handle values that are expensive to compute and are not always needed. However, one constraint of this approach is that it is not simple to force a re-computation or updating of the value. In addition, sometimes it would be nice to directly populate the property if required. What is needed for the latter case is a lazy-loading property that allows for setting as well as accessing. Furthermore, to support the former case, setting the property to “None” (or calling “del” on it) should force recomputation of its value on the next access.

An implementation of this concept is [Updated on 2010-03-21 2013 CST, thanks to comment by anyonymous]:

class cached_property(object):
    Lazy-loading read/write property descriptor.
    Value is stored locally in descriptor object. If value is not set when
    accessed, value is computed using given function. Value can be cleared
    by calling 'del'.

    def __init__(self, func):
        self._func = func
        self._values = {}
        self.__name__ = func.__name__
        self.__doc__ = func.__doc__

    def __get__(self, obj, obj_class):
        if obj is None:
            return obj
        if obj not in self._values \
                or self._values[obj] is None:
            self._values[obj] = self._func(obj)
        return self._values[obj]

    def __set__(self, obj, value):
        self._values[obj] = value

    def __delete__(self, obj):
        if self.__name__ in obj.__dict__:
            del obj.__dict__[self.__name__]
        self._values[obj] = None

In action:

class A(object):

    def __init__(self, s):
        self.s = s

    def hello(self):
        print("[computing hello]")
        return "Hello, %s" % self.s
>>> a = A('world')
>>> print(a.hello)
[computing hello]
Hello, world
>>> print(a.hello)
Hello, world
>>> del a.hello
>>> print(a.hello)
[computing hello]
Hello, world
>>> print(a.hello)
Hello, world
>>> a.hello = "Blah"
>>> print(a.hello)
>>> print(a.hello)
>>> del a.hello
>>> print(a.hello)
[computing hello]
Hello, world
>>> print(a.hello)
Hello, world
>>> a.hello = None
>>> print(a.hello)
[computing hello]
Hello, world
>>> print(a.hello)
Hello, world