Pure-Python Implementation of Fisher’s Exact Test for a 2×2 Contingency Table

While Python comes with many "batteries included", many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap. For example, Fisher's exact test is not part of the standard library. While Python comes with many "batteries included", many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations Read more [...]

Piping Output Over a Secure Shell (SSH) Connection

We all know about using scp to transfer files over a secure shell connection. It works fine, but there are many cases where alternate modalities of usage are required, for example, when dealing when you want to transfer the output of one program directly to be stored on a remote machine. Here are some ways of going about doing this. Let "$PROG" be a program that writes data to the standard output stream. Then: Transfering without compression: $PROG | ssh destination.ip.address Read more [...]

Parse Python Stack Trace and Open Selected Source References for Editing in OS X

UPDATE Nov 7, 2009: Better parsing of traceback. UPDATE Nov 4, 2009: Now passing a "-b" flag to the script opens the parsed stack frame references in a BBEdit results browser, inspired by an AppleScript script by Marc Liyanage. When things go wrong in a Python script, the interpreter dumps a stack trace, which looks something like this: $ python y.py Calling f1 ... Traceback (most recent call last): File "y.py", line 6, in x.f3() File "/Users/jeet/Scratch/snippets/x.py", line Read more [...]

Neat Bash Trick: Open Last Command for Editing in the Default Editor and then Execute on Saving/Exiting

This is pretty slick: enter "fc" in the shell and your last command opens up for editing in your default editor (as given by "$EDITOR"). Works perfectly with vi. The"$EDITOR" variable approach does not seem to work with BBEdit though, and you have to:
$ fc -e '/usr/bin/bbedit --wait'
With vi, ":cq" aborts execution of the command.

Most Pythonique, Efficient, Compact, and Elegant Way to Do This

Given a list of strings, how would you iterpolate a multi-character string in front of each element? For example, given: >>> k = ['the quick', 'brown fox', 'jumps over', 'the lazy', 'dog'] The objective is to get: ['-c', 'the quick', '-c', 'brown fox', '-c', 'jumps over', '-c', 'the lazy', '-c', 'dog'] Of course, the naive solution would be to compose a new list by iterate over the original list: >>> result = [] >>> for i in k: ... result.append('-c') Read more [...]

Molecular Sequence Generation with DendroPy

The DendroPy Phylogenetic Computing Library includes native infrastructure for phylogenetic sequence simulation on DendroPy trees under the HKY model. Being pure-Python, however, it is a little slow. If Seq-Gen is installed on your system, though, you can take advantage of a lightweight Seq-Gen wrapper added to the latest revision under the interop subpackage: dendropy.interop.seqgen. Documentation is lagging, but the following examples should be enough to get started, and the class is simple and Read more [...]

Managing and Munging Line Endings in Vim

If you have opened a file, and see a bunch "^M" or "^J" characters in it, chances are that for some reason Vim is confused as to the line-ending type. You can force it to interpret the file with a specific line-ending by using the "++ff" argument and asking Vim to re-read the file using the ":e" command: :e ++ff=unix :e ++ff=mac :e ++ff=dos This will not actually change any characters in the file, just the way the file is interpreted. If you want to resave the file with the new line-ending Read more [...]

OS X Terminal Taking a Very Long Time to Start

For a week now, opening a new tab or window in OS X's Terminal application has been major palaver, sometimes taking up to a minute. CPU usage would shoot up (mostly/usually by WindowServer, but sometimes by kernel_task). It was driving me nuts. I practically live in the Terminal (or the be more accurate, Terminal + Vim), and usually spawn a new Terminal window several times in an hour for everything from using R as a calculator to opening files for viewing to actual development work. With this slow Read more [...]

Locally Mounting a Remote Directory Through a Firewall Gateway on OS X

Download and install MacFUSE. Download the sshfs binary, renaming/moving to, for example, "/usr/local/bin/sshfs". Create a wrapper tunneling script and save it to somewhere on your system path (e.g., "/usr/local/bin/ssh-tunnel-gateway.sh"), making sure to set the executable bit ("chmod a+x"): #! /bin/bash ssh -t GATEWAY.HOST.IP.ADDRESS ssh $@ Create the following script, and save it to somewhere on your system path (e.g., "/usr/local/bin/mount-remote.sh"), making sure Read more [...]

List All Modules Provided By A Python Package

The following is an example of how to use the "pkg_resources" module (provided by the setuptools project) to compose a list of all available modules in a Python package. #! /usr/bin/env python import sys try: import pkg_resources except ImportError: sys.stderr.write("'pkg_resources' could not be imported: setuptools installation required\n") sys.exit(1) def list_package_modules(package_name): """ Returns list of module names for package `package_name`. """ Read more [...]

List All Changes from a Git Pull, Merge, or Fast-Forward

When you pull and update your local, it would be nice to easily see all the commits that you have applied in the pull. Sure you can figure it by scanning through the git log carefully, but adding the following to your '~/.gitconfig' gives you an easy way to see it in a glance: whatsnewlog = !"sh -c \"git log --graph --pretty=format:'%Creset%C(red bold)[%ad] %C(blue bold)%h%C(magenta bold)%d %Creset%s %C(green bold)(%an)%Creset' --abbrev-commit --date=short $(git symbolic-ref HEAD 2> /dev/null Read more [...]

Lazy-Loading Cached Properties Using Descriptors and Decorators

Python descriptors allow for rather powerful and flexible attribute management with new-style classes. Combined with decorators, they make for some elegant programming. One useful application of these mechanisms are lazy-loading properties, i.e., properties with values that are computed only when first called, returning cached values on subsequent calls. An implementation of this concept (based on this post) is: class lazy_property(object): """ Lazy-loading read-only property descriptor. Read more [...]

Grepping in Git: How to Search Git Repository Revisions, Working Trees, Commit Messages, etc.

To search content of all tracked files in the current working tree for a pattern: git grep To search content of all commit messages for a pattern ('-E' for extended grep): git log [-E] --grep To search content of all commit diffs for lines that add or remove a pattern ('-w' for pattern only at word boundary): git [-w] log -G To search content of entire working trees of previous revisions for a pattern: git grep $(git rev-list --all) Note that Git supports POSIX Basic Regular Expression. Read more [...]

`gcd` – A Git-aware `cd` Relative to the Repository Root with Auto-Completion

The following will enable you to have a Git-aware "cd" command with directory path expansion/auto-completion relative to the repository root. You will have to source it into your "~/.bashrc" file, after which invoking "gcd" from the shell will allow you specify directory paths relative to the root of your Git repository no matter where you are within the working tree. gcd() { if [[ $(which git 2> /dev/null) ]] then STATUS=$(git status 2>/dev/null) Read more [...]

Filter for Unique Lines Adjacent or Otherwise While Preserving Original Order

There are two BASH utilities that help you filter input for unique lines: 'uniq' and 'sort': One gotcha with 'uniq' is that it only filters out duplicate adjacent lines. So if your input looks like: apple apple apple chicory chicory chicory banana banana Then running 'uniq' on it will yield: apple chicory banana But if the input has non-adjacent duplicate lines: apple banana banana chicory apple banana chicory banana banana apple apple apple banana chicory Then the results are: apple banana chicory apple banana chicory banana apple banana chicory < p>The Read more [...]

Filesystem Management with the Full Power of Vim

Just discovered "vidir" , a way to manipulate filenames inside your favorite text editor (better known as Vim). Previously, I would use complex and cumbersome BASH constructs using "for;do;done", "sed", "awk" etc., coupled with the operation itself: $ for f in *.txt; do mv $f $(echo $f | sed -e 's/foo\(\d\+\)_\(.*\)\.txt/bar_\1_blah_\2.txt/'); done Which almost always involved a "pre-flight" dummy run to make sure the reg-ex's were correct: $ for f in *.txt; do echo mv $f $(echo Read more [...]

Execute Selected Lines of (Optionally) Marked-Up Python Code in a Vim Buffer

There are a number of solutions for executing Python code in your active buffer in Vim. All of these expect the buffer lines to be well-formatted Python code, with correct indentation. Many times, however, I am working on program or other documentation (in, for example reStructuredTex or Markdown format), and the code fragments that I want to execute have extra indentation or line leaders. For example, a reStructuredText buffer might look like: How to Wuzzle the Wookie ------------------------- Read more [...]

Ensuring That SGE Does Not Oversubscribe Processors or Memory

SGE, be default, will happily oversubscribe processors when multiple queues target the same nodes (nice one, SGE). Furthermore, even if jobs specify a memory limit, if each individual job uses less then the total memory limit, but the sum memory usage of jobs assigned to the machine exceeds the machine's memory, the memory of the entire node can be exhausted, sending it off into limbo (cunning, SGE, very cunning). The solution to both of these issues is to specify processors and memory as consumable Read more [...]

Enhanced Git Log View Showing Symbolic References Associated With Each Commit

< p> With multiple upstream repositories and branches, and different branches on different upstreams, an enhanced "log" view will help greatly in taking stock of everything. Adding the following line to your "~/.gitconfig" will give you a new command, "git slog" (for "short log") that does just that: < p> # colorful 1-line log summary slog = log --pretty=format:'%Creset%C(red bold)[%ad] %C(blue bold)%h %Creset%C(magenta bold)%d %Creset%s Read more [...]