‘xargs’ – Handling Filenames With Spaces or Other Special Characters

xargs is a great little utility to perform batch operations on a large set of files. Typically, the results of a find operation are piped to the xargs command: find . -iname "*.pdf" | xargs -I{} mv {} ~/collections/pdf/ The -I{} tells xargs to substitute '{}' in the statement to be executed with the entries being piped through. If these entries have spaces or other special characters, though, things will go awry. For example, filenames with spaces in them passed to xargs will result in xargs Read more [...]

YonderGit: Simplified Git Remote Repository Management

One of the great strengths of Git is the multiple and flexible ways of handling remote repositories. Just like Subversion, they can be "served" out of a location, but more generally, if you can reach it from your computer through any number of ways (ssh, etc.), you can git it. YonderGit wraps up a number of a common operations with remote repositories: creating, initializing, adding to (associating with) the local repository, removing, etc. You can clone your own copy of the YonderGit Read more [...]

Some Vim Movement Tips

Within-line character-based movement: `h` and `l` move you left and right one character, respectively. `fc` or `Fc` will take you forward to the next or back to the previous, respectively, occurrence of character "c` on the current line (e.g., `fp` will jump you forward to the next occurrence of "p" on the line, while `Fp` will jump you back to the previous occurrence of "p" on the line). `tc` or `Tc` will take you forward to just before the next or back to just Read more [...]

Safe and const-correct std::map Access in C++ STL

The Standard Template Library std::map[] operator will create and return a new entry if passed a key that does not already exist in the map. This means that you cannot use this operator when you do not want to create a new entry (i.e., you expect the key-value pair to already exist in the map), or in a const context (i.e., in a const method or when using a const object). Instead, in these situations, you need to first pull a (const) iterator using std::map.find(), and then check to see if its value Read more [...]

Useful diff Aliases

Add the following aliases to your '~/.bashrc' for some diff goodness: alias diff-side-by-side='diff --side-by-side -W"`tput cols`"' alias diff-side-by-side-changes='diff --side-by-side --suppress-common-lines -W"`tput cols`"' < p>You can, of course, use shorter alias names in good old UNIX tradition, e.g. 'ssdiff' and 'sscdiff'. You might be wondering why (a) I did not do so, and (b) what is the point, conversely, of having aliases that are almost as long as the commands that they are Read more [...]

Setting Up Git to Use Your Diff Viewer or Editor of Choice

Git offers two ways of viewing differences between commits, or between commits and your working tree: diff and difftool. The first of these, by default, dumps the results to the standard output. This mode of presentation is great for quick summaries of small sets of changes, but is a little cumbersome if there are a large number of changes between the two commits being compared and/or you want to closely examine the changes, browsing back-and-forth between different files/lines, search for specific Read more [...]

Using DendroPy Interoperability Modules to Download, Align, and Estimate a Tree from GenBank Sequences

The following example shows how easy it can be to use the three interoperability modules provided by the DendroPy Phylogenetic Computing Library to download nucleotide sequences from GenBank, align them using MUSCLE, and estimate a maximum-likelihood tree using RAxML. The automatic label composition option of the DendroPy genbank module creates practical taxon labels out the original data. We also pass in additional arguments to RAxML to request that the tree search be carried out 250 times (['-N', Read more [...]

Vim Regular Expression Special Characters: To Escape or Not To Escape

Vim's regular expression dialect is distinct from many of the other more popular ones out there today (and actually predates them). One of the dialect differences that always leaves me fumbling has to do with which special characters need to be escaped. Vim does have a special "very magic" mode (that is activated by "\v" in the regular expression) that makes thing very clean and simple in this regard: only letters, numbers and underscores are treated as literals without escaping. But I have never Read more [...]

Unconditionally Accepting All Merging-In Changes During a Git Merge

Merge conflicts suck. It is not uncommon, however, that you often just know that you really just want to accept all the changes from the branch that you are merging in. Which makes things a lot simpler conceptually. The Git documentation suggests that this can also be procedurally simple as well, as it mentions the "-s theirs" merge strategy which does just that, i.e., unconditionally accept everything from the branch that you are merging in: $ git merge -s theirs Unfortunately, however, running Read more [...]

The Power and Precision of Vim’s Text Objects: Efficent, Elegant, Awesome.

Vim's text objects are not only a powerful, flexible and precise way to specify a region of text, but also intuitive and efficient. They can be used with any command that can be combined with a motion (e.g., "d", "y", "v", "r"), but in this post I will be using the "c" command ("change") to illustrate them. Imagine you were on a line looked like this, with the cursor on the letter "r" of the word "dry": print "Enter run mode ('test', 'dry', or 'full')" Then, after typing "c" to start Read more [...]

Supplementary Command-History Logging in Bash: Tracking Working Directory, Dates, Times, etc.

Introduction Here is a way to create a secondary shell history log (i.e., one that supplements the primary "~/.bash_history") that tracks a range of other information, such as the working directory, hostname, time and date etc. Using the "HISTTIMEFORMAT" variable, it is in fact possible to store the time and date with the primary history, but the storing of the other information is not as readibly do-able. Here, I present an approach based on this excellent post on StackOverflow. The main differences Read more [...]

Stripping Paths from Files in TAR Archives

There is no way to get tar to ignore directory paths of files that it is archiving. So, for example, if you have a large number of files scattered about in subdirectories, there is no way to tell tar to archive all the files while ignoring their subdirectories, such that when unpacking the archive you extract all the files to the same location. You can, however, tell tar to strip a fixed number of elements from the full (relative) path to the file when extracting using the "--strip-components" option. Read more [...]

Pure-Python Implementation of Fisher’s Exact Test for a 2×2 Contingency Table

While Python comes with many "batteries included", many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap. For example, Fisher's exact test is not part of the standard library. While Python comes with many "batteries included", many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations Read more [...]

Piping Output Over a Secure Shell (SSH) Connection

We all know about using scp to transfer files over a secure shell connection. It works fine, but there are many cases where alternate modalities of usage are required, for example, when dealing when you want to transfer the output of one program directly to be stored on a remote machine. Here are some ways of going about doing this. Let "$PROG" be a program that writes data to the standard output stream. Then: Transfering without compression: $PROG | ssh destination.ip.address Read more [...]

Parse Python Stack Trace and Open Selected Source References for Editing in OS X

UPDATE Nov 7, 2009: Better parsing of traceback. UPDATE Nov 4, 2009: Now passing a "-b" flag to the script opens the parsed stack frame references in a BBEdit results browser, inspired by an AppleScript script by Marc Liyanage. When things go wrong in a Python script, the interpreter dumps a stack trace, which looks something like this: $ python y.py Calling f1 ... Traceback (most recent call last): File "y.py", line 6, in x.f3() File "/Users/jeet/Scratch/snippets/x.py", line Read more [...]

Neat Bash Trick: Open Last Command for Editing in the Default Editor and then Execute on Saving/Exiting

This is pretty slick: enter "fc" in the shell and your last command opens up for editing in your default editor (as given by "$EDITOR"). Works perfectly with vi. The"$EDITOR" variable approach does not seem to work with BBEdit though, and you have to:
$ fc -e '/usr/bin/bbedit --wait'
With vi, ":cq" aborts execution of the command.

Most Pythonique, Efficient, Compact, and Elegant Way to Do This

Given a list of strings, how would you iterpolate a multi-character string in front of each element? For example, given: >>> k = ['the quick', 'brown fox', 'jumps over', 'the lazy', 'dog'] The objective is to get: ['-c', 'the quick', '-c', 'brown fox', '-c', 'jumps over', '-c', 'the lazy', '-c', 'dog'] Of course, the naive solution would be to compose a new list by iterate over the original list: >>> result = [] >>> for i in k: ... result.append('-c') Read more [...]

Molecular Sequence Generation with DendroPy

The DendroPy Phylogenetic Computing Library includes native infrastructure for phylogenetic sequence simulation on DendroPy trees under the HKY model. Being pure-Python, however, it is a little slow. If Seq-Gen is installed on your system, though, you can take advantage of a lightweight Seq-Gen wrapper added to the latest revision under the interop subpackage: dendropy.interop.seqgen. Documentation is lagging, but the following examples should be enough to get started, and the class is simple and Read more [...]