Posted by Jeet Sukumaran

There is no way to get tar to ignore directory paths of files that it is archiving. So, for example, if you have a large number of files scattered about in subdirectories, there is no way to tell tar to archive all the files while ignoring their subdirectories, such that when unpacking the archive you extract all the files to the same location. You can, however, tell tar to strip a fixed number of elements from the full (relative) path to the file when extracting using the "--strip-components" option. For example:

Posted by Jeet Sukumaran

In a previous post, I discuss a couple of approaches to dealing with the "Argument list too long" error when transferring large numbers of files. The solution to this problem is to archive the files, using the "-T" option of the "tar" command to pass in a list files generated by a "find" command:

  1. Create a list of the files to be archived using the "find" command:
    $ find . -name="*.tre" > filelist.txt
  2. Use the "-T" option of the "tar" command to pass in this list of filenames:
    $ tar cvjf archive.tbz -T filelist.txt

If you want to delete a long list of files, however, this approach will not work, as "rm" does not support the very convenient "-T"/"--files-from" flag or the equivalent (so convenient, in fact, that I have started adding this to virtually every file-processing script or program that I write).

Luckily, however, "find" does support a "-delete" flag, so to recursively delete all files and directories:

 
Posted by Jeet Sukumaran

There are several functions that calculate principal component statistics in R. Two of these are "prcomp()" and "princomp()". The "prcomp()" function has fewer features, but is numerically more stable than "princomp()".

Both of these functions can be invoked by simply passing in a suitable data frame, in which case all columns will be used:

    pca1 = prcomp(d)
    pca2 = princomp(d)

Alternatively, the columns to be used can be specified using a formula notation:

Posted by Jeet Sukumaran

If you have opened a file, and see a bunch "^M" or "^J" characters in it, chances are that for some reason Vim is confused as to the line-ending type. You can force it to interpret the file with a specific line-ending by using the "++ff" argument and asking Vim to re-read the file using the ":e" command:

Posted by Jeet Sukumaran
Note: As pointed out by Dr. Thornton, the build methods described and given below result in the program binary being statically-linked to the libsequence library.

SGE Queue Tantrums

16 Jun 2010
Posted by Jeet Sukumaran

You can get a "bird's eye" view of your cluster load by running:

Posted by Jeet Sukumaran

Every day I discover at least one new thing about Vim. Sometimes useful, sometimes not. Sometimes rather prosaic, sometimes sublime.

This one falls in the useful but prosaic category: to get a count of the number of characters, lines, words etc. in the current selection, type "g CTRL-G".

Posted by Jeet Sukumaran

Given a list of strings, how would you iterpolate a multi-character string in front of each element?

For example, given:

    >>> k = ['the quick', 'brown fox', 'jumps over', 'the lazy', 'dog']
The objective is to get:
    ['-c', 'the quick', '-c', 'brown fox', '-c', 'jumps over', '-c', 'the lazy', '-c', 'dog']

Of course, the naive solution would be to compose a new list by iterate over the original list:

Posted by Jeet Sukumaran

Ideally, you could refer the whole world --- or at least, the significant portion thereof that want your code --- to your (public mirror) Git repository. But unfortunately, the whole world does not (yet) use Git ("I know it was you Fredo, I know it was you, and it breaks my heart."). Sad. Sooooo sad. But true. So the only recourse is for you to send these tortured souls an archived snapshot of your code via e-mail. I'll pause now to let you finish retching/sobbing/lamenting/venting. ... Back? Anyway, Git has a neat "archive" command that helps you create the required archive, but perhaps it does not have the most friendliest interface in the world. Drop the following script anywhere on your path name it "git-targz" and set its executable bit on. Then invoking "git targz TARBALL-FILEPATH" will create a tar'd and gzip'd bundle of your (repository's) current HEAD. Similar scripts for bzip'ing and plain old zipping can easily be created by varying the final command, and are shown after the tar + gzip script.

Posted by Jeet Sukumaran

For the past few months, I've been 'defensive coding' wrt to Python 3.x; basically, if there is a construct that:

  • will be broken under 3.x

and

  • the alternate (which is not broken) is supported under 2.6+

I've been trying to use that instead.

Here is a "3K-ism" got me, that was completely unanticipated. I encountered it when running my Python environment description script under Python 3.x.