Setting up a Python Scientific Environment (NumPy, SciPy, pandas, StatsModels, etc.) in OS X 10.9 Mavericks

It is better than a nightmare from which you cannot wake up … Install Homebrew: $ ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)" Find problems and fix them (typically resulting from Homebrew becoming very stroppy if it does not have exclusive access to “/usr/local”): $ brew doctor VERY IMPORTANT NOTE: Make ABSOLUTELY sure that the DYLD_LIBRARY_PATH environmental variable is NOT set. Having this set will cause all sorts of problems as conflicts arise between libraries that homebrew installs that some of your other packages need (e.

Read more

Using Python's 'timeit' Module to Benchmark Functions Directly (Instead of Passing in a String to be Executed)

All the basic examples for Python’s timeit module show strings being executed. This lead to, in my opinion, somewhat convoluted code such as: #! /usr/bin/env python import timeit def f(): pass if __name__ == "__main__": timer = timeit.Timer("__main__.f()", "import __main__") result = timer.repeat(repeat=100, number=100000) print("".format(min(result))) For some reason, the fact that you can call a function directly is only (again, in my opinion) obscurely documented. But this makes things so much cleaner:

Read more

Using DendroPy Interoperability Modules to Download, Align, and Estimate a Tree from GenBank Sequences

The following example shows how easy it can be to use the three interoperability modules provided by the DendroPy Phylogenetic Computing Library to download nucleotide sequences from GenBank, align them using MUSCLE, and estimate a maximum-likelihood tree using RAxML. The automatic label composition option of the DendroPy genbank module creates practical taxon labels out the original data. We also pass in additional arguments to RAxML to request that the tree search be carried out 250 times (['-N', '250']).

Read more

Pure-Python Implementation of Fisher's Exact Test for a 2x2 Contingency Table

While Python comes with many “batteries included”, many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap. For example, Fisher’s exact test is not part of the standard library. While Python comes with many “batteries included”, many others are not. Luckily, thanks to generosity and hard work of various members of the Python community, there are a number of third-party implementations to fill in this gap.

Read more

Parse Python Stack Trace and Open Selected Source References for Editing in OS X

UPDATE Nov 7, 2009: Better parsing of traceback. UPDATE Nov 4, 2009: Now passing a “-b” flag to the script opens the parsed stack frame references in a BBEdit results browser, inspired by an AppleScript script by Marc Liyanage. When things go wrong in a Python script, the interpreter dumps a stack trace, which looks something like this: $ python y.py Calling f1 ... Traceback (most recent call last): File "y.

Read more

Most Pythonique, Efficient, Compact, and Elegant Way to Do This

Given a list of strings, how would you iterpolate a multi-character string in front of each element? For example, given: >>> k = ['the quick', 'brown fox', 'jumps over', 'the lazy', 'dog'] The objective is to get: ['-c', 'the quick', '-c', 'brown fox', '-c', 'jumps over', '-c', 'the lazy', '-c', 'dog'] Of course, the naive solution would be to compose a new list by iterate over the original list: >>> result = [] >>> for i in k: .

Read more

Molecular Sequence Generation with DendroPy

The DendroPy Phylogenetic Computing Library includes native infrastructure for phylogenetic sequence simulation on DendroPy trees under the HKY model. Being pure-Python, however, it is a little slow. If Seq-Gen is installed on your system, though, you can take advantage of a lightweight Seq-Gen wrapper added to the latest revision under the interop subpackage: dendropy.interop.seqgen. Documentation is lagging, but the following examples should be enough to get started, and the class is simple and straightforward enough so that all options should be pretty much self-documented.

Read more

List All Modules Provided By A Python Package

The following is an example of how to use the “pkg_resources” module (provided by the setuptools project) to compose a list of all available modules in a Python package. #! /usr/bin/env python import sys try: import pkg_resources except ImportError: sys.stderr.write("'pkg_resources' could not be imported: setuptools installation required\n") sys.exit(1) def list_package_modules(package_name): """ Returns list of module names for package `package_name`. """ try: contents = pkg_resources.resource_listdir(package_name, "") except ImportError: return [] module_names = [] for entry in contents: if pkg_resources.

Read more

Lazy-Loading Cached Properties Using Descriptors and Decorators

Python descriptors allow for rather powerful and flexible attribute management with new-style classes. Combined with decorators, they make for some elegant programming. One useful application of these mechanisms are lazy-loading properties, i.e., properties with values that are computed only when first called, returning cached values on subsequent calls. An implementation of this concept (based on this post) is: class lazy_property(object): """ Lazy-loading read-only property descriptor. Value is computed and stored in owner class object's dictionary on first access.

Read more