The Traveler’s Restaurant Process — A Better Description of the Dirichlet Process for Partitioning Sets

[latexpage] I. "Have Any of These People Ever Been to a Chinese Restaurant?" The Dirichlet process is a stochastic process that can be used to partition a set of elements into a set of subsets. In biological modeling, it is commonly used to assign elements into groups, such as molecular sequence sites into distinct rate categories. Very often, an intuitive explanation as to how it works invokes the "Chinese Restaurant Process" analogy. I have always found this analogy very jarring and confusing,

“Joy Plots” — Great Plot Style for Visualizing Distributions on Discrete/Categorical or Multiple Continuous Variables

R doing what R does really, really, really, really, really, really, *R*eally well: visualization. Folks, this might be THE plot to use to visualize distributions of discrete/categorical variables or simultaneous distributions of multiple continuous variables, replacing or at least taking up a seat alongside the violin plots as the current best approach IMHO. Source code repository: ggjoy Example of use (EDIT: This plot style is named after the "Joy Division", due to a similar

Solving the “Could not find all biber source files” Error

Biblatex is a fantastic bibliography/citation manager for LaTeX. It trumps the older bibtex for its much easier customizability and configuration. It does however, have one bug that can be very perplexing to figure out due to the misleading error message that results: "Could not find all biber source files". At first glance this message seemed straightforward enough to send me poking about the project file structure and build system, checking paths and names. When all that seemed intact, I started

Vim: Insert Mode is Like the Passing Lane

Insert mode is not the mode for editing text. It is a mode for editing text, because both normal and insert modes are modes for editing text. Insert mode, however, is the mode for inserting new/raw text directly from the keyboard (as opposed to, e.g., from a register or a file). Thus, you will only be in insert mode when you are actually typing in inserting (raw) text directly. For almost every other editing operation, normal mode is where you will be. Once you grok this you will realize that,

From Acolyte to Adept: The Next Step After NOP-ing Arrow Keys in Vim

We all know about no-op'ing arrow keys in Vim to get us to break the habit of relying on them for inefficient movement. But, as this post points out, it is not the location of the arrow keys that makes them inefficient, but the modality of the movement: single steps in insert mode is a horrible way to move around when normal mode provides so much better functionality. But here is the thing: while normal mode provides for much better and more efficient ways to move around than insert mode,

Setting up a Python Scientific Environment (NumPy, SciPy, pandas, StatsModels, etc.) in OS X 10.9 Mavericks

It is better than a nightmare from which you cannot wake up ... Install Homebrew: $ ruby -e "$(curl -fsSL" Find problems and fix them (typically resulting from Homebrew becoming very stroppy if it does not have exclusive access to "/usr/local"): $ brew doctor VERY IMPORTANT NOTE: Make ABSOLUTELY sure that the DYLD_LIBRARY_PATH environmental variable is NOT set. Having this set will cause all sorts of problems

Taking it to a 11: Dramatically Speeding Up Keyboard/Typing Responsiveness in OSX

If you use a Mac/OSX, then enter the following commands in your shell and reboot: $ defaults write -g KeyRepeat -int 0 $ defaults write -g InitialKeyRepeat -int 15 If you live in a text editor or the shell, or otherwise spend most of your typing hammering away at the keyboard like I do, then this makes an absolutely wonderful difference in the responsiveness of any typing activity. It will make your previous typing feel like you were pecking away in slow motion at the

Dynamic On-Demand LaTeX Compilation

Most of the existing approaches to integrating LaTeX compilation into a LaTeX writing workflow centered around a text editor (as opposed to a fancy-schmancy IDE) are horrendously bloated creatures, aggressively and voraciously hijacking so many key-mappings and normal functionality that it makes your Vim feel like it is diseased and is experiencing a pathological personality disorder of some kind. Yes, LaTeX-Suite, I am looking at mainly at you. I did not want a platoon of obnoxiously cheery elves

Setting up the Text Editor in My Computing Ecosystem

Image from WikiMedia Commons Basic Setup of Shell to Support My Text Editor Preferences By "text editor", I mean Vim, of course. There are pseudo-operating systems that include rudimentary text-editing capabilities (e.g. Emacs), and integrated development environments that allow for editing of text, but there really is only one text editor that deserves the title of "text editor": Vim, that magical mind-reading mustang that carries out textual mogrifications with surgical precision

Smart (`infercase`) Dictionary Completions in Vim While Preserving Your Preferred `ignorecase` Settings

Dictionary completions in Vim can use a 'infer case' mode, where, e.g., "Probab" will correctly autocomplete to, e.g., "Probability", even though the entry in the dictionary might be in a different case. The problem is that this mode only works if `ignorecase` is on. And sometimes, we want one (`infercase`) but not the other (`ignorecase`). The following function, if added to your "`~/.vimrc`", sets it up so that `ignorecase` is forced on when dictionary completions are invoked via

Building MacVim Natively on OS X 10.7 and Higher

You might want to do this if you want to install the latest snapshot and no pre-built release is available. OR you might want MacVim to use a custom Python installation instead of the default one on the system path. This latter was my motivation. Once you have downloaded and unpacked the code base that you want to build, step into the `src/` subdirectory: $ cd src Before proceeding, make sure that your Python installations have been built with the "``--enable-shared``"! If this is not the

Using Python’s “timeit” Module to Benchmark Functions Directly (Instead of Passing in a String to be Executed)

All the basic examples for Python's timeit module show strings being executed. This lead to, in my opinion, somewhat convoluted code such as: #! /usr/bin/env python import timeit def f(): pass if __name__ == "__main__": timer = timeit.Timer("__main__.f()", "import __main__") result = timer.repeat(repeat=100, number=100000) print("{:8.6f}".format(min(result))) For some reason, the fact that you can call a function directly is only (again, in my opinion) obscurely

‘xargs’ – Handling Filenames With Spaces or Other Special Characters

xargs is a great little utility to perform batch operations on a large set of files. Typically, the results of a find operation are piped to the xargs command: find . -iname "*.pdf" | xargs -I{} mv {} ~/collections/pdf/ The -I{} tells xargs to substitute '{}' in the statement to be executed with the entries being piped through. If these entries have spaces or other special characters, though, things will go awry. For example, filenames with spaces in them passed to xargs will result in xargs

YonderGit: Simplified Git Remote Repository Management

One of the great strengths of Git is the multiple and flexible ways of handling remote repositories. Just like Subversion, they can be "served" out of a location, but more generally, if you can reach it from your computer through any number of ways (ssh, etc.), you can git it. YonderGit wraps up a number of a common operations with remote repositories: creating, initializing, adding to (associating with) the local repository, removing, etc. You can clone your own copy of the YonderGit

Some Vim Movement Tips

Within-line character-based movement: `h` and `l` move you left and right one character, respectively. `fc` or `Fc` will take you forward to the next or back to the previous, respectively, occurrence of character "c` on the current line (e.g., `fp` will jump you forward to the next occurrence of "p" on the line, while `Fp` will jump you back to the previous occurrence of "p" on the line). `tc` or `Tc` will take you forward to just before the next or back to just

Safe and const-correct std::map Access in C++ STL

The Standard Template Library std::map[] operator will create and return a new entry if passed a key that does not already exist in the map. This means that you cannot use this operator when you do not want to create a new entry (i.e., you expect the key-value pair to already exist in the map), or in a const context (i.e., in a const method or when using a const object). Instead, in these situations, you need to first pull a (const) iterator using std::map.find(), and then check to see if its value

Useful diff Aliases

Add the following aliases to your '~/.bashrc' for some diff goodness: alias diff-side-by-side='diff --side-by-side -W"`tput cols`"' alias diff-side-by-side-changes='diff --side-by-side --suppress-common-lines -W"`tput cols`"' < p>You can, of course, use shorter alias names in good old UNIX tradition, e.g. 'ssdiff' and 'sscdiff'. You might be wondering why (a) I did not do so, and (b) what is the point, conversely, of having aliases that are almost as long as the commands that they are

Setting Up Git to Use Your Diff Viewer or Editor of Choice

Git offers two ways of viewing differences between commits, or between commits and your working tree: diff and difftool. The first of these, by default, dumps the results to the standard output. This mode of presentation is great for quick summaries of small sets of changes, but is a little cumbersome if there are a large number of changes between the two commits being compared and/or you want to closely examine the changes, browsing back-and-forth between different files/lines, search for specific