‘xargs’ – Handling Filenames With Spaces or Other Special Characters

xargs is a great little utility to perform batch operations on a large set of files.
Typically, the results of a find operation are piped to the xargs command:

   find . -iname "*.pdf" | xargs -I{} mv {} ~/collections/pdf/

The -I{} tells xargs to substitute ‘{}’ in the statement to be executed with the entries being piped through.
If these entries have spaces or other special characters, though, things will go awry.
For example, filenames with spaces in them passed to xargs will result in xargs barfing with a “xargs: unterminated quote” error on OS X.

The solution is use null-terminated strings in both the find and xargs invocation:

   find . -iname "*.pdf" -print0 | xargs -0 -I{} mv {} ~/collections/pdf/

Note the -print0 argument to find, and the corresponding -0 argument to xargs: the former tells find to produce null-terminated entries while the latter tells xargs to expect and consume null-terminated entries.

Useful diff Aliases

Add the following aliases to your ‘~/.bashrc‘ for some diff goodness:

alias diff-side-by-side='diff --side-by-side -W"`tput cols`"'
alias diff-side-by-side-changes='diff --side-by-side --suppress-common-lines -W"`tput cols`"'


p>You can, of course, use shorter alias names in good old UNIX tradition, e.g. ‘ssdiff’ and ‘sscdiff’. You might be wondering why (a) I did not do so, and (b) what is the point, conversely, of having aliases that are almost as long as the commands that they are aliasing. The answer to the first is ‘memory’, and the second is ‘autocomplete’.



Shorter aliases resulted in me constantly forgetting what they were mapped to (I rarely work outside a Git repository, and thus rarely use external diff, relying/needing Git’s diff 99% of the time), and it was easier for me to Google the options than to open up my huge ‘~/.bashrc’ to look up my personal alias. And the being forced not only to look up the options but then type out all those awkward characters again and again meant that I rarely ended up using these neat diff options. However, now, with these aliases, I just type ‘diff’ and then hit ‘TAB’, and let autocompletion show me and finish off the rest the commands for me.

Supplementary Command-History Logging in Bash: Tracking Working Directory, Dates, Times, etc.


Here is a way to create a secondary shell history log (i.e., one that supplements the primary “~/.bash_history“) that tracks a range of other information, such as the working directory, hostname, time and date etc. Using the “HISTTIMEFORMAT” variable, it is in fact possible to store the time and date with the primary history, but the storing of the other information is not as readibly do-able. Here, I present an approach based on this excellent post on StackOverflow.

The main differences between this approach and the original is:

  • I remove the option to log the extra information to the primary history file: I prefer to keep this history clean.
  • I add history number, host name, time/date stamp etc. to the supplementary history log by default.
  • I add field separators, making it easy to apply ‘awk‘ commands.

The (Supplementary) History Logger Function

First, add or source the following to your “~/.bashrc“:


Activating the Logger

Then you need to set this function to execute on every command by adding it to your “$PROMPT_COMMAND” variable, so you need the following entry in your “~/.bashrc“:

    export PROMPT_COMMAND='_loghistory'

There are a number of options that the logging function takes, including the adding terminal information, the adding of arbitrary text or the execution of a function or function(s) that generate appropriate text. See the function documentation for more info.

Add Some Useful Aliases

Add the following to your “~/.bashrc“:

# dump regular history log
alias h='history'
# dump enhanced history log
alias hh="cat $HOME/.bash_log"
# dump history of directories visited
alias histdirs="cat $HOME/.bash_log | awk -F ' ~~~ ' '{print $2}' | uniq"

Checkout the Results! The ‘histdirs‘ command is very useful to quickly list, select (via copy and pasting) and jumping back to a directory.

$ h
14095  [2011-11-23 15:36:20] ~~~ jd nuim
14096  [2011-11-23 15:36:21] ~~~ ll
14097  [2011-11-23 15:36:23] ~~~ git status
14098  [2011-11-23 15:36:33] ~~~ jd pytb
14099  [2011-11-23 15:36:36] ~~~ git status
14100  [2011-11-23 15:36:53] ~~~ git rm --cached config/*
14101  [2011-11-23 15:37:00] ~~~ git pull
14102  [2011-11-23 15:37:11] ~~~ e .gitignore
14103  [2011-11-23 15:37:28] ~~~ git status
14104  [2011-11-23 15:37:35] ~~~ e .gitignore
14105  [2011-11-23 15:37:44] ~~~ git status
14106  [2011-11-23 15:38:10] ~~~ git commit -a -m "stuff"
14107  [2011-11-23 15:38:12] ~~~ git pushall
14108  [2011-11-23 15:50:38] ~~~ ll build_c/
14109  [2011-11-23 15:53:16] ~~~ cd
14110  [2011-11-23 15:53:18] ~~~ ls -l
14111  [2011-11-23 16:00:12] ~~~ cd Documents/Projects/Phyloinformatics/DendroPy/dendropy
14112  [2011-11-23 16:00:15] ~~~ ls -l
14113  [2011-11-23 16:00:22] ~~~ cd dendropy/
14114  [2011-11-23 16:00:24] ~~~ vim *.py

$ hh
[2011-11-23 15:36:20] ~~~ /Users/jeet ~~~ jd nuim
[2011-11-23 15:36:21] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ ll
[2011-11-23 15:36:23] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ git status
[2011-11-23 15:36:33] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/nuim ~~~ jd pytb
[2011-11-23 15:36:36] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:36:53] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git rm --cached config/*
[2011-11-23 15:37:00] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git pull
[2011-11-23 15:37:11] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ e .gitignore
[2011-11-23 15:37:28] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:37:35] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ e .gitignore
[2011-11-23 15:37:44] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git status
[2011-11-23 15:38:10] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git commit -a -m "stuff"
[2011-11-23 15:38:12] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ git pushall
[2011-11-23 15:50:38] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ ll build_c/
[2011-11-23 15:53:16] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/pytbeaglehon ~~~ cd
[2011-11-23 15:53:18] ~~~ /Users/jeet ~~~ ls -l
[2011-11-23 16:00:12] ~~~ /Users/jeet ~~~ cd Documents/Projects/Phyloinformatics/DendroPy/dendropy
[2011-11-23 16:00:15] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy ~~~ ls -l
[2011-11-23 16:00:22] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy ~~~ cd dendropy/
[2011-11-23 16:00:24] ~~~ /Users/jeet/Documents/Projects/Phyloinformatics/DendroPy/dendropy/dendropy ~~~ vim *.py

$ histdirs

Further Reading

Stripping Paths from Files in TAR Archives

There is no way to get tar to ignore directory paths of files that it is archiving. So, for example, if you have a large number of files scattered about in subdirectories, there is no way to tell tar to archive all the files while ignoring their subdirectories, such that when unpacking the archive you extract all the files to the same location. You can, however, tell tar to strip a fixed number of elements from the full (relative) path to the file when extracting using the “--strip-components” option. For example:

tar --strip-components=2 -xvf archive.tar.gz

This will strip the first two elements of the paths of all the archived files. To get an idea of what this will look like before extracting, you can use the “-t” (“tabulate”, or list) in conjunction with the “--show-transformed option:

tar --strip-components=2 -t --show-transformed -f archive.tar.gz

The “--strip-components” approach only works if all the files that you are extracting are the same relative depth. Files that are “shallower” will not be extracted, while files that are deeper will still be extracted to sub-directories. The only clean solution to this that I can think of would be to extract all the files to a temporary location and then move all the files to single directory:

mkdir /tmp/work
cd /tmp/work
tar -xvzf /path/to/archive.tar.gz
mkdir collected
find . -type f -exec mv {} collected/ \;

Piping Output Over a Secure Shell (SSH) Connection

We all know about using scp to transfer files over a secure shell connection.
It works fine, but there are many cases where alternate modalities of usage are required, for example, when dealing when you want to transfer the output of one program directly to be stored on a remote machine.
Here are some ways of going about doing this.

Let "$PROG" be a program that writes data to the standard output stream.

  • Transfering without compression:
    $PROG | ssh destination.ip.address 'cat > ~/file.txt'
  • Using gzip for compression:
    $PROG | gzip -f | ssh destination.ip.address 'gunzip > ~/file.txt'
  • Better compression can usually be achieved by bzip2
    $PROG | bzip2  | ssh destination.ip.address 'bunzip2 > ~/Scratch/file.txt'

I find this useful enough to source the following function into all my shells:

## xof #######################################################################
# Pipe from standard input to remote file.
xof() {
    if [[ -z $1 || -z $2 ]]
        echo usage: ' | xof  '
        echo Pipe standard input to remote file.
        bzip2 | ssh $1 'bunzip2 > '$2

And when I want to get fancy, I pipe the output directly to my favorite text editor, BBEdit:

## xobb #######################################################################
# Pipe from standard input to bbedit.
xobb() {
    if [[ -z $1 ]]
        echo usage: ' | xobb '
        echo Pipe standard input to BBEdit.
        bzip2 | ssh $1 'bunzip2 | bbedit'

On a tangential note, if you have a large number of files that you want to transfer, the following is more efficient than separately tar-ing and scp-ing:

tar cf - *.t | ssh destination.ip.address "tar xf - -C /home/jeet/projects/bbi2

Neat Bash Trick: Open Last Command for Editing in the Default Editor and then Execute on Saving/Exiting

This is pretty slick: enter “fc” in the shell and your last command opens up for editing in your default editor (as given by “$EDITOR“). Works perfectly with vi. The”$EDITOR” variable approach does not seem to work with BBEdit though, and you have to:

$ fc -e '/usr/bin/bbedit --wait'

With vi, “:cq” aborts execution of the command.

`gcd` – A Git-aware `cd` Relative to the Repository Root with Auto-Completion

The following will enable you to have a Git-aware "cd" command with directory path expansion/auto-completion relative to the repository root. You will have to source it into your "~/.bashrc" file, after which invoking "gcd" from the shell will allow you specify directory paths relative to the root of your Git repository no matter where you are within the working tree.

gcd() {
    if [[ $(which git 2> /dev/null) ]]
        STATUS=$(git status 2>/dev/null)
        if [[ -z $STATUS ]]
        TARGET="./$(git rev-parse --show-cdup)$1"
        #echo $TARGET
        cd $TARGET
    if [[ $(which git 2> /dev/null) ]]
        STATUS=$(git status 2>/dev/null)
        if [[ -z $STATUS ]]
        TARGET="./$(git rev-parse --show-cdup)"
        if [[ -d $TARGET ]]
        dirnames=$(cd $TARGET; compgen -o dirnames $2)
        opts=$(for i in $dirnames; do  if [[ $i != ".git" ]]; then echo $i/; fi; done)
        if [[ ${cur} == * ]] ; then
            COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
            return 0
complete -o nospace -F _gcd gcd

Boosting Interactive Bash Efficiency Through History Search Completion Editing

Most of us know about using the bang operator (`!`) to recall an entry from our bash history:

$ ! # repeat last command
$ !22 # repeat command 22

You can use “!:” followed by a number to substitute in arguments from previous commands. So, for example, to run the command “dosomething” on the first argument of the previous command:

$ dosomething !:1

The fc command is also very useful, opening up the default editor to let you edit previous commands. Saving and exiting will execute the command, while canceling the save (e.g., “:cq” in Vim) will abort.

$ fc # "fix" previous command
$ fc -10 0 # "fix" previous 10 commands

Things get really slick when using the `CTRL-R` and `CTRL-S` operations. These will incrementally match a string against entries in your shell history.

For example, in my shell, typing `CTRL-R` followed by “mak” yields:

(reverse-i-search)`mak': make PREFIX=~/Documents/System/Environment/local/ install

While typing `CTRL-R` followed by “Pro” yields:

(reverse-i-search)`Pro': cd Documents/Projects/Miscellaneous/msbayes-ext/

From here, I can use `CTRL-R` or `CTRL-S` to scroll backward or forward through all possible matches. I can accept and execute the match by hitting `ENTER`, or accept the match but not execute it by hitting `ESC` (thus allowing me to edit the line). Hitting `CTRL-C` or `CTRL-G` aborts and returns me to the regular interactive shell.

This incremental history search is fantastic! No more “history | grep something” followed by cutting-and-pasting and then editing. But often I have already started typing something, and then decide that I want to complete this from some entry in my history. In these cases, `CTRL-R` and `CTRL-S` will not be as efficient, as invoking them will result in me having to re-type the partially entered command. However, a few lines added to my “~/.bashrc” sets Bash’s readline options to allow my to call up completion based on incremental matching of my partial command against my shell’s history:

# Ctrl-p: search in previous history
bind 'Control-p: history-search-backward'
bind -m vi-insert 'Control-p: history-search-backward'
bind -m vi-command 'Control-p: history-search-backward'

# Ctrl-n: search in next history
bind 'Control-n: history-search-forward'
bind -m vi-insert 'Control-n: history-search-forward'
bind -m vi-command 'Control-n: history-search-forward'

The above settings are so as to get this behavior in both Vi-mode readline (my default) as well as Emacs-mode readline (readline’s default).

Incidentally, while not history-editing specific, if you do not already know about the `v` or `CTRL-X CTRL-E` commands, you really should check them out. The former works in Vi-mode, while the latter works in Emacs-mode. Both of them open up the current command for editing in your default editor. In combination with the history search/completion options described above, it makes for a really potent and efficient Bash session: you can call up a previous command using `CTRL-R`/`CTRL-N`, hit `ESC` and then `v` or `CTRL-X CTRL-E` to open up the default text editor and tweak the line using the full power and sophistication afforded by the editor. Then, simply save and exit (e.g., “:wq“) to execute, or cancel (e.g., “:cq“) to abort. I use Vi-mode readline, and I have mapped `CTRL-V` to be able to call up Vim on my command when I am insert mode by adding the following lines to my “~/.bashrc“:

# Ctrl-v: (insert mode) switch to command mode and edit in vi
bind '"\C-v": "\ev"'

Bash Function to Return the Absolute Path of a Directory Name Passed as an Argument

For some reason, a portable solution (i.e., something that works on most common flavors of POSIX systems, from the Linux variety to the Unix ones) to this is a little tricky.

Here is one that seems to do the job:

get_abs_path() {
    local PARENT_DIR=$(dirname "$1")
    cd "$PARENT_DIR"
    local ABS_PATH="$(pwd)"/"$(basename $1)"
    cd - >/dev/null
    echo $ABS_PATH

Dealing with ‘Argument list too long’ Problems

The solution to this problem is to the “Argument list too long” error when trying to archive a large number of files is the “-T” option of the “tar” command to pass in a list files generated by a “find” command:

  1. Create a list of the files to be archived using the "find" command:
    $ find . -name="*.tre" > filelist.txt
  2. Use the “-T” option of the “tar” command to pass in this list of filenames:
    $ tar cvjf archive.tbz -T filelist.txt

If you want to delete a long list of files, however, this approach will not work, as “rm” does not support the very convenient “-T“/”--files-from” flag or the equivalent (so convenient, in fact, that I have started adding this to virtually every file-processing script or program that I write).

Luckily, however, “finddoes support a “-delete” flag, so to recursively delete all files and directories:

find path/to/dir -delete

You can use a “-type f” argument to limit the operation only to files, and the “-depth 1” argument to limit the operation only to the current directory, so that:

find path/to/dir -type f -depth 1 -delete

will delete files in the specified directory, without touching subdirectories or the files within them.

Note that using “find” in conjunction with the “-delete” flag is probably faster than any other approach, including using the “-exec rm {} \;” argument to “find” or looping over the files in a shell script. However, if you want to get rid of an entire directory and all sub-directories, then simply issuing an “rm -r” is, of course, a better performer.

Add the Following Lines to Your `~/.bashrc` and You Will Be Very Happy

I added the following to my `~/.bashrc` and I am loving it!

## Up Arrow: search and complete from previous history
bind '"\eOA": history-search-backward'
## alternate, if the above does not work for you:
#bind '"\e[A":history-search-backward'

## Down Arrow: search and complete from next history
bind '"\eOB": history-search-forward'
## alternate, if the above does not work for you:
#bind '"\e[B":history-search-forward'

(see the comments below for explanation of the alternate codes)

The first command rebinds the up arrow from “previous-history”, which unconditionally selects the immediately preceding command from your command history, with “history-search-backward”, which selects the previous command in your history that begins with the characters you have already typed in.

The second command does the same, but with the down arrow key, in the opposite direction.

For example, given the following sequence of commands:

So, for example, if you have nothing typed at the prompt, then the up arrow and down arrow keys will work just as before, moving through your history one step up or down respectively.

However, if you type “fi” and then press the up arrow key, the previous command entered that begins with “fi” (e.g., “find …”) will be filled out at the prompt. Press the up arrow key again to select the command beginning with “fi” preceding that, and so on. Down arrow iterates downward over commands beginning with “fi”