Syrupy -- System Resource Usage Profiler

Introduction

Syrupy is a Python script that regularly takes snapshots of the memory and CPU load of one or more running processes, so as to dynamically build up a profile of their usage of system resources.

Syrupy works by one of two modes. In the first (default) mode, it monitors the resource usage of a process resulting from the execution of a user-specified command (which can be any arbitrarily-complex combination of programs and arguments that can be invoked from a shell terminal). In the second mode, Syrupy monitors the resource usage of external running processes that meet user-specified criteria: a PID (Process IDentifier) number or a command string matching a regular expression pattern.

In either case, the monitoring of the system resource usage is based on repeated calls to the system command "ps".

Prerequisites

Given that Syrupy is ultimately nothing more than a glorifed wrapper and parser of "ps" output, naturally the second-most important precondition for Syrupy to work on your system is to have the "ps" command available (the first-most important is to have Python available). This is almost certainly true all POSIX or mostly POSIX-compliant systems, including various flavors of UNIX, Linux's, Apple Mac OS X's etc.

Unless they happen to have a POSIX-compliant subsystem (such as Cygwin) installed, however, Microsoft Windows users are out of luck. Though that is hardly news.

Apart from requiring "ps" or something similar on your system (and, of course, Python), Syrupy has no other dependencies whatsoever: it is a single self-contained pure Python script and uses nothing but the standard Python libraries.

Downloading

The entire Syrupy package can be downloaded as a single archive from the SourceForge site:

https://sourceforge.net/project/platformdownload.php?group_id=242033

If you have Git installed on your system, you can clone your own copy of the repository by entering:

git clone http://jeetworks.org/files/repositories/syrupy.git

Installing

As mentioned above, Syrupy is a self-contained single-module (single-file) executable script. So you do not even really need to "install" it as such: you can simply unpack the archive and use the script from wherever you end up saving it. Or you can copy it on the system path to be invoked globally. In fact, all the installation procedure does is automate the latter for you, and this can be carried out by entering the unpacked archive directory and typing:

$ sudo python install

Usage

The discussion that follows highlights some of the main features of Syrupy, but full help and information on all options is readily available by entering:

$ syrupy.py --help

Basic Invocation (Default Mode)

In its default mode, Syrupy is invoked via the following syntax:

$ syrupy.py [SYRUPY OPTIONS] COMMAND [COMMAND OPTIONS/ARGS]

In this context, "COMMAND [COMMAND OPTION/ARGS]" can be anything that you would normally invoke from a terminal, such as a program, executable, or interpretable script, with or without options and/or arguments:

$ syrupy.py /usr/local/bin/program
$ syrupy.py program.sh -o1 --opt2=foo -opt3="foo foo" /path/to/foo foo
$ syrupy.py find . -name "*.csv"
$ syrupy.py python script.py -o1 -o2 foo "something pithy"

Basic Invocation (External Process Mode)

In this mode, Syrupy will monitor one or more processes that are already running on the system based in user-specified criteria. This mode is invoked by supplying values for one or more of the following options:

  • "-p" or "--poll-pid"
  • "-c" or "--poll-command"

The first option takes an integer value representing the PID (Process IDentifier) number of the process to be monitored. The second option takes a string value representing a regular expression or grep pattern matching the command string that invoked the process or processes to be monitored. For example:

$ syrupy.py -p 20912
$ syrupy.py --poll-pid=30011
$ syrupy.py -c '[P|p]ython'
$ syrupy.py --poll-command='.*java'

Note that you should protect the regular expression by quotes so as to avoid confusing the shell.

If both options are given, then only processes that meet both the PID condition and the command regular expression will be monitored (obviously, as the PID condition will match at most one process, then specifying both options is at best redundant, and at worst will get you nothing).

As long as processes are found that meet the specified conditions, then Syrupy will continue to run, monitoring the selected processes. When no processes are found meeting the conditions specified, then Syrupy will terminate. Note that if Syrupy starts up and does not find any processes matching the specified criteria, it will exit immediately. Thus it is important that the processes to be monitored (or, to be more precise, matching the specified criteria) be already running before invoking Syrupy. Each Syrupy instance automatically excludes itself when searching for processes matching a particular ID or command pattern: you cannot use an instance of Syrupy to monitor itself! However, you can use an instance of Syrupy to monitor another instance of the same program.

Basic Output

Given commands like the above, Syrupy will (if all is well) respond with writing something like the following to the standard output:

  PID DATE        TIME     ELAPSED  CPU   MEM    RSS  VSIZE
14634 2008-10-10  20:45:25   00:00  0.0   0.0   2996   6680
14634 2008-10-10  20:45:26   00:01  105   0.2   7804  12592
14634 2008-10-10  20:45:27   00:02  103   0.2   8996  13776
14634 2008-10-10  20:45:28   00:03  103   0.2  10468  15348
14634 2008-10-10  20:45:29   00:04  103   0.3  11412  16396
14634 2008-10-10  20:45:30   00:05  104   0.3  12492  17444
...
etc.

Each row represents an instantaneous snapshot taken at regular intervals of the CPU and memory usage of the process or processes being monitored by Syrupy. In the case of the default mode, this is the process resulting from the user-specified COMMAND invoked by Syrupy, while in the second mode, this is any number of external processes that match the specified criteria.

Thus, over time Syrupy builds up a system resource usage profile of a particular program or programs (hence the name: SYRUPY = SYstem Resource Usage Profile ...um, Yeah).

The meaning of the various fields are given by entering the following:

$ syrupy.py --explain

Which will tell you that:

PID       Process IDentifier -- a number used by the operating system
          kernel to uniquely identify a running program or process.
DATE      The calender date, given as YEAR-MONTH-DAY, that the process
          was polled.
TIME      The actual time, given as HOUR:MINUTE:SECOND
          that the process was polled.
ELAPSED   The total time that the process had been running up to the
          time it was polled.
CPU       The CPU utilization of the process: CPU time used divided by
          the time the process has been running
          (cputime/realtime ratio), expressed as a
          percentage.
MEM       The memory utilization of the process: ratio of the
          process's resident set size to the physical memory
          on the machine, expressed as a percentage.
RSS       Resident Set Size -- the non-swapped physical memory (RAM)
          that a process is occupying (in kiloBytes). The
          rest of the process memory usage is in swap. If
          the computer has not used swap, this number will
          be equal to VSIZE.
VSIZE     Virtual memory Size -- the total amount of memory the
          process is currently using (in kiloBytes). This
          includes the amount in RAM (the resident set size)
          as well as the amount in swap.

If you specify the "show-command" flag, then a final column will appear that presents the entire command string corresponding to the particular process.

Syrupy will continue taking and logging snapshots of the resource usage of the process or processes that it is monitoring until they terminate.

If you specify the "--m2" flag (the "write miscellaneous information to secondary [error] stream" flag), and Syrupy is running in its default mode (i.e., a COMMAND was invoked and tracked) after termination of the process Syrupy will usually produce a final report like:

---
 Command: sumtrees.py ansonia_combo.aligned.fasta.trees
Began at: 2008-10-10 20:45:25.453861.
Ended at: 2008-10-10 21:33:52.629728.
Run time: 0 hour(s), 48 minute(s), 27.175867 second(s).
---

Specifying Options to Syrupy: Position Counts!

Various options to Syrupy control, customize or change its default behavior. It is important to note that all options for Syrupy must be specified before the COMMAND and its options/arguments. Any and all arguments and options following the COMMAND will be passed directly to COMMAND and ignored by Syrupy.

That is:

$ syrupy.py --syrupy-opt1 --syrupy-opt2 /usr/local/bin/program

is correct, while:

$ syrupy.py --syrupy-opt1 /usr/local/bin/program --syrupy-opt2

is wrong. In the second case, "--syrupy-opt2" will be passed to "program", which will result in unintended and probably undesirable behavior.

Controlling the Polling Regime

Since the polling regime is pretty simple, there is only one option to control: the polling interval. By default this is one second, but it can be set to anything you want using the "-i" or "--polling-interval" option:

$ syrupy.py -i 0.001 /bin/program
$ syrupy.py --polling-interval=0.001 /bin/program
$ syrupy.py -i 60 /bin/program
$ syrupy.py --polling-interval=60 /bin/program
etc.

Units are always in seconds, and thus the first two examples will sample the resource usage of "/bin/program" every 100th of a second, while the second two examples will sample the resource usage of "/bin/program" every minute.

Controlling and Redirecting Output

By default, Syrupy will redirect both the output and and error streams of COMMAND to the system null device (typically, "/dev/null"), while writing its own results to the standard output stream (with miscellaneous information to the standard error stream). This is simply the way I tend to want it to work when I am using it: I am usually running a program under it to assess the resource usage of the program, rather than being interested in the output of the program per se. Of course, the standard error of the program or command may actually be useful to see, especially if the program is not bug-free. Also, sometimes the COMMAND may actually be a chained pipeline of scripts or programs, where the output of one is fed as the input of the other. In cases like these, it might be useful to actually have the output stream of COMMAND go to the standard output, and/or the error stream of COMMAND go to the standard error. This can be achieved by the following options:

$ syrupy.py --stdout=^1 --stderr=^2 /bin/program

"^1" and "^2" are special symbols that are interpreted by Syrupy to mean the standard output and standard error respectively.

If you do send the output stream of COMMAND to the standard output, you will probably find that this channel gets cluttered very quickly, as that is where, by default Syrupy writes its output. So you probably want to instruct Syrupy to write its own output elsewhere, using the "-o", "--output" or "-1" option (all these are synonyms for the channel which Syrupy will write its standard output):

$ syrupy.py --output="program.run" --stdout=^1 /bin/program
$ syrupy.py -o="program.run" --stdout=^1 /bin/program
$ syrupy.py -1="program.run" --stdout=^1 /bin/program

Similarly, you can redirect the standard error stream of Syrupy using:

$ syrupy.py -2="syrupy.log" --stderr-^2 /bin/program

Of course, you can request Syrupy to redirect its streams to files without redirecting the streams of COMMAND anywhere in particular as well:

$ syrupy.py --output="program.run" -2="syrupy.log" /bin/program
$ syrupy.py -o="program.run" -2="syrupy.log" /bin/program
$ syrupy.py -1="program.run" -2="syrupy.log" /bin/program

You may also want to save the output and error stream of COMMAND, but not actually want to see them on the standard output. Then, instead of using the special symbols "^1" or "^2", you would simply supply proper file paths:

$ syrupy.py --stdout=cmd.out --stderr=cmd.err /bin/program

Another scenario is if you want to save the primary output of Syrupy to a file, but also have it displayed to the terminal as well. Using the "--o2" flag instructs Syrupy to write its primary output not only to the standard output stream (or file specified by the "-o", "--output" or "-1" options), but to the standard error (or file specified by the "-2" option):

$ syrupy.py --o2 /bin/program

Finally, as a matter of convenience, you can use the "--debug-command" flag to have the error of COMMAND sent to the standard error:

$ syrupy.py --debug-command /bin/program

This is exactly the same as:

$ syrupy.py --stderr=^2 /bin/program

To summarize: the "--stdout" and "--stderr" options set the destinations for the standard output and standard error streams of COMMAND (and by default are set to "/dev/null"), while the "-1" and "-2" options set the destinations for the standard output and standard error streams of Syrupy (and by default are set to the the shell standard out and standard error).

Of course, when COMMAND is not specified or is ignored, as in the external process monitoring mode, then the "--stdout" and "--stderr" options are ignored.

Formatting Output

Syrupy's default output makes for easy visual inspection on a terminal or in a text editor. However, you might want to bring the results into a program like R for analysis. Some of these analysis programs are very picky about how fields are separated, requiring specific characters or strings to delimit columns. You can use the "--separator" flag to specify some other string or character to separate the fields, such as tabs or commas. Furthermore, by default Syrupy pads out each column with extra spaces so that they are all the same width, thus getting them to line up on the screen or when viewed in a (monospace-font rendering) text-editor. These extra spaces may confuse some other programs, and, if so, you can turn off the flushing or alignment of fields using the "--no-align" flag. Thus, for example, to produce plain-vanilla/no-frills comma-separated value (CSV) output you would enter:

$ syrupy --separator=, --no-align /bin/program

which would result in something like:

DATE,TIME,ELAPSED,CPU,MEM,RSS,VSIZE
2008-10-11,00:39:04,00:00,0.0,0.1,1688,601580
2008-10-11,00:39:05,00:01,98.1,0.2,7544,82752
2008-10-11,00:39:06,00:02,98.1,0.3,9872,85056
2008-10-11,00:39:07,00:03,100.0,0.4,12324,87392
2008-10-11,00:39:08,00:04,100.0,0.4,13472,87904
2008-10-11,00:39:09,00:05,98.4,0.5,15480,89952
2008-10-11,00:39:10,00:06,99.0,0.6,17612,92176
2008-10-11,00:39:11,00:07,97.5,0.6,20192,94560
2008-10-11,00:39:12,00:08,99.7,0.6,19632,94048
2008-10-11,00:39:13,00:09,99.4,0.6,19788,94088

You can also suppress the first row, i.e. the column headers, using the "--no-headers" option.

Bugs, Suggestions, Comments, etc.

If you have questions, bug reports, criticisms, suggestion, comments or any other message to send me, you can contact me here.