Building a Simple Application Using 'libsequence'
libsequence library. This means that if you update to a newer version of libsequence, you will need to rebuild the program. I personally prefer a statically-linked standalone binary, as it greatly simplifies deployment and execution in a variety of contexts, from the standard to the more complex/quirky (e.g. heterogenous clusters with user directories mounted on non-standard paths), as well as through a variety of schedulers (e.g., SGE, Condor, PBS, Torque), without any need to mess about with environmental variables. As always, YMMV.
Kevin Thornton's "libsequence" C++ library provides a fairly rich set of tools to compute a number of population genetic summary statistics.
Below is a sample application demonstrating its use to calculate Fst statistics from a FASTA file. You will need to independently install libsequence, as well as the Boost C++ Libraries on which it depends, for this code to build and run.
To build it, you would:
$ g++ -I$BOOST_ROOT /usr/local/lib/libsequence.a -o fst fst.cpp
Here, "$BOOST_ROOT" is the path to the Boost C++ Libraries installation. If you installed "libsequence" using a path prefix other than "/usr/local", you would want to correct the build command so that it points to the correct file.
Of course, managing your build process this way is a major pain and prone to lots of tedium, hassle and problems, especially when deploying across multiple different systems. As such, I have also created a whole skeleton "autoconf" and "automake" framework to wrap up the business of building programs using "libsequence". This can be found in the tar'd and gzipped file attached to this page.
Another thing to note with the sample code below is that a good chunk of it is devoted to parsing and processing the command line invocations, and it does not do a very good job of that in terms of flexibility and robustness. I have written a much better command-line options parser, one that behaves a little bit like Python's "optparse" module, as part of the "Ginkgo" code base (source code available from here; the files containing the command-line parser system specifically can be found here and here).
#include <iostream> #include <vector> #include <string> #include <sstream> #include <stdexcept> #include <Sequence/FST.hpp> #include <Sequence/Fasta.hpp> #include <Sequence/Alignment.hpp> #include <Sequence/PolySites.hpp> #include <Sequence/SeqExceptions.hpp> /** * Converts from one simple streamable type to another. */ template <typename T, typename U> T to_scalar(U from) { std::ostringstream o; o << from; std::istringstream i(o.str()); T target; i >> target; if (i.fail() || !i.eof()) { throw std::runtime_error(o.str()); } return target; } /** * Returns <code>true</code> if the second string is equal to the first n * characters of the first string, where n = length of the second string. * * @param s1 first string * @param s2 second string * @return <code>true</code> if the first string begins with the * second */ bool startswith(const std::string& s1, const std::string& s2) { return s1.compare(0, s2.size(), s2) == 0; } /** * Parses command line invocation into a pair of vectors of strings, * representing vector of flag arguments (i.e., preceded by '-' or '--') and * positional arguments. */ std::pair<std::vector<std::string>, std::vector<std::string> > parse_cmd( int argc, char * argv[]) { std::vector<std::string> opts; std::vector<std::string> args; for (unsigned int i = 1; i < argc; ++i) { if (argv[i][0] == '-') { opts.push_back(argv[i]); } else { args.push_back(argv[i]); } } return std::make_pair(opts, args); } /** * Usage message. */ void show_usage(std::ostream& out=std::cerr) { out << "Usage: fst <FASTA-FILE> <POP-SIZE1> <POP-SIZE2> [POP-SIZE3 [POP-SIZE4 [...]]]\n"; } /** * Usage message. */ void show_help(std::ostream& out=std::cout) { show_usage(out); out << std::endl; out << "Options:" << std::endl; out << " --hsm1992 Use Hudson, Slatkin and Maddison (1992) formulation [default]" << std::endl; out << " --slatkin1993 Use Slatkin (1993) formulation" << std::endl; out << " --hbm1992 Use Hudson, Boos, and Kaplan (1992) formulation" << std::endl; out << " --all Use all of the above formulations (tab-delimited)" << std::endl; } /** * Main CLI and processing loop. */ int main(int argc, char * argv[]) { // Parsing and processing of invocation options and argument std::pair<std::vector<std::string>, std::vector<std::string> > cmd_opts = parse_cmd(argc, argv); std::vector<std::string> opts = cmd_opts.first; std::vector<std::string> args = cmd_opts.second; int method = 0; for (std::vector<std::string>::const_iterator oi = opts.begin(); oi != opts.end(); ++oi) { const std::string& opt = *oi; if (startswith(opt, "-h") || startswith(opt, "--he")) { show_help(std::cout); exit(0); } else if (startswith(opt, "--hsm1992")) { method = 0; } else if (startswith(opt, "--slatkin1993")) { method = 1; } else if (startswith(opt, "--hbm1992")) { method = 2; } else if (startswith(opt, "--all")) { method = -1; } else { std::cerr << "Unrecognized option '" << opt << "'. See '--help' for available options." << std::endl; exit(1); } } // Make sure mandatory arguments are specified if (args.size() < 3 ) { show_usage(std::cerr); exit(1); } // Input data filename std::string input_filepath = args[0]; // Parsing and storing of subpopulation sizes std::vector<unsigned int> pop_sizes; pop_sizes.reserve(args.size() - 1); for (unsigned long i = 1; i < args.size(); ++i) { pop_sizes.push_back(to_scalar<unsigned int>(args[i])); } // Using libsequence to read data and convert to table of polymorphisms std::vector<Sequence::Fasta> alignment; Sequence::Alignment::GetData(alignment, input_filepath.c_str()); Sequence::PolySites data(alignment); // Converting subpopulation sizes from vector to array unsigned int * config = &pop_sizes[0]; Sequence::FST fst(&data, pop_sizes.size(), config); // Actual calls to libsequence's calculators and output if (method < 0) { std::cout << fst.HSM() << "\t" << fst.Slatkin() << "\t" << fst.HBK() << std::endl; } else if (method == 0) { std::cout << fst.HSM() << std::endl; } else if (method == 1) { std::cout << fst.Slatkin() << std::endl; } else if (method == 2) { std::cout << fst.HBK() << std::endl; } }
| Attachment | Size |
|---|---|
| sample-libsequence-prog-build.tgz | 140.96 KB |
feed
Comments
0 comments postedPost new comment