PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
Binary packages are available for recent versions of MacOS and Linux. Installing from source is not too difficult.
Development packages for zlib and libbz2 are needed, as is a standard compiler environment. On Ubuntu, this can be installed via
sudo apt-get install build-essentials libtool automake zlib1g-dev libbz2-dev
On MacOS, the Apple Developer tools and Fink must be installed, then
sudo fink install bzip2-dev
After the support packages are installed, one should be able to do:
./autogen.sh && ./configure && make && sudo make install
If you receive an error that libpandaseq.so.0 is not found on Linux, try running:
Please consult the manual page by invoking
or visiting http://neufeldserver.uwaterloo.ca/~a...aseq_man1.html
The short version is
pandaseq -f forward.fastq -r reverse.fastq
PANDAseq may be used in other programs via a programmatic interface. Consult the header file pandaseq.h for more details. The C interface is pseudo-object oriented and documented in the header. The library provides pkg-config information, so compiling against it can be done using something like:
cc mycode.c `pkg-config --cflags --libs pandaseq-2`
or using, in configure.ac:
PKG_CHECK_MODULES(PANDASEQ, [ pandaseq-2 >= 2.2 ])
A Vala binding is also included. Documentation is available at http://neufeldserver.uwaterloo.ca/~a...pandaseq-vapi/
Other lanugage bindings are welcome.
Q: Can I insist that PANDAseq only assembler perfect sequences?
A: Yes, but you shouldn't want to do it. The whole point is to fix sequences which are probably good. There is no quality setting that will achieve this effect. You can use the plugin completely_miss_the_point, but this really does miss the point. Moreover, assuming that the sequencer is right in the overlap region and in the non-overlapping regions requires an unsound leap in statistics.
Q: Can PANDAseq use multiple core/threads?
A: Yes, but you shouldn't turn it on until you've checked you need it. In most cases, PANDAseq is IO-bound, not CPU-bound; therefore, adding more CPU capacity would have no effect. Try monitoring a running copy of PANDAseq with `top`; watch the CPU% for the PANDAseq process and the overall system CPU waiting time (`%wa` in the banner at the top). If waiting time is low and CPU% is very high, then multi-threading may increase speed. If the CPU waiting time is high, threading will simply not help.
Q: Can I use SAM/BAM files as input without converting them to FASTQ?
A: Yes. PANDAseq-sam <https://github.com/neufeld/pandaseq-sam> extends PANDAseq to do this. SAM/BAM files do not guarantee that sequences will be in the right order, so files may be slower and PANDAseq will use more memory.