Phylogeny Programs (continued)

PHYLIP version 3.6 is my own package. It is available free, from its Web site, in C source code, or as executables for Windows, Mac OS X, and Mac OS 8 or 9. The C source code can easily be compiled on Unix or Linux systems. It includes programs to carry out parsimony, distance matrix methods, maximum likelihood, and other methods on a variety of types of data, including DNA and RNA sequences, protein sequences, restriction sites, 0/1 discrete characters data, gene frequencies, continuous characters and distance matrices. It may be the most widely-distributed phylogeny package, with about 29,000 registered users, some of them satisfied. It is third after PAUP* and MrBayes in the competition to be the program responsible for the most published trees. It has been distributed since October, 1980 and has celebrated its 30th anniversary, as the oldest distributed phylogeny package. PHYLIP is distributed at the PHYLIP web site at http://evolution.gs.washington.edu/phylip.html. A number of sites offer web-servers that will perform data analyses using PHYLIP.

[PAUP* icon here] David Swofford of the School of Computational Science and Information Technology, Florida State University, Tallahassee, Florida has written PAUP* (which originally meant Phylogenetic Analysis Using Parsimony). PAUP* version 4.0beta10 has been released as a provisional version by Sinauer Associates, of Sunderland, Massachusetts. It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions. PAUP* has many options and close compatibility with MacClade. It includes parsimony, distance matrix, invariants, and maximum likelihood methods and many indices and statistical tests. It is described in a web page at http://paup.csit.fsu.edu/, which also contains links to its web pages at Sinauer Associates. It is available for the following types of systems:

For PowerMac and 68k Macintosh Mac OS 9 in a version with full mouse-windows user interface, which can also be run under the Classic environment on Mac OS X,
For PowerPC Mac OS X systems or Intel Mac OS X systems when running under emulation) in a version with a command-line interface,
For Windows in a version with a character-based command-line interface (which appears in a Windows window),
For DOS or a Windows DOS box in a version which has command-line interface, and
In a Unix/Linux version, with command-line interface, for Alpha Compaq/Digital Unix, Alpha Linux, PowerPC Linux, Intel-compatible Linux, Sun SPARC/UltraSPARC Solaris, and Alpha VMS.

The price is $100 US for the Macintosh and PowerMac executable versions, $85 for the Windows executable version, and $150 for the Unix source code version, plus $20 for shipment. The Beta version comes with a Command Reference Document. Their ISBN numbers are 0-87893-806-0, -807-9, and -804-4. Contact and ordering information will be found at the Sinauer Associates web site. The international distributor for many countries is Palgrave Macmillan, Brunel Road, Houndsmills, Basingstoke, Hampshire RG21 6XS, U.K. Tel: +44-1256-329242 Fax: +44-1256-330688. Their e-mail address is lecturerservices (at) palgrave.com. For New Zealand, Korea, Japan, Brazil and Australia see the addresses at this web page.

Derek Sikes of the University of Alaska Mueum, Fairbanks, Alaska (ffdss (at) uaf.edu) and Paul Lewis of the Department of Ecology and Evolutionary Biology of the University of Connecticut have produced PAUPRat, a program that generates a text file which can be used as commands by PAUP* to have it carry implement Kevin Nixon's highly effective tree search method, the Parsimony Ratchet. The input files for PAUPRat can also be modified to implement Rutger Vos's comparable Likelihood Ratchet. It is available as Mac OS, Mac OS X and Linux executables, as a DOS executable that can be run under Windows, and in source code, from its web site at http://users.iab.uaf.edu/~derek_sikes/software2.htm.

[MacClade icon here] MacClade is a pioneering program for interactive analysis of evolution of a variety of character types, including discrete characters and molecular sequences. It works on Macintoshes with Mac OS X, up to and including now Leopard, Mac OS X version 10.6 (and also on Mac OS). MacClade enables you to use the mouse-window interface to specify and rearrange phylogenies by hand, and watch the number of character steps and the distribution of states of a given character on the tree change as you do so. It has many other features beyond this, including ability to edit data, print out phylogenies, and even simulate the evolution of data on a tree. MacClade was written by Wayne Maddison (now of the Department of Zoology, University of British Columbia) and David Maddison of the Department of Entomology, University of Arizona. Until 2011 it was distributed commercially by Sinauer Associates of Sunderland, Massachusetts, USA. As MacClade will not function with the forthcoming Mac OS X 10.7 (Lion), the Maddisons have made it available as a free download. It is available at the MacClade web site starting with version 4.08a. It includes a manual. An much earlier and less capable Version, 2.1 (which for example cannot read nucleic acid sequences and has many fewer features for discrete characters) is also available as a Mac OS 9 executable from the EMBL and Indiana molecular biology software servers at (respectively) iubio.bio.indiana.edu, and ftp.ebi.ac.uk, in directories molbio/mac and pub/software/mac, respectively, as a BinHexed and squeezed archive, (respectively macclade-old.hqx and macclade21.hqx. A demo version of MacClade 3 that will not save or print files is also available there.

[Hennig86 icon here] J. S. Farris has produced Hennig86, a fast parsimony program including branch-and-bound search for most parsimonious trees and interactive tree rearrangement. Although complete benchmarks have not been published it is said to be faster than Swofford's PAUP*; both are a great many times faster than the parsimony programs in PHYLIP. The program is distributed in executable object code only and costs $50, plus $5 mailing costs ($10 outside of of the U.S.). The user's name should be stated, as copies are personalized as a copy- protection measure. It is distributed by Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (akluge (at) umich.edu) and by Diana Lipscomb at George Washington University (biodl (at) gwuvm.gwu.edu). It runs on PC-compatible microcomputers with at least 512K of RAM and needs no math coprocessor or graphics monitor. It can handle up to 180 taxa and 999 characters. It was described in the paper: Farris, J.S. 1989, Hennig86: a PC-DOS program for phylogenetic analysis. Cladistics 5: 163.

[Random Cladistics icon here] Mark Siddall, Assistant Curator of Annelida at the American Museum of Natural History, New York (siddall (at) amnh.org) has released Random Cladistics, version 4.0.3, a set of programs that can carry out bootstrapping, jackknifing, a variety of kinds of permutation tests, and search for "islands" of trees, using Hennig86 or NONA to analyze the data. It can also mark ranges of sites for inclusion or exclusion, compare trees from the analyses, compute an index of incongruence between data sets, and do many other operations. To use it you must have a copy of Hennig86 (for whose distribution see above). Random Cladistics will carry out the appropriate transformations of your data and will call Hennig86 and have it analyze them, and then it will summarize the results. Random Cladistics is described by its author as no longer being supported software -- he says that "Winclada is far superior and provide's a nice interface." Random Cladistics and associated programs are still distributed by their author from its web site at http://research.amnh.org/~siddall/rc.html as MSDOS executables.

[AutoDecay icon here] Torsten Eriksson of the Bergius Botanical Garden, Stockholm, Sweden (torsten (at) bergianska.se) has written a program, AutoDecay which generates Decay Indices from an existing PAUP* 4.0 treefile. It is intended to simplify the the task of creating reverse constraint trees in PAUP* 4.0 and subsequent generation of Bremer support values. (Bremer, K. 1994. Cladistics 10: 295-304). AutoDecay version 5.06 is written in the scripting language Perl, and runs on most systems that have Perl installed. Autodecay can be obtained from Eriksson's software web page from http://www.bergianska.se/index_forskning_soft.html.

[DNA Stacks icon here] Doug Eernisse of the California State University, Fullerton (DEernisse (at) fullerton.edu) has constructed DNA Stacks version 1.3.5, a Macintosh HyperCard stack that can carry out a variety of analyses on DNA sequences. It does not do phylogenies itself. It has an alignment editor, and can carry out various kinds of translation, and codon bias analysis. It can write out data sets in PAUP*, Hennig86, and PHYLIP formats. It is included here because in its "Support Index Blocks..." menu item it is able to prepare jobs for PAUP* to enable Decay Index (Support Index) analysis. It is available by World Wide Web from http://biology.fullerton.edu/deernisse/dnastacks.html.

[TreeRot icon] Michael Sorenson of the Department of Biology, Boston University (msoren (at) bu.edu) has released TreeRot, version 3, a program that helps make Bremer Support Indices ("decay indices") for parsimony analyses. It generates a PAUP* command file with a constraint statement for each node in a given shortest or strict consensus tree and with commands to search for trees inconsistent with each of these constraint statements in turn. For nodes with decay indices of more than a few steps, the constraint statement approach is much more effective than simply finding all trees 1, 2, 3, 4, etc. steps longer than the shortest tree and then examining their strict consensus for which nodes are lost. This version also supports the determination of partitioned Bremer support indices introduced in the paper: Baker, R.H., and R. DeSalle. 1997. Multiple sources of character information and the phylogeny of Hawaiian Drosophilids. Systematic Biology 46: 654-673, and it will also parse the PAUP* log file, automatically calculating the decay index for each node. It is written in the Perl scripting language, and a Mac OS Macintosh executable is also available. Both are distributed at its web site at http://people.bu.edu/msoren/TreeRot.html.

J. S. Farris has written RA (Rapid nucleotide Analysis). It features rapid bootstrapping. It is available from Arnold Kluge, Amphibians and Reptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109-1079, U.S.A. (akluge (at) umich.edu) and Diana Lipscomb at George Washington University (BIODL (at) gwuvm.gwu.edu) who may be contacted for details. The cost is said to be about $30 US.

Kevin Nixon of the L. H. Bailey Hortorium at Cornell University in Ithaca, New York (kcn2 (at) cornell.edu) has written WINCLADA version 0.9.99m24, an interactive program that can read and edit trees and data files, display character state changes inferred by parsimony on diagrams of the trees, and launch runs of the programs NONA, PIWE, and Hennig86. WINCLADA is available as a Windows95/98/NT executable from its web site at http://www.cladistics.com/about_winc.htm. It is available on a shareware basis: the user who downloads it must pay $50 to Kevin Nixon at Winclada/Kevin C. Nixon, 2210 Ellis Hollow Road, Ithaca, New York 14850. There is also a $200-per-class fee for its use in courses. WINCLADA supersedes and combines features of Nixon's earlier programs ClaDOS and DADA, which are no longer distributed.

Pablo Goloboff, of INSUE - Fundación e Instituto Miguel Lillo 205, 4000 S. M. de Tucumán, Argentina (instlillo (at) infovia.com.ar with Subject line "para Pablo Goloboff") has written NONA (Noname), version 2.0, PiWe (Parsimony with Implied WEights), and SPA to carry out parsimony including weighted parsimony analyses. NONA searches for most parsimonious trees according to character weights defined by the user a priori. Pee-Wee calculates weights of the characters by a method introduced by Goloboff, a noniterative version of J. S. Farris's "successive weighting". It was described in Goloboff's paper in Cladistics 9: 83-91, 1993. SPA is a generalized parsimony program that allows differential weighting of changes between different states. NONA is said to be faster than other parsimony programs. A Windows version of NONA which includes Piwe and SPA is available as freeware from its web page at http://www.cladistics.com/aboutNona.htm.

Pablo Goloboff, of INSUE - Fundación e Instituto Miguel Lillo 205, 4000 S. M. de Tucumán, Argentina, (pablogolo (at) csnat.unt.edu.ar) together with J. S. Farris of the, Laboratory of Molecular Systematics of the Naturhistoriska Riksmuseet, Stockholm, Sweden and Kevin Nixon of the L. H. Bailey Hortorium, Cornell University, Ithaca, New York, have produced TNT (Tree analysis using New Technology), version of August 2008. This is a parsimony program intended for use on very large data sets. It makes use of the methods for speeding up parsimony searches introduced by Goloboff in the paper: Goloboff, P.A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15: 415-428, and the highly effective "parsimony ratchet" search strategy introduced by Nixon in the paper: Nixon, K.C. 1999. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15: 407-414. It can handle characters with discrete states as well as continuous characters. The program is distributed as Windows, Linux, and both PowerMac and Intel Mac OS X executables. The program and some support files including documentation is available from its web page at http://www.zmuc.dk/public/phylogeny/TNT It is free, provided you agree to a license with some reasonable limitations.

Frédéric Calendini and Jean-Francois Martin of the Departement Protection des plantes et environnment of the Ecole Nationale Supérieur, Montpellier, France (martinjf (at) ensam.inra.fr) have produced PaupUp version 1.0.3.1, a graphical frontend for Paup* DOS software. The PauUp program provides a user-friendly interface to the phylogenetic program PAUP* on the Windows operating systems. The DOS version of PAUP* is entirely command-line driven and does not provide any graphical interface. PaupUp partly resolves this issue, providing around 80% of the available commands (the most commonly used in our opinion) in a graphical environment comparable to the MAC OS version while the last 20% commands are still available through direct command-line input in a single integrated design. The programs TreeView and Modeltest can be called from PaupUp. PaupUp is not compatible with the Windows version of PAUP* but is compatible with the DOS version that is distributed with that Windows version. It is available as a Windows executable. It also requires the Microsoft .NET executable framework to be installed. PaupUp can be downloaded from its web site at http://www.agro-montpellier.fr/sppe/Recherche/JFM/PaupUp/

Kai Müller of the Nees-Institut für Biodiversit&aauml;t der Pflanzen of the University of Bonn, Germany (kaimueller (at) uni-bonn.de) has written PRAP (Parsimony Ratchet Analyses using PAUP* and likelihood) version 2.0, a Java program to drive PAUP* in computing Bremer support of groups, and in doing ratchet searches for parsimony or likelihood trees. It allows the user to make PAUP* carry out searches using the "parsimony ratchet" strategy of Kevin Nixon. In version 2.0 this can be done using either the parsimony criterion or the likelihood criterion (in spite of the name of the search method). It can also do variations on the parsimony ratchet including multiple random addition sequences. It is described in the paper: Müller, K. F. 2004. PRAP - computation of Bremer support for large data sets. Molecular Phylogenetics and Evolution 31: 780-782, and the search strategies it implements are described in the paper: Müller, K. 2005. The efficiency of different search strategies in estimating parsimony jackknife, bootstrap, and Bremer support. BMC Evolutionary Biology 5: 58. It is available as Java executables, as downloads for Windows, Mac OS X, and for Unix. It can be downloaded from its web site at http://systevol.nees.uni-bonn.de/software. The earlier versions 1.0 and 0.99 are also available there.

MEGA (Molecular Evolutionary Genetic Analysis) is produced by Sudhir Kumar of the Center for Evolutionary Functional Genomics of the The Biodesign Institute at Arizona State University, Tempe, Arizona (s.kumar (at) asu.edu) together with Joel Dudley of the Stanford Center for Biomedical Informatics Research at Stanford University, Koichiro Tamura of Tokyo Metropolitan University and Masatoshi Nei, of Pennsylvania State University. It carries out parsimony, distance matrix and likelihood methods for molecular data (nucleic acid sequences and protein sequences). It can do boostrapping, consensus trees, and a variety of distance measures, with Neighbor-Joining, Minimum Evolution, UPGMA, and parsimony tree methods, as a well as a large variety of data editing tasks, sequence alignment using an implementation of ClustalW, tests of the molecular clock, and single-branch tests of significance of groups. MEGA4 is the current version. MEGA4 is described in the papers:

Kumar, S., J. Dudley, M. Nei and K. Tamura K. 2008. MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics 9: 299-306.
K. Tamura, J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599.

It is available for free at its web site at http://www.megasoftware.net. as Windows executables, with a downloadable manual. Manual web pages are also accessible there. It can be run under Mac OS X and under Linux using Windows emulators, if you have those. In addition, MEGA 4.1 is available as a downloadable beta release. An earlier version, MEGA 1.02, is also available there as a DOS executable. It is downloadable at the MEGA site and that version's manual is also available on line at http://evolgen.biol.metro-u.ac.jp/MEGA/manual/default.html.

[DAMBE icon] Xuhua Xia of the Department of Biology and the Center for Advanced Research in Environmental Genomics (CAREG) of the University of Ottawa, Ontario, Canada (xxia (at) uottawa.ca) has released DAMBE (Data Analysis in Molecular Biology and Evolution), version 5.0.25, a general-purpose package for DNA and protein sequence phylogenies, and also gene frequencies. It can read and convert a number of file formats, and has many features for descriptive statistics. It can compute a number of commonly-used distance matrix measures and infer phylogenies by parsimony, distance, or likelihood methods, including bootstrapping (by sites or by codons) and jackknifing. There are a number of kinds of statistical tests of trees available, and many other features. It can also display phylogenies. DAMBE includes a copy of ClustalW; there is also code from PHYLIP. An interesting feature is a simple web browser that allows sequences to be fetched over the web while running DAMBE. DAMBE is described in two publications, a paper and a book:

Xia, X., and Z. Xie. 2001. DAMBE: Data analysis in molecular biology and evolution. Journal of Heredity 92: 371-373, and a book:
Xia, X. 2000. Data Analysis in Molecular Biology and Evolution. Kluwer Academic Publishers, Boston.

DAMBE consists of Windows executables. It is available for free from its web site at http://dambe.bio.uottawa.ca/dambe.asp.

[PAL icon] Matthew Goode, Alexei Drummond, Ed Buckler, and Korbinian Strimmer, together with seven other contributors, have released PAL (Phylogenetic Analysis Library) version 1.5, a free collection of Java classes for use in molecular phylogenetics. The addresses of the four principal contributers are respectively:

Matthew Goode (m.goode (at) auckland.ac.nz), Bioinformatics Institute, School of Biological Sciences, University of Auckland, New Zealand.
Alexei Drummond (alexei (at) cs.auckland.ac.nz, Department of Computer Science, University of Auckland, New Zealand
Ed Buckler (esb33 (at) cornell.edu), Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York.
Korbinian Strimmer (strimmer (at) uni-leipzig.de, Institute for Medical Informatics, Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany.

PAL is intended to facilitate the rapid construction of both general applications as well as special-purpose tools for phylogenetic analysis. It focuses on probabilistic data modelling and provides, e.g., routines for

maximum likelihood, neighbor-joining and least squares analysis
probability models for nucleotide/amino acid substitution, including constraints for a molecular clock
bootstrapping, and the Kishino-Hasegawa-Templeton and Shimodaira-Hasegawa tests
simulation of trees and data sets, including coalescent trees with growing populations and serial samples
reading and write trees and alignments
adjusting for rate variation among sites
obtaining splits from trees and calculating a distance between trees

among many other functions. It currently consists of over 200 components in 16 packages. PAL is described in a paper:

Drummond, A., and K. Strimmer. 2001. PAL: An object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics 17: 662-663.

It is available at its web site at http://www.cebl.auckland.ac.nz/pal-project/. Two user interfaces are available which contain application programs written using PAL. They have separate entries in these pages:

Vanilla (by Strimmer): A simple text front end
Pebble (vCEBL) (by Drummond): A GUI interface to PAL plus a functional command language.

PAL can be run on any machine that has Java, and can also be compiled into native code by the Gnu Compiler for Java (gcj).

Korbinian Strimmer, of the Institute for Medical Informatics, Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany (strimmer (at) uni-leipzig.de), has written Vanilla, version 1.2, a character-based interface to the PAL Java classes, which includes a number of programs carrying out different kinds of phylogenetic analysis, including:

MLDIST which computes maximum likelihood distances between DNA sequences, protein sequences, and two-state data, with correction for unequal rates at different sites. It has many different substitution models available. It also computes observed distances and can obtain approximate estimates of unknown model parameters such as the Ts/Tv ratio.
MLTREE which computes the likelihood of a given tree under the same models as MLDIST, allowing branch lengths to be provided or to be estimated by the program, with the possibility of constraining them to be clocklike. If two or more tree are provided it can also compare them using the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, and expected Akaike weights.
EVOLVE simulates data along a tree using the above models.
DISTTREE computes least squares branch lengths from distance matrices on a given tree, and can also construct Neighbor-Joining and UPGMA trees.
REWRITE converts data sets between different formats. nucleotides and amino acid data, to estimate of maximum-likelihood branch lengths on trees (incl. clock trees and dated tips), for statistical (e.g., Shimodaira-Hasegawa) and topological (Robinson-Foulds) comparison of trees, to infer demographic parameters from trees (based on the coalescent), and also utility programs to reformat and modify alignments.

There are also 6 other programs with a command-line interface which can estimate demographic parameters from coalescent trees, compute distance matrices from trees, reroot trees, and carry out some manipulations of data sets. Vanilla has a menu-based interface. It is written in Java, and is available from its web site at http://strimmerlab.org/software/vanilla/index.html It can run on Java systems on many machines. Strimmer notes that Vanilla does not provide all the functionality in PAL, and is perhaps most useful as a source of examples on how to use PAL.

Wayne Maddison of the Departments of Zoology and Botany, University of British Columbia, Vancouver, Canada, and David Maddison of the Department of Entomology, University of Arizona, Tucson, together with Peter Midford, Danny Mandel, and Jeff Oliver have released Mesquite, version 2.5. The project email address is info (at) mesquiteproject.org. Mesquite is a large and varied set of modules in Java to carry out a wide variety of analyses in comparative biology. It is also intended as a framework for other developers to use to add additional functons. Some of the over 500 functions available in the project currently are:

Reconstruction of ancestral states by parsimony or likelihood and display of the reconstructed states
Tests of process of character evolution, including comparative methods.
Simulation of character evolution (for categorical, DNA, or continuous characters)
Simulation or testing of tree shapes including the effect of a character on the shape of a tree
Inferences of the fit of gene trees to species trees
Parametric bootstrapping (with integration with programs such as PAUP* and NONA)
Morphometrics (PCA, CVA, geometric morphometrics)
Coalescence (simulations, other calculations)
Tree comparisons and simulations (tree similarity, Markov speciation models)
Search among trees using different tree rearrangement methods as well as exhaustive enumeration
Cluster analysis including single linkage and UPGMA methods
Trees can be displayed and manipulated

Other Java modules that use Mesquite include Tree Set Viz and a Java version of PDAP. Some mesquite modules make use of PAL.

Mesquite is available in Java source code and Java executables from its web page at http://mesquiteproject.org. It can run on Mac OS X, Windows, and Linux/Unix systems using recent versions of Java.

Julien Y. Dutheil, Bastien Boussau, and co-workers of the Institut des Sciences de l'Evolution de Montpellier (ISE-M) of the Université Montpellier 2, France (julien.dutheil (at) univ-montp2.fr) have released Bio++ version 1.8, a set of C++ libraries and programs dedicated to sequence analysis, phylogenetics, molecular evolution and population genetics. The Bio++ project is a collaborative effort to provide reusable implementations of standard phylogenetics and population genetics methods published in the literature, in order to analyze and manipulate sequence data, and with the goal to facilitate the development of new methods. Bio++ is fully object-oriented and documented. Two discussion forums are also available. A non-exhaustive list of available methods includes:

sequence and tree manipulations
a large set of substitution models (nucleotides, protein, codons)
distance estimation and tree reconstruction (by Neighbor Joining, BIONJ and UPGMA)
maximum likelihood methods
nucleotide diversity estimators
tools for drawing phylogenies

Two recent additions also allow you to query sequences from databases and to build GUIs using the Qt libraries. A set of example programs (The Bio++ Program Suite) is also available with examples and a manual. Bio++ contains one of the largest set of models for phylogenetics, including non-homogeneous models. It also features a very general way to set up your own non-homogeneous model and fit it, for instance assuming a different equilibrium GC content for distinct clades in the phylogeny. Bio++ is distributed as source code on a CVS/SVN server, and stable snapshots are made every six months. In addition to the source code, these stable releases can also be installed as pre-compiled packages for various linux distributions. It is described in the papers:

Dutheil, J., S. Gaillard, E. Bazin, S. Glémin, V. Ranwez, N. Galtier, and K. Belkhir. 2006. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics 4 (7):188
Dutheil, J., B. Boussau. 2008. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. >BMC Evolutionary Biology. 22 (8): 255.

It is available as C++ source code, Windows executables, Linux executables, Powermac Mac OS X executables and Intel Mac OS X executables, and packaged as .deb (Debian, Ubuntu, etc), .rpm (Fedora, Mandriva, etc) packages and a Gentoo overlay. It can be downloaded from its web site at http://kimura.univ-montp2.fr/BioPP/

[ETE icon] Jaime Huerta-Cepas, Joaquin Dopazo and Toni Gabaldón of the Comparative Genomics group at the Centre for Genomic Regulation (CRG), Barcelona, Spain (jhuerta (at) crg.es) has released ETE (a python Environment for Tree Exploration), version 2.0. ETE is a Python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. It provides a broad range of tree handling options, specific methods to work on phylogenetics and clustering analyses, bindings to the phylogenomic databases such as phylomeDB, advanced node annotation, interactive visualization, and a customizable tree drawing engine to create PDF tree images. It also implements methods for orthology and paralogy prediction and topological dating. It is described in the paper: Huerta-Cepas, J., J. Dopazo and T. Gabaldón. 2010. ETE: a python Environment for Tree Exploration. BMC Bioinformatics 11: 24. It is available as C source code, Windows executables Mac OS X universal executables, and a Python module. It can be downloaded from its web site at http://ete.cgenomics.org

Rutger Vos of the School of Biological Sciences of the University of Reading, United Kingdom (rutgeraldo (at) gmail.com) has released Bio::Phylo (Phyloinformatic analysis using perl), version 0.35, a phylogeny package with tree simulation, topology, visualization, data conversion functionality. It has modules for simulating tree shapes under various models, compute various tree topology indices, manage and convert data in various formats and visualize tree shapes. It is described in the paper: Vos, R. A., J. Caravas, K. Hartmann, M. A. Jensen and C. Miller. 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12: 63. http://dx.doi.org/10.1186/1471-2105-12-63. It is available as Perl script. It can be downloaded from its web site at http://search.cpan.org/dist/Bio-Phylo/

Gavin Huttley, Rob Knight, PyCogent Development Team of the John Curtin School of Medical Research of the Australian National University, Canberra, Australia (gavin.huttley (at) anu.edu.au) has released PyCogent (COmparative GENomics Toolkit, written in Python)), version 1.4.1. PyCogent is a software library for genomic biology. It is an integrated framework for controlling third-party applications; devising workflows; querying databases; conducting novel probabilistic analyses of biological sequence evolution; and generating publication quality graphics. It is intended that it be able to carry out a variety of phylogeny methods itself, but for now these have not been implemented. It can, however, be used to submit runs of some existing programs to infer phylogenies, including RAxML, FASTML, and Muscle. It is described in the paper: Knight, R., P. Maxwell, A. Birmingham, J. Carnes, J. G. Caporaso, B. C. Easton et al. 2007. Pycogent: A toolkit for making sense from sequence. Genome Biology 8(8): R171. It is available as C source code, Python script, Linux executables, Intel Mac OS X executables and Mac OS X universal executables. It can be downloaded from its web site at http://pycogent.sourceforge.net/

[DendroPy icon] Jeet Sukumaran and Mark Holder of the Department of Ecology and Evolutionary Biology of the University of Kansas, Lawrence, Kansas (jeet (at) ku.edu) have produced DendroPy version 3.6.1, phylogenetic computing library. DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats, such as NEXUS, Newick, NeXML, Phylip, FASTA, etc. Application scripts for performing some useful phylogenetic operations, such as data conversion and tree posterior distribution summarization, are also distributed and installed as part of the libary. DendroPy can thus function as a stand-alone library for phylogenetics, a component of more complex multi-library phyloinformatic pipelines, or as a scripting “glue” that assembles and drives such pipelines. DendroPy's component SumTrees supersedes Sukumaran's previous program bootscore. DendroPy is described in the paper: Sukumaran, J. and Mark T. Holder. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26: 1569-1571. It is available as Python script. It can be downloaded from its web site at http://packages.python.org/DendroPy/

Jason Evans, of Canonware.com (jasone (at) canonware.com) has released Crux version 1.2.0, a set of Python modules together with code in C, that carries out many methods in phylogeny reconstruction. It can be used to compute distances, likelihoods, and do Bayesian MCMC on phylogenies. It can also find neighbor-joining trees, manipulate trees. and computer Robinson-Foulds distances between trees. Crux is written in Cython, an extension of Python which includes some features of the C language. Evans describes Crux as particularly useful for developing scripts to automate phylogeny tasks. Installing it requires Python and a C compiler. It is available at its web site at http://www.canonware.com/Crux/

[Bionumerics icon] Applied Maths NV of Keistraat 120, 9830 Sint-Martens-Latem, Belgium (info @ applied-maths.com) has released Bionumerics, a program to manage a wide variety of biological data "from 1D patterns, 2D gels, phenotype arrays, and DNA/protein sequences". In addition to database and image processing capabilities, it can do clustering and phylogenetic inference. A variety of clustering methods including UPGMA and neighbor-joining distance matrix methods are available, and for inferring phylogenies generalized parsimony and maximum likelihood are described as available. Bootstrap support for groups can also be computed. There are also facilities for plotting the trees. Bionumerics is distributed as Windows executables. Bionumerics is commercial software. Information about it is available at its web site at http://www.applied-maths.com/bn/bn.htm, including requesting a free demo version. For price and ordering information contact them through the web site or by email, or by phone at +32 9 2222 100, fax them at +32 9 2222 102. Their U.S. Sales Office is at Applied Maths Inc., 13809 Research Blvd, Suite 645, Austin, Texas 78750. phone +1 512-482-9700, fax +1 512-482-9708 (email is info-us @ applied-maths.com).

John Czelusniak, then of the Department of Anatomy and Cell Biology, Wayne State University, Detroit, Michigan wrote sog, a C program demonstrating an algorithm to find the most parsimonious phylogeny along with the parsimony strength of grouping (or Bremer decay index) for nucleotide sequences in one pass of a branch and bound algorithm. This differs from the implementation in PAUP* which uses a separate branch and bound search to find the strength of grouping for each group in the tree, using the tree group exclusion option. John said (some time ago) that "sog is a rather ugly hack which will be optimized and streamlined. It IS ALPHA SOFTWARE, which means it has not been tested extensively on datasets other than our primate datasets." It is available at the IUBIO archive at http://iubio.bio.indiana.edu/soft/molbio/evolve/. It is distributed as generic C source code which should be able to compile and run on any system that has a C compiler.

[CAFCA icon here] Rino Zandee (rino.zandee (at) gmail.com) formerly of the Institute of Evolutionary and Ecological Science, Van der Klaauw Laboratory, Leiden University, has written CAFCA version 1.5.12, the Collection of APL Functions for Comparative Analysis. It carries out a search for the most parsimonious tree with discrete-character data (either two-state or multistate), using a search for cliques of component compatibility (monothetic subsets) to propose the candidates for most parsimonious trees. The program is written as functions in the APL language, but PowerPC Mac OS (or maybe it's Mac OS X) executables are distributed. The program is free and is available from the CAFCA Web Site at http://www.mzandee.net/~zandee/cafca/.

Valery Zaporozhchenko of the Research Centre for Medical Genetics, Moscow, Russia (valery (at) regmed.ru) has released Murka version 1.2, a phylogeny package for parsimony methods. It constructs median networks and from them finds Steiner trees (estimates of the most parsimonious tree) from biological alignments. The package includes subprograms for building full median networks and their subsets (such as Median Joining and Reduced Median networks), extracting Steiner trees and analyzing results. Murka is a cross-platform command line application with a source code distributed under the LGPL license. Documentation can be viewed at the documentation page at its web site. It is available as C++ source code, Windows executables and Linux executables. It can be downloaded from its web site at http://phylomurka.sourceforge.net For visualization of trees and networks Murka requires that the graph visualization programs GrappViz also be installed.

Kai Müller of the Nees-Institut für Biodiversität der Pflanzen of the University of Bonn, Germany (kaimueller (at) uni-bonn.de) has produced SeqState version 1.40. It carries out a variety of primer design functions and also calculates various statistics on aligned DNA sequences. For the purposes of this listing, the relevant feature is that it can be used to implement a number of different kinds of coding of indels (insertions and deletions). It is described in the paper: Müller K. F. 2005. SeqState - primer design and sequence statistics for phylogenetic DNA data sets. Applied Bioinformatics 4: 65-69 and the different indel coding methods are discussed in two other papers:

Müller, K. F. 2006. Incorporating information from length-mutational events into phylogenetic analysis. Molecular Phylogenetics and Evolution 38: 667-676.
Simmons, M. P., K. F. Müller, A. P. Norton. 2007. The relative performance of indel-coding methods in simulations. Molecular Phylogenetics and Evolution 44: 724-740.

It is available as Java executables, for Windows, for Mac OS X, and for Linux. It can be downloaded from its web site at http://systevol.nees.uni-bonn.de/software/SeqState

Naoko Takezaki, now of the Division of Genome Analysis and Genetic Research, Department of Medicine, Kagawa University, Kagawa, Japan, (takezaki (at) med.kagawa-u.ac.jp) has written gmaes, a program that estimates a gamma distribution parameter for rate variation among sites by counting the minimum number of substitutions at each site for a given tree topology. The program is distributed as generic C source code which can be compiled on any system that has a C compiler from the IUBIO archive at http://iubio.bio.indiana.edu/soft/molbio/evolve/.

Chris Creevey and James McInerney of the Bioinformatics and Pharmacogenomics Laboratory of the National University of Ireland, Maynooth (chris.creevey (at) may.ie) have released CRANN (an Irish word for "tree"), version 1.04, a program to detect natural selection using rates of synonymous and nonsynonymous substitutions. Crann takes FASTA format aligned nucleotide sequence files and either infers a tree using neighbor-joining based on nonsynonymous differences, or allows the user to read in a tree. It reconstructs the placements of the synonymous and nonsynonymous substitutions on the tree, and carries out a statistical test for an excess of nonsynonymous changes. It can also calculate synonymous and nonsynonymous differences between all pairs of sequences, and can also do that in a sliding window along the sequences. It is described in the papers:

Creevey, C. and J. O. McInerney. 2003. CRANN: Detecting adaptive evolution in protein-coding DNA sequences. Bioinformatics 19: 1726.
Creevey, C. and J. O. McInerney. 2002. An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Gene 300: 43-51.

It is available as Windows executables, Linux executables, Powermac Mac OS X executables and Mac OS 9 executables. It can be downloaded from its web site at http://bioinf.may.ie/crann/

Mathieu Blanchette, of the School of Computer Science, McGill University, Montréal, Québec (blanchem (at) mcb.mcgill.edu), Fei Feng (fei (at) cb.mcgill.ca), of the same school, and Martin Tompa of the Department of Computer Science and Engineering at the University of Washington, Seattle (tompa (at) cs.washington.edu) have released FootPrinter 2.0, a program that uses parsimony scores to carry out "phylogenetic footprinting" to search for regulatory sequences in the vicinity of genes that have been sequenced in multiple species. The program looks for locations upstream of each gene which, when taken together on a known phylogeny, show the largest amount of conservation by having the smallest number of changes of state along the tree. The method is described in these papers:

Blanchette, M. and M. Tompa. 2003. FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research 31: 3840-3842.
Blanchette, M. and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Research 12: 739-748.
Blanchette, M., B. Schwikowski, and M. Tompa. Algorithms for phylogenetic footprinting. Journal of Computational Biology 9: 211-223.

The program is available as C source code (including some programs from PHYLIP) from a web site at http://bio.cs.washington.edu/software/motif_discovery#Motif%20Discovery. Two web servers are available, one running FootPrinter 3.01, a more recent version, and one, MicroFootPrinter, that searches for prokaryotic sequences that are similar to your sequence and runs a FootPrinter 2.0, on that data set.

Daniel Barker (db60 (at) st-andrews.ac.uk) of the University of St. Andrews, Scotland, U.K., has written LVB version 3.1, a program for inferring phylogenies using parsimony and simulated annealing. Simulated annealing is intended to allow searches for most parsimonious trees with large numbers of species. It is described as often giving good results with large matrices. Up to 16383 objects and 32766 characters may be used. Aligned nucleotide sequences with ambiguous nucleotides and/or discrete morphological characters can be used. Bootstrapping of the data is also supported. The program is currently available in ANSI C source code as a Unix tar file, and as executables for Windows, Mac OS X, and Linux. The text of a manual can also be read or downloaded from the web site. LVB is available from its Web site at http://eggg.st-andrews.ac.uk/lvb. It is also available as a Web server from the Institut Pasteur.

Dick Hwang of the Department of Genome Sciences, University of Washington (dhwang (at) u.washington.edu) has written GAPars, a program using a genetic algorithm to search for most parsimonious phylogenies. The program is written in C++ and should compile on Unix C++ compilers and on most other C++ compilers. He describes it as working "rather inefficiently" and "not ready for prime-time use". It can be obtained by emailing Hwang at the address above.

Quinn Snell, Mark Clement, and Hyrum Carroll of the Computational Science Laboratory of the Department of Computer Science at Brigham Young University, Provo, Utah (snell (at) cs.byu.edu) and (clement (at) cs.byu.edu) have written PSODA, a parsimony program for nucleotide sequences. The program reads the NEXUS file format, and carries out heuristic rearrangement of trees using the parsimony criterion. It is available as C++ source code, Windows executables, Linux executables and Powermac Mac OS X executables. It can be downloaded from its web site at http://dna.cs.byu.edu/psoda/

[GeneTree icon] Rod Page (r.page (at) bio.gla.ac.uk), of the Division of Environmental and Evolutionary Biology of the University of Glasgow has released GeneTree, version 1.3.0, a program that produces "reconciled trees" that fit a tree of gene copies to a species tree. It uses a parsimony criterion where the penalty is the number of deletions and duplications required to reconcile the gene tree with the species tree. The program is described as "preliminary". The program is described in the paper: Page, R. D. M. 1998. GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 14: 819-820, and its algorithm is described in the paper: Page, R. D. M. and M. A. Charleston. 1997. From gene to organismal phylogeny: Reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7: 231-240. It is available as a Macintosh executable and as an executable for Windows. They are available from the GeneTree web site at http://taxonomy.zoology.gla.ac.uk/rod/genetree/genetree.html. A manual is also available online there.

[CodonBootstrap icon John Huelsenbeck (johnh (at) berkeley.edu) of the Department of Integrative Biology, University of California, Berkeley released CodonBootstrap version 3, now distributed by Jonathan Bollback. This is a utility that will generate non-parametric bootstrap data sets from a DNA sequence file. The program re-samples codons to (1) avoid problems when analysing data under models that assume coding structure (e.g., rates partitioned by sites), or (2) when the user wishes to re-sample sites and maintain the original autocorrelation among positions within the codon. CodonBootstrap is available as a C source code that can be compiled for Unix from Jonathan Bollback's software web page at http://www.simmap.com/bollback/software.html. A Macintosh version that was formerly distributed seems not be available any more.

[TCS icon] Mark Clement, David Posada, and Keith Crandall of the Universidad Vigo, Spain (Posada) and the Department of Zoology, Brigham Young University, Provo, Utah (dposada (at) uvigo.es) have released TCS version 1.21, a program for estimating gene genealogies within a population. It does so by using the method introduced in the paper: Templeton, A. R., K. A. Crandall and C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. Genetics 132: 619-633. This is a method that connects existing haplotypes in a minimum spanning tree which is essentially a parsimony method. It can also infer networks with loops in them. TCS is written in Java and has a graphic user interface for the display of the resulting networks. It may be run on any system that has the Java runtime environment. The program is described in the paper: Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9: 1657-1660. It implements the estimation of the 95% parsimony connection limit, and the estimation of outgroup weights (which are used to designate the root of the tree). It takes as input sequence files in NEXUS or PHYLIP format, and accepts absolute distances between sequences as input. The output is a Postscript picture of the tree, which can be saved as a Postscript file. TCS is available as Java executables, with documentation, at its web site at http://darwin.uvigo.es/software/tcs.html.

[GEODIS icon] David Posada (dposada (at) uvigo.es), of the Universidad Vigo, Spain, Keith Crandall, of the Department of Zoology, Brigham Young University, Provo, Utah (Keith_Crandall (at) byu.edu) and Alan Templeton, of the Department of Biology of Washington University, Saint Louis, Missouri (temple_a (at) biology.wustl.edu) have made available GEODIS (version 2.6). It implements Templeton's method of Nested Clade Analysis, which is intended to distinguish between historical divergence of populations and geographical separation, using the geographical distribution of haplotypes in a genealogy. GEODIS is a Java program which can run on any platform. It is described in a paper: Posada D., K. A. Crandall and A. R. Templeton. 2000. GeoDis: A program for the cladistic nested analysis of the geographical distribution of genetic haplotypes. Molecular Ecology 9: 487-488. It is available at its web site at http://darwin.uvigo.es/software/geodis.html

Jon Jeffery (jon (at) donnasaxby.com), then of the Insitute of Biology, Leiden University, The Netherlands has written Parsimov, a series of Perl scripts to implement "event cracking", a parsimony-based method of finding the minimum number of changes in developmental sequences of events that are necessary to explain the evolution of pairs of characters on a tree. Among the uses of this method is to reconstruct ancestral developmental sequences. The programs include:

Parsimv7g.pl which implements event-pair "parsimony cracking".
ReplacerParsimv.pl which takes a Parsimv7g.pl output file and replaces the PAUP* character numbers with more readable character names according to a user-specified text list.
Describe.pl, which creates a PAUP* command file to describe each tree in memory under ACCTRAN and DELTRAN optimizations (saving each as separate log files) plus a Parsimv7g.pl batch file (e.g., ParsBatch.txt) to crack each of the PAUP* log files produced.

The programs can be executed on any system that has Perl installed. They are described in a paper: Jeffery, J.E., O.R.P. Bininda-Emonds, M.I. Coates, and M.K. Richardson. 2005. A new technique for identifying sequence heterochrony. Systematic Biology 54: 230-240. The Parsimov programs are available as (separate) downloads at Olaf Bininda-Emonds's software web page at http://www.uni-oldenburg.de/molekularesystematik/en/34011.html#EvoDevo

David Swofford, of the Center for Evolutionary Genomics, Duke University, Durham, North Carolina, together with Stewart Berlocher of the Department of Entomology of the University of Illinois, Urbana, Illinois wrote Freqpars. It implements parsimony analysis based on gene frequencies. The method was described by D. L. Swofford and S. H. Berlocher in a paper in Systematic Zoology 36: 293-325, 1987. The program is available in FORTRAN 77 source code. The search for most parsimonious trees under Swofford and Berlocher's criterion is not very extensive, Swofford notes, because the individual tree evaluations are computationally difficult. The source code in FORTRAN, with documentation, has been made available (after a period of unavailability) at Swofford's PAUP web site as one of a number of "companion applications".

To top of this page

To next section of software pages

Notices added in compliance with University of Washington requirements for web sites hosted at the University: Privacy Terms