To go to top of Software pages
To previous part of Software pages
PHYLIP
version 3.6 is my own package.
It is available free, from its Web site, in C source code, or
as executables for Windows, Mac OS X, and Mac OS 8 or 9.
The C source code can easily be compiled on Unix or Linux systems.
It includes programs to carry out parsimony,
distance matrix methods, maximum likelihood, and other methods on a variety
of types of data, including DNA and RNA sequences, protein sequences,
restriction sites, 0/1 discrete characters data, gene frequencies,
continuous characters and distance matrices. It may be the
most widely-distributed phylogeny package, with about 29,000 registered users,
some of them satisfied.
It is third after PAUP*
and MrBayes
in the competition to be the
program responsible for the most published trees. It has been
distributed since October, 1980 and has celebrated its 30th anniversary,
as the oldest distributed phylogeny package.
PHYLIP is distributed at the PHYLIP web site
at http://evolution.gs.washington.edu/phylip.html.
A number of sites offer
web-servers that will perform data analyses using PHYLIP.
David Swofford
of the School of Computational Science and
Information Technology, Florida State University, Tallahassee, Florida
has written PAUP* (which originally meant Phylogenetic Analysis Using Parsimony).
PAUP* version 4.0beta10 has been released as a provisional
version by Sinauer Associates, of Sunderland, Massachusetts.
It has Macintosh, PowerMac, Windows, and Unix/OpenVMS versions.
PAUP* has many options and close compatibility with
MacClade.
It includes parsimony, distance matrix, invariants, and maximum
likelihood methods and many indices and statistical tests.
It is described in a web page
at http://paup.csit.fsu.edu/, which also contains links to
its web pages at Sinauer Associates.
It is available for the following types of systems:
- For PowerMac and 68k Macintosh Mac OS 9 in a version with full mouse-windows
user interface, which can also be run under the Classic environment on Mac OS X,
- For PowerPC Mac OS X systems or Intel Mac OS X systems when running
under emulation) in a version with a command-line interface,
- For Windows in a version with a
character-based command-line interface (which appears in a Windows window),
- For DOS or a Windows DOS box in a version which has command-line interface, and
- In a Unix/Linux version, with command-line interface, for
Alpha Compaq/Digital Unix, Alpha Linux, PowerPC Linux, Intel-compatible Linux,
Sun SPARC/UltraSPARC Solaris, and Alpha VMS.
The price
is $100 US for the Macintosh and PowerMac executable versions,
$85 for the Windows executable version, and
$150 for the Unix source code version, plus $20 for shipment. The Beta version comes with a Command Reference Document.
Their ISBN numbers are
0-87893-806-0, -807-9, and -804-4. Contact and ordering
information will be found at the Sinauer
Associates web site.
The international distributor for many countries is Palgrave Macmillan,
Brunel Road, Houndsmills, Basingstoke, Hampshire RG21 6XS, U.K. Tel:
+44-1256-329242
Fax: +44-1256-330688. Their e-mail address is
lecturerservices (at) palgrave.com.
For New Zealand, Korea, Japan, Brazil and Australia see the addresses at
this web page.
Derek Sikes of the University of Alaska Mueum, Fairbanks, Alaska
(ffdss (at) uaf.edu) and Paul
Lewis of the
Department of Ecology and Evolutionary Biology of the University of Connecticut
have produced PAUPRat, a program that generates a text file
which can be used as commands by PAUP*
to have it carry implement Kevin Nixon's highly effective
tree search method, the Parsimony
Ratchet. The input files for PAUPRat can also
be modified to implement Rutger Vos's
comparable Likelihood Ratchet. It is available as Mac OS,
Mac OS X and Linux executables, as a DOS executable that can be run under
Windows,
and in source code, from
its web site
at http://users.iab.uaf.edu/~derek_sikes/software2.htm.
MacClade is a pioneering program for
interactive analysis of evolution of a variety of character types,
including discrete characters and molecular sequences. It works on
Macintoshes with Mac OS X, up to and including
now Leopard, Mac OS X version 10.6 (and also on Mac OS).
MacClade enables you to use the mouse-window interface to specify and
rearrange phylogenies by hand, and watch the number of character steps and the
distribution of states of a given character on the tree change as you do so.
It has many other features beyond this, including ability to edit data,
print out phylogenies, and even simulate the evolution of data on a tree.
MacClade was written by Wayne Maddison (now of the Department of Zoology, University of British Columbia)
and David Maddison of the Department of Entomology,
University of Arizona. Until 2011 it was distributed commercially by
Sinauer Associates of Sunderland, Massachusetts, USA.
As MacClade will not function with the forthcoming Mac OS X 10.7 (Lion),
the Maddisons have made it available as a free download. It is
available at the MacClade web site
starting with version 4.08a. It includes a manual.
An much earlier and less capable Version, 2.1 (which for example
cannot read nucleic acid sequences and has many fewer features for discrete
characters) is also available as a Mac OS 9 executable from the EMBL and Indiana
molecular biology software servers at (respectively)
iubio.bio.indiana.edu, and ftp.ebi.ac.uk, in directories
molbio/mac and pub/software/mac,
respectively, as a BinHexed and squeezed archive, (respectively
macclade-old.hqx and macclade21.hqx. A demo
version of MacClade 3 that will not save or print files
is also available there.
J. S. Farris
has produced Hennig86, a fast parsimony program including
branch-and-bound search for most parsimonious trees and interactive tree
rearrangement. Although complete benchmarks have not been published it is said
to be faster than Swofford's PAUP*; both are a great many times faster than the
parsimony programs in PHYLIP. The program is distributed in executable object
code only and costs $50, plus $5 mailing costs ($10 outside of of the U.S.).
The user's name should be stated, as copies are personalized as a copy-
protection measure. It is distributed by Arnold Kluge, Amphibians and
Reptiles, Museum of Zoology, University of Michigan, Ann Arbor, Michigan
48109-1079, U.S.A. (akluge (at) umich.edu) and by Diana Lipscomb at
George Washington University (biodl (at) gwuvm.gwu.edu). It runs on PC-compatible
microcomputers with at least 512K of RAM and needs no math coprocessor or
graphics monitor. It can handle up to 180 taxa and 999 characters.
It was described in the paper:
Farris, J.S. 1989, Hennig86: a PC-DOS program for phylogenetic analysis.
Cladistics 5: 163.
Mark Siddall
, Assistant Curator of Annelida
at the American Museum of Natural History, New York
(siddall (at) amnh.org) has released Random Cladistics, version 4.0.3, a set of programs that can carry
out bootstrapping, jackknifing, a variety of kinds of permutation tests, and
search for "islands" of trees,
using Hennig86 or
NONA to analyze the data. It can also
mark ranges of sites for
inclusion or exclusion, compare trees from the analyses, compute an index
of incongruence between data sets, and do many other
operations. To use it you must have a copy of Hennig86
(for whose distribution see above). Random Cladistics will carry out the
appropriate transformations of your data and will call Hennig86 and have it
analyze them, and then it will summarize the results.
Random Cladistics is described by its author as no longer being supported
software -- he says that "Winclada
is far superior and provide's a nice interface."
Random Cladistics and associated programs are still distributed by their author
from its web site at
http://research.amnh.org/~siddall/rc.html as MSDOS executables.
Torsten Eriksson
of the Bergius Botanical
Garden, Stockholm, Sweden (torsten (at) bergianska.se)
has written a program, AutoDecay which
generates Decay Indices from an existing PAUP* 4.0 treefile. It is intended
to simplify the the task of
creating reverse constraint trees in PAUP* 4.0 and subsequent generation of
Bremer support values. (Bremer, K. 1994. Cladistics 10: 295-304).
AutoDecay version 5.06 is written in the scripting language Perl, and runs on
most systems that have Perl installed. Autodecay can
be obtained from
Eriksson's software web page from
http://www.bergianska.se/index_forskning_soft.html.
Doug Eernisse
of the
California State University, Fullerton (DEernisse (at) fullerton.edu)
has constructed DNA Stacks version 1.3.5, a Macintosh HyperCard stack
that can carry out a variety of analyses on DNA sequences. It
does not do phylogenies itself. It has an alignment editor, and can
carry out various kinds of translation,
and codon bias analysis. It can write out data sets in PAUP*, Hennig86, and
PHYLIP formats. It is included here because in its
"Support Index Blocks..." menu item it is able to prepare jobs for
PAUP* to enable Decay Index (Support Index) analysis.
It is available
by World Wide Web from
http://biology.fullerton.edu/deernisse/dnastacks.html.
Michael Sorenson
of the Department of Biology, Boston University (msoren (at) bu.edu)
has released
TreeRot, version 3, a program that helps make Bremer Support
Indices ("decay indices") for parsimony analyses. It generates a
PAUP* command file with a constraint
statement for each node in a given shortest or strict consensus tree and
with commands to search for trees inconsistent with each of these constraint
statements in turn. For nodes with decay indices of more than a few steps, the
constraint statement approach is much more effective than simply finding all
trees 1, 2, 3, 4, etc. steps longer than the shortest tree and then examining
their strict consensus for which nodes are lost.
This version also supports the determination of partitioned Bremer support
indices introduced in the paper:
Baker, R.H., and R. DeSalle. 1997. Multiple sources of character information
and the phylogeny of Hawaiian Drosophilids. Systematic Biology
46: 654-673, and it will also parse the
PAUP* log file, automatically calculating the decay index for each node.
It is written in the Perl scripting language, and
a Mac OS Macintosh executable is also available. Both are distributed at
its web site
at http://people.bu.edu/msoren/TreeRot.html.
J. S. Farris
has written RA (Rapid nucleotide Analysis).
It features rapid bootstrapping. It is available from Arnold Kluge,
Amphibians and Reptiles, Museum of Zoology, University of Michigan, Ann
Arbor, Michigan 48109-1079, U.S.A.
(akluge (at) umich.edu)
and Diana
Lipscomb at George Washington University (BIODL
(at) gwuvm.gwu.edu) who may
be contacted for details. The cost is said to be about $30 US.
Kevin Nixon
of the L. H. Bailey Hortorium at
Cornell University in Ithaca, New York (kcn2 (at) cornell.edu) has written
WINCLADA version 0.9.99m24, an interactive program that can read and edit
trees and data files, display character state changes inferred by parsimony
on diagrams of the trees, and launch runs of the programs
NONA, PIWE, and
Hennig86. WINCLADA is available
as a Windows95/98/NT executable from
its web site at
http://www.cladistics.com/about_winc.htm. It is available on
a shareware basis: the user who downloads it must pay $50 to Kevin Nixon at
Winclada/Kevin C. Nixon, 2210 Ellis Hollow Road, Ithaca, New York 14850.
There is also a $200-per-class fee for its use in courses.
WINCLADA supersedes and combines features of Nixon's earlier programs
ClaDOS and DADA, which are no longer distributed.
Pablo Goloboff,
of INSUE - Fundación e Instituto
Miguel Lillo 205, 4000 S. M. de
Tucumán, Argentina (instlillo (at) infovia.com.ar
with Subject line "para Pablo Goloboff") has written
NONA (Noname), version 2.0, PiWe
(Parsimony with Implied WEights),
and SPA to carry out parsimony including weighted
parsimony analyses. NONA searches for most parsimonious trees according to
character weights defined by the user a priori. Pee-Wee calculates weights of
the characters by a method introduced by Goloboff, a
noniterative version of J. S. Farris's "successive weighting". It was described
in Goloboff's paper in Cladistics 9: 83-91, 1993.
SPA is a generalized parsimony program that allows differential weighting of
changes between different states.
NONA is said to be faster than other parsimony programs.
A Windows version of NONA which includes Piwe
and SPA is available as freeware from
its web page at
http://www.cladistics.com/aboutNona.htm.
Pablo Goloboff,
of INSUE - Fundación e Instituto
Miguel Lillo 205, 4000 S. M. de Tucumán, Argentina,
(pablogolo (at) csnat.unt.edu.ar)
together with J. S. Farris of the, Laboratory of Molecular Systematics of
the Naturhistoriska Riksmuseet, Stockholm,
Sweden and Kevin Nixon of the L. H. Bailey Hortorium, Cornell University,
Ithaca, New York, have produced TNT (Tree analysis using
New Technology), version of August 2008. This is a parsimony
program intended for use on very large data sets. It makes use of the
methods for speeding up parsimony searches introduced by Goloboff in
the paper: Goloboff, P.A. 1999. Analyzing large data sets in reasonable times:
solutions for composite optima. Cladistics 15: 415-428, and
the highly effective "parsimony ratchet" search strategy introduced by
Nixon in the paper: Nixon, K.C. 1999. The parsimony ratchet, a new method
for rapid parsimony analysis. Cladistics 15: 407-414.
It can handle characters with discrete states as well as continuous characters.
The program is distributed as Windows, Linux, and both PowerMac and Intel Mac
OS X executables.
The program and some support
files including documentation is available from
its web page
at http://www.zmuc.dk/public/phylogeny/TNT
It is free, provided you agree to a license with some reasonable limitations.
Frédéric Calendini and Jean-Francois Martin
of the Departement Protection des plantes et environnment
of the Ecole Nationale Supérieur, Montpellier, France
(martinjf (at) ensam.inra.fr)
have produced PaupUp
version 1.0.3.1, a graphical frontend for Paup* DOS software. The PauUp
program provides a user-friendly interface to the phylogenetic program PAUP* on the Windows operating systems. The
DOS version of PAUP* is entirely command-line driven and does not provide any
graphical interface. PaupUp partly resolves this issue, providing around 80%
of the available commands (the most commonly used in our opinion) in a
graphical environment comparable to the MAC OS version while the last 20%
commands are still available through direct command-line input in a single
integrated design. The programs
TreeView and Modeltest
can be called from PaupUp. PaupUp is not compatible with the Windows version
of PAUP* but is compatible with the DOS version that is distributed with that
Windows version. It is available as a Windows executable. It also requires
the Microsoft .NET executable framework to be installed. PaupUp can be
downloaded from
its web site
at http://www.agro-montpellier.fr/sppe/Recherche/JFM/PaupUp/
Kai Müller
of the Nees-Institut für Biodiversit&aauml;t der Pflanzen
of the University of Bonn, Germany
(kaimueller (at) uni-bonn.de)
has written PRAP
(Parsimony Ratchet Analyses using PAUP* and likelihood)
version 2.0, a Java program to drive PAUP* in computing Bremer support
of groups, and in doing ratchet searches for
parsimony or likelihood trees. It allows the user to make
PAUP* carry out searches using the
"parsimony ratchet" strategy of Kevin Nixon. In version 2.0 this can be done
using either the parsimony criterion or the likelihood criterion (in spite of
the name of the search method). It can also do variations on the parsimony
ratchet including multiple random addition sequences.
It is described in the paper:
Müller, K. F. 2004. PRAP - computation of Bremer support for large data
sets. Molecular Phylogenetics and Evolution 31: 780-782, and
the search strategies it implements are described in the paper: Müller,
K. 2005. The efficiency of different search strategies in estimating parsimony
jackknife, bootstrap, and Bremer support. BMC Evolutionary Biology
5: 58.
It is available as Java executables, as downloads for Windows, Mac OS X,
and for Unix. It can be downloaded from
its web site
at http://systevol.nees.uni-bonn.de/software.
The earlier versions 1.0 and 0.99 are also available there.
MEGA
(Molecular
Evolutionary Genetic Analysis) is produced by Sudhir Kumar of
the Center for Evolutionary Functional Genomics of the
The Biodesign Institute at
Arizona State University, Tempe, Arizona (s.kumar (at) asu.edu)
together with Joel Dudley of the
Stanford Center for Biomedical Informatics Research at Stanford University,
Koichiro Tamura of Tokyo Metropolitan University and Masatoshi Nei,
of Pennsylvania State University.
It carries out parsimony, distance matrix and likelihood methods for
molecular data (nucleic acid sequences and protein sequences). It
can do boostrapping, consensus trees, and a variety of distance measures,
with Neighbor-Joining, Minimum Evolution, UPGMA, and parsimony tree
methods, as a well as a large variety of data editing tasks, sequence
alignment using an implementation of
ClustalW, tests of the
molecular clock, and single-branch tests of significance of groups.
MEGA4 is the current version. MEGA4 is described in the papers:
- Kumar, S., J. Dudley, M. Nei and K. Tamura K. 2008. MEGA:
A biologist-centric
software for evolutionary analysis of DNA and protein sequences.
Briefings in Bioinformatics 9: 299-306.
- K. Tamura, J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4:
Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.
Molecular Biology and Evolution
24: 1596-1599.
It is available for free at
its web site
at http://www.megasoftware.net.
as Windows executables, with a downloadable manual. Manual web
pages are also accessible there.
It can be run under Mac OS X and under Linux using Windows emulators, if
you have those.
In addition, MEGA 4.1 is available as a downloadable beta release.
An earlier version, MEGA 1.02, is also available there as a DOS executable.
It is downloadable at the MEGA site and that version's manual
is also
available on line at
http://evolgen.biol.metro-u.ac.jp/MEGA/manual/default.html.
Xuhua Xia
of the Department of Biology and the
Center for Advanced Research in Environmental Genomics (CAREG)
of the University
of Ottawa, Ontario, Canada
(xxia (at) uottawa.ca) has released DAMBE
(Data Analysis in Molecular Biology and Evolution), version 5.0.25,
a general-purpose package for DNA and protein sequence phylogenies,
and also gene frequencies. It can read and
convert a number of file formats, and has many features for
descriptive statistics. It can compute a number of commonly-used
distance matrix measures and infer phylogenies by parsimony, distance,
or likelihood methods, including bootstrapping (by sites or by codons)
and jackknifing. There are
a number of kinds of statistical tests of trees available, and many other
features. It
can also display phylogenies. DAMBE includes a copy of ClustalW; there is also code from
PHYLIP.
An interesting feature is a simple web browser that allows sequences to
be fetched over the web while running DAMBE.
DAMBE is described in two publications, a paper and a book:
- Xia, X., and Z. Xie. 2001. DAMBE: Data analysis in molecular biology and
evolution. Journal of Heredity 92: 371-373, and a book:
- Xia, X. 2000. Data Analysis in Molecular Biology and Evolution.
Kluwer Academic Publishers, Boston.
DAMBE consists of Windows executables. It is available for free from
its web site at
http://dambe.bio.uottawa.ca/dambe.asp.
Matthew Goode
, Alexei Drummond, Ed Buckler, and Korbinian Strimmer,
together with seven other contributors,
have released PAL (Phylogenetic Analysis Library)
version 1.5, a free collection of Java classes for use in molecular
phylogenetics.
The addresses of the four principal contributers are respectively:
- Matthew Goode (m.goode
(at) auckland.ac.nz),
Bioinformatics Institute, School of Biological Sciences, University of Auckland, New Zealand.
- Alexei Drummond (alexei (at) cs.auckland.ac.nz, Department of Computer Science,
University of Auckland, New Zealand
- Ed Buckler (esb33 (at) cornell.edu),
Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York.
- Korbinian Strimmer (strimmer
(at) uni-leipzig.de, Institute for Medical Informatics,
Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany.
PAL is intended to facilitate the rapid construction of both
general applications as well as special-purpose tools for phylogenetic
analysis. It focuses on probabilistic data modelling and provides,
e.g., routines for
- maximum likelihood, neighbor-joining and least squares
analysis
- probability models for nucleotide/amino acid substitution, including constraints for a
molecular clock
- bootstrapping, and the Kishino-Hasegawa-Templeton and Shimodaira-Hasegawa tests
- simulation of trees and data sets, including coalescent trees
with growing populations and serial samples
- reading and write trees and alignments
- adjusting for rate variation among sites
- obtaining splits from trees and calculating a distance between trees
among many other functions. It currently consists of over 200 components in 16
packages. PAL is described in a paper:
- Drummond, A., and K. Strimmer. 2001. PAL: An object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics 17:
662-663.
It is
available at its web site
at http://www.cebl.auckland.ac.nz/pal-project/. Two user interfaces
are available which contain application programs written using PAL.
They have separate entries in these pages:
- Vanilla (by Strimmer): A simple text front end
- Pebble (vCEBL) (by Drummond): A GUI interface to PAL plus a functional command language.
PAL can be run on any machine that has Java, and can also be compiled into
native code by the Gnu Compiler for Java (gcj).
Korbinian Strimmer
, of the Institute for Medical Informatics,
Statistics and Epidemiology (IMISE) of the University of Leipzig, Germany
(strimmer (at) uni-leipzig.de),
has written Vanilla, version 1.2, a character-based
interface to the PAL Java classes, which includes a
number of programs carrying out different kinds of phylogenetic analysis,
including:
- MLDIST which computes maximum likelihood distances
between DNA sequences, protein sequences, and two-state data, with
correction for unequal rates at different sites. It has many different
substitution models available. It also computes observed distances and can
obtain approximate estimates of unknown model parameters such as the
Ts/Tv ratio.
- MLTREE which computes the likelihood of a given tree under
the same models as MLDIST, allowing branch lengths to be provided or to
be estimated by the program, with the possibility of constraining them to
be clocklike. If two or more tree are provided it can also compare
them using the Kishino-Hasegawa test, the Shimodaira-Hasegawa test,
and expected Akaike weights.
- EVOLVE simulates data along a tree using the above models.
- DISTTREE computes least squares branch lengths from distance
matrices on a given tree, and can also construct Neighbor-Joining and
UPGMA trees.
- REWRITE converts data sets between different formats.
nucleotides and amino acid data, to estimate of maximum-likelihood branch
lengths on trees (incl. clock trees and dated tips), for statistical (e.g.,
Shimodaira-Hasegawa) and topological (Robinson-Foulds) comparison of trees, to
infer demographic parameters from trees (based on the coalescent), and also
utility programs to reformat and modify alignments.
There are also 6 other programs with a command-line interface which
can estimate demographic parameters from coalescent trees,
compute distance matrices from trees,
reroot trees, and carry out some manipulations of data sets.
Vanilla has a menu-based interface. It is written in Java, and is
available from
its web site
at http://strimmerlab.org/software/vanilla/index.html
It can run on Java systems on many machines. Strimmer notes that
Vanilla does not provide all the functionality in PAL, and is perhaps
most useful as a source of examples on how to use PAL.
Wayne Maddison
of the Departments of Zoology and
Botany, University of British Columbia, Vancouver, Canada, and David Maddison
of the Department of Entomology, University of Arizona, Tucson,
together with Peter Midford, Danny Mandel, and Jeff Oliver have
released Mesquite, version 2.5. The project email
address is info
(at) mesquiteproject.org. Mesquite
is a large and varied
set of modules in Java to carry out a wide variety of analyses in
comparative biology. It is also intended as a framework for other
developers to use to add additional functons. Some of the over 500 functions
available in the project currently are:
- Reconstruction of ancestral states by parsimony or likelihood and display
of the reconstructed states
- Tests of process of character evolution, including comparative methods.
- Simulation of character evolution (for categorical, DNA, or continuous
characters)
- Simulation or testing of tree shapes including the effect of a
character on the shape of a tree
- Inferences of the fit of gene trees to species trees
- Parametric bootstrapping (with integration with programs such as
PAUP* and NONA)
- Morphometrics (PCA, CVA, geometric morphometrics)
- Coalescence (simulations, other calculations)
- Tree comparisons and simulations (tree similarity, Markov speciation models)
- Search among trees using different tree rearrangement methods as well as
exhaustive enumeration
- Cluster analysis including single linkage and UPGMA methods
- Trees can be displayed and manipulated
Other Java modules that use Mesquite include Tree Set Viz
and a Java version of PDAP. Some mesquite modules make use of PAL.
Mesquite is available in Java source code and Java executables from
its web page at
http://mesquiteproject.org. It can run on Mac OS X, Windows,
and Linux/Unix systems using recent versions of Java.
Julien Y. Dutheil, Bastien Boussau, and co-workers
of the Institut des Sciences de l'Evolution de Montpellier (ISE-M)
of the Université Montpellier 2, France
(julien.dutheil (at) univ-montp2.fr)
have released Bio++
version 1.8, a set of C++ libraries and programs dedicated to sequence
analysis, phylogenetics, molecular evolution and population genetics. The
Bio++ project is a collaborative effort to provide reusable implementations of
standard phylogenetics and population genetics methods published in the
literature, in order to analyze and manipulate sequence data, and with the
goal to facilitate the development of new methods. Bio++ is fully
object-oriented and documented. Two discussion forums are also available.
A non-exhaustive list of available methods includes:
- sequence and tree manipulations
- a large set of substitution models (nucleotides, protein, codons)
- distance estimation and tree reconstruction (by Neighbor Joining, BIONJ and UPGMA)
- maximum likelihood methods
- nucleotide diversity estimators
- tools for drawing phylogenies
Two recent additions also allow you to query sequences from databases and to
build GUIs using the Qt libraries. A set of example programs (The Bio++
Program Suite) is also available with examples and a manual.
Bio++ contains one of the largest set of models for phylogenetics, including
non-homogeneous models. It also features a very general way to set up your own
non-homogeneous model and fit it, for instance assuming a different
equilibrium GC content for distinct clades in the phylogeny.
Bio++ is distributed as source code on a CVS/SVN server, and stable snapshots
are made every six months. In addition to the source code, these stable
releases can also be installed as pre-compiled packages for various linux
distributions.
It is described in the papers:
- Dutheil, J., S. Gaillard, E. Bazin, S. Glémin, V. Ranwez, N. Galtier, and K.
Belkhir. 2006. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics
4 (7):188
-
Dutheil, J., B. Boussau. 2008. Non-homogeneous models of sequence evolution
in the Bio++ suite of libraries and programs. >BMC Evolutionary Biology.
22 (8): 255.
It is available as C++ source code, Windows executables, Linux executables,
Powermac Mac OS X executables and Intel Mac OS X executables, and packaged
as .deb (Debian, Ubuntu, etc), .rpm (Fedora, Mandriva, etc) packages and a
Gentoo overlay. It can be downloaded from
its web site
at http://kimura.univ-montp2.fr/BioPP/
Jaime Huerta-Cepas, Joaquin Dopazo and Toni Gabaldón
of the Comparative Genomics group
at the Centre for Genomic Regulation (CRG), Barcelona, Spain
(jhuerta (at) crg.es)
has released ETE
(a python Environment for Tree Exploration),
version 2.0.
ETE is a Python programming toolkit that assists in the automated
manipulation, analysis and visualization of hierarchical trees. It provides a
broad range of tree handling options, specific methods to work on
phylogenetics and clustering analyses, bindings to the phylogenomic databases
such as phylomeDB, advanced node annotation, interactive visualization, and a
customizable tree drawing engine to create PDF tree images. It also implements
methods for orthology and paralogy prediction and topological dating.
It is described in the paper:
Huerta-Cepas, J., J. Dopazo and T. Gabaldón. 2010. ETE: a python
Environment for Tree Exploration. BMC Bioinformatics 11: 24.
It is available as C source code, Windows executables Mac OS X universal executables, and a Python module. It can be downloaded from
its web site
at http://ete.cgenomics.org
Rutger Vos
of the School of Biological Sciences
of the University of Reading, United Kingdom
(rutgeraldo (at) gmail.com)
has released Bio::Phylo
(Phyloinformatic analysis using perl),
version 0.35, a phylogeny package with tree simulation, topology,
visualization, data conversion functionality. It has modules for simulating
tree shapes under various models, compute various tree topology indices,
manage and convert data in various formats and visualize tree shapes.
It is described in the paper:
Vos, R. A., J. Caravas, K. Hartmann, M. A. Jensen and C. Miller. 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:
63. http://dx.doi.org/10.1186/1471-2105-12-63.
It is available as Perl script. It can be downloaded from
its web site
at http://search.cpan.org/dist/Bio-Phylo/
Gavin Huttley, Rob Knight, PyCogent Development Team
of the John Curtin School of Medical Research
of the Australian National University, Canberra, Australia
(gavin.huttley (at) anu.edu.au)
has released PyCogent
(COmparative GENomics Toolkit, written in Python)),
version 1.4.1. PyCogent
is a software library for genomic biology. It is an integrated
framework for controlling third-party applications;
devising workflows; querying databases; conducting novel probabilistic
analyses of biological sequence evolution; and generating publication quality
graphics. It is intended that it be able to carry out a variety of
phylogeny methods itself, but for now these have not been implemented. It
can, however, be used to submit runs of some existing programs to infer
phylogenies, including RAxML,
FASTML, and
Muscle.
It is described in the paper:
Knight, R., P. Maxwell, A. Birmingham, J. Carnes, J. G. Caporaso, B. C. Easton
et al. 2007. Pycogent: A toolkit for making sense from sequence. Genome
Biology 8(8): R171.
It is available as C source code, Python script, Linux executables, Intel Mac
OS X executables and Mac OS X universal executables. It can be downloaded from
its web site
at http://pycogent.sourceforge.net/
Jeet Sukumaran and Mark Holder
of the Department of Ecology and Evolutionary Biology
of the University of Kansas, Lawrence, Kansas
(jeet (at) ku.edu)
have produced DendroPy
version 3.6.1, phylogenetic computing library. DendroPy is a Python library
for phylogenetic computing. It provides classes and functions for the
simulation, processing, and manipulation of phylogenetic trees and character
matrices, and supports the reading and writing of phylogenetic data in a range
of formats, such as NEXUS, Newick, NeXML, Phylip, FASTA, etc. Application
scripts for performing some useful phylogenetic operations, such as data
conversion and tree posterior distribution summarization, are also distributed
and installed as part of the libary. DendroPy can thus function as a
stand-alone library for phylogenetics, a component of more complex
multi-library phyloinformatic pipelines, or as a scripting “glue”
that assembles and drives such pipelines.
DendroPy's component SumTrees supersedes Sukumaran's previous program
bootscore.
DendroPy is described in the paper:
Sukumaran, J. and Mark T. Holder. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26: 1569-1571.
It is available as Python script. It can be downloaded from
its web site
at http://packages.python.org/DendroPy/
Jason Evans, of Canonware.com
(jasone (at) canonware.com)
has released Crux version 1.2.0, a set of Python modules
together with code in C, that carries out many methods in phylogeny
reconstruction. It can be used to compute distances, likelihoods, and
do Bayesian MCMC on phylogenies. It can also find neighbor-joining trees,
manipulate trees. and computer Robinson-Foulds distances between trees.
Crux is written in Cython, an extension of Python which includes some
features of the C language. Evans describes Crux as particularly useful
for developing scripts to automate phylogeny tasks.
Installing it requires Python and a C compiler.
It is available at its web site at
http://www.canonware.com/Crux/
Applied Maths NV
of Keistraat 120, 9830 Sint-Martens-Latem, Belgium
(info @ applied-maths.com)
has released Bionumerics, a
program to manage a wide variety of biological data "from
1D patterns, 2D gels, phenotype arrays, and DNA/protein sequences".
In addition to database and image processing
capabilities, it can do clustering and phylogenetic inference. A
variety of clustering methods including UPGMA and neighbor-joining
distance matrix methods are available, and for inferring
phylogenies generalized parsimony and maximum likelihood are described
as available. Bootstrap support for groups can also be computed.
There are also facilities for plotting the trees.
Bionumerics is distributed as Windows executables. Bionumerics
is commercial software. Information about it is available at
its web site
at http://www.applied-maths.com/bn/bn.htm,
including requesting a free demo version.
For price and ordering information contact them through the
web site or by email, or by phone at
+32 9 2222 100, fax them at +32 9 2222 102.
Their U.S. Sales Office is at Applied Maths Inc.,
13809 Research Blvd, Suite 645, Austin, Texas 78750.
phone +1 512-482-9700, fax +1 512-482-9708 (email is info-us @
applied-maths.com).
John Czelusniak
, then of the Department of Anatomy and Cell Biology,
Wayne State University, Detroit, Michigan
wrote sog, a
C program demonstrating an algorithm to find the most parsimonious phylogeny
along with the parsimony strength of grouping (or Bremer decay index) for
nucleotide sequences in one pass of a branch and bound algorithm. This differs
from the implementation in PAUP* which uses a separate branch and bound search
to find the strength of grouping for each group in the tree, using
the tree group exclusion option. John said (some time ago) that
"sog is a rather ugly hack
which will be optimized and streamlined. It IS ALPHA SOFTWARE, which means it
has not been tested extensively on datasets other than our primate datasets."
It is available at the IUBIO archive
at http://iubio.bio.indiana.edu/soft/molbio/evolve/.
It is distributed as generic C source code which should be able to compile
and run on any system that has a C compiler.
Rino Zandee
(rino.zandee (at) gmail.com)
formerly of the Institute of Evolutionary and Ecological Science, Van der Klaauw
Laboratory, Leiden University, has written CAFCA version 1.5.12,
the Collection of APL Functions for Comparative
Analysis. It carries out a
search for the most parsimonious tree with discrete-character data (either
two-state or multistate), using a search for cliques of component
compatibility (monothetic subsets) to propose the candidates for most
parsimonious trees. The program is written as functions in the APL language,
but PowerPC Mac OS (or maybe it's Mac OS X) executables are distributed. The program is
free and is available from the
CAFCA Web Site
at http://www.mzandee.net/~zandee/cafca/.
Valery Zaporozhchenko
of the Research Centre for Medical Genetics, Moscow, Russia
(valery (at) regmed.ru)
has released Murka
version 1.2, a phylogeny package for parsimony methods. It constructs median
networks and from them finds Steiner trees (estimates of the most parsimonious
tree) from biological alignments. The package includes subprograms for
building full median networks and their subsets (such as Median Joining and
Reduced Median networks), extracting Steiner trees and analyzing results. Murka
is a cross-platform command line application with a source code distributed
under the LGPL license. Documentation can be viewed at the
documentation page at its web site.
It is available as C++ source code, Windows executables and Linux executables. It can be downloaded from
its web site
at http://phylomurka.sourceforge.net
For visualization of trees and networks Murka requires that the graph
visualization programs GrappViz also be installed.
Kai Müller
of the Nees-Institut für Biodiversität der Pflanzen
of the University of Bonn, Germany
(kaimueller (at) uni-bonn.de)
has produced SeqState
version 1.40. It carries out a variety of primer design functions and also
calculates various statistics on aligned DNA sequences. For the purposes of
this listing, the relevant feature is that it can be used to implement a
number of different kinds of coding of indels (insertions and deletions).
It is described in the paper:
Müller K. F. 2005. SeqState - primer design and sequence statistics for
phylogenetic DNA data sets. Applied Bioinformatics 4: 65-69
and the different indel coding methods are discussed in two other papers:
- Müller, K. F. 2006. Incorporating information from length-mutational events into phylogenetic analysis. Molecular Phylogenetics and Evolution 38: 667-676.
- Simmons, M. P., K. F. Müller, A. P. Norton. 2007. The relative performance of
indel-coding methods in simulations. Molecular Phylogenetics and Evolution 44: 724-740.
It is available as Java executables, for Windows, for Mac OS X, and for
Linux. It can be downloaded from
its web site
at http://systevol.nees.uni-bonn.de/software/SeqState
Naoko Takezaki
,
now of the Division of Genome Analysis and Genetic Research, Department
of Medicine, Kagawa University, Kagawa, Japan, (takezaki (at) med.kagawa-u.ac.jp)
has written gmaes, a program that estimates a gamma
distribution parameter for rate variation among sites by counting the minimum
number of substitutions at each site for a given tree topology.
The program is distributed as generic C source code which can be
compiled on any system that has a C compiler
from the IUBIO archive
at http://iubio.bio.indiana.edu/soft/molbio/evolve/.
Chris Creevey and James McInerney
of the Bioinformatics and Pharmacogenomics Laboratory
of the National University of Ireland, Maynooth
(chris.creevey (at) may.ie)
have released CRANN
(an Irish word for "tree"),
version 1.04, a program to detect natural selection using rates of synonymous
and nonsynonymous substitutions. Crann takes FASTA format aligned
nucleotide sequence files and either infers a tree using neighbor-joining
based on nonsynonymous differences, or allows the user to read in a tree.
It reconstructs the placements of the synonymous and nonsynonymous
substitutions on the tree, and carries out a statistical test for an excess of
nonsynonymous changes. It can also calculate synonymous and nonsynonymous
differences between all pairs of sequences, and can also do that in a sliding
window along the sequences.
It is described in the papers:
- Creevey, C. and J. O. McInerney. 2003. CRANN: Detecting adaptive evolution
in protein-coding DNA sequences. Bioinformatics 19: 1726.
- Creevey, C. and J. O. McInerney. 2002. An algorithm for detecting
directional and non-directional positive selection, neutrality and negative
selection in protein coding DNA sequences. Gene 300: 43-51.
It is available as Windows executables, Linux executables, Powermac Mac OS X
executables and Mac OS 9 executables. It can be downloaded from
its web site
at http://bioinf.may.ie/crann/
Mathieu Blanchette, of the School of Computer Science, McGill
University, Montréal, Québec
(blanchem
(at) mcb.mcgill.edu), Fei Feng
(fei (at) cb.mcgill.ca),
of the same school, and Martin Tompa of the
Department of Computer Science and Engineering at the University of
Washington, Seattle (tompa (at) cs.washington.edu) have released
FootPrinter 2.0, a program
that uses parsimony scores to carry out "phylogenetic footprinting" to
search for regulatory sequences in the vicinity of genes that have been
sequenced in multiple species. The program looks for locations upstream
of each gene which, when taken together on a known phylogeny, show the
largest amount of conservation by having the smallest number of changes of
state along the tree. The method is described in these papers:
- Blanchette, M. and M. Tompa. 2003. FootPrinter: a program designed for
phylogenetic footprinting. Nucleic Acids Research 31: 3840-3842.
- Blanchette, M. and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Research
12: 739-748.
- Blanchette, M., B. Schwikowski, and M. Tompa. Algorithms for phylogenetic
footprinting. Journal of Computational Biology 9: 211-223.
The program is available as C source code (including some programs from
PHYLIP) from
a web site
at http://bio.cs.washington.edu/software/motif_discovery#Motif%20Discovery.
Two web servers are available,
one running FootPrinter 3.01, a more recent version,
and one, MicroFootPrinter, that searches for prokaryotic sequences that are similar
to your sequence and runs a FootPrinter 2.0, on that data set.
Daniel Barker
(db60 (at) st-andrews.ac.uk)
of the University of St. Andrews, Scotland, U.K.,
has written LVB version 3.1,
a program for inferring phylogenies using parsimony and simulated annealing.
Simulated annealing is intended to allow searches for most parsimonious trees
with large numbers of species.
It is described as often giving good results with large matrices. Up to
16383 objects and 32766 characters may be used. Aligned nucleotide sequences
with ambiguous nucleotides and/or discrete morphological characters can be used.
Bootstrapping of the data is also supported.
The program is currently available in ANSI C source code as a Unix tar file,
and as executables for Windows, Mac OS X, and Linux.
The text of a manual
can also be read or downloaded from the web site.
LVB is available from its Web site at
http://eggg.st-andrews.ac.uk/lvb. It is also available as a
Web server
from the Institut Pasteur.
Dick Hwang
of the Department of Genome Sciences,
University of Washington (dhwang (at) u.washington.edu)
has written GAPars, a program using a genetic algorithm to search for
most parsimonious phylogenies. The program is written in C++ and should
compile on Unix C++ compilers and on most other C++ compilers. He describes
it as working "rather inefficiently" and "not ready for prime-time use".
It can be obtained by emailing Hwang at the address above.
Quinn Snell, Mark Clement, and Hyrum Carroll
of the Computational Science Laboratory of the Department of Computer Science
at Brigham Young University, Provo, Utah
(snell (at) cs.byu.edu)
and (clement (at) cs.byu.edu)
have written PSODA, a parsimony program for nucleotide
sequences. The program reads the NEXUS file format, and carries out heuristic
rearrangement of trees using the parsimony criterion.
It is available as C++ source code, Windows executables, Linux executables and
Powermac Mac OS X executables. It can be downloaded from
its web site
at http://dna.cs.byu.edu/psoda/
Rod Page
(r.page (at) bio.gla.ac.uk), of
the Division of Environmental and
Evolutionary Biology of the University of Glasgow has released
GeneTree, version 1.3.0,
a program that produces "reconciled trees" that fit a tree of gene copies to
a species tree. It uses a parsimony criterion where the penalty is the
number of deletions and duplications required to reconcile the gene tree with
the species tree. The program is described as "preliminary". The program
is described in the paper: Page, R. D. M. 1998. GeneTree: comparing gene and
species phylogenies using reconciled trees. Bioinformatics 14:
819-820, and its algorithm is described in the paper:
Page, R. D. M. and M. A. Charleston. 1997. From gene to organismal phylogeny:
Reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7: 231-240.
It is available as a Macintosh executable and as an executable for
Windows. They are
available from
the GeneTree web site at
http://taxonomy.zoology.gla.ac.uk/rod/genetree/genetree.html.
A manual is also available online there.
John Huelsenbeck
(johnh (at) berkeley.edu) of the
Department of Integrative Biology, University of California, Berkeley
released
CodonBootstrap version 3, now distributed by
Jonathan Bollback. This is a utility that
will generate non-parametric bootstrap data sets from a DNA sequence file. The
program re-samples codons to (1) avoid problems when analysing data under
models that assume coding structure (e.g., rates partitioned by sites), or
(2) when the user wishes to re-sample sites and maintain the original
autocorrelation among positions within the codon.
CodonBootstrap is available as a C source code that can be compiled for
Unix from
Jonathan Bollback's software web page
at http://www.simmap.com/bollback/software.html.
A Macintosh version that was formerly distributed seems not be available
any more.
Mark Clement, David Posada, and Keith Crandall of the
Universidad Vigo, Spain (Posada) and the Department of
Zoology, Brigham Young University, Provo, Utah (dposada (at) uvigo.es)
have released TCS version 1.21, a program for
estimating gene genealogies within a population. It does so by using the
method introduced in the paper: Templeton, A. R., K. A. Crandall and
C. F. Sing. 1992. A cladistic analysis of phenotypic associations with haplotypes inferred from restriction
endonuclease mapping and DNA sequence data. III. Cladogram estimation.
Genetics 132: 619-633.
This is a method that connects existing haplotypes in a minimum spanning
tree which is essentially a parsimony method. It can also infer
networks with loops in them.
TCS is written in Java and has a graphic user interface for the
display of the resulting networks. It may be run on any system that has the Java
runtime environment. The program is described in the paper:
Clement M., D. Posada, and K. Crandall. 2000. TCS: a computer program to
estimate gene genealogies. Molecular Ecology 9: 1657-1660.
It implements the estimation of the 95% parsimony
connection limit, and the estimation of outgroup weights (which
are used to designate the root of the tree). It takes as input
sequence files in NEXUS or PHYLIP format, and accepts absolute distances
between sequences as input.
The output is a Postscript picture of the tree, which can be saved as a
Postscript file.
TCS is available as Java executables, with documentation, at
its web site
at http://darwin.uvigo.es/software/tcs.html.
David Posada (dposada (at) uvigo.es), of the
Universidad Vigo, Spain, Keith Crandall, of the Department of Zoology,
Brigham Young University, Provo, Utah (Keith_Crandall (at) byu.edu)
and Alan Templeton, of the Department of Biology of Washington University, Saint
Louis, Missouri (temple_a (at) biology.wustl.edu) have made available
GEODIS (version 2.6).
It implements Templeton's method of Nested Clade Analysis, which
is intended to distinguish between historical divergence of populations
and geographical separation, using the geographical distribution of
haplotypes in a genealogy. GEODIS is a Java program which can run on
any platform. It is described in a paper: Posada D., K. A. Crandall and
A. R. Templeton. 2000. GeoDis: A program for the cladistic nested analysis of
the geographical distribution of genetic haplotypes. Molecular Ecology
9: 487-488. It is available at
its web site
at http://darwin.uvigo.es/software/geodis.html
Jon Jeffery
(jon (at) donnasaxby.com), then of the Insitute of Biology, Leiden University, The Netherlands
has written Parsimov, a series of Perl scripts to implement
"event cracking", a parsimony-based method of finding the minimum number of
changes in developmental sequences of events that are necessary to explain
the evolution of pairs of characters on a tree. Among the uses of this
method is to reconstruct ancestral developmental sequences.
The programs include:
- Parsimv7g.pl which implements event-pair "parsimony cracking".
- ReplacerParsimv.pl which takes a Parsimv7g.pl output file and replaces the PAUP* character numbers with more readable character names according to a user-specified text list.
- Describe.pl, which creates a PAUP*
command file to describe each tree in memory under ACCTRAN and DELTRAN optimizations (saving each as separate log files) plus a Parsimv7g.pl batch file (e.g., ParsBatch.txt) to crack each of the PAUP* log files produced.
The programs can be
executed on any system that has Perl installed. They are described in a
paper: Jeffery, J.E., O.R.P. Bininda-Emonds, M.I. Coates, and M.K. Richardson.
2005. A new technique for identifying sequence heterochrony. Systematic Biology 54: 230-240.
The Parsimov programs are available as (separate) downloads at
Olaf Bininda-Emonds's software web page
at
http://www.uni-oldenburg.de/molekularesystematik/en/34011.html#EvoDevo
David Swofford, of the Center for Evolutionary Genomics,
Duke University, Durham, North Carolina, together with Stewart Berlocher
of the Department of Entomology of the University of Illinois, Urbana,
Illinois wrote
Freqpars. It implements parsimony analysis based on gene
frequencies. The method was described by D. L. Swofford and S. H. Berlocher
in a paper in Systematic Zoology 36: 293-325, 1987. The program
is available in FORTRAN 77 source code. The search for most parsimonious
trees under Swofford and Berlocher's criterion is not very extensive,
Swofford notes,
because the individual tree evaluations are computationally difficult.
The source code in FORTRAN, with documentation, has been made
available (after a period of unavailability) at Swofford's PAUP web
site as one of a number of "companion applications".
To top of this page
To next section of software pages
Notices added in compliance with University of Washington
requirements for web sites hosted at the University:
Privacy
Terms