To previous part of Software pages
Jun Adachi and Masami Hasegawa
have written a package MOLPHY, version 2.3b3, carrying out maximum likelihood inference of phylogenies for either nucleotide sequences or protein sequences. Their protein sequence maximum likelihood program, ProtML, is a successor to the one they made available to me and which I formerly distributed on a nonsupported basis in PHYLIP. The package is distributed free in C source code, with documentation. MOLPHY is available from its web site fromhttp://www.ism.ac.jp/ismlib/softother.e.html
A monograph describing MOLPHY (number 28 in the Computer Science
Monographs of the Institute of Statistical Mathematics) is available
from the same source (see folder csm96 on the distribution web page),
as TeX source and as a .dvi
file. The monograph can also be ordered from the Institute.
An executable version of MOLPHY 2.2 for Windows95 or Windows NT on Intel
processors, and also one that works on Windows NT on DEC Alpha processors, is
available from Russell Malmberg at the Botany Department of the
University of Georgia (russell (at) plantbio.uga.edu
)
at his software
web site
at http://www.plantbio.uga.edu/~russell/index.php?s=1&n=5&r=0
Gary Olsen
, of the Department of Microbiology, University of Illinois, Urbana, Illinois (gary (at) life.uiuc.edu) has developed a speeded-up replacement for my program DNAML coded in C, called fastDNAml version 1.2.2. It achieves a number of economies and also is organized so that it can be run on parallel processors -- he and his co-workers have constructed trees of very large size on a high-speed parallel processor. The program can be compiled using the "p4" portable parallel processing toolkit. It can also be run in ordinary serial mode on workstations where it is faster than DNAML. fastDNAml is described in a paper: Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Computer Applications in the Biosciences (CABIOS) 10: 41-48. It is available in the following places:ftp.bio.indiana.edu
in directory molbio/evolve
.
http://packages.debian.org/unstable/science/fastdnaml
.
Bette Korber
of the Theoretical Division, Los Alamos National Laboratory , Los Alamos, New Mexico (btk (at) t10.lanl.gov) and her colleagues have released a version of fastDNAml which uses the REV (general reversible) model of DNA evolution. They used it for the results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288: 1789-1796. The program is available both in a version using the MPI Message-Passing Interface for parallel computers or a non-parallel version. It is available as C source code for Unix from the web site for the programs from that paper athttp://www.santafe.edu/~btk/science-paper/bette.html
.
Alexandros Stamatakis
(alexandros.stamatakis (at) h-its.org) of the Heidelberger Institut für Theoretische Studien, Heidelberg, Germany and his colleagues have released RAxML, version 7.2.8, a program for faster reconstruction of phylogenies by maximum likelihood. It provides faster heuristic search, use of parallel processing, and a simulated annealing algorithm, RAxML can also carry out parsimony, bootstrapping, and consensus tree methods. There are a number of papers describing RAxML:http://sco.h-its.org/exelixis/software.html
.
Daniele Silvestro and Ingo Michalak
https://sites.google.com/site/raxmlgui/
Thomas Keane
(thomas.m.keane (at) nuim.ie) and Thomas Naughton (tom.naughton (at) nuim.ie), both of the Department of Computer Science of the National University of Ireland, Maynooth have released DPRML, a distributed cross-platform tree-building program that can use the idle clock cycles of machines, allowing idle time on hundreds of machines to be harnessed for tree-building. It uses the PAL Java framework. It is described in a paper: Keane, T.M., T. J. Naughton, S. A. A. Travers, J. O. McInerney, and G. P. McCormack. 2005. DPRml: Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21: 969-974. DPRML can be downloaded from its web page athttp://distributed.cs.nuim.ie/dprml.php
Its authors note
that it is slower than their more recent distributed phylogeny platform
MULTIPHYL, and they urge use of that instead of DPRML.
T. M. Keane, T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack, of the Department of Computer Science at the the National University of Ireland, Maynooth, Ireland (tkeane (at) cs.nuim.ie ) have produced MultiPhyl,
version 1.06, a distributed phylogeny platform enabling maximum likelihood
runs across a large number of heterogeneous machines.
MultiPhyl is a high-throughput implementation of a distributed phylogenetics
platform that is capable of using the idle computational resources of many
heteogeneous non-dedicated machines to form a phylogenetics supercomputer. It
allows a user to upload hundreds or thousands of amino acid or nucleotide
alignments simultaneously and perform computationally intensive tasks such as
model selection, tree searching, and bootstrapping of each of the alignments.
The program implements a set of 80 amino acid models and 56 nucleotide ML
models and a variety of statistical methods for choosing between alternative models.
It is described in the paper:
Keane, T.M., T.J. Naughton, S.A.A. Travers, J.O. McInerney, and G.P. McCormack. 2005. DPRml: Distributed Phylogeny Reconstruction by Maximum Likelihood. Bioinformatics 21(7): 969-974.
It is available as Java code.
It can be downloaded from the downloads web site
at http://distributed.cs.nuim.ie/downloads.php
for the distributed Java-based platform produced by this group. The platform
itself can also be downloaded from the same site.
Multiphyl can also be tested by using their web server version.
Ziheng Yang
of the Department of Genetics and Biometry, University College London, (z.yang (at) ucl.ac.uk
) has released
PAML, version 4.4, a package of programs for the maximum
likelihood analysis of nucleotide or protein sequences, including codon-based
methods that take into account both amino acids and nucleotides.
The programs can estimate branch lengths in a phylogenetic tree and parameters
in the evolutionary model such as the transition/transversion rate ratio, the
gamma parameter for variable substitution rates among sites, rate
parameters for different genes, and synonymous and nonsynonymous substitution
rates. They can also test evolutionary models, calculate
substitution rates at particular sites, reconstruct ancestral nucleotide or
amino acid sequences, simulate DNA and protein sequence evolution,
compute distances based on the synonymous and nonsynonymous changes,
and of course do phylogenetic tree reconstruction by
maximum likelihood and Bayesian Markov Chain Monte Carlo methods. The
strength of the package lies in its rich implementation of
evolutionary models, though Yang coments that
tree-making is not a strong point of the current version.
Another notable point is the availability of codon models, which Yang
pioneered. The package is
available as Windows executables and as
C source code for Unix and MacOS X systems. An Old Versions folder
in the ftp site that distributes these also contains Mac OS executables for
the earlier versions 3.0a and 3.0c.
See the PAML
web page at http://abacus.gene.ucl.ac.uk/software/paml.html
where it is available.
Amy Egan and Joana Silva
http://ideanalyses.sourceforge.net/main.html
Tim Massingham and Nick Goldman
http://www.ebi.ac.uk/goldman/SLR/
Gangolf Jobb
(gangolf (at) treefinder.de), formerly of the Institut für Statistik of the University of München, Germany, has produced Treefinder, a maximum likelihood program for nucleotide sequence data. It makes available a variety of models of base change, including codon-position-specific models. It carries out search for best trees by its own method of tree rearrangement, and can assess statistical support for groups by either bootstrap or a local paired-sites method. All parameters of the models can be optimized by searching for the values that maximize the likelihood. The program is fast, and has both a graphical user interface and a general language in which its operation can be programmed. Trees can be interactively manipulated and constrained in various ways. Treefinder is described in a paper: Jobb, G., A. von Haeseler, and K. Strimmer. 2004. TREEFINDER: A powerful graphical analysis environment for molecular phylogenetics. BMC Evolutionary Biology 4: 18. It has been available for download from its web site athttp://www.treefinder.de
as executables for Windows, Mac OS X, and
Linux. It requires the Java runtime environment to be present.
However currently Jobb has declared himself "on strike" at this web site
and asks that people first email him to discuss whether he should be
compensated for his work. I do not know whether that means that the program
is available for free currently, or whether he will soon start charging for it.
He certainly deserves compensation for this good program.
Stéphane Guindon (currently at the University of Auckland, New Zealand,
s.guindon (at) auckland.ac.nz) and Olivier Gascuel (gascuel (at) lirmm.fr) at the
LIRMM, of the CNRS and the University of Montpellier II, France, have
released PHYML version 3.0, a fast maximum likelihood
program for
nucleotide or protein sequence data. It has 6 possible DNA substition
models, 5 amino acid substitution models, allowing estimation of many of
the model parameters, and can allow for a gamma
distribution of rates among sites and a proportion of invariable sites.
It can also do bootstrapping of the trees.
PHYML is described in a paper: Guindon, S., and O. Gascuel. 2003. A simple,
fast, and accurate algorithm to estimate large phylogenies by maximum
likelihood. Systematic Biology 52: 696-704. It is
available as Linux, SunOS, Windows, and Mac OS X executables from
its web site in Montpellier at
http://www.atgc-montpellier.fr/phyml/binaries.php
, where it is also available as a
web server.
Johan Nylander (Johan.Nylander (at) abc.se) has written BootPHYML version 3.4. This is a Perl script that performs bootstrapping using programs from PHYLIP , substituting PHYML for the PHYLIP program DNAML. It works with Mac OS X and Linux or Unix. It is available on a web page at Nylander's web site in Sweden.
Bastien Boussau
http://pbil.univ-lyon1.fr/software/nhphyml/
Bastien Boussau
http://pbil.univ-lyon1.fr/software/phyml_multi/
Pierre Rioux and Tim Littlejohnhttp://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/phylo/ParBoot/
. It is no longer available by ftp from Montréal.
It is described on a web page at the Université de Montréal at
http://megasun.bch.umontreal.ca/aboutpb.html
. It requires a
networked system of computers with PHYLIP, a Perl interpreter, and
appropriate accounts and permissions.
Laura Salter Kubatko
http://www.stat.ohio-state.edu/~lkubatko/software/ssa/ssa.html
Nir Friedman, Matan Ninio, Tal Pupko, Eval Privman, and Itshak Pe'er
http://compbio.cs.huji.ac.il/semphy/
Simon Whelan
http://www.bioinf.manchester.ac.uk/leaphy/Leaphy.htm
Daniele Catanzaro
http://homepages.ulb.ac.be/~dacatanz/Site/PhyloCoco.html
Vivek Gowri-Shankar
and Howsun Jow (vivek.gowri-shankar (at) s.man.ac.uk) of the Department of Computer Sciences of the University of Manchester, Manchester, U.K. have written PHASE, version 1.1, a software package for PHylogenetics And Sequence Evolution. It infers phylogenies with models for RNA evolution that include models for both paired sites and unpaired sites. The models for the unpaired sites have the usual 4 states, while the models for the paired sites have 6, 7, or 16 states, depending on the model chosen. The programs carry out a Bayesian Markov chain Monte Carlo (MCMC) analysis that samples trees from the posterior distribution given the data. PHASE is described in two papers:http://www.bioinf.man.ac.uk/resources/phase/
.
Le Sy Vinh
(vinh (at) cs.uni-duesseldorf.de) and Heiko Schmidt (heiko (at) cs.uni-duesseldorf.de) of the Institut für Bioinformatik of the University of Düsseldorf, Germany and Arndt von Haeseler (arndt.von.haeseler (at) univie.ac.at) of the Center for Integrative Bioinformatics Vienna (CIBIV), Austria, have written Phylogenetic Navigator (PhyNav) version 1.0. This program finds subsets of species in a dataset that are "minimal k-distance subsets" and analyses these each by maximum likelihood. Then it stitches these groups together using likelihood. This makes it possible to analyze larger datasets. The program is described in a paper: Vinh, L. S., H. A. Schmidt, and A. von Haeseler. 2005. PhyNav: A novel approach to reconstruct large phylogenies. pp. 386-393 in Classification, the Ubiquitous Challenge (Proceedings of the 28th Annual Conference of the GfKl 2004), ed. C. Weihs and W. Gaul. Series Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, Heidelberg/New York. It is available as Linux executables from its web site athttp://www.cibiv.at/software/phynav/
Shu-chuan (Grace) Chen
http://www.mixturetree.net/download.html
Morgan Price
http://www.microbesonline.org/fasttree/
Paul Michael Agapow
http://www.agapow.net/software/mac5
David Posada
(dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain and Keith Crandall of the Department of Biology, Brigham Young University released Modeltest version 3.7, a program to test a hierarchy of statistical models of DNA evolution using the Likelihood Ratio Test criterion and the AIC (Akaike Information Criterion). The likelihood values are obtained by running PAUP*. MODELTEST accepts likelihood scores corresponding to 56 models of DNA substitution including whether transition and transversion rates are equal, whether rates at different sites are equal, and whether there are invariant sites. Modeltest is described in the paper: Posada, D. and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817-818. It is available as executables for Macintosh, for Windows, and source code in C for that can be compiled on many other systems. It is distributed from its web site athttp://darwin.uvigo.es/software/modeltest.html
.
Modeltest was the basis for two further developments: the MrModeltest
program which uses MrBayes
and the FindModel server at Los Alamos National laboratories which
is a revised version of Modeltest that uses the weighbor program to infer the trees.
David Posada
(dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain has released jMODELTEST version 0.1.1, a Java version of Modeltest. Like Modeltest, it carries out statistical selection of best-fit models of nucleotide substitution. It implements five different model selection strategies: hierarchical and dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian information criteria (AIC and BIC), and a decision theory method (DT). It also provides estimates of model selection uncertainty, parameter importances and model-averaged parameter estimates, including model-averaged phylogenies. It is described in the paper: Posada D. 2008. jModelTest: Phylogenetic Model Averaging. Molecular Biology and Evolution 25: 1253-1256. It is distributed as Java executables that will run on Java-equipped Windows systems, on Mac OS X, and on Linux systems that have Java installed. It also uses PHYML to comput maximum likelihood trees under the various models. I do not know whether it comes with PHYML installed or requires the user to install it. jMODELTEST will be found at its web site athttp://darwin.uvigo.es/software/jmodeltest.html
Paulo Nuin (nuinp (at) mcmaster.ca) of the Department of Biology,
McMaster University, Hamilton, Ontario, Canada has released
MrMTgui version 1.01. This is a graphic user interface for
running Modeltest and MrModeltest. It is available for Windows
as executables from
the MrMTgui web site
at http://genedrift.org/mtgui.php
. Source code of a
Linux version is also available which can be compiled using the WxWindows
windowing software. The Linux sources are available by accessing a
svn (subversion) version-control code base, using instructions available
at the above site. MrMTgui was formerly known as MTgui in the earlier
version which could not access MrModeltest.
Johan Nylander (Johan.Nylander
(at) abc.se)
has released MrModeltest version 2.2. This is a program which is a simplified version of
Modeltest 3.7. It is performs hierarchical
likelihood ratio tests and calculates approximate AIC, AICc, and Akaike weights
of the nucleotide substitution models currently implemented in both
PAUP* and MrBayes.
Version 2 has added use of four different hierarchies for the likelihood ratio
tests and the selected model being printed in a MrBayes block.
MrModeltest is available as an executable and source code for Windows,
for Mac OS, and for Mac OS X, and as source code for Linux and Unix.
It is available from
Nylander's software download site
at http://www.abc.se/~nylander/
in Sweden.
http://www.abc.se/~nylander/
in Sweden.
Charles Bell
of the Department of Biology of Xavier University of Louisiana, New Orleans (cbell3 (at) xula.edu) has written Porn* (Phylogenetics On Rick's Network, as it was originally hosted on Rick Ree's site) verson 2.0, a Linux clone of Modeltest using the Python language. It enables command-line computations equivalent to Modeltest under the Linux operating system. It creates command blocks for PAUP* which can be used when running PAUP*. Porn* is written as a shell script invoking Python modules. It is available at its web site athttp://www.phylodiversity.net/cbell/pornstar/
David Posada
(dposada (at) uvigo.es) of the Department of Biochemistry, Genetics and Immunology of the University of Vigo, Spain has released ProtTest, version 2.4, a Java program allowing testing of 64 different models of protein evolution, using the AIC, AICc, and BIC criteria for choosing among models that include different substitution models, invariant sites, rate heterogeneity, and empirical amino acid frequency variants of the models. ProtTest uses the PAL library of phylogenetic java routines and also uses the PHYML program to compute likelihoods. It is described in the paper: Abascal, F., R. Zardoya and D. Posada. 2005. ProtTest: Selection of best-fit models of protein evolution. Bioinformatics 21: 2104-2105. It is available from its web site athttp://darwin.uvigo.es/software/prottest.html
Thomas Keane, of the Bioinformatics and
Pharmacogenomics Lab of the Department of Biology,
National University of Ireland, Maynooth
(thomas.m.keane (at) nuim.ie)
has written ModelGenerator, version 0.85.
It is a Java program for model selection that selects
amino acid and nucleotide substitution models using
Fasta or PHYLIP alignments.
It supports 56 nucleotide and 80 amino acid substitution models.
It is described in the paper: Keane, T. M., C. J. Creevey, M. M.
Pentony, T. J. Naughton and J. O. McInerney. 2006, Assessment of methods
for amino acid matrix selection and their use on empirical data shows that ad
hoc assumptions for choice of matrix are not justified. BMC Evolutionary
Biology 6: 29.
It is available from its web site at http://bioinf.may.ie/modelgenerator/
.
http://www.abc.se/~nylander/
in Sweden.
Vladimir Minin, Zaid Abdo, Paul Joyce, and Jack Sullivan
of the Department of Biological Sciences at the University of Idaho, Moscow, Idaho (jacks (at) uidaho.edu) or (vminin (at) u.washington.edu) (Minin is now at the University of Washington) have released DT-ModSel (Decision Theory MODel SELection), a performance-based method for selecting a likelihood model for phylogenetic estimation . It implements a model selection method which is based on the Bayesian Information Criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. It can compare 56 different models of molecular sequence evolution on a given tree. Minin, V., Z. Abdo, P. Joyce, and J. Sullivan. 2003. Performance-based selection of likelihood models for phylogeny estimation. Systematic Biology 52: 674-683. It is available as Perl script. It can be downloaded from its web site athttp://www.webpages.uidaho.edu/~jacks/DTModSel.html
Sergei Kosakovsky Pond
and Simon Frost of the Anitviral Research Center, University of California, San Diego and Spencer Muse of the Department of Statistics, North Carolina State University, Raleigh, North Carolina (muse (at) stat.ncsu.edu) have released HY-PHY (HYpothesis testing using PHYlogenies), version 0.99Beta. HY-PHY has general ways of enabling the user to perform a wide variety of statistical tests of different models of molecular sequence change. It is actually a higher-level programming language which enables the user to set up many different kinds of tests. The user can define their own alphabet of symbols and test any reversible subtitution model. Examples of tests that can be performed include molecular clock tests, relative rate tests, relative ratio tests, and tests of positive selection. It is described in a paper: Kosakovsky Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21(5): 676-679.Although not primarily intended as a phylogeny estimation package, it also can infer trees by Neighbor-Joining and UPGMA methods, and a number of search strategies are also available for likelihood inference. HY-PHY is freely available as executables for Mac OS, for Mac OS X, for Windows, and as source code for for Unix and Linux. It is available at the HY-PHY web page at http://www.hyphy.org.
Akifumi S. Tanabe
of the Division of Ecology and Evolutionary Biology, Department of
Environmental Life Sciences, Graduate School of Life Sciences
of Tohoku University, Japan
(astanabe (at) mail.tains.tohoku.ac.jp)
has released Kakusan4,
a parallelized nucleotide substitution model selection
script written in the Perl language for data sets with multiple partitions.
Kakusan3 supports nucleotide substitution model selection on each partition and/or each
codon position by AIC, AICc or BIC. Because the optimization of likelihoods
is executed using BASEML,
PAUP* or
Treefinder and these can be run in
parallel, Kakusan can take advantage of multi-core systems or multiple
processor systems. The Kakusan Perl script can be run on Windows,
MacOS X, Linux, FreeBSD and on other UNIX operating systems. It accepts
several different input file formats.
It outputs configuration files for Treefinder, MrBayes and PAUP*.
It is described in the paper:
Tanabe, A. S., 2007, Kakusan: a computer program to automate the selection of
a nucleotide substitution model and the configuration of a mixed model on
multilocus data. Molecular Ecology Notes 7: 962-964.
It is available as Perl script, Windows executables and Mac OS X universal executables. It can be downloaded from
its web site
at http://www.fifthdimension.jp/products/kakusan/
. Earlier
versions, Kakusan, Kakusan2, and Kakusan3 can also be downloaded there.
Jonathan Bollback
of the University of Edinburgh, Edinburgh, U.K., and of the Institute of Science and Technology, Austria (j.p.bollback (at) ed.ac.uk) has written MAPPS (Model Adequacy in Phylogenetics by Predictive Simulation) version 1.1.6, a program to evaluate the fit of a group of phylogenetic models to DNA sequence data. The rationale behind this approach is that an adequate model should be able to predict future data (nucleotide site patterns). In the absence of future data the model's predictive ability is compared to the original data set. The model's predictive ability is evaluated through simulation under the model. Comparison of simulated (or predictive) data sets is evaluated using the multinomial test statistic. The program uses data and trees in a format compatible with the output from MrBayes. It is described in the paper: Bollback, J. P. 2002. Bayesian model adequacy and choice in phylogenetics. Molecular Biology and Evolution 19(7): 1171-1180. It is available as Mac OS X universal executables. It can be downloaded from its web site athttp://www.simmap.com/bollback/software.html
Hidetoshi Shimodaira ("Shimo")
of the Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan (shimo (at) is.titech.ac.jp) has released CONSEL version 0.1k, a package of small programs to calculate P values for tests of phylogenies. It uses output from other phylogeny programs (in particular it can use output from PAUP, PAML, PHYML, and MOLPHY) which makes available to it the sitewise log-likelihoods for some trees and the trees themselves. It uses these to carry out the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, a weighted version of the SH test, and a new "approximately unbiased" test of Shimodaira's. CONSEL is available as C source code that will compile on Linux and Unix systems that have the gcc compiler, and it is also available as a DOS executable that will run on DOS or Windows systems. It can be downloaded from its web site athttp://www.ism.ac.jp/~shimo/prog/consel/index.html
.
It is described in a paper: Shimodaira, H. and M. Hasegawa. 2001.
CONSEL: for assessing the confidence of phylogenetic tree selection.
Bioinformatics 17: 1246-1247 which cites the statistical
papers describing the methods.
Hidetoshi Shimodaira
http://www.is.titech.ac.jp/~shimo/prog/scaleboot/index.html
Maria Anisimova, Olivier Gascuel, and Jean-François Dufayard
http://atgc.lirmm.fr/phyml/alrt/
This program was of temporary
usefulness; the
method was made available in PHYML 3.0 and should probably be used from that
program, athough these executables are still available for download.
Nick Grasslyhttp://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/dna/analysis/Plato/
Iain Milne, Dominik Lindner, and Frank Wright
http://www.topali.org
. Version 1 of TOPALi has been superseded
by version 2 but is also available, at
the version 1 web page
at http://www.topali.org/topali-v1/
Kim Fisker
, then of the Computer Science Department at Aarhus University, Denmark released RecPars, which does a parsimony analysis of DNA sequences. It was more recently maintained by Thomas Christensen of that department. It tries to find the best phylogenies for different regions of the sequences and thereby postulating a recombination event between these segments. The method is described in a paper: Hein, J. 1993. A heuristic method to reconstruct the history of sequences subject to recombination. Journal of Molecular Evolution 36: 396-406. RecPars is available as C source code for Unix. It is distributed from its web site athttp://www.daimi.au.dk/~compbio/recpars/recpars.html
.
A web server is available there as well.
Dan Gusfield (gusfield (at) cs.ucdavis.edu) and Ren-Hua Chung
(rchung (at) ucdaavis.edu), both of the Department of Computer Science
at the University of California, Davis, have released PPH
(Perfect Phylogeny Haplotyper). PPH takes a set of diploid genotypes for SNP
(single nucleotide polymorphism) markers, and infers haplotypes for them. It
does this by seeing whether it can find a set of haplotypes that resolve all
diploid genotypes and that fit onto a tree without requiring any extra changes
of nucleotides (in other words, they are all compatible with the same tree).
The result is not only the haplotype resolution but the resulting tree, if any.
The method is described in a paper: Gusfield, D., 2002 Haplotyping as perfect
phylogeny: conceptual framework and efficient solutions, pp. 165-175 in
Proceedings of RECOMB 2002, edited by G. Myers, S. Hannenhalli,
D. Sankoff, S. Istrail, P. Pevzner et al. ACM Press, New York. The program is
available as C++ and Perl source code, and as executables for Windows, for
SUN SPARC Solaris, for Intel/AMD-compatible Linux, and for Mac OS X from
its web site at http://wwwcsif.cs.ucdavis.edu/~gusfield/pph.html
.
Marc Suchard and Vladimir Minin
http://www.biomath.ucla.edu/msuchard/DualBrothers/
Karin Dorman
http://rumi.gdcb.iastate.edu/software/index.xml
Simone Linz, Achim Radtke, and Arndt von Haeseler
http://www.cibiv.at/software/hgt/
Darren P. Martin and Ed Rybicki
http://darwin.uvigo.es/rdp/rdp.html
An older
version, RDP2, is also available there, as is an "unstable" early release of
RDP4.
Robert Beiko and Nicholas Hamilton
http://bioinformatics.org.au/eeep
Gary Olsen
of the Department of Microbiology, University of Illinois, Urbana, Illinois (gary (at) phylo.life.uiuc.edu) has written dnarates version 1.1.0. It reads a set of DNA sequences and a tree, and for that tree makes a maximum likelihood estimate of the rate of evolution at each site. This is done by taking the rate at each site as a separate parameter and maximizing the likelihood with respect to all those parameters. The program is available as generic C source code. It is based in part (with my permission) on code from my PHYLIP program DNAML. dnarates is available from the IUBIO phylogeny software page athttp://iubio.bio.indiana.edu/soft/molbio/evolve/
Bette Korber
of the Theoretical Division, Los Alamos National Laboratory , Los Alamos, New Mexico (btk (at) t10.lanl.gov) and her colleagues have released RevDNArates which is a version of Gary Olsen's program dnarates which uses the REV (general reversible) model of DNA evolution and calculates the maximum likelihood estimate of rate of change at each site (one parameter per site). They used it for the results in the paper: B. Korber, M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinksy and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288: 1789-1796. The program is available as C source code for Unix from the web site for the programs from that paper athttp://www.santafe.edu/~btk/science-paper/bette.html
.
Sonja Meyer and Arndt von Haeseler, then of the Insititut
für Bioinformatik, Heinrich Heine Universität, Düsseldorf,
Germany (von Haeseler is now at the Center for Integrative Bioinformatics
Vienna, and his email address is arndt.von.haeseler (at)
&nbps;univie.ac.at) have released PARAT,
version 0.9.1. This program infers a phylogeny and also site-specific
evolutionary rates (one for each site). It can do so for up to 100 sequences
directly. Above 100 sequences, it samples sets of sequences and estimates
the rates from each such set, and then averages the resulting rates.
It is distributed as open source C source code, which can readily be compiled
and installed. PARAT is decscribed in a paper: Meyer, S. and A. von Haeseler.
2003. Identifying site specific substitution rates. Molecular Biology
and Evolution 20: 182-189. It is available at
its web site
at http://www.cibiv.at/software/parat/
Itay Mayrose
http://www.tau.ac.il/~itaymay/cp/rate4site.html
Itay Mayrose and Tal Pupko
http://www.tau.ac.il/~talp/MCMC/McRate.html
Jianzhi George Zhang, now of the Laboratory of Genomic and Molecular Evolution
in the Department of Ecology and Evolutionary Biology of the University of
Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
and Xun Gu, now at the Department of Genetics, Development, and Cell Biology at
Iowa State University, Ames, Iowa
(xgu (at) iastate.edu)
wrote GZ-Gamma, a program
for estimation of the expected number of substitutions at each amino acid
or nucleotide site and the shape parameter of a Gamma distribution of rates of
evolution at different sites. The program
takes a phylogeny and infers the sequences at interior nodes of the tree using
a Bayesian method, and then uses these to infer changes and make a histogram
of changes among sites, then using that to infer the shape parameter of a
Gamma distribution that fits that histogram. The method and
program was described in a paper:
Gu, X. and J. Zhang. 1997. A simple method for estimating the parameter of
substitution rate variation among sites. Molecular Biology and Evolution
14: 1106-1113. It is available as C source code and as MSDOS
executables from the software web site of Masatoshi Nei's lab in
which the work was done. A zip archive of the files can be downloaded from
the link there for “Gamma”. A documentation file is also available
there from the “readme” link.
Jessica Leigh, Ed Susko, Manuela Bumgartner, and Andrew Roger
http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#Concaterpillar
Haichun Wang, Matthew Spencer, Ed Susko, and Andrew Roger
http://www.mathstat.dal.ca/~hcwang/procov.html
Nick Goldman
(goldman (at) ebi.ac.uk) of the European Bioinformatics Institute, Hinxton, UK and his group have produced EDIBLE, a program for Experimental Design and Information By Likelihood Exploration, version 1.00. It allows the user to read in a phylogeny, explore the effect on the likelihood and on the information matrix (the second derivatives of the likelihood with respect to the parameters) and measures of overall information of changing branch lengths in the tree and moving branch lengths around. It also can carry out simulations, producing multiple data sets on the tree in question. The program is described in two papers:http://www.ebi.ac.uk/goldman/info/edible.html
at the EBI site.
Bret Larget, of the Departments of Statistics and Botany at the University of Wisconsin, Madison (larget (at) stat.wisc.edu) and Donald Simon
of the Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania (simon (at) mathcs.duq.edu) have written BAMBE (Bayesian Analysis in Molecular Biology and Evolution) version 4.01a, a program for Bayesian analysis of phylogenies with DNA sequence data. It uses a prior distribution of trees and arearrangement mechanism introduced in the paper: Mau, B., M. A. Newton, and B. Larget. 1997. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Molecular Biology and Evolution 14: 717-724. The trees and parameter values are sampled by a Metropolis algorithm Markov Chain Monte Carlo sampling. The resulting posterior distribution can be used to characterize the uncertainty about not only the tree, but the parameters of the substitution model as well. The program is in C++ source code for Unix, and is distributed from his web site athttp://www.stat.wisc.edu/~larget/
. A Windows executable
of an earlier version is also available there. The 2.03 and earlier
versions are also available at a web page at Duquesne
University. BAMBE is also
available as a web server
at the Institut Pasteur in Paris.
Mark Pagel and Andrew Meade
http://www.evolution.rdg.ac.uk/BayesPhy.html
Nicolas Lartillot
http://www.atgc-montpellier.fr/phylobayes/binaries.php
A server is here
John Huelsenbeck
http://mrbayes.net
.
Torsten Eriksson
of the Bergius Botanical Garden, Stockholm, Sweden (torsten (at) bergianska.se
)
makes available MrBayes tree scanners. These are
two Perl scripts that scan the output parameter files produced by MrBayes.
One saves the tree corresponding to the best sample. The other saves all
trees that contain a specific node (a specific grouping). They are
distributed together, and available from
his software distribution site
at http://www.bergianska.se/index_forskning_soft.html
.
Marc Suchard
http://www.biomath.ucla.edu/msuchard/software/software.htm
Alexei Drummond, of the Department of
Computer Science of the University of Auckland, New Zealand (alexei (at) cs.auckland.ac.nz)
and Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of
the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and
formerly of the Department of Zoology, University of Oxford,
Oxford, U.K., have developed BEAST (Bayesian Evolutionary
Analysis Sampling Trees), version 1.4.1. This is a general Bayesian
inference program for parameters of evolutionary models when the trees
are coalescent trees. A variety of nucleotide substitution models
including relaxed molecular clocks are allowed, and population models that
include exponential population growth and divergence time between populations
are included. Most of the analyses use Bayesian sampling to infer
parameters by averaging over the posterior on the trees. For the purposes
of this listing, the two relevant features are the ability to output a
sample of the trees, so that the program can be used for Bayesian tree
inference in clocklike models, and the ability to infer the divergence time
between populations. The general approach used by BEAST is described
in the paper: Drummond, A. J., G. K. Nicholls, A. G. Rodrigo, and W. Solomon.
2002. Estimating mutation parameters, population history and genealogy
simultaneously from temporally spaced sequence data. Genetics
161: 1307-1320. BEAST is available as a Java executable which will
run on any system with Java 1.4 or later. There are specific packages
available for Mac OS X and for Windows as well as the general distribution.
These are all distributed from
its web site at http://beast.bio.ed.ac.uk/Main_Page
Alexei Drummond, of the Department of
Computer Science of the University of Auckland, New Zealand
(alexei (at) cs.auckland.ac.nz)
and Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of the
Institute of Evolutionary Biology at the University of Edinburgh, Scotland, U.K.
have released Tracer, version 1.2. This is
a program for analyzing the results of Bayesian sampling runs using either
BEAST or MrBayes. It allows
analysis of the progress of sampling the parameters. For the purposes of
this listing, the relevant feature is an ability to use the trees sampled
by these programs to do a Bayesian skyline plot analysis of birth and death
rates of lineages. Tracer is available as a Java executable from its web site
at http://tree.bio.ed.ac.uk/software/tracer/
with specific packages for Mac OS X and Windows as well.
Johan Nylander
http://www.abc.se/~nylander
Dan Rabosky
http://www.eeb.cornell.edu/Rabosky/dan/software.html
and is
distributed at
its web page in the CRAN-R archive of R packages at http://cran.r-project.org/web/packages/laser/index.html
Pavel Morozov and Andrey Rzhetsky
http://amdec-bioinfo.cu-genome.org/html/misc/Pavel/phyllab.html
Peter Foster (p.foster (at) nhm.ac.uk) of the Natural History Museum,
London, England has released p4 version 0.81, a Python
package for maximum likelihood and Bayesian phylogenetic analyses of molecular
sequences. This is not a program with menus and buttons; it is invoked
using the Python language, which the user should know before attempting to
use it. It can do Bayesian inference of phylogenies, as well as computation
of likelihoods of trees. It also has facilties for viewing large trees and for
manipulation of trees. It needs Python 2.3 or better and the Gnu Scientific
Library (GSL) installed on the machine. It is distributed as Python source code
at its web site at
http://www.bmnh.org/web_users/pf/p4.html
Mike Charleston
(mcharles (at) it.usyd.edu.au
)
of the Sydney University Biological Informatics and Technology Centre,
Sydney, Australia
has developed Spectrum, a program for finding bipartition spectra
from phylogenetic
molecular and distance data, according to the method of Hendy et al.
(1994) (Hadamard transforms)
for moderately sized data sets (up to 18 taxa). The program also
implements a
branch-and-bound search for the "closest tree" - that is, the tree whose
expected spectrum is closest to the spectrum derived from the observed
data. Mac OS PowerMac, 68k Mac OS, and Windows executables are
available from his software
web site
at http://www.it.usyd.edu.au/~mcharles/
.
Ingrid Jakobsen
, Susan Wilson, and Simon Easteal, of Australian National University, Canberra, released partimatrix. (Ingrid Jakobsen is currently at the Department of Mathematics of the University of Queensland, Australia, i.jakobsen (at) uq.edu.au). This program computes a "partition matrix" from aligned DNA sequence data. The method finds partitions of the sequences into two groups and presents a matrix which describes the conflict and agreement among these partitions. The objective is to discover parts of the DNA sequence which imply different trees. It is described in the paper by I. B. Jakobsen, S. R. Wilson and S. Easteal. 1997. The Partition Matrix: Exploring variable phylogenetic signals along nucleotide sequence alignments. Molecular Biology and Evolution 14: 474-484. The program is distributed as C source code for Unix systems with X Windows. It seems not to be available from Dr. Jakobsen, but is ` available from a site at the Centro Nacional de Cálculo Científico de la Universidad de Los Andes, Venezuela athttp://www.cecalc.ula.ve/BIOINFO/servicios/herr1/PARTIMATRIX/manual.htm
Carla Cummins and James McInerney
http://bioinf.nuim.ie/tiger
Yasuo Ina
of the National Institute of Agrobiological Resources, Tsukuba, Japan developed ODEN version, a package of programs for doing distance matrix analyses on nucleotide or protein sequences. It is described in a paper: Ina, Y. 1994. ODEN: a program package for molecular evolutionary analysis and database search of DNA and amino acid sequences. Computer Applications in the Biosciences (CABIOS) 10: 11-12. It is available free by anonymous ftp from directorypub/unix/oden
on ftp.dna.affrc.go.jp
as C source code for Unix systems.
Angela Lüttke and Rainer Fuchs
(then of the European Molecular Biology Laboratory; Fuchs is currently at Biogen, Inc., Cambridge, Massachusetts) wrote MacT, a package of programs for Mac OS Macintoshes that compute distances and compute Neighbor-Joining phylogenies for them. The programs work on 4 through 26 sequences, and source code in Microsoft QuickBasic is provided as well as compiled executables. The package is free and is available on the molecular biology software servers. For example, it is available at the Indiana University IUBIO server athttp://iubio.bio.indiana.edu/soft/molbio/mac/
.
It is described in a paper: Luttke, A. and R. Fuchs. 1992. MacT: Apple Macintosh
programs for constructing phylogenetic trees. Computer Applications in
the Biosciences 8: 591-594.
Nicholas Galtier
of the University of Lyon (galtier (at) biomserv.univ-lyon1.fr
)
has written Phylo_win, a "graphic interface" for molecular
phylogenetic inference. It performs neighbor-joining, parsimony and
maximum likelihood methods and can bootstrap with any of them. Many distances
can be used including Jukes and Cantor, Kimura, Tajima and Nei, Galtier and Gouy
(1995), LogDet for nucleotidic sequences, Poisson correction for protein
sequences, Ka and Ks for codon sequences. Species and sites to include in the
analysis are selected by mouse. Reconstructed trees can be drawn, edited,
printed, stored, evaluated according to numerous criteria.
Taxonomic species groups and sets of conserved regions can be defined by
mouse in both tools and stored into sequence files, thus avoiding multiple
data files. It is entirely mouse-driven. Most usual sequence file formats are
read: CLUSTAL, FASTA, PHYLIP, MASE. It runs under X windows on many Unix
workstations.
It is described in the paper:
Galtier, N., M. Gouy, and C. Gautier. 1996. SeaView and Phylo_win, two graphic
tools for sequence alignment and molecular phylogeny. Computer Applications
in the Biosciences 12: 543-548.
Phylo_win is now considered by Galtier to have been superseded by his
program SeaView. Phylo_win is distributed as C source code (to compile it one
needs the NCBI Vibrant tool kit). It is also available
as executables for SunOS, Solaris, SGI Unix,
IBM RISC Unix, Linux, HP/UX, and DEC Alpha (Digital Unix). It can be
fetched from
its web page at http://pbil.univ-lyon1.fr/software/phylowin_legacy.html
.
It can also be obtained by anonymous ftp from
biom3.univ-lyon1.fr
in directory pub/mol_phylogeny
.
A Digital OpenVMS executable is
also available
as http://www.tmk.com/ftp/vms-freeware/mathog/
.
Heiko Schmidt, of the Center for Integrative Bioinformatics of the University of Vienna (heiko.schmidt (at) univie.ac.at), Korbinian Strimmer
, now at the Department of Statistics of the University of Münich, Germany (korbinian.strimmer (at) lmu.de), and Arndt von Haeseler, now at the Center for Integrative Bioinformatics Vienna (arndt.von.haeseler (at) &nbps;univie.ac.at) have developed TREE-PUZZLE version 5.2, (formerly called PUZZLE) a program for maximum likelihood analysis for nucleotide and amino acid alignments. TREE-PUZZLE infers phylogenies by "quartet puzzling", a method that applies maximum likelihood tree reconstruction to all possible quartets of taxa and subsequently tries to combine most of the four-taxa maximum likelihood trees to construct an overall maximum likelihood tree. Usually there are several possible solutions. A consensus tree generated from the quartet puzzling trees shows nodes that are well supported. More details about the algorithm and on the phylogenetic accuracy can be found in the papers:
Mike Holder
, formerly of the High Performance Computing Center of the University of Houston and Andrew Roger (aroger (at) is.dal.ca
) of the Department of
Biochemistry and Molecular Biology of Dalhousie University, Halifax, Canada
have produced a shell script program for
Unix systems, puzzleboot, version 1.03, that allows the
analysis of multiple bootstrapped data sets with
TREE-PUZZLE. It is
designed for use with the distance matrix option of TREE-PUZZLE, to make use of
the distance calculation methods.
It is available from
the Roger lab software page at
http://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#puzzleboot
Daniel Huson (huson (at) informatik.uni-tuebingen.de) of the ZBIT Center for Bioinformatics at the University of Tübingen, Germany and David Bryant (Bryant (at) math.auckland.ac.nz) of the University of Auckland, New Zealand, distribute a program SplitsTree for analysis of conflicts among splits implied by different quartets or different characters. It provides a number of methods for computing split networks from sequences (e.g. median networks), distances (e.g. split decomposition or neighbor-net) and trees (consensus networks and super-networks). Additionally, it contains simple combinatorial methods for computing hybridization networks and recombination networks. It can process sequence or restriction site data, and can do bootstrapping. It is discussed in the papers:
http://www.splitstree.org
.
These include
Igor Kuznetsov and Pavel Morozov
, then of the Institute of Cytology and Genetics, Novosibirsk, Russia produced GEOMETRY, a package for nucleotide sequence analysis using the method of statistical geometry in sequence space. Kuznetsov (ikuznetsov (at) albany.edu) is currently at the Department of Epidemiology and Biostatistics at the State University of New York in Albany, Morozov (pm259 (at) columbia.edu) is currently at the Irving Cancer Research Center at Columbia University. The method is described in this paper: Eigen, M., R. Winkler-Oswatitsch, and A. Dress. 1988. Statistical geometry in sequence space: A method of quantitative comparative sequence analysis, Proc. Natl. Acad. Sci. USA 85: 5913-5917. The program is described in the article: Kuznetsov, I. and P. Morozov. 1996. GEOMETRY: a software package for nucleotide sequence analysis using statistical geometry in sequence space. Computer Applications in the Biosciences (CABIOS) 12: 297-301. The package uses the same data formats for sequence and tree input as the ones used in the VOSTORG package. GEOMETRY is available as a DOS executable. It is available for downloading by ftp from the EMBL file serverftp.ebi.ac.uk
in directory
pub/software/dos
as file geom.zip.
Vincent Berry
of the LIRMM, Université de Montpellier, France (vberry (at) lirmm.fr) has released PhyloQuart version 1.4, a package of programs inferring phylogenies from quartets. It is able to use either nucleotide sequences or distances. It implements the Q* method of tree reconstruction, which is inspired by the work of Bandelt and Dress, and is described in the paper: Berry, V. and O. Gascuel. 2000. Inferring evolutionary trees with strong combinatorial evidence. Theoretical Computer Science 240: 271-298. PhyloQuart is available as C source code which can be compiled on Unix systems, from its web site athttp://www.lirmm.fr/~vberry/PHYLOQUART/phyloquart.html
.
http://www.cibiv.at/software/iqpnni/
Stephen J. Willson
(swillson (at) iastate.edu) of the Department of Mathematics, Iowa State University, has produced a package of programs to infer phylogenies from quartets of species. They infer phylogenies of individual quartets by parsimony, and in combining them use information on how strongly the phylogeny for that quartet is preferred over its alternatives, or by measures of how well the group fits into a given placement on a tree, as judged by quartets. The methods are described in two papers:
James Lake
of the Department of Molecular, Cell and Developmental Biology of the University of California, Los Angeles (lake (at) mbi.ucla.edu) has released Gambit, which implements a method called Boostrapper's Gambit. The method involves bootstrap sampling sequences, computing trees for quartets of species, and assembling larger trees out of quartets that have significant boostrap support. One of the methods available to estimate trees from the quartets is paralinear (LogDet) distances. Other distance methods and parsimony are also available. The Bootstrapper's Gambit method is described in a paper: Lake, J. A. 1995. Calculating the probability of multitaxation evolutionary trees: Bootstrappers gambit. Proceedings of the National Academy of Sciences, USA 92: 9662-9666. The program is available as a DOS executable, free as a beta release to noncommercial users on a trial basis until January 1, 2003. (It is unclear from the web site whether a free version is to be available to noncommercial users after that point -- a previous deadline was extended). Commercial users are asked to pay $50 on a shareware basis. The program is available at its web site athttp://genomics.ucla.edu/gambit/
.
Arne Röhl, Peter Forster, and Hans-Jürgen Bandelt
(Forster has more recently been Senior Lecturer Forensic Science in the Faculty of Science and Technology of Anglia Ruskin University, Cambridge, U.K.), e-mail address pf223 (at) cam.ac.uk, and Bandelt is at the Fachbereich Mathematik, University of Hamburg, Bundesstrasse 55, 20146 Hamburg, Germany, e-mail address bandelt (at) math.uni-hamburg.de) have written Network version 4.516, a program to infer networks (which have more connections than trees) from non-recombining DNA, STR, amino acid, and RFLP data. The networks are either reduced median networks or median-joining networks, method which are described in the papers:http://www.fluxus-engineering.com/sharenet.htm
.
Mike Hendy
, Katharina T. Huber, Michael Langton, Vincent Moulton, and David Penny have written Spectronet version 1.27, a program that computes a collection of weighted splits or partitions and allows the user to interactively analyze the results with a series of tools. Hendy and Penny are at Massey University, New Zealand (m.hendy (at) massey.ac.nz and d.penny (at) massey.ac.nz), Huber and Moulton are at the School of Computational Science of the University of East Anglia, U.K. (Katharina.Huber (at) cmp.uea.ac.uk and Vincent.Moulton (at) cmp.uea.ac.uk). Spectronet can read molecular sequence or discrete character data, compute splits by Hadamard conjugation or directly, compute and display compatibility matrices of characters, make reduced median networks, and plot networks by making a Lentoplot. Spectronet is described in a paper: Huber, K. T., M. Langton, D. Penny, V. Moulton and M. Hendy. 2002. Spectronet: A package for computing spectra and median networks. Applied Bioinformatics 1: 159-161. It is available as a Windows executable from its web site athttp://awcmee.massey.ac.nz/spectronet/index.html
.
Steven Kelk, Leo van Iersel, Judith Keijsper, and Leen Stougie
http://sourceforge.net/projects/level2/
and a general
web page about it is at
its web site
at http://homepages.cwi.nl/~kelk/level2triplets.html
Luay Nakhleh, Derek Ruths, and Cuong Than
http://bioinfo.cs.rice.edu/phylonet/index.html
Guohua Jin and Luay Nakhleh
http://bioinfo.cs.rice.edu/nepal/index.html
Rasmus Nielsen, of the Centre for Bioinformatics of the University of
Copenhagen, Denmark and of the Department of Integrative Biology of the
University of California, Berkeley (rasmus
(at) binf.ku.dk),
Jody Hey, of the Department of Genetics, Rutgers University, Picataway, New
Jersey
(hey (at) biology.rutgers.edu)
and Sang Chul Joi
have released IMa2, a program that estimates divergence
times between several populations along with the population sizes before and
after divergence, as well as the migration rate between the populations
after divergence. The program uses Markov chain Monte Carlo (MCMC)
coalescent methods. It is described in two papers:
It allows Bayesian inference from a number of loci, each assumed to
be without intra-locus recombination. It can use a DNA mutation model,
a stepwise microsatellite mutation model, or an infinite-sites model.
The program estimates the population sizes, the times of divergence,
each relative to the mutation rate. It can
also estimate growth rates of population sizes after speciation.
IMa2 is distributed as a Windows executable with generic
C source code that can easily be compiled on Unix systems including Mac OS X.
It is available from
its web page
at the Hey lab web site,
http://genfaculty.rutgers.edu/hey/software#IMa2
. Earlier
versions, IMa and IM, are also available there.
Liang Liu
http://www.stat.osu.edu/~dkp/BEST/
Ruchi Chaudhary, Mukul S. Bansal, André Wehe, David
Fernández-Baca, and Oliver Eulenstein
http://genome.cs.iastate.edu/CBL/iGTP/
Andrew Roger
, of the Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada (aroger (at) is.dal.ca) has written ELW (Expected Likelihood Weights), two PERL scripts -- elw.pl and calcwts.pl -- that, together with PAUP* and the PHYLIP program Seqboot can be used to implement the "expected likelihood weights" method of Strimmer and Rambaut, described in the paper by Strimmer, K. and A. Rambaut. 2002. Inferring confidence sets of possibly misspecified gene trees. Proceedings of the Royal Society of London Series B 269: 137-142. It calculates a confidence interval for the maximum likelihood tree using the variation of the likelihoods among bootstrap estimates of the tree. ELW can be downloaded from its entry on Roger's software web page athttp://rogerlab.biochemistryandmolecularbiology.dal.ca/Software/Software.htm#elw
Naoko Takezaki
of the Life Science Research Center of Kagawa University, Japan (takezaki (at) med.kagawa-u.ac.jp
)
wrote Lintre (Phylogenetic tests of
the molecular clock and linearized tree), a package of programs for
Sun workstations. The programs include:
http://www.kms.ac.jp/~genomelb/takezaki.eng.html#software
and also at
the Nei lab
software web site at
http://www.bio.psu.edu/people/faculty/nei/software.htm
. They
are also available at
by ftp from the IUBio
archive at http://iubio.bio.indiana.edu/soft/molbio/evolve/lintr/
.
Andrew Rambaut
(a.rambaut (at) ed.ac.uk)), of
the Institute for Evolutionary Biology, University of Edinburgh, Scotland, and
formerly of the Department of Zoology, University of Oxford,
has written TipDate version 1.2.
TipDate is an application for estimating the rate molecular evolution
(and hence a time-scale) for a
phylogeny consisting of dated tips. These will most frequently be from viruses
or other
fast-evolving pathogens that have been isolated over a range of dates. The
program can also return
the likelihood for the simple molecular clock model (i.e., assuming that all
sequences are
contemporary), for a model in which rates of change at different times are
drawn from a distribution, or the non-clock model. These are useful for
likelihood ratio tests of the fit of the model to the data.
TipDate is described in a paper: Rambaut, A. 2000. Estimating the rate of
molecular evolution: incorporating non-contemporaneous sequences into maximum
likelihood phylogenies. Bioinformatics 16: 395-399.
TipDate is available as Mac OS executables and as source code for
Linux or Unix from
the IUBIO software site
at
Thomas Wilcox,
Jotun Hein,
http://microbe.bio.indiana.edu:7131/soft/iubionew/molbio/evolution/evolve/TipDate/
.
It is also available in a web-based server version from the
Pasteur Institute server.
http://www.zo.utexas.edu/faculty/antisense/DownloadComputerPrograms.html
ftp.ebi.ac.uk
in directories pub/software/unix
and pub/software/vms
.
A widely-used multisequence alignment program
that estimates trees as it aligns multiple sequences is ClustalW. Currently it is in version 2.0.12. It is the latest incarnation of the Clustal family of tree-based alignment programs. Clustal was originally written by Des Higgins (now at the Conway Institute, University College Dublin, Ireland) (des.higgins (at) ucd.ie
), and later versions were developed by
Julie Thompson (now at the Institut de Génétique, et de Biologie
Moléculaire et Cellulaire at the Université de Strasbourg, France,
julie (at) igbmc.u-strasbg.fr
), Toby Gibson,
(Toby.Gibson (at) embl.de
), and
François Jeanmougin (jeanmougin
(at) igbmc.u-strasbg.fr
) and many
others.
Recent features include the
ability to detect read different input formats (NBRF/PIR, Fasta,
EMBL/Swissprot), align old alignments, produce phylogenetic trees after
alignment (Neighbor Joining trees with a bootstrap option), write different
alignment formats (Clustal, NBRF/PIR, GCG, PHYLIP) and the presence of
a full command line interface. Clustal exists in two major variants:
http://www.clustal.org
. The downloads of the current version
there are also
available by ftp from the
European Bioinformatics Institute ftp server.
For the older ClustalV, there exists a Macintosh Hypercard stack, ClustToTree, that can
convert its tree files to Newick Standard format (used by many other programs).
ClustToTree is
made available by Kai-Uwe
Fröhlich at the University of Graz, Austria at
http://aaa-proteins.uni-graz.at/Archiv/ClustToTreecomp.html
.
ClustalW is made available on web servers by the Genebee web server at the Belozersky Institute in Moscow, and at the European Bioinformatics Institute.
Cédric Notredame
of the Comparative Bioinformatics Group of the Center for Genomic Regulation (CRG), Barcelona, Spain (cedric.notredame (at) europe.com), Olivier Poirot, Fabrice Armougom, and Sebastien Moretti of the Centre National de la Recherche Scientifique Marseille-Nice Génopole, France have produced T-Coffee (Tree-based Consistency Objective Function For alignmEnt Evaluation), version 8.93. This is a multiple sequence alignment program that aims to improve on ClustalW. It is of the same general approach as ClustalW, a "progressive alignment" method, but it avoids some of the problems with the "greedy" nature of the ClustalW algorithm by taking into account more information about how the sequences all align with each other. T-Coffee is described in the paper: Notredame, C., D. Higgins, and J. Heringa. 2000. T-Coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology 302: 205-217. From the point of view of this listing, the relevant features of T-Coffee are that it makes a "guide tree" and can write that tree out. It also can read in a guide tree supplied by the user. Versions from 2.00 on can align both sequences and structures. T-Coffee is available as Unix source code which can easily be compiled, and as Linux, Mac OS X and Windows binaries. It is available from its web site athttp://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html
Ward Wheeler
of the Division of Invertebrate Zoology, American Museum of Natural History, New York (wheeler (at) amnh.org
) and
David Gladstein (gladstein (at) gladstein.org)
have written MALIGN, version 2.7, a parsimony-based alignment program for molecular sequences. It implements the original
suggestion by Sankoff, Morel, and Cedergren (1973) that alignment and
phylogenies could be done at the same time by finding that tree that minizes
the total alignment score along the tree. Jotun Hein's program TreeAlign
(mentioned above) is another, more approximate but possibly faster, attempt to
implement the Sankoff-Morel-Cedergren suggestion. MALIGN is one of the only
programs to calculate this optimality criterion exactly (Wheeler and
Gladstein's other program POY is the other).
MALIGN is described in a paper: Wheeler, W. C. and D. S. Gladstein. 1994.
MALIGN - A multiple sequence alignment program. Journal of Heredity
85: 417-418. MALIGN is available
from its download web site
at the Program in Scientific Computation of the American Museum of Natural
History at http://research.amnh.org/scicomp/projects/malign.php
.
It is available as C source code and as binaries for
Linux, Windows, Sun Solaris, SGI, and HPUX. The
C source code is distributed in two forms, the ordinary one and a special
version for parallel computation.
MiraiBio
, a Hitachi Software company DNASIS, a general-purpose DNA and protein sequence analysis system, produced by Molecular Biology Insights, Inc. of Cascade, Colorado (but sold through Hitachi). It has many functions including primer design, plasmid maps, contig assembly, alignment, database searching, and many kinds of protein plots. For our purposes what is relevant is the ability to do multiple sequence alignment by the Higgins-Sharp method of progressive sequence alignment (the one used in ClustalV), with one of the results being a UPGMA tree based on pairwise sequence alignment scores. DNASIS is available from MiraiBio as version 3.0 (called DNASIS MAX) Windows executables, including a demo version at its web site athttp://www.miraibio.com/dnasis-max/dnasis-max-overview.html
.
Prices are not stated there -- there is Order form that can be sent to
them by email. It was formerly also available from MBI, and at that time
a Windows version cost $1,895 and a Mac OS X version cost $2,995 for a 1-10
user network license.
Karl Nicholas
(karlnicholas (at) hotmail.com
)
with help from Hugh Nicholas (nicholas (at) psc.edu
)
of the National Resource for Biomedical Supercomputing (NRBSC; www.nrbsc.org)
at the Pittsburgh Supercomputing Center has produced GeneDoc,
version 2.6.0.2, a program for the shading and
editing of multiple sequence alignments. Its reads .MSF files and Fasta Files.
The alignment can be edited by changing the position of residues in the
sequences. GeneDoc includes scoring functions to assist in determining
whether your aligment changes are improving the score. Support for obtaining
a score via sum-of-pairs or by a phylogenetic tree is included. Phylogenetic
trees can be built with either the GUI interface or imported NEXUS or PHYLIP
format tree descriptions. The program runs on Windows
and both 16-bit and 32-bit executables are distributed.
The source code is also available there.
It can be downloaded from its Web site at http://www.nrbsc.org/gfx/genedoc/gddl.htm
A Windows NT version for Digital Alpha processors was formerly available from
Russell Malmberg at the Botany Department of the
University of Georgia but is not currently in distribution.
Ward Wheeler
of the Division of Invertebrate Zoology, American Museum of Natural History, New York (wheeler (at) amnh.org
),
David Gladstein (gladstein (at) gladstein.org)
and Jan De Laet of the Royal Belgian Institute of Natural Sciences,
Brussels (jdelaet (at) natuurwetenschappen.be) have written
POY version 4.1.2, a program that approximately implements
David Sankoff's method of
searching for the tree that minimizes a parsimony criterion that includes
penalties for gaps, accomplishing both searching for phylogenies and
alignments. POY has algorithmic improvements by Wheeler and Gladstein that
speed up the algorithm. (Their program MALIGN is the
only program carrying out the full Sankoff proposal).
POY implements two approximate methods, Fixed States Optimization and
Direct Optimization. The methods used are described in three papers:
http://research.amnh.org/scicomp/projects/poy.php
.
Russell Doolittle (rdoolittle (at) ucsd.edu) and Dafei Feng
, of the Section of Molecular Biology of the Division of Biological Sciences of the University of California at San Diego, released ALIGN in 1990. A version for Macintoshes was coded by Peter Markeiwicz. ALIGN implements the "progressive alignment" strategy described in their paper: Feng, D.-F. and R. F. Doolittle. 1987. Progressive sequence aligment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25: 351-360. This is also the basis for the Clustal family of programs as well as the (formerly distributed) Pileup program in the GCG package. The ALIGN program can align as well as print out a tree (which does not have branch lengths). It uses Doolittle's own formats, and so three other programs are included with ALIGN to convert formats. The programs are distributed by ftp from the EBI ftp software server at ftp.ebi.ac.uk in directory pub/software/mac as file align.hqx. A set of C source programs presumably equivalent to these is also made available by Milton Saier at UCSD on a web page athttp://www-biology.ucsd.edu/~msaier/transport/software.html
.
Roland Fleißner
http://www.cibiv.at/software/alifritz/
Ben
Redelings, currently of the National Evolutionary Synthesis Center
(benjamin.redelings (at) nescent.org) and Marc Suchard
http://www.biomath.ucla.edu/~msuchard/bali-phy/
Kazutaka Katoh, Hiroyuki Toh, K. Kuma, T. Miyata, and K. Misawa
http://align.bmr.kyushu-u.ac.jp/mafft/software/
A
web server is also available there.
Robert C. Edgar and Kimmen Sjolander
http://www.drive5.com/lobster/
There is also a
web server available to run
the program.
Robert Edgar
http://www.drive5.com/muscle/
Manolo Gouy
http://pbil.univ-lyon1.fr/software/seaview.html
Pietro Liò, of the Computer Laboratory at the
University of Cambridge (Pietro.Lio (at) cl.cam.ac.uk
),
has written PASSML and PASSML_TM,
which use likelihood methods with Hidden Markov models to infer
phylogeny and also secondary structure from protein data. PASSML is for
general proteins and PASSML_TM is for membrane proteins.
The methods used are described in the papers: Goldman, N., J. L. Thorne,
and D. T. Jones. 1998. Assessing the impact of secondary structure and
solvent accessibility on protein evolution. Genetics 149:
445-458,
PASSML is described in the paper: Liò, P., N. Goldman, J. L. Thorne
and D. T. Jones. 1998. PASSML: combining evolutionary inference and protein
secondary structure prediction. Bioinformatics 14: 726-733,
and PASSML_TM is described in the paper:
Liò, P. and N. Goldman. 1999 Using protein structural information in
evolutionary inference: transmembrane proteins. Molecular Biology and
Evolution 16: 1696-1710.
The programs are available as ANSI C source code.
The source code is available via
its web page at
http://www.ebi.ac.uk/goldman/hmm/passml.html
.
Rod Page (r.page (at) bio.gla.ac.uk
), of
the Division of Environmental and Evolutionary Biology of the University
of Glasgow has written COMPONENT version 2.0, a program for
Windows systems for
comparing cladograms for use in phylogeny and biogeography studies. It has
many tree comparison and consensus methods, and far more features for
biogeographic studies (such as comparing species and area cladograms) than any
other package. It also can generate random trees. It runs under Windows 3.0
or higher. There is a review of the
program in: Slowinksi, J. 1993. Review of Component, Version 2.0, by Roderick D. M. Page. Cladistics 9: 351-353. COMPONENT is
available free from its
web site at
http://taxonomy.zoology.gla.ac.uk/rod/cpw.html
.
Source code in Pascal and documentation (as PDFs) are also available there.
A very early development Macintosh version ("COMPONENT Lite") is available
from the
COMPONENT Lite web site
at http://taxonomy.zoology.gla.ac.uk/rod/cplite/guide.html
.
Rod Page
(r.page (at) bio.gla.ac.uk
), of the Division of Environmental and
Evolutionary Biology of the University of Glasgow
and Michael Charleston (mcharles (at) it.usyd.edu.au
)
of the Biological Informatics and Technology Centre of the School of
Information Technologies of the University of Sydney,
Sydney, Australia have written TREEMAP, version 3,
a free program for comparing host and parasite
phylogenies. It allows you to interactively compare host and parasite
trees, construct reconstructions of the history of the association, and
perform some simple randomisation tests of hypotheses of cospeciation.
It also can use Charleston's "Jungles" method to fit parasite trees to host
trees by
parsimony. That method is described in his paper: Charleston, M. A. 1998
Jungles: A new solution to the host/parasite phylogeny reconciliation problem.
Mathematical Biosciences 149: 191-223.
For a description of the method used by TreeMap, see Page, R.D.M. 1994.
Parallel phylogenies: Reconstructing the history of host-parasite
assemblages. Cladistics 10: 155-173.
It can also estimate the number of randomized parasite trees that map as well
to the host tree as does the original parasite tree.
The program is available as a Java executable, which can be downloaded from
its web site
at http://www.it.usyd.edu.au/~mcharles/software/treemap/treemap3.html
.
A beta release executable for Mac OS of version 2.0, called version 2.0β,
is available at
the
Treemap 2.0β web site
at http://www.it.usyd.edu.au/~mcharles/software/treemap/treemap.html
.
An earlier version, 1.0,
is available as an executable for Mac OS or as an executable for Windows PCs.
They can be downloaded from
its WWW site: http://taxonomy.zoology.gla.ac.uk/rod/treemap.html
.
Fredrik Ronquist (Fredrik.Ronquist
(at) nrm.se)
of the Naturhistoriska riksmuseet, Stockholm, Sweden
has released DIVA version 1.2, a program for
DIspersal Vicariance Analysis. It is for analyses in historical
biogeography, where one is reconstructing the distribution history of a group
of organisms from the distribution areas of extant species and their phylogeny.
It is a parsimony-style analysis based on optimization of the numbers of
dispersal and extinction events, where one assumes that speciations divide
species ranges allopatrically. It does not make any assumption about
the hierarchical nature of vicariance events.
It was formerly available as either a Windows executable or a Mac OS executable from
its web page
at http://www.ebc.uu.se/systzoo/research/diva/diva.html
.
Currently there is some download, not well described, including perhaps source
code, available from the Sourceforge site at
http://diva.sourceforge.net/
.
Yu Yan
http://mnh.scu.edu.cn/s-diva/
Fredrik Ronquist (Fredrik.Ronquist
(at) nrm.se)
of the Naturhistoriska riksmuseet, Stockholm, Sweden
has written TreeFitter version 1.0. It fits parasite
trees to a host tree, and can also use them to infer the best host tree.
The program, which has many options, uses an event-based parsimony
method, which penalizes events using penalties chosen to reflect their
improbability. The NEXUS file format is used for the tree files.
It is available from its web site at
http://www.ebc.uu.se/systzoo/research/treefitter/treefitter.html
as either a Windows executable or a Mac OS executable. An on-line manual
is available at the web site.
Steffen Junick, Daniel Merkle, and Martin Middendorf
http://pacosy.informatik.uni-leipzig.de/pv/Software/Tarzan/PV-Tarzan.engl.html
Pierre Legendre
http://www.bio.umontreal.ca/casgrain/en/labo/parafit.html
Alexandros Stamatakis, A. Auch, J. Meier-Kolthoff, and M. Göker
http://icwww.epfl.ch/~stamatak/AxParafit.html
Daniel Merkle, Martin Middendorf, and Nicolas Wieseke
http://pacosy.informatik.uni-leipzig.de/58-1-Downloads.html
Ran Libeskind-Hadas
http://www.cs.hmc.edu/~hadas/jane/
Athanasia C.
Tzika, Raphaël Helaers, and Michel Milinkovitch
It allows the user to identify gains and losses on specific branches of the
tree, see the genome content of ancestral species, statistically over- or
under-represented molecular functions, biological processes and anatomical
systems (expression data), and reconstruct tissue specificity of gained,
duplicated, and lost genes.
It is described in the paper:
Tzika, A. C., R. Helaers, Y. Van de Peer and M. C. Milinkovitch. 2008. MANTiS: a phylogenetic framework for multi-species genome comparisons. Bioinformatics 24(2): 151-157.
It is available as Java executables with a Windows executable installer, a
Linux executable installer, and a Mac OS X universal executable installer. It
can be downloaded from
its web site
at http://www.mantisdb.org
Jonathan Bollback, of the
Institute of Science and Technology Austria,
Klostemeuberg, Austria (bollback
(at) ist.ac.at)
has written SIMMAP (SIMulation MAPping) version 1.5.2.
It stochastically maps characters onto a tree, given the tree and a
probability model of character change among discrete states. It can handle
general models of nucleotiden substitution as well as the Mk model of change
of discrete morphological characters. It is also able
to estimate covariation between molecular or morphological characters,
estimate dN and dS, while accounting for model and tree uncertainty,
and estimate a wide variety of descriptive statistics for patterns in
molecular or morphological evolution.
The method it uses was introduced in the papers:
http://www.simmap.com
at the University of Copenhagen.
Liran Carmel, Yuri I. Wolf, Igor B. Rogozin, and Eugene V. Koonin
http://carmelab.huji.ac.il/software/EREM/erem.html
Antonio Marco and Ignacio Marín
http://www.uv.es/~genomica/treetracker/
Jianzhi George Zhang of the Department of Ecology and Evolutionary
Biology of the University of Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
produced Ancestor, a program for inferring the ancestral
protein sequence of a set of species from their protein sequences.
The tree of the sequences is inferred by the minimum evolution
distance matrix method of Rzhetsky and Nei. I can estimate the ancestral
sequences at all nodes of the tree. The methods are described in a
paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid
sequences inferred by the parsimony, likelihood, and distance methods.
Journal of Molecular Evolution 44: S139-S146.
The program is distributed as a DOS executable with C source code. It will
run in a Windows Command Prompt window. It is
available from Masatoshi Nei's lab software site software site
at
Jianzhi George Zhang of the Department of Ecology and Evolutionary
Biology of the University of Michigan, Ann Arbor, Michigan
(jianzhi (at) umich.edu)
has produced ANC-GENE, a program to infer ancestral
protein and DNA sequences from DNA sequences of a coding gene when the
phylogeny of the species is known. It first infers the amino acids by a
distance-based Bayesian method, and then infers the underlying nucleotide
sequences by fixing the inferred amino acids. It estimates branch lengths
on the phylogeny by a distance method before inferring the ancestral sequences.
It uses one of two possible models of amino acid changes (the Poisson-f or
JTT-f models), as well as the Jukes-Cantor model of nucleotide substitution.
It outputs both inferred pathways of change at each amino acid position and
inferred sequences at each node of the tree. The methods are discussed in'
this paper: Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid
sequences inferred by the parsimony, likelihood, and distance methods.
Journal of Molecular Evolution 44 (Suppl 1): S139-S146.
ANC-GENE is available as a DOS executable and C souce code. These
can be executed in
Windows in a Command Prompt windows. It can be downloaded from
the Nei laboratory software web site
at
Xun Gu, of the Department of Genetics, Development and Cell Biology
and the Center for Bioinformatics and Biological Statistics at
Iowa State University, Ames, Iowa (xgu (at) iastate.edu) has
release Mgenome version 1.0. It finds trees for multiple
genome rearrangement by signed reversals. For a collection of genomes
represented by signed permutations of genes, it finds a tree that connects
all given genomes by reversal paths such that the number of all signed
reversals is as small as possible. The methods seem to be described in a paper:
Wu, S., and X. Gu. 2003. Algorithms for multiple genome rearrangement by
signed reversals. Pacfic Symposium on Biocomputing 8: 363-74,
although the paper does not refer to the program.
The paper is available as a PDF at the Gu lab web site.
The program is available as a Windows executable
at the Gu lab software web site at
Benjamin Vernot, Aiton Goldman and Dannie Durand
Olivier Elemento,
then of the IMGT, the International imMunoGeneTics database and the LIRMM (Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier)
of the Université de Montpellier II, Montpellier, France
(He is now at the Institute for Computational Biomedicine
at Weill Cornell Medical College in New York City and his email
address is
ole2001 (at) med.cornell.edu)
has written DTscore
(Duplication, Tandem - score),
a distance-based tandem duplication tree reconstruction program. It takes as input a distance matrix between copies in a family of tandem repeats. The rows and columns need to be ordered in the same way as the copies are in the locus. DTscore can be applied to relatively large datasets (more than a hundred copies).
It is described in the paper:
Elemento O. and O. Gascuel. 2002. A fast and accurate distance algorithm to reconstruct tandem duplication trees. Bioinformatics 18: S92-S99.
It is available as C source code, Windows executables and Linux executables. It can be downloaded from
its web page at
Michael Sanderson
https://homes.bio.psu.edu/people/faculty/nei/software.htm
https://homes.bio.psu.edu/people/faculty/nei/software.htm
http://xungulab.com/software.html
.
Mathieu Blanchette, of the School of Computer Science, McGill
University, Montréal, Québec
(blanchem (at) mcb.mcgill.edu) has written BPAnalysis,
a program that infers phylogenies from a set of gene orders by minimizing
the number of breakpoints required in genome rearrangement (this is not the
same as minimizing the number of rearrangement events). It is a C++
program which is also distributed in source code and in an executable
for DOS and Windows. The method employed is described in the paper:
Sankoff, D. and M. Blanchette. 1998. Multiple genome rearrangement and
breakpoint phylogeny. Journal of Computational Biology 5:
555-570. It is available from
Blanchette's software page
at http://www.mcb.mcgill.ca/~blanchem/software.html
http://www.cs.cmu.edu/~durand/Notung/
http://www.lirmm.fr/%7Eelemento/DTscore/
and also at
its web site at
ATGC
at http://www.atgc-montpellier.fr/dtscore/binaries.php
http://loco.biosci.arizona.edu/gtp/gtp.html