Another bioinformatic tool: PyANI
Overview
Objectives
To calculate average nucleotide identity of a genome with related genomes.
In the regular lessons, we implemented three bioinformatic tools:
blast for homology search,
maaft for sequence alignment, and
raxml for phylogenetic analysis.
In this section, we will discuss two more tools.
PyANI
PyANI is an open-source python-based tool for calculating
Average Nucleotide Identity (ANI) between two or more sequences.
When comparing two genomes, first syntenic regions are identified
using tools such as mummer or blast.
Then the nucleotide identity is calculated in the syntenic regions.
The source code for PyANI is available at widdowquinn/pyani. The documentation for basic usage is available here.
The first stage of PyANI is
PyANI v2 is available in Hipergator, but has to be loaded.
The dependencies, mummer and blast+ will be loaded together with pyani.
$ ml pyani
Lmod is automatically replacing "python/3.8" with "pyani/0.2.10".
We will be using the genomes present in files/ani for computing ANI.
The file UXhortspp.fasta contains genome of a unknown X. hortorum species.
The other sequences are genome of some X. hortorum pathovars
downloaded from NCBI.
Getting genome sequences from NCBI
PyANI has a script called
genbank_get_genomes_by_taxon.pyto download all genomes for a taxon from NCBI. For usage, check the documentation linked above.
The objective now is to perform pairwise comparisons of all reference genomes and calculate ANI. This can be performed with following command.
average_nucleotide_identity.py -i files/ani -o ani -m ANIm -g --gformat png,pdf
average_nucleotide_identity.pyis the name of the script-iis used to specify directory containing input genomes/sequences.-ois used to specify output directory. Note that the program will exit if this directory preexists.-mis used to specify mode for alignment of syntenic region.ANImspecifiesmummerandANIbspecifiesblast+.-gis used to generate graphic output, i.e., heatmap.--gformatspecifies the graphic output formats.
$ ls ani
ANIm_alignment_coverage.pdf ANIm_hadamard.pdf ANIm_similarity_errors.pdf
ANIm_alignment_coverage.png ANIm_hadamard.png ANIm_similarity_errors.png
ANIm_alignment_coverage.tab ANIm_hadamard.tab ANIm_similarity_errors.tab
ANIm_alignment_lengths.pdf ANIm_percentage_identity.pdf nucmer_output.tar.gz
ANIm_alignment_lengths.png ANIm_percentage_identity.png
ANIm_alignment_lengths.tab ANIm_percentage_identity.tab
You can now transfer ANIm_percentage_identity.png
to your computer to view the heatmap.

You can also get numeric ANI values from the ANIm_percentage_identity.tab file.
$ awk 'NR==1{print "-"$0; next}{for (i=2; i<=NF; i++) {$i=substr($i,1,5)}; print $0}' ani8/ANIm_percentage_identity.tab | column -t
- Xpopuli Xhgardneri Xhcynarae Xhhederae Xhvitians Xhunknown Xhcarotae Xhtaraxaci Xhpelargonii
Xpopuli 1.0 0.913 0.913 0.912 0.912 0.913 0.913 0.912 0.913
Xhgardneri 0.913 1.0 0.993 0.960 0.982 0.999 0.961 0.974 0.958
Xhcynarae 0.913 0.993 1.0 0.960 0.984 0.993 0.961 0.975 0.958
Xhhederae 0.912 0.960 0.960 1.0 0.960 0.960 0.965 0.955 0.961
Xhvitians 0.912 0.982 0.984 0.960 1.0 0.982 0.961 0.976 0.958
Xhunknown 0.913 0.999 0.993 0.960 0.982 1.0 0.961 0.974 0.958
Xhcarotae 0.913 0.961 0.961 0.965 0.961 0.961 1.0 0.956 0.961
Xhtaraxaci 0.912 0.974 0.975 0.955 0.976 0.974 0.956 1.0 0.954
Xhpelargonii 0.913 0.958 0.958 0.961 0.958 0.958 0.961 0.954 1.0
Based on ANI, the unknown strain seems closest to X. hortorum pv. gardneri.