EuGene is an open integrative gene finder for eukaryotic
and prokaryotic genomes. Compared to most existing gene finders, EuGene is characterized by its ability
to simply integrate arbitrary sources of information in its prediction process, including RNA-Seq,
protein similarities, homologies and various statistical sources of information.
EuGene-PP (Prokaryote Pipeline) facilitates the application of EuGene on prokaryotic genomes, integrating
any type of oriented gene expression information (RNA-seq or Tilling arrays, supporting all usual file formats), protein
similarities, output of existing CDS and ncRNA predictors, and statistical information. Beyond the usual CDSs, the resulting
annotation contains RNA-Seq based TSSs predictions and (possibly anti-sense) ncRNA genes. It can run using
just FASTA genomic sequences and expression data, and has no parameter to tune (by default).
For more information:
EuGene-PP: a next-generation automated annotation pipeline for
prokaryotic genomes E. Sallet, J. Gouzy, T. Schiex. Bioinformatics 2014
Next-generation Annotation of Prokaryotic Genomes
with EuGene-P: Application to Sinorhizobium meliloti 2011. E. Sallet et al. DNA Res. 2013
Download EuGene-PP: egnpp-Linux-x86_64.1.0.tar.gz.
Download a simple test dataset for EuGene-PP: EuGenePP_dataset.tar.gz.
- N E W S -
EuGene-PP v1.0 is available. [Download]
EuGene v4.1c is able to predict spliced starts.
EuGene v4.1 allows non canonical splice site prediction.
EuGene v4.0 is able to annotate prokaryotic genomes.
- L A S T U P D A T E
EuGene-PP v1.0 : Jan. 2014
EuGene v4.1c: Nov. 2013
As most existing gene finders, EuGene can exploit
probabilistic models like Markov models for
discriminating coding from non coding sequences or
to discriminate effective splice sites from false
splice sites (using various mathematical models).
Beyond this EuGene is able to integrate information
from several signal (splice site, translation
start...) prediction software, similarity with
existing sequences (EST, mRNA, 5'/3' EST from full
length mRNA, proteins, genomic homologuous
sequences) and output of existing gene finders... Based on all the available information,
EuGene will output a prediction of maximal score i.e.,
maximally consistent with the information provided.
EuGene graphical output
Each source of information is integrated in EuGene
by a small independant software component, called
a "plugin". The plugin is responsible for the
integration of the information but also for
plotting the information on the graphical output
of EuGene (if needed) and can also analyze the
inconsistencies between the final prediction and
the information provided.
a large variety of plugins
currently but if needed EuGene's users have the
ability to extend EuGene. This can be done using
two different approaches. One simple approach is
to use the "Annotastruct"
plugin. This plugin allows to inject information
in EuGene using a GFF file. For the
more serious user, it is possible to write a new
plugin directly (in C++) and to load it
dynamically into EuGene (without recompilation of
EuGene has been used extensively on the
genome where it has shown its excellent prediction
quality. Recent updates of TAIR include hundreds of new genes predicted by EuGene which have been validated by RACE by TIGR [Reference]. It has been adapted to
other plant and related
organims. EuGene has been developed with funding from
The software is now OSI Certified Open Source Software under
the terms of the Artistic License.