EuGene is an open integrative gene finder for eukaryotic and prokaryotic genomes. Compared to most existing gene finders, EuGene is characterized by its ability to simply integrate arbitrary sources of information in its prediction process, including RNA-Seq, protein similarities, homologies and various statistical sources of information. Learn more...



EuGene-PP (Prokaryote Pipeline) facilitates the application of EuGene on prokaryotic genomes, integrating any type of oriented gene expression information (RNA-seq or Tilling arrays, supporting all usual file formats), protein similarities, output of existing CDS and ncRNA predictors, and statistical information. Beyond the usual CDSs, the resulting annotation contains RNA-Seq based TSSs predictions and (possibly anti-sense) ncRNA genes. It can run using just FASTA genomic sequences and expression data, and has no parameter to tune (by default).
For more information:
EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes E. Sallet, J. Gouzy, T. Schiex. Bioinformatics 2014
Next-generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011. E. Sallet et al. DNA Res. 2013

Download EuGene-PP: egnpp-Linux-x86_64.1.0.tar.gz.



Download a simple test dataset for EuGene-PP: EuGenePP_dataset.tar.gz.

- N E W S -
EuGene-PP v1.0 is available. [Download]
EuGene v4.1c is able to predict spliced starts.
EuGene v4.1 allows non canonical splice site prediction.
EuGene v4.0 is able to annotate prokaryotic genomes.


- L A S T U P D A T E -
EuGene-PP v1.0 : Jan. 2014
EuGene v4.1c: Nov. 2013



EuGene presentation

As most existing gene finders, EuGene can exploit probabilistic models like Markov models for discriminating coding from non coding sequences or to discriminate effective splice sites from false splice sites (using various mathematical models). Beyond this EuGene is able to integrate information from several signal (splice site, translation start...) prediction software, similarity with existing sequences (EST, mRNA, 5'/3' EST from full length mRNA, proteins, genomic homologuous sequences) and output of existing gene finders... Based on all the available information, EuGene will output a prediction of maximal score i.e., maximally consistent with the information provided.

EuGene graphical output
Each source of information is integrated in EuGene by a small independant software component, called a "plugin". The plugin is responsible for the integration of the information but also for plotting the information on the graphical output of EuGene (if needed) and can also analyze the inconsistencies between the final prediction and the information provided.

There exists a large variety of plugins currently but if needed EuGene's users have the ability to extend EuGene. This can be done using two different approaches. One simple approach is to use the "Annotastruct" plugin. This plugin allows to inject information in EuGene using a GFF file. For the more serious user, it is possible to write a new plugin directly (in C++) and to load it dynamically into EuGene (without recompilation of eugene).
EuGene has been used extensively on the Arabidopsis genome where it has shown its excellent prediction quality. Recent updates of TAIR include hundreds of new genes predicted by EuGene which have been validated by RACE by TIGR [Reference]. It has been adapted to other plant and related organims. EuGene has been developed with funding from INRA and Génoplante.
The software is now OSI Certified Open Source Software under the terms of the Artistic License.