MGEScan Command Line Interface (In Progress)

MGEScan provides Command Line Interface (CLI) along with Galaxy Web Interface. You can run MGEScan-LTR and MGEScan-nonLTR programs on your shell terminal.


If you have installed MGEScan on Galaxy, MGEScan CLI tools are available on your system.


Do you need to install MGEScan? See here for Installation. Follow the instructions except the Galaxy. You can skip the Galaxy installation if you need MGEScan CLI tools only.


Try mgescan -h on your terminal:

(mgescan)$ mgescan -h
MGEScan: identifying ltr and non-ltr in genome sequences

        mgescan both <genome_dir> [--output=<data_dir>] [--mpi=<num>]
        mgescan ltr <genome_dir> [--output=<data_dir>] [--mpi=<num>]
        mgescan nonltr <genome_dir> [--output=<data_dir>] [--mpi=<num>]
        mgescan (-h | --help)
        mgescan --version

        -h --help   Show this screen.
        --version   Show version.
        --output=<data_dir> Directory results will be saved

MGEScan Programs

mgescan CLI tool provides options to run ltr, nonltr or both programs.

How to Run

If you need to run MGEScan program to indentify both LTR and non-LTR for certain genome sequences, simply specify the path where your input genome files (FASTA format) exist with both sub-command.

For example, if you have DNA sequences (FASTA) for Fruitfly (Drosophila melanogaster) under $HOME/dmelanogaster directory, and want to save results in the $HOME/mgescan_result_dmelanogaster, your may run mgescan command like so:

(mgescan)$ mgescan both $HOME/dmelanogaster --output=$HOME/mgescan_result_dmelanogaster

The expected output message is like so:

ltr: starting
nonltr: starting
nonltr: finishing (elapsed time: 306.881129026)
ltr: finishing (elapsed time: 1306.881129026)

MPI Option

If your system supports a MPI program, you can use --mpi option with a number of processes. Use half number of your cores.

Input Files (FASTA)

The input can be a single file with a single sequence or multiple sequences. Store your input DNA sequences in a same folder and specify the path when you run MGEScan program. For example, if you run the program for D. melanogaster, you may have sequence files like so:

$ ls -al dmelanogaster
total 167564
drwx------  2 mgescan mgescan     4096 Jan 28 23:23 .
drwx------ 13 mgescan mgescan     4096 Apr  7 18:45 ..
-rw-------  1 mgescan mgescan 23395126 Dec 18  2014 2L.fa
-rw-------  1 mgescan mgescan 21499210 Dec 18  2014 2R.fa
-rw-------  1 mgescan mgescan 24952673 Dec 18  2014 3L.fa
-rw-------  1 mgescan mgescan 28370194 Dec 18  2014 3R.fa
-rw-------  1 mgescan mgescan  1374441 Dec 18  2014 4.fa
-rw-------  1 mgescan mgescan 22796595 Dec 18  2014 X.fa
-rw-------  1 mgescan mgescan  2796595 Dec 18  2014 Y.fa


Upon the succeessful completion of MGEScan program, several output files are stored in the destination directory that you specified with --output parameter. It includes plain text and gff3 files.


MGEScan LTR generates ltr.out to describe clusters and coordinates of LTR retrotransposons identified. Each cluster of LTR retrotransposons starts with the head line of [cluster_number]———, followed by the information of LTR retrotransposons in the cluster. The columns for LTR retrotransposons are as follows.

  1. LTR_id: unique id of LTRs identified. It consist of two components, sequence file name and id in the file. For example, chr1_2 is the second LTR retrotransposon in the chr1 file.
  2. start position of 5 LTR.
  3. end position of 5 LTR.
  4. start position of 3 LTR.
  5. end position of 3 LTR.
  6. strand: + or -.
  7. length of 5 LTR.
  8. length of 3 LTR.

9. length of the LTR retrotransposon. 10.TSD on the left side of the LTR retotransposons. 11.TSD on the right side of the LTR retrotransposons. 12.di(tri)nucleotide on the left side of 5LTR 13.di(tri)nucleotide on the right side of 5LTR 14.di(tri)nucleotide on the left side of 3LTR 15.di(tri)nucleotide on the right side of 3LTR

Sample output of ltr.out for D. melanogaster