MGEScan, identifying LTR and non-LTR in genome sequences are available on the Galaxy scientific workflow which is a web-based workflow software to support data analysis with various tools.
This tutorial demonstrates a quick start of using MGEScan on Galaxy workflow with a sample dataset, D. melanogaster genome. A public server at Indiana University (http://silo.cs.indiana.edu:38080) provides sample datasets and MGEScan tools to try MGEScan on Galaxy without installation hassle.
Approximate 3 hours and 30 minutes (including 3 hours computation time)
Run MGEScan-LTR and MGEScan-nonLTR for D. melanogaster¶
In this tutorial, we will try to run both MGEScan-LTR and MGEScan-nonLTR with
D. melanogaster genome dataset. You can find the dataset at the
menu on top and MGEScan tools on the left frame.
Login or Register (Optional)¶
You can save your work if you have account on Galaxy workflow. The user-based history in Galaxy/MGEScan stores your data and launched tasks. The guest user account is able to run the MGEScan tools without the login but results or history data won’t be saved if the web browser session is closed.
Example: Drosophila melanogaster¶
In the Data Library, enable the checkbox for
d.melanogaster and click
“Select datasets for import into selected histories” from the down arrow at
You will find 8 fasta files are available. We need to import all of them, make them all checked and click “Import library datasets” in the middle of the page.
Once you imported the D. melanogaster datasets into your history, you are ready to run MGEScan tools on Galaxy. Go to the main page, and checkout imported datasets (8 files) on the right frame of the page.
You can select where datasets to be imported.
Run MGEScan for LTR and nonLTR¶
In the new version of MGEScan, two programs, MGEScan-LTR and MGEScan-nonLTR, can be ran at the same time with a merged result. Open the page at “MGEScan > MGEScan”, a simple tool is available for LTR and nonLTR executions with MPI option for parallel processing.
Find LTR or nonLTR page if you’d like to choose other options to run MGEScan tools in detail.
MGEScan runs both LTR and nonLTR with a selected input genome sequence. Find “MGEScan > MGEScan” tool on the left frame and confirm that the symlink dataset we created in the previous step is loaded in “From” select form like so:
To accelerate processing time, select “Yes” at “Enable MPI” select form and specify “Number of MPI Processes”. If you have a multi-core system, use up to the number of cores. silo.cs.indiana.edu has 24 cores but we will use 4 in this tutorial to avoid being a noisy neighbor.
Our options are:
- From: Create a symlink to multiple datasets on data 2 and data 8, and others
- MGEScan: Both
- Enable MPI: Yes
- Number of MPI Processes: 4
And click “Execute”.
Our test case took 3 hours for analyzing LTR and nonLTR of
- nonLTR: 19 minutes
- LTR: 3 hours
- Total: 3 hours
Upon the MGEScan tools completion, the output files are accessible via Galaxy in gff3 format, a plain text, or an archived (e.g. tar.gz) file. You will notice that the color of your tools has been changed to green like so:
You can download the output files to your local storage, or get access to Genome Browser with provided links.
Visualization: UCSC or Ensembl Genome Browser¶
Your genomic data in a Generic Feature Format Version 3 (gff3) can be displayed by a well known visualization tool such as UCSC or Ensembl Genome Browser on Galaxy with custom annotations of MGEScan for LTR and nonLTR. Find the link provided for gff3 to view interactive graphical display of genome sequence data.