QuickStart

MGEScan, identifying LTR and non-LTR in genome sequences are available on the Galaxy scientific workflow which is a web-based workflow software to support data analysis with various tools.

Overview

This tutorial demonstrates a quick start of using MGEScan on Galaxy workflow with a sample dataset, D. melanogaster genome.

Tip

Approximate 3 hours and 30 minutes (including 3 hours computation time)

Run MGEScan-LTR and MGEScan-nonLTR for D. melanogaster

In this tutorial, we will try to run both MGEScan-LTR and MGEScan-nonLTR with D. melanogaster genome dataset. You can find the dataset at the Shared Data menu on top and MGEScan tools on the left frame.

Access to Galaxy/MGEScan

Run Galaxy/MGEScan at your machine:

_images/mgescan-main.png

Login or Register (Optional)

You can save your work if you have account on Galaxy workflow. The user-based history in Galaxy/MGEScan stores your data and launched tasks. The guest user account is able to run the MGEScan tools without the login but results or history data won’t be saved if the web browser session is closed.

Register

Email address is required to sign up.

_images/galaxy-register.png

Login

If you already have an account, you can use your user id and password at the User > Login page.

_images/galaxy-login.png

Get Dataset from Shared Data

You can find sample datasets (e.g. D.melanogaster) at Shared Data menu on top. Click “Shared Data” > “Data Libraries” and find “Sample datasets for MGEScan”.

Example: Drosophila melanogaster

In the Data Library, enable the checkbox for d.melanogaster and click “Select datasets for import into selected histories” from the down arrow at the end.

_images/galaxy-importing-from-dataset.png

You will find 8 fasta files are available. We need to import all of them, make them all checked and click “Import library datasets” in the middle of the page.

_images/galaxy-importing-from-dataset2.png

Once you imported the D. melanogaster datasets into your history, you are ready to run MGEScan tools on Galaxy. Go to the main page, and checkout imported datasets (8 files) on the right frame of the page.

Note

You can select where datasets to be imported.

Run MGEScan for LTR and nonLTR

In the new version of MGEScan, two programs, MGEScan-LTR and MGEScan-nonLTR, can be ran at the same time with a merged result. Open the page at “MGEScan > MGEScan”, a simple tool is available for LTR and nonLTR executions with MPI option for parallel processing.

Note

Find LTR or nonLTR page if you’d like to choose other options to run MGEScan tools in detail.

MGEScan Tool

MGEScan runs both LTR and nonLTR with a selected input genome sequence. Find “MGEScan > MGEScan” tool on the left frame and confirm that the symlink dataset we created in the previous step is loaded in “From” select form like so:

_images/mgescan-tool.png

Enable MPI

To accelerate processing time, select “Yes” at “Enable MPI” select form and specify “Number of MPI Processes”. If you have a multi-core system, use up to the number of cores.

Our options are:

  • From: Create a symlink to multiple datasets on data 2 and data 8, and others
  • MGEScan: Both
  • Enable MPI: Yes
  • Number of MPI Processes: 4

And click “Execute”.

Computation Time

Our test case took 3 hours for analyzing LTR and nonLTR of D. melanogaster:

  • nonLTR: 19 minutes
  • LTR: 3 hours
  • Total: 3 hours

Results

Upon the MGEScan tools completion, the output files are accessible via Galaxy in gff3 format, a plain text, or an archived (e.g. tar.gz) file. You will notice that the color of your tools has been changed to green like so:

_images/mgescan-result.png

You can download the output files to your local storage, or get access to Genome Browser with provided links.

Visualization: UCSC or Ensembl Genome Browser

Your genomic data in a Generic Feature Format Version 3 (gff3) can be displayed by a well known visualization tool such as UCSC or Ensembl Genome Browser on Galaxy with custom annotations of MGEScan for LTR and nonLTR. Find the link provided for gff3 to view interactive graphical display of genome sequence data.

_images/mgescan-genome-browser.png

UCSC Genome Browser (Example View)

_images/mgescan-ltr-gff3-ucsc-browser.png

Ensembl (Example View)

_images/mgescan-ltr-gff3-ensembl.png

Additional Options

There are other options to view results on a web interface or local.

  • View data: Content of the result file
_images/galaxy-view-data.png
  • Download: Download the file
_images/galaxy-download.png

Description of tools

Each tool in Galaxy has its description to explain how to use.

_images/mgescan-description.png