Welcome to VirusTAP VirusTAP manual

Contents

  1. Overview
  2. Logging in
  3. Submitting data
  4. Retrieve result
  5. Registration
  6. Flow of analysis

Overview

VirusTAP (Virus targetted assembling pipeline is a web service designed to assemble virus genomes from next generation sequencing (NGS) reads.
VirusTAP accepts paired end NGS reads (optimised for MiSeq) and trims low wuality data, removes unnecessarily reads such as rRNA, bacterial genomes, and host genomes (human, mouse, or monkey).
Then, VirusTAP performs simple assembly using idba program, and performs megablast and RAPSearch against virus nucleotide and protein databases. Those conthigs that did not hit any virus sequences will be regarded as non-virus contig, and those reads that mapped to the non-virus contigs will be removed. After repeating this non-virus reads removal for upto 5 times, the broken pairs are removed and remaining reads will be proceeded to assembly process. Assembly process starts from simple assembly using IDBA or Platanus to create seed contigs. Seed contigs will be extended using PriceTI program. Reads will be mapped back to the assembled contigs. Homology serch of contigs agaist NCBI will be performed as well. All these data can be downloaded from the download button.

Logging in

User ID and password is required to login VirusTAP. ID and password can be issued by registering from the "register account" link. More detail will be explained later.

VirusTAP does not work on the Internet Explorler. Firefox is recommended for VirusTAP.

Get Firefox

Submitting data

Image like right side will be shown after log in.

Basic usage

The simplest way to perform the assembly is just to drag & drop the 1st and 2nd reads to the Read 1 and Read 2 file selection menu, respectively, and click the Exec button at the bottom.
VirusTAP will send an e-mail when the analysis finished.
Please be sure that VirusTAP can accept only fastq.gz formatted file, up to 10GBytes in total.

Quality trimming

VirusTAP will delete first several bases which sometimes show low quality. Number of the bases can be specified at the "5' trim length" menu.
Next, one of two quality trimming programs (skewer / fastq-mcf.pl) will perform quality trimming. The strength of the trimming can be tweaked with the "Minumum average quality limit" menu.
VirusTAP will perform an additional in-house that cuts after a low quality base, whitch quality is specified at the "Trim lower than this q-value" menu.
Minumum length and maximum lengh can also be specified.

Read subtraction

rRNA and Bacterial genomes can be removed by checking the "Remove rRNAs" and "Remove Bacteria genomes" menues, respectively.
Host genome subtraction is mandatory. Users can select the host species from the pull-down menu.
"Precise non-virus filter" is a little complecated filter. If this filter were checked, VirusTAP performs a simple assembly using IDBA program, and performs megablast and RAPSearch homology search of the contigs against virus nucleotide and protein sequences , respectively. If a contig does not show any homology to any virus sequence, it will be regarded as non-virus contig, and those reads that are mapped to it will be discarded. This assembly-and-homology-search process will be repeated up to 5 times to remove as much non-virus reads as possible.

De novo assembly

Two methods are currently available: idba-price-method and PlPr-method.
Both methods performs simple assembly to generate seed contigs using IDBA or Platanus assembler, respectively. The seed contigs will further be assembled by using PriceTI program. This program attempts to extend the edges and concatenate the contigs.

Homology search / read mapping

Megablast homology search of the assembled contigs will be performed against NCBI NT database. Parameters for the megablast can be tweaked. The trimmed reads will also be mapped back to the assembled contigs.

Retrieveing result

Accessing the result

VirusTAP will send an e-mail when the analysis finished. Users can access to the result by clicking the url in the mail or the history menu shown below. Users can also access to the result by filling the "Run ID" box.
VirusTAP remembers the analysis for one week.

Downloading the result

All the results including read mapping result can be downloaded by clicking the icon shown below.

Registration

To register a new account, click the "register account" link.

Click "Check" button after filling the user ID and e-mail address.

When the registration success, messages like below will be shown.

And, a confirmation mail that contains the password and the link to activate the account and will be send to the registered mail address. Follow the link and activate the account.

Please be sure that the password will be send in this mail and will NOT be send any more.


If the requested user ID have already being in use messages like below will be shown. In this case, go back and try another ID.

Flow of analysis

  1. Data uploading.
  2. Quality trimming and adapter removal.
  3. Subtraction of rRNA, Bacteria genomes, and host genome.
  4. Removal of duplicated reads using fastuniq program. Ref
  5. Removal of non-virus reads.

    De novo assembly: IDBA-UD
    Homology searches: megablast: 1e-10, RAPSearch2: 1e-20. against in house virus sequence database*.
    * Sequences originated from viruses are extracted from nt, nr and refseq_genome database. Origin of the sequences are consulted by taxonomy database (ftp://ftp.ncbi.nih.gov/pub/taxonomy).
  6. De novo assembly.
  7. Homology search / read mapping.