Skip to content

Ray (assembler) #
Find similar titles

Structured data

About
Sequence assembly
Code Repository
Programming Language
C++
URL

Assemble genomes in parallel using the message-passing interface. Sequence assembly 프로그램. Ray Meta라는 Metagenomics용 어셈블러가 잘 알려져 있다.

Ray targets several applications:

  • de novo genome assembly (with Ray vanilla)
  • de novo meta-genome assembly (with Ray Méta)
  • de novo transcriptome assembly (works, but not tested a lot)
  • quantification of contig abundances
  • quantification of microbiome consortia members (with Ray Communities)
  • quantification of transcript expression
  • taxonomy profiling of samples (with Ray Communities)
  • gene ontology profiling of samples (with Ray Ontologies)
  • compare DNA samples using words (Ray -run-surveyor ...; see Ray Surveyor options)

Installation #

It needs Open MPI

wget http://sourceforge.net/projects/denovoassembler/files/Ray-2.3.1.tar.bz2
bzip2 -d Ray-2.3.1.tar.gz2
tar xvf Ray-2.3.1.tar
cd Ray-2.3.1
make PREFIX=build
make install
cd build
ls Ray

Usage #

mpiexec -n 80 Ray -k 31 -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
mpiexec -n 80 Ray Ray.conf # with commands in a file
mpiexec -n 80 Ray -k 31 -detect-sequence-files SampleDirectory # auto-detection
mpiexec -n 10 Ray -mini-ranks-per-rank 7 Ray.conf # with mini-ranks

Example #

Prepare data using SRA Toolkit

$ fastq-dump SRR041654 --split-files
$ fastq-dump SRR041655 --split-files
$ fastq-dump SRR041656 --split-files
$ fastq-dump SRR041657 --split-files

Running Ray

$ time mpiexec -n 40 Ray -p ~/temp/SRR041654_1.fastq ~/temp/SRR041654_2.fastq -p ~/temp/SRR041655_1.fastq ~/temp/SRR041655_2.fastq -p ~/temp/SRR041656_1.fastq ~/temp/SRR041656_2.fastq -p ~/temp/SRR041657_1.fastq ~/temp/SRR041657_2.fastq -o test2

Standard output

Contigs >= 100 nt
 Number: 171106
 Total length: 113220842
 Average: 661
 N50: 6245
 Median: 159
 Largest: 134667
Contigs >= 500 nt
 Number: 21007
 Total length: 86625827
 Average: 4123
 N50: 12342
 Median: 1395
 Largest: 134667
Scaffolds >= 100 nt
 Number: 167918
 Total length: 113575942
 Average: 676
 N50: 7922
 Median: 156
 Largest: 158033
Scaffolds >= 500 nt
 Number: 18330
 Total length: 87180713
 Average: 4756
 N50: 15429
 Median: 1463
 Largest: 158033

Rank 0 wrote test2/Contigs.fasta
Rank 0 wrote test2/Scaffolds.fasta
Check for test2/*

Time consumption

real    2342m12.977s
user    50699m22.696s
sys     5481m18.176s

Important options #

  • -k kmerLength: The length of k-mers (default 21)
  • -run-surveyor: Runs Ray Surveyor to compare samples
  • -disable-recycling: Disables read recycling during the assembly
  • -minimum-seed-length minimumSeedLength: minimun seed length (default 100)
  • -color-space: Runs in color-space
  • -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edge.tsv Taxon-Names.tsv: Computes and writes detailed taxonomic profiles
  • -gene-ontology OntologyTerms.txt Annotations.txt: Provides an ontology and annotations
  • -show-memory-usage

Assembly statistics #

Incoming Links #

Related Articles #

Suggested Pages #

web biohackers.net
0.0.1_20140628_0