Ray (assembler)
#
Find similar titles
- (rev. 1)
- Hyungyong Kim
Structured data
- About
- Sequence assembly
- Code Repository
- https://github.com/sebhtml/ray
- Programming Language
- C++
- URL
- http://denovoassembler.sourceforge.net/index.html
Assemble genomes in parallel using the message-passing interface. Sequence assembly 프로그램. Ray Meta라는 Metagenomics용 어셈블러가 잘 알려져 있다.
Ray targets several applications:
- de novo genome assembly (with Ray vanilla)
- de novo meta-genome assembly (with Ray Méta)
- de novo transcriptome assembly (works, but not tested a lot)
- quantification of contig abundances
- quantification of microbiome consortia members (with Ray Communities)
- quantification of transcript expression
- taxonomy profiling of samples (with Ray Communities)
- gene ontology profiling of samples (with Ray Ontologies)
- compare DNA samples using words (Ray -run-surveyor ...; see Ray Surveyor options)
Table of Contents
Installation #
It needs Open MPI
wget http://sourceforge.net/projects/denovoassembler/files/Ray-2.3.1.tar.bz2
bzip2 -d Ray-2.3.1.tar.gz2
tar xvf Ray-2.3.1.tar
cd Ray-2.3.1
make PREFIX=build
make install
cd build
ls Ray
Usage #
mpiexec -n 80 Ray -k 31 -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
mpiexec -n 80 Ray Ray.conf # with commands in a file
mpiexec -n 80 Ray -k 31 -detect-sequence-files SampleDirectory # auto-detection
mpiexec -n 10 Ray -mini-ranks-per-rank 7 Ray.conf # with mini-ranks
Example #
Prepare data using SRA Toolkit
$ fastq-dump SRR041654 --split-files
$ fastq-dump SRR041655 --split-files
$ fastq-dump SRR041656 --split-files
$ fastq-dump SRR041657 --split-files
Running Ray
$ time mpiexec -n 40 Ray -p ~/temp/SRR041654_1.fastq ~/temp/SRR041654_2.fastq -p ~/temp/SRR041655_1.fastq ~/temp/SRR041655_2.fastq -p ~/temp/SRR041656_1.fastq ~/temp/SRR041656_2.fastq -p ~/temp/SRR041657_1.fastq ~/temp/SRR041657_2.fastq -o test2
Standard output
Contigs >= 100 nt
Number: 171106
Total length: 113220842
Average: 661
N50: 6245
Median: 159
Largest: 134667
Contigs >= 500 nt
Number: 21007
Total length: 86625827
Average: 4123
N50: 12342
Median: 1395
Largest: 134667
Scaffolds >= 100 nt
Number: 167918
Total length: 113575942
Average: 676
N50: 7922
Median: 156
Largest: 158033
Scaffolds >= 500 nt
Number: 18330
Total length: 87180713
Average: 4756
N50: 15429
Median: 1463
Largest: 158033
Rank 0 wrote test2/Contigs.fasta
Rank 0 wrote test2/Scaffolds.fasta
Check for test2/*
Time consumption
real 2342m12.977s
user 50699m22.696s
sys 5481m18.176s
Important options #
- -k kmerLength: The length of k-mers (default 21)
- -run-surveyor: Runs Ray Surveyor to compare samples
- -disable-recycling: Disables read recycling during the assembly
- -minimum-seed-length minimumSeedLength: minimun seed length (default 100)
- -color-space: Runs in color-space
- -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edge.tsv Taxon-Names.tsv: Computes and writes detailed taxonomic profiles
- -gene-ontology OntologyTerms.txt Annotations.txt: Provides an ontology and annotations
- -show-memory-usage
Assembly statistics #
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.025 bioBakery
- 0.025 Genome
- 0.025 EST
- 0.025 EST assembly
- 0.025 Nature Genetics
- 0.025 QIIME
- 0.025 MEGAN
- 0.025 Methods Enzymol
- 0.025 Frontiers in Genetics
- 0.025
- More suggestions...