Meta-IDBA
#
Find similar titles
- (rev. 10)
- Hyungyong Kim
Structured data
Meta-IDBA is an iterative De Bruijn Graph De Novo Sequence assembly program specially designed for Metagenomics. One of the most difficult problem in metagenomic assembly is that similar subspecies of the same species mix together to make the de Bruijn graph very complicated and intractable.
Meta-IDBA handles this problem grouping similar regions of similar subspecies by partitioning the graph into components based on the topological structure of the graph. Each component represents a similar region between subspecies from the same species or even from different species. After the components are separated, all contigs in it are aligned to produced a consensus and also the multiple alignment.
Table of Contents
Installation #
$ ./configure
$ make
Usage #
Note that Meta-IDBA is out of maintainance now, we recommend using IDBA-UD instead which generally performs better.
IDBA-UD IDBA-Hybrid and IDBA-Tran require paired-end reads stored in single FastA file and a pair of reads is in consecutive two lines. If not, please use fq2fa to merge two FastQ read files to single file.
$ fq2fa --paired --filter a.fastq a.fasta ## paired end merged a.fastq
$ cat a.fasta b.fasta > all.fasta
$ idba_ud -r all.fasta -o ibda_out
Example #
Get data using SRA Tools
$ fastq-dump SRR041654 --split-spot
$ fastq-dump SRR041655 --split-spot
$ fastq-dump SRR041656 --split-spot
$ fastq-dump SRR041657 --split-spot
Convert FASTQ to FASTA and merge all to one file
$ fq2fa --paired --filter SRR041654.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041655.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041656.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041657.fastq SRR041654.fasta
$ cat SRR041654.fasta SRR041655.fasta SRR041656.fasta SRR041657.fasta > all.fasta
Execute IDBA-UD
$ idba_ud -r all.fasta -o ibda_out
Above woking time in 24 core, 296 GB RAM (using 1 CPU)
real 172m34.052s
user 2776m21.814s
sys 69m39.381s
Results
$ ls -lah idba_out
-rw-rw---- 1 isg isg 5.4G 12 1 17:11 align-100-0
-rw-rw---- 1 isg isg 2.9G 12 1 15:21 align-20
-rw-rw---- 1 isg isg 5.0G 12 1 15:56 align-40
-rw-rw---- 1 isg isg 5.3G 12 1 16:21 align-60
-rw-rw---- 1 isg isg 5.4G 12 1 16:43 align-80
-rw-rw---- 1 isg isg 0 12 1 14:22 begin
-rw-rw---- 1 isg isg 119M 12 1 17:00 contig-100.fa
-rw-rw---- 1 isg isg 224M 12 1 15:23 contig-20.fa
-rw-rw---- 1 isg isg 130M 12 1 15:58 contig-40.fa
-rw-rw---- 1 isg isg 123M 12 1 16:23 contig-60.fa
-rw-rw---- 1 isg isg 121M 12 1 16:46 contig-80.fa
-rw-rw---- 1 isg isg 119M 12 1 17:13 contig.fa
-rw-rw---- 1 isg isg 0 12 1 17:13 end
-rw-rw---- 1 isg isg 117M 12 1 16:56 graph-100.fa
-rw-rw---- 1 isg isg 410M 12 1 15:04 graph-20.fa
-rw-rw---- 1 isg isg 263M 12 1 15:46 graph-40.fa
-rw-rw---- 1 isg isg 140M 12 1 16:13 graph-60.fa
-rw-rw---- 1 isg isg 121M 12 1 16:34 graph-80.fa
-rw-rw---- 1 isg isg 8.1G 12 1 14:40 kmer
-rw-rw---- 1 isg isg 199M 12 1 15:28 local-contig-20.fa
-rw-rw---- 1 isg isg 59M 12 1 16:01 local-contig-40.fa
-rw-rw---- 1 isg isg 41M 12 1 16:25 local-contig-60.fa
-rw-rw---- 1 isg isg 33M 12 1 16:47 local-contig-80.fa
-rw-rw---- 1 isg isg 1.6K 12 1 17:13 log
-rw-rw---- 1 isg isg 117M 12 1 17:13 scaffold.fa
Important options #
Allowed Options:
-o, --out arg (=out) output directory
-r, --read arg fasta read file (<=128)
--read_level_2 arg paired-end reads fasta for second level scaffolds
--read_level_3 arg paired-end reads fasta for third level scaffolds
--read_level_4 arg paired-end reads fasta for fourth level scaffolds
--read_level_5 arg paired-end reads fasta for fifth level scaffolds
-l, --long_read arg fasta long read file (>128)
--mink arg (=20) minimum k value (<=124)
--maxk arg (=100) maximum k value (<=124)
--step arg (=20) increment of k-mer of each iteration
--inner_mink arg (=10) inner minimum k value
--inner_step arg (=5) inner increment of k-mer
--prefix arg (=3) prefix length used to build sub k-mer table
--min_count arg (=2) minimum multiplicity for filtering k-mer when building the graph
--min_support arg (=1) minimum supoort in each iteration
--num_threads arg (=0) number of threads
--seed_kmer arg (=30) seed kmer size for alignment
--min_contig arg (=200) minimum size of contig
--similar arg (=0.95) similarity for alignment
--max_mismatch arg (=3) max mismatch of error correction
--min_pairs arg (=3) minimum number of pairs
--no_bubble do not merge bubble
--no_local do not use local assembly
--no_coverage do not iterate on coverage
--no_correct do not do correction
--pre_correction perform pre-correction before assembly
관련논문 #
Suggested Pages #
- 0.025 NGS
- 0.025 EST assembly
- 0.025 FASTX-Toolkit
- 0.025 MEGAN
- 0.025
- 0.025
- 0.025
- 0.025
- 0.025
- 0.025 Pfam
- More suggestions...