Skip to content

Meta-IDBA #
Find similar titles

Meta-IDBA is an iterative De Bruijn Graph De Novo Sequence assembly program specially designed for Metagenomics. One of the most difficult problem in metagenomic assembly is that similar subspecies of the same species mix together to make the de Bruijn graph very complicated and intractable.

Meta-IDBA handles this problem grouping similar regions of similar subspecies by partitioning the graph into components based on the topological structure of the graph. Each component represents a similar region between subspecies from the same species or even from different species. After the components are separated, all contigs in it are aligned to produced a consensus and also the multiple alignment.

Installation #

$ ./configure
$ make

Usage #

Note that Meta-IDBA is out of maintainance now, we recommend using IDBA-UD instead which generally performs better.

IDBA-UD IDBA-Hybrid and IDBA-Tran require paired-end reads stored in single FastA file and a pair of reads is in consecutive two lines. If not, please use fq2fa to merge two FastQ read files to single file.

$ fq2fa --paired --filter a.fastq a.fasta  ## paired end merged a.fastq
$ cat a.fasta b.fasta > all.fasta
$ idba_ud -r all.fasta -o ibda_out

Example #

Get data using SRA Tools

$ fastq-dump SRR041654 --split-spot 
$ fastq-dump SRR041655 --split-spot
$ fastq-dump SRR041656 --split-spot
$ fastq-dump SRR041657 --split-spot

Convert FASTQ to FASTA and merge all to one file

$ fq2fa --paired --filter SRR041654.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041655.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041656.fastq SRR041654.fasta
$ fq2fa --paired --filter SRR041657.fastq SRR041654.fasta
$ cat SRR041654.fasta SRR041655.fasta SRR041656.fasta SRR041657.fasta > all.fasta

Execute IDBA-UD

$ idba_ud -r all.fasta -o ibda_out

Above woking time in 24 core, 296 GB RAM (using 1 CPU)

real    172m34.052s
user    2776m21.814s
sys     69m39.381s

Results

$ ls -lah idba_out
-rw-rw---- 1 isg isg 5.4G 12  1 17:11 align-100-0
-rw-rw---- 1 isg isg 2.9G 12  1 15:21 align-20
-rw-rw---- 1 isg isg 5.0G 12  1 15:56 align-40
-rw-rw---- 1 isg isg 5.3G 12  1 16:21 align-60
-rw-rw---- 1 isg isg 5.4G 12  1 16:43 align-80
-rw-rw---- 1 isg isg    0 12  1 14:22 begin
-rw-rw---- 1 isg isg 119M 12  1 17:00 contig-100.fa
-rw-rw---- 1 isg isg 224M 12  1 15:23 contig-20.fa
-rw-rw---- 1 isg isg 130M 12  1 15:58 contig-40.fa
-rw-rw---- 1 isg isg 123M 12  1 16:23 contig-60.fa
-rw-rw---- 1 isg isg 121M 12  1 16:46 contig-80.fa
-rw-rw---- 1 isg isg 119M 12  1 17:13 contig.fa
-rw-rw---- 1 isg isg    0 12  1 17:13 end
-rw-rw---- 1 isg isg 117M 12  1 16:56 graph-100.fa
-rw-rw---- 1 isg isg 410M 12  1 15:04 graph-20.fa
-rw-rw---- 1 isg isg 263M 12  1 15:46 graph-40.fa
-rw-rw---- 1 isg isg 140M 12  1 16:13 graph-60.fa
-rw-rw---- 1 isg isg 121M 12  1 16:34 graph-80.fa
-rw-rw---- 1 isg isg 8.1G 12  1 14:40 kmer
-rw-rw---- 1 isg isg 199M 12  1 15:28 local-contig-20.fa
-rw-rw---- 1 isg isg  59M 12  1 16:01 local-contig-40.fa
-rw-rw---- 1 isg isg  41M 12  1 16:25 local-contig-60.fa
-rw-rw---- 1 isg isg  33M 12  1 16:47 local-contig-80.fa
-rw-rw---- 1 isg isg 1.6K 12  1 17:13 log
-rw-rw---- 1 isg isg 117M 12  1 17:13 scaffold.fa

Important options #

Allowed Options:

-o, --out arg (=out)                   output directory
-r, --read arg                         fasta read file (<=128)
    --read_level_2 arg                 paired-end reads fasta for second level scaffolds
    --read_level_3 arg                 paired-end reads fasta for third level scaffolds
    --read_level_4 arg                 paired-end reads fasta for fourth level scaffolds
    --read_level_5 arg                 paired-end reads fasta for fifth level scaffolds
-l, --long_read arg                    fasta long read file (>128)
    --mink arg (=20)                   minimum k value (<=124)
    --maxk arg (=100)                  maximum k value (<=124)
    --step arg (=20)                   increment of k-mer of each iteration
    --inner_mink arg (=10)             inner minimum k value
    --inner_step arg (=5)              inner increment of k-mer
    --prefix arg (=3)                  prefix length used to build sub k-mer table
    --min_count arg (=2)               minimum multiplicity for filtering k-mer when building the graph
    --min_support arg (=1)             minimum supoort in each iteration
    --num_threads arg (=0)             number of threads
    --seed_kmer arg (=30)              seed kmer size for alignment
    --min_contig arg (=200)            minimum size of contig
    --similar arg (=0.95)              similarity for alignment
    --max_mismatch arg (=3)            max mismatch of error correction
    --min_pairs arg (=3)               minimum number of pairs
    --no_bubble                        do not merge bubble
    --no_local                         do not use local assembly
    --no_coverage                      do not iterate on coverage
    --no_correct                       do not do correction
    --pre_correction                   perform pre-correction before assembly

관련논문 #

Suggested Pages #

web biohackers.net
0.0.1_20140628_0