0049-【マクロゲノム】-16 S分析qiime 1極簡教程2


1.ソフトウェアのインストール
  • qiime1.9.1—— conda install qiime
  • FLASh—— conda install flash

  • 2.データとファイルの準備
    # pair fastq    
    $lt rawdata/
    total 3.8M
    -rw-r--r-- 1 toucan toucan   65 Sep 23  2017 A.mf
    -rw-r--r-- 1 toucan toucan 698K Sep 23  2017 A_1.fastq
    -rw-r--r-- 1 toucan toucan 698K Sep 23  2017 A_2.fastq
    -rw-r--r-- 1 toucan toucan 606K Sep 23  2017 B_1.fastq
    -rw-r--r-- 1 toucan toucan   65 Sep 23  2017 B.mf
    -rw-r--r-- 1 toucan toucan 606K Sep 23  2017 B_2.fastq
    -rw-r--r-- 1 toucan toucan   65 Sep 23  2017 C.mf
    -rw-r--r-- 1 toucan toucan 606K Sep 23  2017 C_1.fastq
    -rw-r--r-- 1 toucan toucan 606K Sep 23  2017 C_2.fastq
    
    # mapping   
    $cat A.mf
    #SampleID   BarcodeSequence LinkerPrimerSequence    Description
    A           A

    プロセス分析開始
    正規であれば、まず低品質のシーケンスを取り出してから合併しなければならない.
    各ステップの分析では、入力ファイルと出力ファイル、ログファイルに注意し、後続の最適化パラメータに使用します.
    1.シーケンスパッチ
    コマンドの実行:
    mkdir joined
    flash rawdata/A_1.fastq rawdata/A_2.fastq -o A -d joined/ >joined/A.log 2>&1
    flash rawdata/B_1.fastq rawdata/B_2.fastq -o B -d joined/ >joined/B.log 2>&1
    flash rawdata/C_1.fastq rawdata/C_2.fastq -o C -d joined/ >joined/C.log 2>&1
    

    デフォルトのパラメータ:
    -m, --min-overlap=NUM   Default:10bp.
    -M, --max-overlap=NUM   Default: 65bp
    -x, --max-mismatch-density=NUM  Default: 0.25.
    -o, --output-prefix=PREFIX Default: "out".
    -d, --output-directory=DIR  Default:current working directory.
    -t, --threads=NTHREADS   (--threads=1)

    出力ファイルの解釈
       - out.extendedFrags.fastq      The merged reads.
       - out.notCombined_1.fastq      Read 1 of mate pairs that were not merged.
       - out.notCombined_2.fastq      Read 2 of mate pairs that were not merged.
       - out.hist                     Numeric histogram of merged read lengths.
       - out.histogram                Visual histogram of merged read lengths.

    対応出力ファイル:
    total 904K
    -rw-r--r-- 1 toucan toucan 835K Jun 22 10:09 A.extendedFrags.fastq
    -rw-rw-r-- 1 toucan toucan  343 Jun 22 10:09 A.hist
    -rw-rw-r-- 1 toucan toucan 1.6K Jun 22 10:09 A.histogram
    -rw-rw-r-- 1 toucan toucan 1.5K Jun 22 10:09 A.log
    -rw-r--r-- 1 toucan toucan  26K Jun 22 10:09 A.notCombined_1.fastq
    -rw-r--r-- 1 toucan toucan  26K Jun 22 10:09 A.notCombined_2.fastq

    2.マージ後のファイルとmappingファイルを保持する
    mkdir A
    mkdir B
    mkdir C
    mv joined/A.extendedFrags.fastq rawdata/A.mf A/
    mv joined/B.extendedFrags.fastq rawdata/B.mf B/
    mv joined/C.extendedFrags.fastq rawdata/C.mf C/

    3.連結後のシーケンス品質管理
    コマンドの実行:
    split_libraries_fastq.py -i A/A.extendedFrags.fastq -m A/A.mf -q 19 --barcode_type not-barcoded --sample_id A -o A --store_demultiplexed_fastq
    split_libraries_fastq.py -i B/B.extendedFrags.fastq -m B/B.mf -q 19 --barcode_type not-barcoded --sample_id B -o B --store_demultiplexed_fastq
    split_libraries_fastq.py -i C/C.extendedFrags.fastq -m C/C.mf -q 19 --barcode_type not-barcoded --sample_id C -o C --store_demultiplexed_fastq
    

    デフォルトのパラメータ:
    -i SEQUENCE_READ_FPS, --sequence_read_fps=SEQUENCE_READ_FPS
    -o OUTPUT_DIR, --output_dir=OUTPUT_DIR
    
    -m MAPPING_FPS, --mapping_fps=MAPPING_FPS   [default: none]
    --sample_ids=SAMPLE_IDS   [default: none]
    -r MAX_BAD_RUN_LENGTH, --max_bad_run_length=MAX_BAD_RUN_LENGTH    [default: 3]
    -p MIN_PER_READ_LENGTH_FRACTION, --min_per_read_length_fraction=MIN_PER_READ_LENGTH_FRACTION   [default: 0.75]
    -n SEQUENCE_MAX_N, --sequence_max_n=SEQUENCE_MAX_N     [default: 0]
    -q PHRED_QUALITY_THRESHOLD, --phred_quality_threshold=PHRED_QUALITY_THRESHOLD    for Q20 and better, specify -q 19) [default: 3]
    --barcode_type=BARCODE_TYPE    If data is not barcoded, pass "not-barcoded". [default: golay_12]
    --store_demultiplexed_fastq write demultiplexed fastq files [default: False]        fna,   fastq  

    出力ファイル:
    histograms.txt  seqs.fastq  seqs.fna  split_library_log.txt
    
    $cat A/split_library_log.txt
    Input file paths
    Sequence read filepath: A/A.extendedFrags.fastq (md5: 1e302287181676b7a74f16c8b0ddaee8)
    Quality filter results
    Total number of input sequences: 1925
    Barcode not in mapping file: 0
    Read too short after quality truncation: 37
    Count of N characters exceeds limit: 3
    Illumina quality digit = 0: 0
    Barcode errors exceed max: 0
    
    Result summary (after quality filtering)
    Median sequence length: 199.00
    A   1885
    
    Total number seqs written   1885

    3.すべてのfnaファイルをマージ
    cat */*.fna > seq.fna

    4.pick OTUクラスタリング
    コマンドの実行:
    pick_otus.py -m uclust -i seq.fna -o otus/uclust_pick_otus

    uclust,類似度0.97,denovoを用いてクラスタリングを行う
    デフォルトのパラメータ:
    -i, --input_seqs_filepath
    -o, --output_dir
    -m, --otu_picking_method    [default: uclust]
    -r, --refseqs_fp
    Path to reference sequences to search against when using -m blast, -m sortmerna, -m uclust_ref, -m usearch_ref, or -m usearch61_ref [default: /Users/caporaso/.virtualenvs/qiime/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta]
    
    --sortmerna_e_value
    Maximum E-value when clustering [default = 1]
    --sortmerna_coverage
    Mininum percent query coverage (of an alignment) to consider a hit, expressed as a fraction between 0 and 1 [default: 0.97]
    
    -s, --similarity
    Sequence similarity threshold (for blast, cdhit, uclust, uclust_ref, usearch, usearch_ref, usearch61, usearch61_ref, sumaclust, and sortmerna) [default: 0.97]
    --denovo_otu_id_prefix
    OTU identifier prefix (string) for the de novo OTU pickers (sumaclust, swarm and uclust) [default: denovo, OTU ids are ascendingintegers]
    -n, --prefix_prefilter_length
    Prefilter data so seqs with identical first prefix_prefilter_length are automatically grouped into a single OTU. This is useful for large sequence collections where OTU picking doesn’t scale well [default: None; 100 is a good value]
    -A, --optimal_uclust
    Pass the –optimal flag to uclust for uclust otu picking. [default: False]
    --word_length
    Word length value for uclust, uclust_ref, and usearch, usearch_ref, usearch61, and usearch61_ref. With default setting, will use the setting recommended by the method (uclust: 8, usearch: 64, usearch61: 8). int value can be supplied to override this setting. [default: default]

    出力ファイル:
    #     ,          
    $cat otus/uclust_pick_otus/seq_otus.log
    UclustOtuPicker parameters:
    Application:uclust
    Similarity:0.97
    enable_rev_strand_matching:False
    exact:False
    max_accepts:1
    max_rejects:8
    new_cluster_identifier:denovo
    optimal:False
    output_dir:otus/uclust_pick_otus
    prefilter_identical_sequences:True
    presort_by_abundance:True
    save_uc_files:True
    stable_sort:True
    stepwords:8
    suppress_sort:True
    word_length:8
    Num OTUs:1174
    Result path: otus/uclust_pick_otus/seq_otus.txt
    
    #     
    $head -n 20 otus/uclust_pick_otus/seq_clusters.uc
    # uclust --input /tmp/UclustExactMatchFilterH_tiMj.fasta --id 0.97 --tmpdir /tmp --w 8 --stepwords 8 --usersort --maxaccepts 1 --stable_sort --maxrejects 8 --uc otus/uclust_pick_otus/seq_clusters.uc
    # version=1.2.22
    # Tab-separated fields:
    # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel
    # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster,N=NoHit
    # For C and D types, PctId is average id with seed.
    # QueryStart and SeedStart are zero-based relative to start of sequence.
    # If minus strand, SeedStart is relative to reverse-complemented seed.
    S   0   429 *   *   *   *   *   QiimeExactMatch.B_51    *
    S   1   199 *   *   *   *   *   QiimeExactMatch.A_135   *
    S   2   205 *   *   *   *   *   QiimeExactMatch.A_39    *
    S   3   205 *   *   *   *   *   QiimeExactMatch.A_18    *
    S   4   429 *   *   *   *   *   QiimeExactMatch.B_39    *
    H   0   429 99.5    +   0   0   429M    QiimeExactMatch.B_0 QiimeExactMatch.B_51
    S   5   205 *   *   *   *   *   QiimeExactMatch.A_389   *
    S   6   199 *   *   *   *   *   QiimeExactMatch.A_90    *
    S   7   430 *   *   *   *   *   QiimeExactMatch.B_46    *
    H   0   429 97.4    +   0   0   429M    QiimeExactMatch.B_22    QiimeExactMatch.B_51
    H   0   429 99.8    +   0   0   429M    QiimeExactMatch.B_56    QiimeExactMatch.B_51
    S   8   205 *   *   *   *   *   QiimeExactMatch.A_491   *
    
    
    # out 
    $head otus/uclust_pick_otus/seq_otus.txt
    denovo0 C_211
    denovo1 A_1435
    denovo2 A_181
    denovo3 A_1437
    denovo4 A_1439
    denovo5 A_1301
    denovo6 B_237   B_522
    denovo7 B_230   B_138   B_390
    denovo8 B_547   B_422
    denovo9 B_238   B_524   B_37

    5.代表シーケンスの選択
    コマンドの実行:
    pick_rep_set.py -i otus/uclust_pick_otus/seq_otus.txt -f seq.fna -l otus/rep_set/seq_rep_set.log -o otus/rep_set/seq_rep_set.fasta

    デフォルトのパラメータ:
    -s, --sort_by
    Sort by otu or seq_id [default: otu]

    出力ファイル:
    $head otus/rep_set/seq_rep_set.fasta
    >denovo0 C_211
    TAGGGAATTTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGCAGGATGAATGCCTTCGGGTTGTAAACTGCTTTTATTAGTGACGATTATGACGGTAACTAATGAATAAGGACCTGCTAACTACGTGCCAGCAGCCGCGGTCATACGTAGGGTCCAAGCGTTATCCGGAATTACTGGGCGTAAAGAGTTGCGTAGGTGGCTAGGTAAGTAGATAGTGAAAGCGTGTGGCTCAACCATACATCCATTATCTAAACTGTCTGGCTGGAGGATGAGAGAGGTAGATGGAATTTCTGATGTAGGGGTAATATCCGTAGATATCAGAAGGAACACCGATGGCGTAAGCAGTCTACTGGCTCATTCCTGACACTAAGGCACGAGAGCGTGGGGAGCAAACAGG
    >denovo1 A_1435
    TAGGTGGTTCCCTACGGGAGGCAGCAGTAGGGAATCTTCGGCAATGGACGGAAGTCTGACCGAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTGTTAGCATCGAAGAAGCGAAAGTGACGGTAGGTGCAGAGAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAAT
    

    6.種の注釈
    コマンドの実行:
    assign_taxonomy.py -o otus/rdp_assigned_taxonomy -i otus/rep_set/seq_rep_set.fasta

    出力ファイル:
    #     
    $head -n  50 otus/rdp_assigned_taxonomy/seq_rep_set_tax_assignments.log
    UclustConsensusTaxonAssigner parameters:
    id_to_taxonomy_filepath:/home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
    id_to_taxonomy_fp:/home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
    max_accepts:3
    min_consensus_fraction:0.51
    reference_sequences_fp:/home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
    similarity:0.9
    unassignable_label:Unassigned
    Result path: /tmp/assign-taxxoM5EE
    
    .uc file contents:
    
    # uclust --input otus/rep_set/seq_rep_set.fasta --id 0.9 --rev --maxaccepts 3 --allhits --libonly --lib /home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta --uc /tmp/UclustConsensusTaxonAssigner__qJYfJ.uc
    # version=1.2.22
    # Tab-separated fields:
    # 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel
    # Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit
    # For C and D types, PctId is average id with seed.
    # QueryStart and SeedStart are zero-based relative to start of sequence.
    # If minus strand, SeedStart is relative to reverse-complemented seed.
    L   35928   1437    *   *   *   *   *   238281  *
    H   35928   405 97.0    +   0   0   329I174MI231M702I   denovo0 C_211   238281
    L   86192   1314    *   *   *   *   *   4372149 *
    H   86192   405 95.8    +   0   0   306I405M603I    denovo0 C_211   4372149
    L   46911   1334    *   *   *   *   *   151933  *
    H   46911   405 96.3    +   0   0   247I330MD45MI13M3I16M679I   denovo0 C_211   151933
    N   *   184 *   *   *   *   *   denovo1 A_1435  *
    L   76219   1464    *   *   *   *   *   3991164 *
    H   76219   429 99.1    +   0   0   331I429M704I    denovo10 B_239  3991164
    
    #     ,           ; otu uc    ,    otu 
    $head -n  50 otus/rdp_assigned_taxonomy/seq_rep_set_tax_assignments.txt
    denovo984   k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__   1.00    2
    denovo58    k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Micrococcaceae; g__Rothia; s__dentocariosa    0.67    3

    6.OTUケーブルの生成
    make_otu_table.py -i otus/uclust_picked_otus/seq_otus.txt -t otus/rdp_assigned_taxonomy/seq_rep_set_tax_assignments.txt -o otus/otu_table.biom

    出力ファイル:
    $head otus/otu_table_with_taxonomy.txt
    # Constructed from biom file
    #OTU ID C   A   B   taxonomy
    denovo0 1.0 0.0 0.0 k__Bacteria; p__TM7; c__TM7-1; o__; f__; g__; s__
    denovo1 0.0 1.0 0.0 Unassigned
    denovo2 0.0 1.0 0.0 k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Burkholderiales; f__Burkholderiaceae; g__Lautropia; s__
    denovo3 0.0 1.0 0.0 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Actinomyces; s__
    denovo4 0.0 1.0 0.0 k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Prevotellaceae; g__Prevotella; s__
    denovo5 0.0 1.0 0.0 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__[Mogibacteriaceae]; g__; s__
    denovo6 0.0 0.0 2.0 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__Morganella
    denovo7 0.0 0.0 3.0 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Pseudomonadaceae; g__Pseudomonas; s__
    

    7.系統発育樹構築
    シーケンスペアリング
    シーケンスペアリング:muscleのメソッドまたはpynastのメソッド
  • align_seqs.py – Align sequences using a variety of alignment methods

  • コマンドの実行:
    align_seqs.py -i otus/rep_set/seq_rep_set.fasta -o otus/pynast_aligned_seqs

    出力ファイル:
    #     
    $ls otus/pynast_aligned_seqs/
    seq_rep_set_aligned.fasta  seq_rep_set_failures.fasta  seq_rep_set_log.txt
    
    # -           gap。
    $head -n 2 otus/pynast_aligned_seqs/seq_rep_set_aligned.fasta
    >denovo0 C_211 1..405
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TA---GG-G-A-ATT-TTCCA-C-AA-T-GG--GC-GA-A----A-G-CC-T-GA-TG-GA-GCAA-CGCC-G-CG-T---G-C-A--G--GA-T-G--A--A-T-G-CC-----TT-CG---------G-G-T-T-G-T--A---AA-C-TGC--------TT-TT-A-T--T-AGT----GA-C--G---A-----------------------T--TA------------------------------T-GA-CG-GT-A-A-CT-A-AT-G---------AA-----------TAAGG-ACC-TG-C-TAA---C--T-ACGT--GCCA--G-C---A--GCCG---C-GG--TC-AT--AC---GT-AG-GGT-CCA-A-G-CG-TTAT-C-CGG-AA-TT-A--C-T--GGGC-GTA----AA-GAGTTGC--G-TA-G-G-T-G------------G--C-TA-G-G-T-AA----G-T-A-G---A-TAG-TG-A-AA-GC--GTGT-G-G--------------------------------------------------------------------CT-C-AA-------------------------------------------------------------------------CC-A-TACA-TC-C------A-T-T-A-T--------C--TA-A-A-C-T-G-TCT--G-G-C--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    # 
    $head -n 2 otus/pynast_aligned_seqs/seq_rep_set_failures.fasta
    >denovo1147 A_996
    ATTACCGCGGCTGCTGGCACCAGACTTTCCCTCCAATGGATCCCCATTAAAAGATTTAAAGTGGACTCATTCCAATTACAGGGCCTTGAAAGAGTCCTGTATTGTTATTTTTTGTCACTACCTCCCGTGTCAGGAGTGGGTAATTTGAGTGTCTGCTGCCTCCCGTAGGG
    
    #     
    $head  otus/pynast_aligned_seqs/seq_rep_set_log.txt
    candidate sequence ID   candidate nucleotide count  errors  template ID BLAST percent identity to templatcandidate nucleotide count post-NAST
    denovo0 C_211   405     186763  87.20   405
    denovo1 A_1435  184     202319  84.80   184
    ...
    denovo999 C_335 404     926588  92.60   404
    PyNastAligner parameters:
    Algorithm:NAST
    Application:PyNAST
    blast_db:None
    min_len:153 #  [default: 75% of the median input sequence length]
    min_pct:75.0 #         
    pairwise_alignment_method:uclust
    template_filepath:/home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta #           

    参照シーケンス:
    参照シーケンス-番号は何を表しますか?どうしてそんなに空席が多いのですか.
    $head -n 2 /home/toucan/miniconda3/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.pynast.fasta
    >1111561
    ------------------------------------------------------------------------------------------------------------AGAGTTT-GA--T-CC-T-G-GCTC-AG-AT-TGAA-C-GC--TGG-C--G-GC-A-CG--C----C-T--AACACA-T-GC-A-AGT-CGA-A-CG----------G-CAG-CG-G-----------------------------GGGAA-AG----------------------------------------------------CTT-G---------------------------------------------------------------------------------CTTT-CCT-----------------G-CC--G--GC----GAG-T-GG-C-GG-A--C-------------GGG-TGAGT-A--AT-GC-G-T-A-GG---A-A--T-TT-G--C-C-ATT--AA-G------------------------------------------------------------------A-GG----GGG-AC-AA-CTC-------------------------G-G-G-----------------------GAA-A---CTC-GAG-CTAA-TA---CC-A--C-AT-A----------A--------------------T-------------------------------------CT-C-----------------------------------------------------------------------------------------------------------------------T-TC-G--------------------------------------------------------------------------------------------------------------------------------------G-A-G---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------CAAA--G-A-A-GG-----G--GATT--C--------------------------------------------------------------------------------------------------------------------TTC-G----------------------------------------------------------------------------------------------------------------------GAA--CC-TT--T---C-G--------------C----T-T---A-AT-G---AG---A-A-----G-CCT-ACG--T-TGG--A------TT--A--G-CT-T----G---TTGG-T-G-GG-G-T----AAA-GG-C-T-C-ACCA--A-GG-C-G--A-TG-A------------TCT-A-T------AG-CT-G-G-TCT-G-AG----A--GG-AT--G-AT-C-AG-CCAC-A-CTGGA--A-C-TG-A-GA-C-AC-G-G-TCCAGA-CTCC-TAC-G--G-G-A-G-GC-A-GC-A-G-TG---AG-G-A-ATT-TTGGA-C-AA-T-GG--GG-GA-A----A-C-CC-T-GA-TC-CA-GCGA-TGCC-G-CG-T---G-T-G--T--GA-A-G--A--A-G-G-CC-----TA-AG---------G-G-T-T-G-T--A---AA-G-CAC--------TT-TT-A-G--T-GAG----GA-A--G---AG-AGTA---AGTC-GG----T--T--AA-T---A----C-----CC-G-GCT-TGC-AA-GA-CG-TT-A-C-TC-A-CA-G---------AA-----------AAAGC-GCC-GG-C-TAA---C--T-CTGT--GCCA--G-C---A--GCCG---C-GG--TA-AT--AC---AG-AG-GGT-GCA-A-G-CG-TTAA-T-CGG-AT-TG-A--C-T--GGGC-GTA----AA-GGGC-GC--G-TA-G-G-C-G------------G--T-AA-G-A-T-AA----G-T-C-A---G-ATG-TT-A-AA-AA--CC-CGA-G--------------------------------------------------------------------CT-C-AA-------------------------------------------------------------------------CT-T-G-GG-GA-C----T-G-C-A-T-T--------T--GA-A-A-C-T-A-TCT--C-A-C---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------T-A-G-A-G-T-A-----C-AG--TA-G-A------------G-GA-G-AG-C----GG--AATT-TCC-G-GT--GT-A-GCG-GTGAAA-TG-CGT-AGAT-A-TC-G-GAA--GG-A-AC-A-CC-AG--T--G--GC-GAA-G--G-C---G----G--C-T-CTCTG------G-AC-TG--------------------------------------------------------------AC-A-C-T--GA--CG-----CT-GA-GG--C-G-CGA--AA-G-C--------------G-TGGG-GAG-C-A-AACA--GG-ATTA-G-ATA-C-----CC-T-G-GTA-G-T----C-CA--C-G-CTG-T-AAA--C-GATG-AG--AA-CT---------A-GC--T--G-T-TG-G-TA-C--G---------------------------------------------------------------------------------------TT-TA----------------------------------------------------------------------------------------------------------------------------------------------------G-T-AT--C-A-G-T-AG-C------GC--A----GC-TAA--CG-C-G-T--T--AA-GT--T----C-TCC-GCC-T-G-GG-GAT-TA---CGG-----T-C--G-C-A-A-GAC-T--AAA-ACTC-AAA---------GGAA-TTG-ACGGG-G-G-CCCG----C-A--C-A-A-GCG-GT-G--G--AG-CA-T--GC-GGT-TT-AATT-C-G-ATG-CAAC-C-CG-A-AA-A-A-CC-TT-A-CC-TACCC-TT-G-AC-A-T-C--------------CCG-C-G-------------A-AG-C-C-T--GT--A-GA-G-A-T--A-C-G--G-G-C-G--T-G---CTCG-------------------------------------A--AA-G------------------------------------------AG----A----A---CG-CGG---T--GA---------------------------------------------------C-A-G-G-T-GCTG-CA-TGG-CT--GTC-GTC-A-GC-TC---G-TG-TC-G--TGA-GA-TGT-T-GG-G-TT-AA-GT-CCCGT-AA--------C-GAG-CGC-A-ACC-C-T-TG--TC--C-TTAG--T-T-G-C-C---AT-C-T--A---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------CATTAG---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------T----A-G------------G----G---A-A--CT---------------C-T-A-A-G-GA-G--AC-T-G-CCG--G-C------------------------------------G-A---TAA----------------------------------G-T-C-G--G-A-GG-A--AGG-T--GGGG-A-CGAT-GTC--AAGT-C---ATC-A-T-G-G-C-C-TTT----AT-G--GG-T-A-GG-GC-TA-CAC-GCGTG-C--TA--CAATG---G-GCAG-T-A--C-AAA-GG-GA--------------------------------------------------------------------------------------------------A-G-C-G-A--A-GCTG-T--G---------------------------------------A-AG-T-G-----------G--A-G-CA---A----------A--CCT-C------A-G-AAAGC-TG-C-T-C-G-TAA-TCC--------GGA-T-TGAAG-TC--T-GCAA-CT-C-------------------------------------------------------------------------------------------------G-ACTTC-A-T-G-AG-G-TT-GGAAT-CG-C-TA--G-TA-AT-C-G-C----AGA-TC-A-G-C-------AT--GCT-GC-G-GT-G-AAT-ACGT-T-CCCGGGCCT-TGTA----CACACCG-CCC-GTC-----A---CA--CCA-TG-GA-A--G---TGG-G-TT-GT-ACC--A-GAA------G--T-AGG-AG-A-G-C-T-AA-C-C-------------------------------------------------------------T-TC-G------------------------------------------------------------------------------------------------------GG-A--GG-C--A---TC-TTA--CC--ACG-G----T-ATG-AT-TCA------------------------TG--ACT-GGGG-TG-AAG-TCGTAACAA-GGTA--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    比較後ろ過高変異塩基
  • filter_alignment.py – Filter sequence alignment by removing highly variable regions¶

  • コマンドの実行:
    filter_alignment.py -i otus/pynast_aligned_seqs/seq_rep_set_aligned.fasta -o otus/filtered_alignment

    デフォルトのパラメータ:
    -g, --allowed_gap_frac
    Gap filter threshold, filters positions which are gaps in > allowed_gap_frac of the sequences [default: 0.999999]

    出力ファイル:比較後の配列の塩基内に-号を含むgap領域を除去し、塩基を接続することに相当する.高突然変異のアルカリ基を除去する.
    $head otus/filtered_alignment/seq_rep_set_aligned_pfiltered.fasta
    >denovo0
    ----------------------------------------------TAGGGAATTTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGCAGGATGAATGCCTTCGGGTTGTAAACTGCTTTTATTAGTGACGCGGTAACTAATGAATAAGGACCTGCTAACTACGTGCCAGCAGCCGCGGTCATACGTAGGGTCCAAGCGTTATCCGGAATTACTGGGCGTAAAGAGTGCGTAGGTGGCTAGGTAAGTAGATAGTGAAAGCGTT-GGCTCAACCATCATCC-ATTATCTAAACTGTCTGGCTGGAGGATGAGAGAGGTAGATGGAATTTCTGATGTAGGGGTAATATCCGTAGATATCAGAAGGAACACCGATGGCGTAAGCAGTCTACTGGCTC-ATCCTGACACTAAGGCACGAGAGCGTGGGGAGCAAACAGG-----
    >denovo1
    -------------------TAGGTGGTTCCCTACGGGAGGCAGCAGTAGGGAATCTTCGGCAATGGACGGAAGTCTGACCGAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTGTTAGCATCGAAGCGGTAGGTGCAGAGAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAAT-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    >denovo10
    ----------------------------------------------TAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGCGAAGAAGGCCTTCGGGTCGTAAAGCTCTGTTGTTAGGGAAGCGGTACCTAACGAGAAAGCCACGG-TAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTTCCTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAG-GGCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAGAGCGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACGCCAGTGGTGAAGGCGGCTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG-----
    

    ツリーファイルの作成
    コマンドの実行:
    make_phylogeny.py -i otus/filtered_alignment/seq_rep_set_aligned_pfiltered.fasta -o otus/rep_set.tre

    デフォルトのパラメータ:
    -t, --tree_method
    Method for tree building. Valid choices are: clustalw, raxml_v730, muscle, fasttree, clearcut [default: fasttree]

    出力ファイル:
    ((denovo711:0.00882,denovo816:0.03528)1.000:0.22701,((denovo168:0.23214,((denovo91:0.10067,denovo...

    8.Web統計レポートの生成
    cat */*.mf >all.mf
    summarize_taxa_through_plots.py -i otus/otu_table.biom -o taxa/ -m all.mf

    9.OTU table数量統計
    コマンドの実行:
    biom summarize-table -i otus/otu_table.biom -o summary_sample_count.txt
    biom summarize-table -i otus/otu_table.biom -o summary_out_count.txt --qualitative

    パラメータ:
    #        otu 
      --qualitative         Present counts as number of unique observation ids per
                            sample, rather than counts of observations per sample.
    

    出力結果:
    # sample
    $cat summary_sample_count.txt
    Num samples: 3
    Num observations: 1,174
    Total count: 3,003
    Table density (fraction of non-zero values): 0.350
    
    Counts/sample summary:
     Min: 507.000
     Max: 1,885.000
     Median: 611.000
     Mean: 1,001.000
     Std. dev.: 626.523
     Sample Metadata Categories: None provided
     Observation Metadata Categories: taxonomy
    
    Counts/sample detail:
    C: 507.000
    B: 611.000
    A: 1,885.000
    
    # otu
    $cat summary_out_count.txt
    Num samples: 3
    Num observations: 1,174
    
    Observations/sample summary:
     Min: 203.000
     Max: 688.000
     Median: 341.000
     Mean: 410.667
     Std. dev.: 204.036
     Sample Metadata Categories: None provided
     Observation Metadata Categories: taxonomy
    
    Observations/sample detail:
    B: 203.000
    C: 341.000
    A: 688.000

    9.データの均一化
    すべての試料の均一化処理:シーケンシング深さの不理想と不均一がalpha多様性およびbeta多様性に影響を及ぼす
    上から見た試料の最小count数はMin:507.000であった.コマンドの実行:
    single_rarefaction.py -i otus/otu_table.biom -o otus/otu_table_even.biom -d 500

    デフォルトのパラメータ:
      -k, --keep_empty_otus
                            Retain OTUs of all zeros, which are usually omitted
                            from the output OTU tables. [default: False]
      --subsample_multinomial
                            subsample using subsampling with replacement [default:
                            False]
        -d DEPTH, --depth=DEPTH
                            Number of sequences to subsample per sample.
                            [REQUIRED]

    10 .Alpha多様性分析
    コマンドの実行:
    mkdir alpha_diversity
    alpha_diversity.py -i otus/otu_table_even.biom -m observed_species,shannon,simpson,ace,chao1 -o alpha_diversity/alpha_div.txt -t otus/rep_set.tre
    

    デフォルトのパラメータ:
    -m, --metrics
    Alpha-diversity metric(s) to use. A comma-separated list should be provided when multiple metrics are specified. [default: PD_whole_tree,chao1,observed_otus]
    -t, --tree_path
    Input newick tree filepath. [default: None; REQUIRED for phylogenetic metrics]

    出力ファイル:
    $cat alpha_diversity/alpha_div.txt
        observed_species    shannon simpson ace chao1
    C   336.0   8.03497684336   0.994192    1264.42582826   1079.2826087
    A   269.0   7.52951805111   0.99112 799.221180773   687.523809524
    B   175.0   5.54361960505   0.916264    654.523124747   613.9
    

    11.beta多様性分析
    コマンドの実行:
    beta_diversity_through_plots.py -i otus/otu_table_even.biom -m all.mf -o PCoA -t otus/rep_set.tre

    デフォルトのパラメータ:
    -t TREE_FP, --tree_fp=TREE_FP
                            path to the tree file [default: none; REQUIRED for
                            phylogenetic measures]
    

    マクロゲノム分析リソースツール
    omictools
    https://omictools.com/metagenomics-category
    metAMOS
    http://marbl.github.io/metAMOS/