Pythonは複数行のテキストの問題を処理します--簡単な方法で複数行のfastaファイルを読みます
6301 ワード
fastaシーケンスを処理する場合、次のように1つのシーケンスが複数行に配列されることがよくあります.
私の簡単な処理方法は、「全体読み込み-->セパレータをリストに分割-->文字列マージリスト」です.コードは次のとおりです.
印刷結果:
転載先:https://www.cnblogs.com/xlij1205/p/10504418.html
$cat test.fasta
>test_1
TGGGGAATCTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGCGATGAAGGCCTTAGGGTTGTAAAGCTCT
TTCAGCTGGGAAGATAATGACGGTACCAGCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGG
GGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGATTGTTAAGTCGGGGGTGAAATCCCGGGGCTCAA
CCCCGGAACTGCCTCCGATACTGGCAATCTTGAGATCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
GATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAA
CAGG
>test_3
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTGCGGGTTGTAAAGCACT
TTCAGTAGAGAAGAAATGCCCATGGTTAATACCCGTGGGTCTTGACGTAACCTACAGAAGAAGCACCGGCTAACTCCGTG
CCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGGTCAG
TCGGATGTGAAAGCCCTAGGCTCAACCTGGGAATGGCATTCGATACTGCCTGACTAGAGTATGGTAGAGGGAAGTGGAAT
TTCCGGTGTAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAGTGGCGAAGGCGACTTCCTGGGCCAATACTGACGCT
GAGGTGCGAAAGCGTGGGGAGCAAACAGG
>test_4
TGGGGAATTTTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGGAGGATGAAGGCCCTCGGGTCGTAAACTCCT
GTCCTAGGGGAAGAAAAAAATGACGGTACCCTTGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGG
GGGGGGGGAGCGGTGTTCGGAATTACTGGGCGTAAAGGGCGCGCAGGCGGCCTGGGAAGTCTTGGGTGAAAGCCCCCAGC
TCAACTGGGGAATGGCCTGAGAAACCACTAGGCTGGAGTGCTGGAGAGGGAAGCGGAATTCCCGGTGGAGCGGTGAAATG
CGTAGATATCGGGAGGAACACCAGAGGCGAAGGCGGCTTCCTGGACAGACACTGACGCTGAGGCGCGAAAGCTAGGGGAG
CAAACGGG
>test_5
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCT
TTCACCGGTGAAGATAATGACGGTAACCGGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGG
GGCTAGCGTTGTTCGGATTTACTGGGCGTAAAGCGCACGTAGGCGGACTATTAAGTCAGGGGTGAAATCCCGGGGCTCAA
CCCCGGAACTGCCTTTGATACTGGTAGTCTTGAGTTCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
GATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAA
CAGG
>test_6
GGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACACCGCGTGCGGGATGAAGGCCCTCGGGTTGTAAACCGCTTTC
AGGAGGGACGAAAATGACGGTACCTCCAGAAGAAGGCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGCC
AAACGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGTTCAACAAGTCGATCGTGAAAGCCCGGGGCTCAACCC
CGGGACGCCGGTCGAAACTGTTGTGACTAGGGTCCGGTAGAGGTGAGTGGAATTCTCGGTGTAGCGGTGGAATGCGCAGA
TATCGAGAGGAACACCAGTTGCGAAGGCGGCTCACTGGGCCGGTACCGACGCTAAGGAGCGAAAGCGTGGGGAGCAAACA
GG
私の簡単な処理方法は、「全体読み込み-->セパレータをリストに分割-->文字列マージリスト」です.コードは次のとおりです.
seq_file=open("test.fasta")
seq_list=seq_file.read().split(">")
for seq in seq_list :
if seq :
seq_name=seq.split("
")[0]
seq_fa="".join(seq.split("
")[1:])
print ">" + seq_name + "
" + seq_fa
印刷結果:
>test_1
TGGGGAATCTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGCGATGAAGGCCTTAGGGTTGTAAAGCTCTTTCAGCTGGGAAGATAATGACGGTACCAGCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGATTGTTAAGTCGGGGGTGAAATCCCGGGGCTCAACCCCGGAACTGCCTCCGATACTGGCAATCTTGAGATCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
>test_3
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGCCTGCGGGTTGTAAAGCACTTTCAGTAGAGAAGAAATGCCCATGGTTAATACCCGTGGGTCTTGACGTAACCTACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGGTCAGTCGGATGTGAAAGCCCTAGGCTCAACCTGGGAATGGCATTCGATACTGCCTGACTAGAGTATGGTAGAGGGAAGTGGAATTTCCGGTGTAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAGTGGCGAAGGCGACTTCCTGGGCCAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
>test_4
TGGGGAATTTTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGGAGGATGAAGGCCCTCGGGTCGTAAACTCCTGTCCTAGGGGAAGAAAAAAATGACGGTACCCTTGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGGGGGGGGGGAGCGGTGTTCGGAATTACTGGGCGTAAAGGGCGCGCAGGCGGCCTGGGAAGTCTTGGGTGAAAGCCCCCAGCTCAACTGGGGAATGGCCTGAGAAACCACTAGGCTGGAGTGCTGGAGAGGGAAGCGGAATTCCCGGTGGAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAGAGGCGAAGGCGGCTTCCTGGACAGACACTGACGCTGAGGCGCGAAAGCTAGGGGAGCAAACGGG
>test_5
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCTTTCACCGGTGAAGATAATGACGGTAACCGGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGTTCGGATTTACTGGGCGTAAAGCGCACGTAGGCGGACTATTAAGTCAGGGGTGAAATCCCGGGGCTCAACCCCGGAACTGCCTTTGATACTGGTAGTCTTGAGTTCGAGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGG
>test_6
GGAATATTGCACAATGGGCGAAAGCCTGATGCAGCGACACCGCGTGCGGGATGAAGGCCCTCGGGTTGTAAACCGCTTTCAGGAGGGACGAAAATGACGGTACCTCCAGAAGAAGGCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGCCAAACGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGTTCAACAAGTCGATCGTGAAAGCCCGGGGCTCAACCCCGGGACGCCGGTCGAAACTGTTGTGACTAGGGTCCGGTAGAGGTGAGTGGAATTCTCGGTGTAGCGGTGGAATGCGCAGATATCGAGAGGAACACCAGTTGCGAAGGCGGCTCACTGGGCCGGTACCGACGCTAAGGAGCGAAAGCGTGGGGAGCAAACAGG
転載先:https://www.cnblogs.com/xlij1205/p/10504418.html