Python分詞と頻度統計

1938 ワード

方法1:

strs='1、      ，   18-28   ；2、       、   、  、  、  、  、           ；' \
     '3、        、          ， IT              IT      ，      （   、     ）；4、        ，            ' \
     '5、           、         、  、        ；、     ，     ，     ，    、  、  ，    ；'
text1=jieba.cut(strs)
fd=nltk.FreqDist(text1)
keys=fd.keys()
item=fd.iteritems()
print ' '.join(keys)
dicts=dict(item)
sort_dict=sorted(dicts.iteritems(),key=lambda d:d[1],reverse=True)
print sort_dict

方法2:

from collections import Counter
import jieba.analyse
import time
bill_path = r'bill.txt'
bill_result_path = r'bill_result.txt'
car_path = 'car.txt'
with open(bill_path,'r') as fr:
        data = jieba.cut(fr.read())
data = dict(Counter(data))
with open(bill_result_path,'w') as fw:
    for k,v in data.items():
        fw.write("%s,%d
" % (k.encode('utf-8'),v))

計ニンニク客題庫(Python):9、元素除去

Titan X (Pascal) 搭載のUbuntu 16.04で深層学習用の環境を構築する