Python3处理文件中每个词的方法

2024-03-31 11:33:05 395

''''' CreatedonDec21,2012 处理文件中的每个词 @author:liury_lab ''' importcodecs the_file=codecs.open('d:/text.txt','rU','UTF-8') forlineinthe_file: forwordinline.split(): print(word,end="|") the_file.close() #若词的定义有变，可使用正则表达式 #如词被定义为数字字母，连字符或单引号构成的序列 importre the_file=codecs.open('d:/text.txt','rU','UTF-8') print() print('************************************************************************') re_word=re.compile('[\w\'-]+') forlineinthe_file: forwordinre_word.finditer(line): print(word.group(0),end="|") the_file.close() #封装成迭代器 defwords_of_file(file_path,line_to_words=str.split): the_file=codecs.open('d:/text.txt','rU','UTF-8') forlineinthe_file: forwordinline_to_words(line): yieldword the_file.close() print() print('************************************************************************') forwordinwords_of_file('d:/text.txt'): print(word,end='|') defwords_by_re(file_path,repattern='[\w\'-]+'): the_file=codecs.open('d:/text.txt','rU','UTF-8') re_word=re.compile('[\w\'-]+') defline_to_words(line): formoinre_word.finditer(line): yieldmo.group(0)#原书为return，发现结果不对，改为yield returnwords_of_file(file_path,line_to_words) print() print('************************************************************************') forwordinwords_by_re('d:/text.txt'): print(word,end='|')

Python3处理文件中每个词的方法

热门推荐

随机推荐