在Python3中初学者应会的一些基本的提升效率的小技巧

2024-04-02 19:28:04 209

有时候我反问我自己，怎么不知道在Python3中用更简单的方式做“这样”的事，当我寻求答案时，随着时间的推移，我当然发现更简洁、有效并且bug更少的代码。总的来说（不仅仅是这篇文章），“那些”事情总共数量是超过我想象的，但这里是第一批不明显的特性，后来我寻求到了更有效的/简单的/可维护的代码。
字典

字典中的keys()和items()

你能在字典的keys和items中做很多有意思的操作，它们类似于集合（set）：


aa={‘mike':‘male',‘kathy':‘female',‘steve':‘male',‘hillary':‘female'}

bb={‘mike':‘male',‘ben':‘male',‘hillary':‘female'}

aa.keys()&bb.keys()#{‘mike',‘hillary'}#theseareset-like
aa.keys()-bb.keys()#{‘kathy',‘steve'}
#Ifyouwanttogetthecommonkey-valuepairsinthetwodictionaries
aa.items()&bb.items()#{(‘mike',‘male'),(‘hillary',‘female')}

太简洁啦！

在字典中校验一个key的存在

下面这段代码你写了多少遍了？


dictionary={}
fork,vinls:
ifnotkindictionary:
dictionary[k]=[]
dictionary[k].append(v)

这段代码其实没有那么糟糕，但是为什么你一直都需要用if语句呢？


fromcollectionsimportdefaultdict
dictionary=defaultdict(list)#defaultstolist
fork,vinls:
dictionary[k].append(v)

这样就更清晰了，没有一个多余而模糊的if语句。

用另一个字典来更新一个字典


fromitertoolsimportchain
a={‘x':1,‘y':2,‘z':3}
b={‘y':5,‘s':10,‘x':3,‘z':6}

#Updateawithb
c=dict(chain(a.items(),b.items()))
c#{‘y':5,‘s':10,‘x':3,‘z':6}

这样看起来还不错，但是不够简明。看看我们是否能做得更好：


c=a.copy()
c.update(b)

更清晰而且更有可读性了！

从一个字典获得最大值

如果你想获取一个字典中的最大值，可能会像这样直接：


aa={k:sum(range(k))forkinrange(10)}
aa#{0:0,1:0,2:1,3:3,4:6,5:10,6:15,7:21,8:28,9:36}
max(aa.values())#36

这么做是有效的，但是如果你需要key，那么你就需要在value的基础上再找到key。然而，我们可以用过zip来让展现更扁平化，并返回一个如下这样的key-value形式：


max(zip(aa.values(),aa.keys()))
#(36,9)=>value,keypair

同样地，如果你想从最大到最小地去遍历一个字典，你可以这么干：


sorted(zip(aa.values(),aa.keys()),reverse=True)
#[(36,9),(28,8),(21,7),(15,6),(10,5),(6,4),(3,3),(1,2),(0,1),(0,0)]

在一个list中打开任意数量的items

我们可以运用*的魔法，获取任意的items放到list中：


defcompute_average_salary(person_salary):
person,*salary=person_salary
returnperson,(sum(salary)/float(len(salary)))

person,average_salary=compute_average_salary([“mike”,40000,50000,60000])
person#‘mike'
average_salary#50000.0

这不是那么有趣，但是如果我告诉你也可以像下面这样呢：


defcompute_average_salary(person_salary_age):
person,*salary,age=person_salary_age
returnperson,(sum(salary)/float(len(salary))),age

person,average_salary,age=compute_average_salary([“mike”,40000,50000,60000,42])
age#42

看起来很简洁嘛!

当你想到有一个字符串类型的key和一个list的value的字典，而不是遍历一个字典，然后顺序地处理value，你可以使用一个更扁平的展现(list中套list)，像下面这样:


#Insteadofdoingthis
fork,vindictionary.items():
process(v)

#weareseparatingheadandtherest,andprocessthevalues
#asalistsimilartotheabove.headbecomesthekeyvalue
forhead,*restinls:
process(rest)

#ifnotveryclear,considerthefollowingexample
aa={k:list(range(k))forkinrange(5)}#rangereturnsaniterator
aa#{0:[],1:[0],2:[0,1],3:[0,1,2],4:[0,1,2,3]}
fork,vinaa.items():
sum(v)

#0
#0
#1
#3
#6

#Instead
aa=[[ii]+list(range(jj))forii,jjinenumerate(range(5))]
forhead,*restinaa:
print(sum(rest))

#0
#0
#1
#3
#6

你可以把list解压成head，*rest,tail等等。

Collections用作计数器

Collections是我在python中最喜欢的库之一，在python中，除了原始的默认的，如果你还需要其他的数据结构，你就应该看看这个。

我日常基本工作的一部分就是计算大量而又不是很重要的词。可能有人会说，你可以把这些词作为一个字典的key，他们分别的值作为value，在我没有接触到collections中的Counter时，我可能会同意你的做法（是的，做这么多介绍就是因为Counter）。

假设你读的python语言的维基百科，转化为一个字符串，放到一个list中（标记好顺序）：


importre
word_list=list(map(lambdak:k.lower().strip(),re.split(r'[;,:(.s)]s*',python_string)))
word_list[:10]#[‘python',‘is',‘a',‘widely',‘used',‘general-purpose',‘high-level',‘programming',‘language',‘[17][18][19]']

到目前为止看起来都不错，但是如果你想计算这个list中的单词：


fromcollectionsimportdefaultdict#again,collections!
dictionary=defaultdict(int)
forwordinword_list:
dictionary[word]+=1

这个没有那么糟糕，但是如果你有了Counter，你将会节约下你的时间做更有意义的事情。


fromcollectionsimportCounter
counter=Counter(word_list)
#Gettingthemostcommon10words
counter.most_common(10)
[(‘the',164),(‘and',161),(‘a',138),(‘python',138),
(‘of',131),(‘is',102),(‘to',91),(‘in',88),(‘',56)]
counter.keys()[:10]#justlikeadictionary
[‘',‘limited',‘all',‘code',‘managed',‘multi-paradigm',
‘exponentiation',‘fromosing',‘dynamic']

很简洁吧，但是如果我们看看在Counter中包含的可用的方法：


dir(counter)
[‘__add__',‘__and__',‘__class__',‘__cmp__',‘__contains__',‘__delattr__',‘__delitem__',‘__dict__',
‘__doc__',‘__eq__',‘__format__',‘__ge__',‘__getattribute__',‘__getitem__',‘__gt__',‘__hash__',
‘__init__',‘__iter__',‘__le__',‘__len__',‘__lt__',‘__missing__',‘__module__',‘__ne__',‘__new__',
‘__or__',‘__reduce__',‘__reduce_ex__',‘__repr__',‘__setattr__',‘__setitem__',‘__sizeof__',
‘__str__',‘__sub__',‘__subclasshook__',‘__weakref__',‘clear',‘copy',‘elements',‘fromkeys',‘get',
‘has_key',‘items',‘iteritems',‘iterkeys',‘itervalues',‘keys',‘most_common',‘pop',‘popitem',‘setdefault',
‘subtract',‘update',‘values',‘viewitems',‘viewkeys',‘viewvalues']

你看到__add__和__sub__方法了吗，是的，Counter支持加减运算。因此，如果你有很多文本想要去计算单词，你不必需要Hadoop，你可以运用Counter(作为map)然后把它们加起来（相当于reduce）。这样你就有构建在Counter上的mapreduce了，你可能以后还会感谢我。

扁平嵌套lists

Collections也有_chain函数，其可被用作扁平嵌套lists


fromcollectionsimportchain
ls=[[kk]+list(range(kk))forkkinrange(5)]
flattened_list=list(collections._chain(*ls))

同时打开两个文件

如果你在处理一个文件（比如一行一行地），而且要把这些处理好的行写入到另一个文件中，你可能情不自禁地像下面这么去写：


withopen(input_file_path)asinputfile:
withopen(output_file_path,‘w')asoutputfile:
forlineininputfile:
outputfile.write(process(line))

除此之外，你可以在相同的一行里打开多个文件，就像下面这样：


withopen(input_file_path)asinputfile,open(output_file_path,‘w')asoutputfile:
forlineininputfile:
outputfile.write(process(line))

这样就更简洁啦！
从一堆数据中找到星期一

如果你有一个数据想去标准化（比如周一之前或是之后），你也许会像下面这样：


importdatetime
previous_monday=some_date-datetime.timedelta(days=some_date.weekday())
#Similarly,youcouldmaptonextmondayaswell
next_monday=some_date+date_time.timedelta(days=-some_date.weekday(),weeks=1)

这就是实现方式。
处理HTML

如果你出于兴趣或是利益要爬一个站点，你可能会一直面临着html标签。为了去解析各种各样的html标签，你可以运用html.parer：

fromhtml.parserimportHTMLParser

classHTMLStrip(HTMLParser):

def__init__(self):
self.reset()
self.ls=[]

defhandle_data(self,d):
self.ls.append(d)

defget_data(self):
return‘'.join(self.ls)

@staticmethod
defstrip(snippet):
html_strip=HTMLStrip()
html_strip.feed(snippet)
clean_text=html_strip.get_data()
returnclean_text

snippet=HTMLStrip.strip(html_snippet)

如果你仅仅想避开html：

escaped_snippet=html.escape(html_snippet)

#Backtohtmlsnippets(thisisnewinPython3.4)
html_snippet=html.unescape(escaped_snippet)
#andsoforth...

在Python3中初学者应会的一些基本的提升效率的小技巧

热门推荐

随机推荐