探究Python多进程编程下线程之间变量的共享问题
1、问题:
群中有同学贴了如下一段代码,问为何list最后打印的是空值?
frommultiprocessingimportProcess,Manager importos manager=Manager() vip_list=[] #vip_list=manager.list() deftestFunc(cc): vip_list.append(cc) print'processid:',os.getpid() if__name__=='__main__': threads=[] forllinrange(10): t=Process(target=testFunc,args=(ll,)) t.daemon=True threads.append(t) foriinrange(len(threads)): threads[i].start() forjinrange(len(threads)): threads[j].join() print"------------------------" print'processid:',os.getpid() printvip_list
其实如果你了解python的多线程模型,GIL问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。
pythonaa.py processid:632 processid:635 processid:637 processid:633 processid:636 processid:634 processid:639 processid:638 processid:641 processid:640 ------------------------ processid:619 []
将第6行注释开启,你会看到如下结果:
processid:32074 processid:32073 processid:32072 processid:32078 processid:32076 processid:32071 processid:32077 processid:32079 processid:32075 processid:32080 ------------------------ processid:32066 [3,2,1,7,5,0,6,8,4,9]
2、python多进程共享变量的几种方式:
(1)Sharedmemory:
DatacanbestoredinasharedmemorymapusingValueorArray.Forexample,thefollowingcode
http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes
frommultiprocessingimportProcess,Value,Array deff(n,a): n.value=3.1415927 foriinrange(len(a)): a[i]=-a[i] if__name__=='__main__': num=Value('d',0.0) arr=Array('i',range(10)) p=Process(target=f,args=(num,arr)) p.start() p.join() printnum.value printarr[:]
结果:
3.1415927 [0,-1,-2,-3,-4,-5,-6,-7,-8,-9]
(2)Serverprocess:
AmanagerobjectreturnedbyManager()controlsaserverprocesswhichholdsPythonobjectsandallowsotherprocessestomanipulatethemusingproxies.
AmanagerreturnedbyManager()willsupporttypeslist,dict,Namespace,Lock,RLock,Semaphore,BoundedSemaphore,Condition,Event,Queue,ValueandArray.
代码见开头的例子。
http://docs.python.org/2/library/multiprocessing.html#managers
3、多进程的问题远不止这么多:数据的同步
看段简单的代码:一个简单的计数器:
frommultiprocessingimportProcess,Manager importos manager=Manager() sum=manager.Value('tmp',0) deftestFunc(cc): sum.value+=cc if__name__=='__main__': threads=[] forllinrange(100): t=Process(target=testFunc,args=(1,)) t.daemon=True threads.append(t) foriinrange(len(threads)): threads[i].start() forjinrange(len(threads)): threads[j].join() print"------------------------" print'processid:',os.getpid() printsum.value
结果:
------------------------ processid:17378 97
也许你会问:WTF?其实这个问题在多线程时代就存在了,只是在多进程时代又杯具重演了而已:Lock!
frommultiprocessingimportProcess,Manager,Lock importos lock=Lock() manager=Manager() sum=manager.Value('tmp',0) deftestFunc(cc,lock): withlock: sum.value+=cc if__name__=='__main__': threads=[] forllinrange(100): t=Process(target=testFunc,args=(1,lock)) t.daemon=True threads.append(t) foriinrange(len(threads)): threads[i].start() forjinrange(len(threads)): threads[j].join() print"------------------------" print'processid:',os.getpid() printsum.value
这段代码性能如何呢?跑跑看,或者加大循环次数试一下。。。
4、最后的建议:
Notethatusuallysharingdatabetweenprocessesmaynotbethebestchoice,becauseofallthesynchronizationissues;anapproachinvolvingactorsexchangingmessagesisusuallyseenasabetterchoice.SeealsoPythondocumentation:Asmentionedabove,whendoingconcurrentprogrammingitisusuallybesttoavoidusingsharedstateasfaraspossible.Thisisparticularlytruewhenusingmultipleprocesses.However,ifyoureallydoneedtousesomeshareddatathenmultiprocessingprovidesacoupleofwaysofdoingso.
5、Refer:
http://stackoverflow.com/questions/14124588/python-multiprocessing-shared-memory
http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/
http://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.synchronized