使用python opencv对目录下图片进行去重的方法
版本:
平台:ubuntu14/I5/4G内存
python版本:python2.7
opencv版本:2.13.4
依赖:
如果系统没有python,则需要进行安装
sudoapt-getinstallpython
sudoapt-getinstallpython-dev
sudoapt-getinstallpython-pip
sudopipinstallnumpymathplotlib
sudoapt-getinstalllibcv-dev
sudoapt-getinstallpython-opencv
使用感知哈希算法进行图片去重
原理:对每个文件进行遍历所有进行去重,因此图片越多速度越慢,但是可以节省手动操作
感知哈希原理:
1、需要比较的图片都缩放成8*8大小的灰度图
2、获得每个图片每个像素与平均值的比较,得到指纹
3、根据指纹计算汉明距离
5、如果得出的不同的元素小于5则为相同(相似?)的图片
#!/usr/bin/python #-*-coding:UTF-8-*- importcv2 importnumpyasnp importos,sys,types
defcmpandremove2(path): dirs=os.listdir(path) dirs.sort() iflen(dirs)<=0: return dict={} foriindirs: prepath=path+"/"+i preimg=cv2.imread(prepath) iftype(preimg)istypes.NoneType: continue preresize=cv2.resize(preimg,(8,8)) pregray=cv2.cvtColor(preresize,cv2.COLOR_BGR2GRAY) premean=cv2.mean(pregray)[0] prearr=np.array(pregray.data) forjinrange(0,len(prearr)): ifprearr[j]>=premean: prearr[j]=1 else: prearr[j]=0 print"get",prepath dict[i]=prearr dictkeys=dict.keys() dictkeys.sort() index=0 whileTrue: ifindex>=len(dictkeys): break curkey=dictkeys[index] dellist=[] printcurkey index2=index whileTrue: ifindex2>=len(dictkeys): break j=dictkeys[index2] ifcurkey==j: index2=index2+1 continue arr1=dict[curkey] arr2=dict[j] diff=0 forkinrange(0,len(arr2)): ifarr1[k]!=arr2[k]: diff=diff+1 ifdiff<=5: dellist.append(j) index2=index2+1 iflen(dellist)>0: forjindellist: file=path+"/"+j print"remove",file os.remove(file) dict.pop(j) dictkeys=dict.keys() dictkeys.sort() index=index+1
defcmpandremove(path): index=0 flag=0 dirs=os.listdir(path) dirs.sort() iflen(dirs)<=0: return0 whileTrue: ifindex>=len(dirs): break prepath=path+dirs[index] printprepath index2=0 preimg=cv2.imread(prepath) iftype(preimg)istypes.NoneType: index=index+1 continue preresize=cv2.resize(preimg,(8,8)) pregray=cv2.cvtColor(preresize,cv2.COLOR_BGR2GRAY) premean=cv2.mean(pregray)[0] prearr=np.array(pregray.data) foriinrange(0,len(prearr)): ifprearr[i]>=premean: prearr[i]=1 else: prearr[i]=0 removepath=[] whileTrue: ifindex2>=len(dirs): break ifindex2!=index: curpath=path+dirs[index2] #printcurpath curimg=cv2.imread(curpath) iftype(curimg)istypes.NoneType: index2=index2+1 continue curresize=cv2.resize(curimg,(8,8)) curgray=cv2.cvtColor(curresize,cv2.COLOR_BGR2GRAY) curmean=cv2.mean(curgray)[0] curarr=np.array(curgray.data) foriinrange(0,len(curarr)): ifcurarr[i]>=curmean: curarr[i]=1 else: curarr[i]=0 diff=0 foriinrange(0,len(curarr)): ifcurarr[i]!=prearr[i]: diff=diff+1 ifdiff<=5: print'thesame' removepath.append(curpath) flag=1 index2=index2+1 index=index+1 iflen(removepath)>0: forfileinremovepath: print"remove",file os.remove(file) dirs=os.listdir(path) dirs.sort() iflen(dirs)<=0: return0 #index=0 returnflag defmain(argv): iflen(argv)<=1: print"commanderror" return-1 ifos.path.exists(argv[1])isFalse: return-1 path=argv[1] ''' whileTrue: ifcmpandremove(path)==0: break ''' cmpandremove(path) return0 if__name__=='__main__': main(sys.argv)
为了节省操作,遍历所有目录,把想要去重的目录遍历一遍
#!/bin/bash indir=$1 addcount=0 functionintest() { forfilein$1/* do echo$file iftest-d$file then ~/similar.py$file/ intest$file fi done } intest$indir
以上这篇使用pythonopencv对目录下图片进行去重的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持毛票票。