Python 实现任意区域文字识别(OCR)操作

2023-07-04 13:10:03 405

本文的OCR当然不是自己从头开发的，是基于百度智能云提供的API（我感觉是百度在中国的人工智能领域值得称赞的一大贡献），其提供的API完全可以满足个人使用，相对来说简洁准确率高。

安装OCRPythonSDK

OCRPythonSDK目录结构

├──README.md
├──aip//SDK目录
│├──__init__.py//导出类
│├──base.py//aip基类
│├──http.py//http请求
│└──ocr.py//OCR
└──setup.py//setuptools安装

支持Python版本：2.7.+,3.+

安装使用PythonSDK有如下方式：

如果已安装pip，执行pipinstallbaidu-aip即可。

如果已安装setuptools，下载后执行pythonsetup.pyinstall即可。

代码实现

下面让我们来看一下代码实现。

主要使用的模块有

importos#操作系统相关
importsys#系统相关
importtime#时间获取
importsignal#系统信号
importwinsound#提示音
fromaipimportAipOcr#百度OCRAPI
fromPILimportImageGrab#捕获剪切板中的图片
importwin32clipboardaswc#WINDOWS剪切板操作
importwin32con#这里用于获取WINDOWS剪贴板数据的标准格式

第一步这里的APP_ID,API_KEY,SECRET_KEY是通过登陆百度智能云后自己在OCR板块申请的,实现基本的OCR程序，可以通过图片获取文字。

"""你的APPIDAKSK"""
APP_ID='xxx'
API_KEY='xxx'
SECRET_KEY='xxx'

client=AipOcr(APP_ID,API_KEY,SECRET_KEY)

"""读取图片"""
defget_file_content(filePath):
withopen(filePath,'rb')asfp:
returnfp.read()

"""从API的返回字典中获取文字"""
defgetOcrText(txt_dict):
txt=""
iftype(txt_dict)==dict:
foriintxt_dict['words_result']:
txt=txt+i["words"]
iflen(i["words"])<25:#这里使用字符串长度决定了文本是否换行，读者可以根据自己的喜好控制回车符的输出，实现可控的文本显示形式
txt=txt+"\n\n"
returntxt

"""调用通用/高精度文字识别,图片参数为本地图片"""
defBaiduOcr(imageName,Accurate=True):
image=get_file_content(imageName)
ifAccurate:
returngetOcrText(client.basicGeneral(image))
else:
returngetOcrText(client.basicAccurate(image))

"""带参数调用通用文字识别,图片参数为远程url图片"""
defBaiduOcrUrl(url):
returngetOcrText(client.basicGeneralUrl(url))

第二步，实现快捷键获取文字，将识别文字放入剪切板中，提示音提醒以及快捷键退出程序

"""剪切板操作函数"""
defget_clipboard():
wc.OpenClipboard()
txt=wc.GetClipboardData(win32con.CF_UNICODETEXT)
wc.CloseClipboard()
returntxt

defempty_clipboard():
wc.OpenClipboard()
wc.EmptyClipboard()
wc.CloseClipboard()

defset_clipboard(txt):
wc.OpenClipboard()
wc.EmptyClipboard()
wc.SetClipboardData(win32con.CF_UNICODETEXT,txt)
wc.CloseClipboard()

"""截图后,调用通用/高精度文字识别"""
defBaiduOcrScreenshots(Accurate=True,path="./",ifauto=False):
ifnotos.path.exists(path):
os.makedirs(path)
image=ImageGrab.grabclipboard()
ifimage!=None:
print("\rTheimagehasbeenobtained.Pleasewaitamoment!",end="")
filename=str(time.time_ns())
image.save(path+filename+".png")
ifAccurate:
txt=getOcrText(client.basicAccurate(get_file_content(path+filename+".png")))
else:
txt=getOcrText(client.basicGeneral(get_file_content(path+filename+".png")))
os.remove(path+filename+".png")
#f=open(os.path.abspath(path)+"\\"+filename+".txt",'w')
#f.write(txt)
set_clipboard(txt)
winsound.PlaySound('SystemAsterisk',winsound.SND_ASYNC)
#os.startfile(os.path.abspath(path)+"\\"+filename+".txt")
#empty_clipboard()
returntxt
else:
ifnotifauto:
print("PleasegetthescreenshotsbyShift+Win+S!",end="")
return""
else:
print("\rPleasegetthescreenshotsbyShift+Win+S!",end="")

defsig_handler(signum,frame):
sys.exit(0)

defremoveTempFile(file=[".txt",".png"],path="./"):
ifnotos.path.exists(path):
os.makedirs(path)
pathDir=os.listdir(path)
foriinpathDir:
forjinfile:
ifjini:
os.remove(path+i)

defAutoOcrFile(path="./",filetype=[".png",".jpg",".bmp"]):
ifnotos.path.exists(path):
os.makedirs(path)
pathDir=os.listdir(path)
foriinpathDir:
forjinfiletype:
ifjini:
f=open(os.path.abspath(path)+"\\"+str(time.time_ns())+".txt",'w')
f.write(BaiduOcr(path+i))
break

defAutoOcrScreenshots():
signal.signal(signal.SIGINT,sig_handler)
signal.signal(signal.SIGTERM,sig_handler)
print("WaitingForCtrl+Ctoexitaterremovingallpicturefilesandtxtfiles!")
print("PleasegetthescreenshotsbyShift+Win+S!",end="")
while(1):
try:
BaiduOcrScreenshots(ifauto=True)
time.sleep(0.1)
exceptSystemExit:
removeTempFile()
break
else:
pass
finally:
pass

最终运行函数AutoOcrScreenshots函数便可以实现了：

if__name__=='__main__':
AutoOcrScreenshots()

使用方法

使用Windows10系统时，将以上代码放置在一个.py文件下，然后运行便可以使用Shift+Win+S快捷键实现任意区域截取，截取后图片将暂时存放在剪切板中，程序自动使用WindowsAPI获取图片内容，之后使用百度的OCRAPI获取文字，并将文字放置在剪切版内存中后发出提示音。

使用者则可以在开启程序后，使用快捷键截图后静待提示音后使用Ctrl+V将文字内容放置在自己所需的位置。

补充：Python中文OCR

有个需求，需要从一张图片中识别出中文，通过python来实现，这种这么高大上的黑科技我们普通人自然搞不了，去github找了一个似乎能满足需求的开源库-tesseract-ocr：

Tesseract的OCR引擎目前已作为开源项目发布在GoogleProject，其项目主页在这里查看https://github.com/tesseract-ocr，

它支持中文OCR，并提供了一个命令行工具。python中对应的包是pytesseract.通过这个工具我们可以识别图片上的文字。

笔者的开发环境如下：

macosx

python3.6

brew

安装tesseract

brewinstalltesseract

安装python对应的包：pytesseract

pipinstallpytesseract

怎么用？

如果要识别中文需要下载对应的训练集：https://github.com/tesseract-ocr/tessdata，下载”chi_sim.traineddata”，然后copy到训练数据集的存放路径，如：

具体代码就几行:

#!/usr/bin/envpython3
#-*-coding:utf-8-*-

importpytesseract
fromPILimportImage

#openimage
image=Image.open('test.png')
code=pytesseract.image_to_string(image,lang='chi_sim')
print(code)

OCR速度比较慢，大家可以拿一张包含中文的图片试验一下。

以上为个人经验，希望能给大家一个参考，也希望大家多多支持毛票票。如有错误或未考虑完全的地方，望不吝赐教。

声明：本文内容来源于网络，版权归原作者所有，内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：czq8825#qq.com（发邮件时，请将#更换为@）进行举报，并提供相关证据，一经查实，本站将立刻删除涉嫌侵权内容。

Python 实现任意区域文字识别(OCR)操作

安装OCRPythonSDK

代码实现

使用方法

热门推荐

随机推荐