微客导航 » 文章资讯 » python3的UnicodeDecodeError解决方法

python3的UnicodeDecodeError解决方法

2023-08-04 05:48:04 316

爬虫部分解码异常

response.content.decode()#默认使用utf-8出现解码异常

以下是设计的通用解码

通过text获取编码

#通过text获取编码
importrequests
fromlxmlimportetree


defpublic_decode():
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_15_1)AppleWebKit/537.36(KHTML,likeGecko)Chrome/78.0.3904.108Safari/537.36'
}
response=requests.get('https://blog.csdn.net/a13951206104',headers=headers)
html=etree.HTML(response.text)#response.text能自动获取编码,大多乱码
_charset=html.xpath('//@charset')or[]
if_charset:
encode_content=response.content.decode(_charset[0].strip().lower(),
errors='replace')#如果设置为replace，则会用?取代非法字符；
return{'response_text':encode_content,"response_obj":response}
for_charset_in['utf-8','gbk','gb2312']#国内主要这3种:
if'�'notinresponse.content.decode(_charset_,errors='replace'):
return{'response_text':response.content.decode(_charset_,errors='replace'),
"response_obj":response}
else:
#默认还得是utf-8
return{'response_text':response.content.decode('utf-8',errors='replace'),
"response_obj":response}

通过数据来解编码(推荐)

defpublic_decode(response):
headers={
'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_15_1)AppleWebKit/537.36(KHTML,likeGecko)Chrome/78.0.3904.108Safari/537.36'
}
response=requests.get('https://blog.csdn.net/a13951206104',headers=headers)
html=etree.HTML(response.text)
#不希望抓下来的数据中有非法字符
item=dict()
result=None
for_charset_in['utf-8','gbk','gb2312']:
ifresponse:
result=response.content.decode(_charset_,errors='replace')
item['content']=html.xpath('//*[@id="content"]')
if'�'notinresult['content'].strip():
result=response.content.decode(_charset_,errors='replace')
break
ifnotresult:
#默认utf-8
result=response.content.decode(_charset_,errors='replace')

errors=‘replace'使解码不报异常,然后把几个常用的编码一个个试下,最后要看落下来的数据,所以最好拿数据去获取合适的编码

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持毛票票。

声明：本文内容来源于网络，版权归原作者所有，内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：czq8825#qq.com（发邮件时，请将#更换为@）进行举报，并提供相关证据，一经查实，本站将立刻删除涉嫌侵权内容。

返回顶部
3162201930
czq8825@qq.com

python3的UnicodeDecodeError解决方法

热门推荐

随机推荐