微客导航 » 文章资讯 » python3.4爬虫demo

python3.4爬虫demo

2023-08-31 11:45:05 375

python3.4所写爬虫

仅仅是个demo，以百度图片首页图片为例。能跑出图片上的图片；

使用eclipsepydev编写：

fromSpiderSimple.HtmLHelperimport*
importimp
importsys
imp.reload(sys)
#sys.setdefaultencoding('utf-8')
html=getHtml('http://image.baidu.com/')
try:
getImage(html)
exit()
exceptExceptionase:
print(e)

HtmlHelper.py文件

上面的SpiderSimple是自定义的包名

fromurllib.requestimporturlopen,urlretrieve
#正则库
importre
#打开网页
defgetHtml(url):
page=urlopen(url)
html=page.read()
returnhtml
#用正则爬里面的图片地址
defgetImage(Html):
try:
#reg=r'src="(.+?\.jpg)"class'
#image=re.compile(reg)
image=re.compile(r']*src[=\"\']+([^\"\']*)[\"\'][^>]*>',re.I)
Html=Html.decode('utf-8')
imaglist=re.findall(image,Html)
x=0
forimagurlinimaglist:
#将图片一个个下载到项目所在文件夹
urlretrieve(imagurl,'%s.jpg'%x)
x+=1
exceptExceptionase:
print(e)

要注意个大问题，python默认编码的问题。

有可能报UnicodeDecodeError:'ascii'codeccan'tdecodebyte0x??inposition1:ordinalnotinrange(128)，错误。这个要设置python的默认编码为utf-8.

设置最好的方式是写bat文件，

echooff
setPYTHONIOENCODING=utf8
python-u%1

然后重启电脑。

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作具有一定的参考学习价值，谢谢大家对毛票票的支持。如果你想了解更多相关内容请查看下面相关链接

返回顶部
3162201930
czq8825@qq.com

python3.4爬虫demo

热门推荐

随机推荐