python爬虫 批量下载zabbix文档代码实例
这篇文章主要介绍了python爬虫批量下载zabbix文档代码实例,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
#-*-coding:UTF-8-*-
importrequests,re,time
url='https://www.zabbix.com/documentation/3.4/zh/manual'
base_url='https://www.zabbix.com/documentation/3.4/'
seconds=1
err_url=[]
defget_urls():
res=requests.get(url)
content=res.text
pattern=re.compile(r"indexmenu_4848130395ca30b274d8bd.add[(]'(zh/manual.*?)[']",re.S)
routes=pattern.findall(content)
urls=[base_url+itemforiteminroutes]
returnurls
defdownload(url):
download_url=url+"?do=export_pdf"
print("当前下载url:")
print(download_url)
res=requests.get(url)
ifres.status_code==200:
pattern=re.compile(r"(.*?) ",re.S)
title=pattern.findall(res.text)[0].encode("utf-8")
try:
filename=title.replace('\\','-').replace('/','-').replace('"','-').replace('*','-').replace('?','-').replace(':','-').replace('<','-').replace('>','-').replace('|','-')
exceptException:
title=pattern.findall(res.text)[0]
filename=title.replace('\\','-').replace('/','-').replace('"','-').replace('*','-').replace('?','-').replace(':','-').replace('<','-').replace('>','-').replace('|','-')
file=filename+'.pdf'
res=requests.get(download_url)
ifres.status_code==200:
withopen(file,"wb")asf:
f.write(res.content)
print('下载成功')
else:
print('下载失败')
err_url.append(download_url)
else:
print('获取文件名失败,停止当前下载')
err_url.append(download_url)
defdownloads(urls):
forurlinurls:
download(url)
time.sleep(seconds)
iflen(err_url):
print("下载失败的URL:")
print(err_url)
defmain():
print("下载开始")
urls=get_urls()
downloads(urls)
print("下载完成")
if__name__=='__main__':
main()
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持毛票票。