微客导航 » 文章资讯 » python爬取网页内容转换为PDF文件

python爬取网页内容转换为PDF文件

2023-09-11 19:45:05 376

本文实例为大家分享了python爬取网页内容转换为PDF的具体代码，供大家参考，具体内容如下

将廖雪峰的学习教程转换成PDF文件，代码只适合该网站，如果需要其他网站的教程，可靠需要进行稍微的修改。

#coding=utf-8
importos
importre
importtime
importpdfkit
importrequests
frombs4importBeautifulSoup
fromPyPDF2importPdfFileMerger
importsys
reload(sys)
sys.setdefaultencoding('utf8')

html_template="""






{content}



"""

#----------------------------------------------------------------------
defparse_url_to_html(url,name):
"""
解析URL，返回HTML内容
:paramurl:解析的url
:paramname:保存的html文件名
:return:html
"""
try:
response=requests.get(url)
soup=BeautifulSoup(response.content,'html.parser')
#正文
body=soup.find_all(class_="x-wiki-content")[0]
#标题
title=soup.find('h4').get_text()

#标题加入到正文的最前面，居中显示
center_tag=soup.new_tag("center")
title_tag=soup.new_tag('h1')
title_tag.string=title
center_tag.insert(1,title_tag)
body.insert(1,center_tag)
html=str(body)
#body中的img标签的src相对路径的改成绝对路径
pattern="(
以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持毛票票。

返回顶部
3162201930
czq8825@qq.com

python爬取网页内容转换为PDF文件

热门推荐

随机推荐