Python读取本地文件并解析网页元素的方法
如下所示:
frombs4importBeautifulSoup
path='./web/new_index.html'
withopen(path,'r')asf:
Soup=BeautifulSoup(f.read(),'lxml')
titles=Soup.select('ul>li>div.article-info>h3>a')
fortitleintitles:
print(title.text)
输出:
Sardinia'stop10beaches
Howtogettanned
HowtobeanAussiebeachbum
Summer'scheatsheet
#其中
titles=Soup.select('ul>li>div.article-info>h3>a')
#等效
titles=Soup.select('h3a')
print(title.text) #等效 print(title.get_text()) print(title.string)
也可以使用以下代码
importbs4
path='./web/new_index.html'
withopen(path,'r')asf:
Soup=bs4.BeautifulSoup(f.read(),'lxml')
titles=Soup.select('h3a')
fortitleintitles:
print(title.string)
Html原文:
Home Site Other