Python读取本地文件并解析网页元素的方法
如下所示:
frombs4importBeautifulSoup path='./web/new_index.html' withopen(path,'r')asf: Soup=BeautifulSoup(f.read(),'lxml') titles=Soup.select('ul>li>div.article-info>h3>a') fortitleintitles: print(title.text) 输出: Sardinia'stop10beaches Howtogettanned HowtobeanAussiebeachbum Summer'scheatsheet
#其中 titles=Soup.select('ul>li>div.article-info>h3>a') #等效 titles=Soup.select('h3a')
print(title.text) #等效 print(title.get_text()) print(title.string)
也可以使用以下代码
importbs4 path='./web/new_index.html' withopen(path,'r')asf: Soup=bs4.BeautifulSoup(f.read(),'lxml') titles=Soup.select('h3a') fortitleintitles: print(title.string)
Html原文:
Home Site Other