微客导航 » 文章资讯 » python提取页面内url列表的方法

python提取页面内url列表的方法

2024-03-31 09:45:03 340

本文实例讲述了python提取页面内url列表的方法。分享给大家供大家参考。具体实现方法如下：

frombs4importBeautifulSoup
importtime,re,urllib2
t=time.time()
websiteurls={}
defscanpage(url):
websiteurl=url
t=time.time()
n=0
html=urllib2.urlopen(websiteurl).read()
soup=BeautifulSoup(html)
pageurls=[]
Upageurls={}
pageurls=soup.find_all("a",href=True)
forlinksinpageurls:
ifwebsiteurlinlinks.get("href")andlinks.get("href")notinUpageurlsandlinks.get("href")notinwebsiteurls:
Upageurls[links.get("href")]=0
forlinksinUpageurls.keys():
try:
urllib2.urlopen(links).getcode()
except:
print"connectfailed"
else:
t2=time.time()
Upageurls[links]=urllib2.urlopen(links).getcode()
printn,
printlinks,
printUpageurls[links]
t1=time.time()
printt1-t2
n+=1
print("totalis"+repr(n)+"links")
printtime.time()-t
scanpage("http://news.163.com/")

希望本文所述对大家的Python程序设计有所帮助。

返回顶部
3162201930
czq8825@qq.com

python提取页面内url列表的方法

热门推荐

随机推荐