python 爬虫实现增量去重和定时爬取实例

2023-07-31 12:02:04 380

importpymysql definsert_db(db_table,issue,time_str,num_code): host='127.0.0.1' user='root' password='root' port=3306 db='lottery' data_base=pymysql.connect(host=host,user=user,password=password,port=port,db=db) cursor=data_base.cursor() try: sql="INSERTINTO%sVALUES('%s','%s','%s')"%(db_table,issue,time_str,num_code) cursor.execute(sql) data_base.commit() exceptValueErrorase: print(e) data_base.rollback() finally: cursor.close() data_base.close() defselect_db(issue,db_table): host='127.0.0.1' user='root' password='root' port=3306 db='lottery' data_base=pymysql.connect(host=host,user=user,password=password,port=port,db=db) cursor=data_base.cursor() try: sql="SELECT'%s'FROM%s"%(issue,db_table) cursor.execute(sql) data_base.commit() exceptValueErrorase: print(e) data_base.rollback() finally: returnissue

#使用bs4进行网页解析 #实现了增量去重 #实现了定时爬取 importdatetime importtime frombs4importBeautifulSoup importrequests frommysql_configimportinsert_db frommysql_configimportselect_db defmy_test(): db_table='lottery_table' url='http://kj.13322.com/kl10_dkl10_history_dtoday.html' res=requests.get(url) content=res.content soup=BeautifulSoup(content,'html.parser',from_encoding='utf8') c_t=soup.select('#trend_table')[0] trs=c_t.contents[4:] fortrintrs: iftr=='\n': continue tds=tr.select('td') issue=tds[1].text time_str=tds[0].text num_code=tr.table.text.replace('\n0',',').replace('\n',',').strip(',') print('期号：%s\t时间：%s\t号码:%s'%(str(issue),str(time_str),str(num_code))) issue_db=select_db(issue,db_table) try: ifissue_db==issue: insert_db(db_table,issue_db,time_str,num_code) print('添加%s到%s成功'%(issue_db,db_table)) exceptExceptionase: print('%s已经存在！'%issue_db) print(e) if__name__=='__main__': flag=0 now=datetime.datetime.now() sched_time=datetime.datetime(now.year,now.month,now.day,now.hour,now.minute,now.second)+\ datetime.timedelta(seconds=3) whileTrue: now=datetime.datetime.now() ifsched_time

以上这篇python爬虫实现增量去重和定时爬取实例就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持毛票票。

声明：本文内容来源于网络，版权归原作者所有，内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：czq8825#qq.com（发邮件时，请将#更换为@）进行举报，并提供相关证据，一经查实，本站将立刻删除涉嫌侵权内容。

python 爬虫 实现增量去重和定时爬取实例

热门推荐

随机推荐

python 爬虫实现增量去重和定时爬取实例