scrapy自定义pipeline类实现将采集数据保存到mongodb的方法
本文实例讲述了scrapy自定义pipeline类实现将采集数据保存到mongodb的方法。分享给大家供大家参考。具体如下:
#StandardPythonlibraryimports #3rdpartymodules importpymongo fromscrapyimportlog fromscrapy.confimportsettings fromscrapy.exceptionsimportDropItem classMongoDBPipeline(object): def__init__(self): self.server=settings['MONGODB_SERVER'] self.port=settings['MONGODB_PORT'] self.db=settings['MONGODB_DB'] self.col=settings['MONGODB_COLLECTION'] connection=pymongo.Connection(self.server,self.port) db=connection[self.db] self.collection=db[self.col] defprocess_item(self,item,spider): err_msg='' forfield,datainitem.items(): ifnotdata: err_msg+='Missing%sofpoemfrom%s\n'%(field,item['url']) iferr_msg: raiseDropItem(err_msg) self.collection.insert(dict(item)) log.msg('ItemwrittentoMongoDBdatabase%s/%s'%(self.db,self.col), level=log.DEBUG,spider=spider) returnitem
希望本文所述对大家的python程序设计有所帮助。