PHP代码实现爬虫记录——超管用
实现爬虫记录本文从创建crawler数据库,robot.php记录来访的爬虫从而将信息插入数据库crawler,然后从数据库中就可以获得所有的爬虫信息。实现代码具体如下:
数据库设计
createtablecrawler ( crawler_IDbigint()unsignednotnullauto_incrementprimarykey, crawler_categoryvarchar()notnull, crawler_datedatetimenotnulldefault'--::', crawler_urlvarchar()notnull, crawler_IPvarchar()notnull )defaultcharset=utf;
以下文件robot.php记录来访的爬虫,并将信息写入数据库:
<?php $ServerName=$_SERVER["SERVER_NAME"]; $ServerPort=$_SERVER["SERVER_PORT"]; $ScriptName=$_SERVER["SCRIPT_NAME"]; $QueryString=$_SERVER["QUERY_STRING"]; $serverip=$_SERVER["REMOTE_ADDR"]; $Url="http://".$ServerName; if($ServerPort!="") { $Url=$Url.":".$ServerPort; } $Url=$Url.$ScriptName; if($QueryString!="") { $Url=$Url."?".$QueryString; } $GetLocationURL=$Url; $agent=$_SERVER["HTTP_USER_AGENT"]; $agent=strtolower($agent); $Bot=""; if(strpos($agent,"bot")>-) { $Bot="OtherCrawler"; } if(strpos($agent,"googlebot")>-) { $Bot="Google"; } if(strpos($agent,"mediapartners-google")>-) { $Bot="GoogleAdsense"; } if(strpos($agent,"baiduspider")>-) { $Bot="Baidu"; } if(strpos($agent,"sogouspider")>-) { $Bot="Sogou"; } if(strpos($agent,"yahoo")>-) { $Bot="Yahoo!"; } if(strpos($agent,"msn")>-) { $Bot="MSN"; } if(strpos($agent,"ia_archiver")>-) { $Bot="Alexa"; } if(strpos($agent,"iaarchiver")>-) { $Bot="Alexa"; } if(strpos($agent,"sohu")>-) { $Bot="Sohu"; } if(strpos($agent,"sqworm")>-) { $Bot="AOL"; } if(strpos($agent,"yodaoBot")>-) { $Bot="Yodao"; } if(strpos($agent,"iaskspider")>-) { $Bot="Iask"; } require("./dbinfo.php"); date_default_timezone_set('PRC'); $shijian=date("Y-m-dh:i:s",time()); //连接到MySQL服务器 $connection=mysql_connect($host,$username,$password); if(!$connection) { die('Notconnected:'.mysql_error()); } //设置活动的MySQL数据库 $db_selected=mysql_select_db($database,$connection); if(!$db_selected) { die('Can\'tusedb:'.mysql_error()); } //向数据库插入数据 $query="insertintocrawler(crawler_category,crawler_date,crawler_url,crawler_IP)values('$Bot','$shijian','$GetLocationURL','$serverip')"; $result=mysql_query($query); if(!$result) { die('Invalidquery:'.mysql_error()); } ?>
成功了,现在访问数据库即可得知什么时候哪里的蜘蛛爬过你的什么页面。
viewsourceprint? <?php include'./robot.php'; include'../library/page.Class.php'; $page=$_GET['page']; include'../library/conn_new.php'; $count=$mysql->num_rows($mysql->query("select*fromcrawler")); $pages=newPageClass($count,,$_GET['page'],$_SERVER['PHP_SELF'].'?page={page}'); $sql="select*fromcrawlerorderby"; $sql.="crawler_datedesclimit".$pages->page_limit.",".$pages->myde_size; $result=$mysql->query($sql); ?> <tablewidth=""> <thead> <tr> <tdbgcolor="#CCFFFF"></td> <tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫访问时间</td> <tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫分类</td> <tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫IP</td> <tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫访问的URL</td> </tr> </thead> <?php while($myrow=$mysql->fetch_array($result)){ ?> <tr> <tdwidth=""><imgsrc="../images/topicnew.gif"/></td> <tdwidth=""style="font-family:Georgia"><?echo$myrow["crawler_date"]?></td> <tdwidth=""style="color:#FA"><?echo$myrow["crawler_category"]?></td> <tdwidth=""><?echo$myrow["crawler_IP"]?></td> <tdwidth=""><?echo$myrow["crawler_url"]?></td> </tr> <?php } ?> </table> <?php echo$pages->myde_write(); ?>
以上代码就是PHP代码实现爬虫记录——超管用的全部内容,希望对大家有所帮助。