PHP代码实现爬虫记录——超管用
实现爬虫记录本文从创建crawler数据库,robot.php记录来访的爬虫从而将信息插入数据库crawler,然后从数据库中就可以获得所有的爬虫信息。实现代码具体如下:
数据库设计
createtablecrawler ( crawler_IDbigint()unsignednotnullauto_incrementprimarykey, crawler_categoryvarchar()notnull, crawler_datedatetimenotnulldefault'--::', crawler_urlvarchar()notnull, crawler_IPvarchar()notnull )defaultcharset=utf;
以下文件robot.php记录来访的爬虫,并将信息写入数据库:
<?php
$ServerName=$_SERVER["SERVER_NAME"];
$ServerPort=$_SERVER["SERVER_PORT"];
$ScriptName=$_SERVER["SCRIPT_NAME"];
$QueryString=$_SERVER["QUERY_STRING"];
$serverip=$_SERVER["REMOTE_ADDR"];
$Url="http://".$ServerName;
if($ServerPort!="")
{
$Url=$Url.":".$ServerPort;
}
$Url=$Url.$ScriptName;
if($QueryString!="")
{
$Url=$Url."?".$QueryString;
}
$GetLocationURL=$Url;
$agent=$_SERVER["HTTP_USER_AGENT"];
$agent=strtolower($agent);
$Bot="";
if(strpos($agent,"bot")>-)
{
$Bot="OtherCrawler";
}
if(strpos($agent,"googlebot")>-)
{
$Bot="Google";
}
if(strpos($agent,"mediapartners-google")>-)
{
$Bot="GoogleAdsense";
}
if(strpos($agent,"baiduspider")>-)
{
$Bot="Baidu";
}
if(strpos($agent,"sogouspider")>-)
{
$Bot="Sogou";
}
if(strpos($agent,"yahoo")>-)
{
$Bot="Yahoo!";
}
if(strpos($agent,"msn")>-)
{
$Bot="MSN";
}
if(strpos($agent,"ia_archiver")>-)
{
$Bot="Alexa";
}
if(strpos($agent,"iaarchiver")>-)
{
$Bot="Alexa";
}
if(strpos($agent,"sohu")>-)
{
$Bot="Sohu";
}
if(strpos($agent,"sqworm")>-)
{
$Bot="AOL";
}
if(strpos($agent,"yodaoBot")>-)
{
$Bot="Yodao";
}
if(strpos($agent,"iaskspider")>-)
{
$Bot="Iask";
}
require("./dbinfo.php");
date_default_timezone_set('PRC');
$shijian=date("Y-m-dh:i:s",time());
//连接到MySQL服务器
$connection=mysql_connect($host,$username,$password);
if(!$connection)
{
die('Notconnected:'.mysql_error());
}
//设置活动的MySQL数据库
$db_selected=mysql_select_db($database,$connection);
if(!$db_selected)
{
die('Can\'tusedb:'.mysql_error());
}
//向数据库插入数据
$query="insertintocrawler(crawler_category,crawler_date,crawler_url,crawler_IP)values('$Bot','$shijian','$GetLocationURL','$serverip')";
$result=mysql_query($query);
if(!$result)
{
die('Invalidquery:'.mysql_error());
}
?>
成功了,现在访问数据库即可得知什么时候哪里的蜘蛛爬过你的什么页面。
viewsourceprint?
<?php
include'./robot.php';
include'../library/page.Class.php';
$page=$_GET['page'];
include'../library/conn_new.php';
$count=$mysql->num_rows($mysql->query("select*fromcrawler"));
$pages=newPageClass($count,,$_GET['page'],$_SERVER['PHP_SELF'].'?page={page}');
$sql="select*fromcrawlerorderby";
$sql.="crawler_datedesclimit".$pages->page_limit.",".$pages->myde_size;
$result=$mysql->query($sql);
?>
<tablewidth="">
<thead>
<tr>
<tdbgcolor="#CCFFFF"></td>
<tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫访问时间</td>
<tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫分类</td>
<tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫IP</td>
<tdbgcolor="#CCFFFF"align="center"style="color:#">爬虫访问的URL</td>
</tr>
</thead>
<?php
while($myrow=$mysql->fetch_array($result)){
?>
<tr>
<tdwidth=""><imgsrc="../images/topicnew.gif"/></td>
<tdwidth=""style="font-family:Georgia"><?echo$myrow["crawler_date"]?></td>
<tdwidth=""style="color:#FA"><?echo$myrow["crawler_category"]?></td>
<tdwidth=""><?echo$myrow["crawler_IP"]?></td>
<tdwidth=""><?echo$myrow["crawler_url"]?></td>
</tr>
<?php
}
?>
</table>
<?php
echo$pages->myde_write();
?>
以上代码就是PHP代码实现爬虫记录——超管用的全部内容,希望对大家有所帮助。