实战：php环境下redis实现IP查找-技术开发专区

实战：php环境下redis实现IP查找

作者：魏庆滨编辑：王玉圆 2012-08-27 00:05 IT168网站原创

【IT168 技术】redis中提供了丰富的数据类型，如字符串(string)、列表(list)、哈希表(hash)、集合(set)及有序集合(Sorted set)等，可以实现各种有趣的应用，如使用有序集合，实现IP查找。在这里，我们进行一下比较，测试一下使用redis实现ip查找在php中的效率，并与使用二分查找IP方法进行比较。

　　一、准备

　　搭建测试环境，安装Redis 2.4.6、php 5.2.17以及phpredis扩展。

　　通常我们需要根据用户的ip地址，获得归属地信息，而ip库中的ip信息一般是这种格式：“1.12.0.0-1.15.255.255 北京方正宽带”，表示1.12.0.0-1.15.255.255这个段内的ip，是北京的ip，使用方正宽带上网。

　　针对这种情况，我们构造两个数组，格式分别是

$ipadd[0]->start="17563648";
$ipadd[0]->end="17825791";

　　以及

$ipinfo["17563648-17825791"]="北京方正宽带";

　　$ip数组中存放IP段起始、截止ip经过ip2long函数转换后的数值。根据ip，可以获得存放在$ipadd数组中对应的ip段(17563648-1782579)，然后就可以在$ipinfo数组中唯一确定该IP所在地的信息“北京方正宽带”。在这里，就是测试根据ip查找对应的ip段的效率。

　　把$ipadd数组，保存到redis里面，保存成有序集合格式。其中ipaddress.dat中保存了完整的按照$ipadd格式组织的ip库信息，共有11万多条记录。

[root@localhost test]# more addredis.php
<?php
require 'ipaddress.dat';
$redis=new redis();
$redis->connect('127.0.0.1',6379);
foreach ($ipadd as $ip) {
$redis->zadd('ip',$ip->start,'start:'.$ip->start.'-'.$ip->end);
$redis->zadd('ip',$ip->start,'end:'.$ip->start.'-'.$ip->end);
}
[root@localhost test]# php addredis.php

　　进入redis，查看集合情况

[root@localhost test]# telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
zcard ip
:215706

　　可以看到，ip集合中有21万多条，包括了10万多条ip信息(实际上是11万多条，因为起始ip有重复，这里简单的认为起始ip就是唯一的)。

　　准备测试数据，采用某个网站的实际访问日志test.access_log，ip分布在全国各地。格式如下：

[root@localhost test]# tail test.access_log
302 "-" 119.189.159.152 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "84658813-1322969571-03601900"
302 "-" 221.239.111.102 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "48818100-1328600341-28058000"
302 "-" 60.30.33.171 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "69217158-1328601481-34381000"
302 "-" 60.190.0.6 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "58494331-1328593662-19393300"
302 "-" 60.13.46.192 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "74718190-1328442091-23296100"
302 "-" 60.190.210.178 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "93053412-1328583110-29350700"
302 "-" 60.190.40.66 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "90432067-1328596666-46225600"
302 "-" 1.86.220.76 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "74104724-1325918365-59565400"
302 "-" 61.189.53.66 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "33954701-1328587981-80413400"
302 "-" 121.15.174.113 test.com [07/Feb/2012:15:58:01 +0800] "GET /test.php HTTP/1.0 0" 0 "52170030-1328594341-01346100"

　　测试记录条数：200000

[root@localhost test]# cat test.access_log |wc -l
200000

${PageNumber}

　　二、程序代码

　　准备在php中，读取test.access_log中的文件，通过正则表达式，获得其中的ip信息，然后使用ip2long转换为整型，再使用redis获得该ip所在的ip段起止范围。同时，我们实现一个二分查找ip的起止范围函数get_ip，与使用redis查找的方法运行时间进行比较。

　　get_ip函数与计时函数：

[root@localhost test]# more func.php
<?php
function get_ip($ipadd,$ip) {
$start = 0;
$end = count($ipadd) - 1;

while($start <= $end) {
$index = intval(($start + $end) / 2);

if ($ip < $ipadd [$index]->start) {
$end = $index - 1;
} elseif ($ip > $ipadd [$index]->end) {
$start = $index + 1;
} else {//$ipadd [$index]->end >= $ip && $ipadd [$index]->start <= $ip
return ("i" . $ipadd [$index]->start . "p" . $ipadd [$index]->end);
}
}
return "Unknown";
}
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}

　　逐条读取日志文件，对其中每一条根据正则表达式获得ip信息，然后再在redis中获得对应的ip段。同时记录每一步操作的时间。

　　testredis.php脚本

[root@localhost test]# more testredis.php
<?php
require 'ipaddress.dat';//ip地址数组
require_once('func.php');
$redis=new redis();
$redis->connect('127.0.0.1',6379);
$filename='test.access_log';
echo "二分法开始\n";
$time1 = 0;
$handle = @fopen($filename, 'r');
while (!feof($handle)){
        $log = fgets($handle,4096);
        $t1_start = microtime_float();
        if (preg_match('/^\d+ \".*\" (.*) test\.com \[.*\] \"GET \/test.php .* \".*\"/i',$log,$arr_log)){
                $time_start = microtime_float();
                $pieces = get_ip($ipadd,ip2long($arr_log[2]));
                $time_mid = microtime_float();
                $time1 += $time_mid - $time_start;
        }
        $t1_stop = microtime_float();
        $t1 += $t1_stop - $t1_start;
}
fclose($handle);
echo "get_ip使用时间: $time1 seconds\n";
echo "二分法总使用时间: $t1 seconds\n";
echo "二分法结束\n";
echo "Redis 开始\n";
$time2 = 0;
$handle = @fopen($filename, 'r');
while (!feof($handle)){
        $log = rawurldecode(fgets($handle,4096));
        $t2_start = microtime_float();
        if (preg_match('/^\d+ \".*\" (.*) test\.com \[.*\] \"GET \/test.php .* \".*\"/i',$log,$arr_log)){
                $i++;
                $time_start = microtime_float();
                //实际需要再判断一下是否是截止ip，是截止ip时，才得到该ip段信息，否则ip是不属于该ip段的，这里省略了这一步
                $pieces = $redis->zrangebyscore('ip',ip2long($arr_log[2]),4294967295,array('withscores' =>true,'limit'=>array(0, 1)));
                $time_mid = microtime_float();
                $time2 += $time_mid - $time_start;
        }
        $t2_stop = microtime_float();
        $t2 += $t2_stop - $t2_start;
}
fclose($handle);
echo "Redis使用时间: $time2 seconds\n";
echo "Redis总使用时间: $t2 seconds\n";
echo "Redis结束\n";

　　运行结果如下：

[root@localhost test]# php testredis.php
二分法开始
get_ip使用时间: 3.89317107201 seconds
二分法总使用时间: 26.6202020645 seconds
二分法结束
Redis 开始
Redis使用时间: 12.8361856937 seconds
Redis总使用时间: 36.9929895401 seconds
Redis结束

　　根据redis的文档，ZRANGEBYSCORE函数的时间复杂度是O(log(N)+M)，其中N是有序集合中元素个数，M是返回的结果集中元素个数。在这里M是常量1，可以认为是O(log(N))，与二分查找的时间复杂度是相同的，但是这里运行时间相差比较大，接近是二分查找的三倍，是什么造成的呢?

　　通过监控cpu可以看到，开始运行二分查找的时候，cpu使用率在50%左右，系统使用在2%左右;而运行redis中zrangebyscore时，cpu使用率下降，但是系统使用上升到了5%以上，怀疑是因为上下文切换导致的。

[root@localhost ~]# vmstat -n 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0     96  55952 166932 985812    0    0     9    65    8    6  1  1 98  0  0
0  0     96  55952 166932 985860    0    0     0     0 1105  274  0  0 100  0  0
0  0     96  56076 166932 985860    0    0     0    15 1113  313  0  0 100  0  0
0  0     96  56076 166932 985860    0    0     0   372 1159  283  0  0 100  0  0
0  0     96  56464 166952 985864    0    0     0   244 1117  346  1  1 98  0  0
0  0     96  56464 166952 985864    0    0     0     0 1103  273  0  0 100  0  0
0  1     96  48532 166968 991548    0    0  1833     1 1126  319  0  0 96  4  0
1  0     96  42216 166572 888116    0    0  4824   223 1159  448 46  4 48  2  0
1  0     96  43544 166572 886184    0    0  3460     0 1132  324 48  2 50  0  0
1  0     96  43008 166580 886912    0    0  3459    11 1132  322 48  2 50  0  0
1  0     96  60728 166584 869088    0    0  3432   244 1151  411 48  3 48  0  0
1  0     96  50228 166592 879532    0    0  3501    12 1133  330 49  1 50  0  0
1  0     96  42848 166584 886676    0    0  3460     4 1151  325 48  2 50  0  0
1  0     96  43508 166520 886160    0    0  3459     0 1140  330 48  2 50  0  0
1  0     96  56392 166432 870924    0    0  3419   404 1169  542 49  3 48  0  0
1  0     96  46132 166460 881272    0    0  3427   275 1171  352 48  2 48  3  0
1  0     96  43972 166448 883324    0    0  3460    17 1132  325 49  2 50  0  0
1  0     96  46196 166416 882168    0    0  2225   231 1135 5666 44  7 48  0  0
1  0     96  45948 166416 882260    0    0     0  1381 1110 13843 41 10 49  0  0
1  0     96  45948 166416 882260    0    0     0    19 1103 13747 41 10 49  0  0
1  0     96  46640 166444 882232    0    0     0   215 1113 15996 36 17 47  0  0
1  0     96  46640 166444 882260    0    0     0    16 1158 12688 43  8 50  0  0
1  0     96  46640 166444 882260    0    0     0    19 1105 12558 43  7 50  0  0
1  0     96  46640 166444 882260    0    0     0     0 1103 12689 43  7 50  0  0
1  0     96  46780 166476 882264    0    0     0   239 1115 12654 44  8 49  0  0
1  0     96  46780 166480 882260    0    0     0    11 1103 12644 43  7 50  0  0
1  0     96  46780 166480 882264    0    0     0  1517 1133 12539 43  7 50  0  0
1  0     96  47168 166488 882256    0    0     0   221 1114 12634 43  9 49  0  0
1  0     96  47168 166488 882268    0    0     0     0 1104 12641 43  7 50  0  0
0  0     96 216728 166492 826272    0    0     0    20 1105 3096  8  3 90  0  0
0  0     96 217612 166504 826260    0    0     0   213 1132  356  1  1 98  0  0

${PageNumber}

　　我们知道，使用正则匹配的时候很耗费cpu资源，使用redis进行查找也耗费cpu资源，将两者分开看看是否能提高效率。先通过正则匹配，把ip数据取出来，然后再分别判断ip情况，修改写法如下：

[root@localhost test]# more test.php
<?php
require 'ipaddress.dat';//ip地址数组
require_once('func.php');
$redis=new redis();
$redis->connect('127.0.0.1',6379);
$filename='test.access_log';
$arr_tmp = array();
$handle = @fopen($filename, 'r');
while (!feof($handle)){
$log = rawurldecode(fgets($handle,4096));
if (preg_match('/^\d+ \".*\" (.*) test\.com \[.*\] \"GET \/test.php .* \".*\"/i',$log,$arr_log)){
$arr_tmp[]=ip2long($arr_log[2]);
}
}
fclose($handle);
$time_start = microtime_float();
foreach ($arr_tmp as $v) {
$pieces = get_ip($ipadd,$v);
}
$time_mid = microtime_float();
$time = $time_mid - $time_start;
echo "二分法: $time seconds\n";

$time_start = microtime_float();
foreach ($arr_tmp as $v) {
$pieces = $redis->zrangebyscore('ip',$v,4294967295,array('withscores' =>true,'limit'=>array(0, 1)));
}
$time_mid = microtime_float();
$time = $time_mid - $time_start;
echo "Redis: $time seconds\n";

　　运行一下看看效果：

[root@localhost test]# php test.php
二分法: 2.70506095886 seconds
Redis: 6.70395517349 seconds

　　这里可以看到，速度提高了很多，但是仍然比二分法慢，怀疑是因为与redis交互导致的时间长。

　　下面减少redis中ip集合里面的数据量

[root@localhost test]# telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
zcard ip
:95650

　　运行结果

[root@localhost test]# php test.php
二分法: 2.72314596176 seconds
Redis: 6.59329199791 seconds

　　继续减少，

root@localhost test]# telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
zcard ip
:19662

　　运行结果

[root@localhost test]# php test.php
二分法: 2.72199296951 seconds
Redis: 6.13260388374 seconds

　　继续减少

[root@localhost test]# telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
zcard ip
:1970

　　运行结果

[root@localhost test]# php test.php
二分法: 2.75167584419 seconds
Redis: 5.84946107864 seconds

　　测试ip库为空的极端情况。清空redis中ip集合中的数据，再运行程序。

　　运行结果如下：

[root@localhost test]# php test.php
二分法: 0.29176902771 seconds
Redis: 5.43927407265 seconds[root@localhost test]# telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
zcard ip
:215706
del ip
:1
zcard ip
:0

　　三、测试结果

　　通过上面的测试可以看到，在php中使用redis进行ip查找，查找速度会随集合数据量的减少而减少，但是与集合的数据量多少关系不大，时间应当主要消耗在与redis内部处理zrangebyscore函数上面。整体速度比使用二分查找慢。因此在使用phpredis扩展访问redis的情况下，不建议使用redis进行ip查找。另外，在php进行密集cpu运算(如运行preg_match)时，应避免与redis交互。

　　源文档：http://www.redis.cn/commands/zrangebyscore.html

关注我们