5、MapReduce测试
首先,我们在NameNode主节点上准备2个名单文件,希望最终能够统计出2个名单文件中提及到的每个名字的数量,名单文件的内容如下:
[root@localhost hadoop-0.20.203.0]# cat 70_input/namelist_1 Harry Tony Bill Alex Kevin [root@localhost hadoop-0.20.203.0]# cat 70_input/namelist_2 Kevin Joe Harry Tony [root@localhost hadoop-0.20.203.0]# |
其次,将这些文件考贝到hadoop文件系统中:
[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -put 70_input input [root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls Found 1 items drwxr-xr-x - root supergroup 0 2011-08-16 03:35 /user/root/input [root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls input Found 2 items -rw-r--r-- 2 root supergroup 49 2011-08-16 03:35 /user/root/input/namelist_1 -rw-r--r-- 2 root supergroup 31 2011-08-16 03:35 /user/root/input/namelist_2 [root@localhost hadoop-0.20.203.0]# |
再次,我们直接调用系统自带的hadoop-examples-0.20.203.0.jar包中的wordcount程序来统计名字出现的数量:
[root@localhost hadoop-0.20.203.0]# bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount input output 11/08/16 05:26:31 INFO input.FileInputFormat: Total input paths to process : 2 11/08/16 05:26:32 INFO mapred.JobClient: Running job: job_201108160517_0002 11/08/16 05:26:33 INFO mapred.JobClient: map 0% reduce 0% 11/08/16 05:26:46 INFO mapred.JobClient: map 33% reduce 0% 11/08/16 05:26:47 INFO mapred.JobClient: map 66% reduce 0% 11/08/16 05:26:49 INFO mapred.JobClient: map 100% reduce 0% 11/08/16 05:26:58 INFO mapred.JobClient: map 100% reduce 100% 11/08/16 05:27:03 INFO mapred.JobClient: Job complete: job_201108160517_0002 …… [root@localhost hadoop-0.20.203.0]# |
最终,我们查看一下执行结果是否跟我们的期望相符合:
[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls output Found 3 items -rw-r--r-- 1 root supergroup 0 2011-08-16 05:30 /user/root/output/_SUCCESS drwxr-xr-x - root supergroup 0 2011-08-16 05:30 /user/root/output/_logs -rw-r--r-- 1 root supergroup 81 2011-08-16 05:30 /user/root/output/part-r-00000 [root@localhost hadoop-0.20.203.0]# bin/hadoop fs -cat output/part-r-00000 Harry 2 Bill 1 Tony 2 Alex 1 Kevin 2 Joe 1 [root@localhost hadoop-0.20.203.0]# |
结果跟我们期望的一样,那么通过本文的学习我们做了第一个Hadoop架构的整体实验,下一步大家可以分两步进行Hadoop学习:第一步,HDFS的运维管理;第二步,MapReduce程序的编写。