技术开发 频道

云计算的利刃:快速部署Hadoop集群

  5、MapReduce测试

  首先,我们在NameNode主节点上准备2个名单文件,希望最终能够统计出2个名单文件中提及到的每个名字的数量,名单文件的内容如下:

[root@localhost hadoop-0.20.203.0]# cat 70_input/namelist_1

Harry

Tony

Bill

Alex

Kevin

[root@localhost hadoop-0.20.203.0]# cat 70_input/namelist_2

Kevin

Joe

Harry

Tony

[root@localhost hadoop-0.20.203.0]#

  其次,将这些文件考贝到hadoop文件系统中:

[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -put 70_input input

[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls

Found 1 items

drwxr-xr-x   - root supergroup          0 2011-08-16 03:35 /user/root/input

[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls input

Found 2 items

-rw-r--r--   2 root supergroup         49 2011-08-16 03:35 /user/root/input/namelist_1

-rw-r--r--   2 root supergroup         31 2011-08-16 03:35 /user/root/input/namelist_2

[root@localhost hadoop-0.20.203.0]#

  再次,我们直接调用系统自带的hadoop-examples-0.20.203.0.jar包中的wordcount程序来统计名字出现的数量:

[root@localhost hadoop-0.20.203.0]# bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount input output

11/08/16 05:26:31 INFO input.FileInputFormat: Total input paths to process : 2

11/08/16 05:26:32 INFO mapred.JobClient: Running job: job_201108160517_0002

11/08/16 05:26:33 INFO mapred.JobClient:  map 0% reduce 0%

11/08/16 05:26:46 INFO mapred.JobClient:  map 33% reduce 0%

11/08/16 05:26:47 INFO mapred.JobClient:  map 66% reduce 0%

11/08/16 05:26:49 INFO mapred.JobClient:  map 100% reduce 0%

11/08/16 05:26:58 INFO mapred.JobClient:  map 100% reduce 100%

11/08/16 05:27:03 INFO mapred.JobClient: Job complete: job_201108160517_0002

……

[root@localhost hadoop-0.20.203.0]#

  最终,我们查看一下执行结果是否跟我们的期望相符合:

[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -ls output

Found 3 items

-rw-r--r--   1 root supergroup          0 2011-08-16 05:30 /user/root/output/_SUCCESS

drwxr-xr-x   - root supergroup          0 2011-08-16 05:30 /user/root/output/_logs

-rw-r--r--   1 root supergroup         81 2011-08-16 05:30 /user/root/output/part-r-00000

[root@localhost hadoop-0.20.203.0]# bin/hadoop fs -cat output/part-r-00000

Harry            2

Bill               1

Tony             2

Alex             1

Kevin            2

Joe               1

[root@localhost hadoop-0.20.203.0]#

  结果跟我们期望的一样,那么通过本文的学习我们做了第一个Hadoop架构的整体实验,下一步大家可以分两步进行Hadoop学习:第一步,HDFS的运维管理;第二步,MapReduce程序的编写。

0
相关文章