3、配置集群
我们用变量$HADOOP_HOME代表Hadoop的主目录,它的值为: $HADOOP_HOME=/opt/hadoop-0.20.203.0。
1)将Java添加到Hadoop运行环境
首先,在NameNode主节点,编辑$HADOOP_HOM /conf/hadoop-env.sh文件,将Java加入到Hadoop的运行环境中,具体如下:
# The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/opt/jdk1.7.0 |
2)配置NameNode主节点信息
其次,在NameNode主节点上,编辑$HADOOP_HOM /conf/core-site.xml文件,添加NameNode主节点的IP和监听端口的相关信息。
[root@localhost hadoop-0.20.203.0]# cat conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://192.168.3.230:9000</value> </property> </configuration> [root@localhost hadoop-0.20.203.0]# |
3)配置数据冗余数量
再次,在NameNode主节点上,编辑$HADOOP_HOM /conf/hdfs-site.xml文件,配置数据冗余的数据的备份数量。
[root@localhost hadoop-0.20.203.0]# cat conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> [root@localhost hadoop-0.20.203.0]# |
4)配置jobtracker信息
然后,在NameNode主节点上,编辑$HADOOP_HOM /conf/mapred-site.xml文件,配置NameNode主节点上的jobtracker服务的端口。
[root@localhost hadoop-0.20.203.0]# cat conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>192.168.3.230:9001</value> </property> </configuration> [root@localhost hadoop-0.20.203.0]# |
5)配置master信息
接下来,在NameNode主节点上,编辑$HADOOP_HOM /conf/masters文件,配置主节点的IP信息。
[root@localhost hadoop-0.20.203.0]# cat conf/masters 192.168.3.230 [root@localhost hadoop-0.20.203.0]# |
6)配置slave信息
最后,在NameNode主节点上,编辑$HADOOP_HOM /conf/slave文件,配置从节点的IP信息,这个从节点可以有一个,也可以有多个,对于本例有4个slave。
[root@localhost hadoop-0.20.203.0]# cat conf/slaves 192.168.3.231 192.168.3.232 192.168.3.233 192.168.3.234 [root@localhost hadoop-0.20.203.0]# |
7)分发Hadoop配置信息
以上操作都做完以后,我们在NameNode主节点上,将JDK和Hadoop软件分发到各DataNode从节点上,并保持安装路径与NameNode主节点相同,就像下面代码演示的一样:
scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.231:/opt/ scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.232:/opt/ scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.233:/opt/ scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.234:/opt/ |
8)格式化分布式文件系统
跟Windows和Linux一样,要想使用HDFS也需要事先格式化,否则文件系统是不可用的,具体方法见下面的代码:
[root@localhost hadoop-0.20.203.0]# bin/hadoop namenode -format 11/08/16 02:38:40 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.203.0 …… 11/08/16 02:38:40 INFO util.GSet: VM type = 64-bit 11/08/16 02:38:40 INFO namenode.NameNode: Caching file names occuring more than 10 times 11/08/16 02:38:41 INFO common.Storage: Image file of size 110 saved in 0 seconds. 11/08/16 02:38:41 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. 11/08/16 02:38:41 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ [root@localhost hadoop-0.20.203.0]# |
9)启动Hadoop集群
只需要在NameNode主节点上执行下面的start-all.sh命令即可,同时Master节点可以通过ssh登录到各slave节点去启动其它相关进程。
[root@localhost hadoop-0.20.203.0]# bin/start-all.sh starting namenode, logging to /opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-namenode-localhost.localdomain.out 192.168.3.232: starting datanode, logging to /opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-datanode-localhost.localdomain.out 192.168.3.233: starting datanode, logging to …… /opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out 192.168.3.233: starting tasktracker, logging to …… [root@localhost hadoop-0.20.203.0]# |
10)查看Master和slave进程状态
在NameNode主节点上,查看Java进程情况:
[root@localhost hadoop-0.20.203.0]# jps 867 SecondaryNameNode 735 NameNode 1054 Jps 946 JobTracker [root@localhost hadoop-0.20.203.0]# |
在4台DataNode从节点上,查看Java进程情况:
[root@localhost opt]# jps 30012 TaskTracker 29923 DataNode 30068 Jps [root@localhost opt]# |
各节点上进程都在的话,说明集群部署成功。