技术开发 频道

云计算的利刃:快速部署Hadoop集群

  3、配置集群

  我们用变量$HADOOP_HOME代表Hadoop的主目录,它的值为: $HADOOP_HOME=/opt/hadoop-0.20.203.0。

  1)将Java添加到Hadoop运行环境

  首先,在NameNode主节点,编辑$HADOOP_HOM /conf/hadoop-env.sh文件,将Java加入到Hadoop的运行环境中,具体如下:

# The java implementation to use.  Required.

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

export JAVA_HOME=/opt/jdk1.7.0

  2)配置NameNode主节点信息

  其次,在NameNode主节点上,编辑$HADOOP_HOM /conf/core-site.xml文件,添加NameNode主节点的IP和监听端口的相关信息。

[root@localhost hadoop-0.20.203.0]# cat conf/core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

     <property>

         <name>fs.default.name</name>

         <value>hdfs://192.168.3.230:9000</value>

     </property>

</configuration>

[root@localhost hadoop-0.20.203.0]#

  3)配置数据冗余数量

  再次,在NameNode主节点上,编辑$HADOOP_HOM /conf/hdfs-site.xml文件,配置数据冗余的数据的备份数量。

[root@localhost hadoop-0.20.203.0]# cat conf/hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

     <property>

         <name>dfs.replication</name>

         <value>2</value>

     </property>

</configuration>

[root@localhost hadoop-0.20.203.0]#

  4)配置jobtracker信息

  然后,在NameNode主节点上,编辑$HADOOP_HOM /conf/mapred-site.xml文件,配置NameNode主节点上的jobtracker服务的端口。

[root@localhost hadoop-0.20.203.0]# cat conf/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

     <property>

         <name>mapred.job.tracker</name>

         <value>192.168.3.230:9001</value>

     </property>

</configuration>

[root@localhost hadoop-0.20.203.0]#

  5)配置master信息

  接下来,在NameNode主节点上,编辑$HADOOP_HOM /conf/masters文件,配置主节点的IP信息。

[root@localhost hadoop-0.20.203.0]# cat conf/masters

192.168.3.230

[root@localhost hadoop-0.20.203.0]#

  6)配置slave信息

  最后,在NameNode主节点上,编辑$HADOOP_HOM /conf/slave文件,配置从节点的IP信息,这个从节点可以有一个,也可以有多个,对于本例有4个slave。

[root@localhost hadoop-0.20.203.0]# cat conf/slaves

192.168.3.231

192.168.3.232

192.168.3.233

192.168.3.234

[root@localhost hadoop-0.20.203.0]#

  7)分发Hadoop配置信息

  以上操作都做完以后,我们在NameNode主节点上,将JDK和Hadoop软件分发到各DataNode从节点上,并保持安装路径与NameNode主节点相同,就像下面代码演示的一样:

scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.231:/opt/

scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.232:/opt/

scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.233:/opt/

scp -r jdk1.7.0 hadoop-0.20.203.0 192.168.3.234:/opt/

  8)格式化分布式文件系统

  跟Windows和Linux一样,要想使用HDFS也需要事先格式化,否则文件系统是不可用的,具体方法见下面的代码:

[root@localhost hadoop-0.20.203.0]# bin/hadoop namenode -format

11/08/16 02:38:40 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = localhost.localdomain/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.203.0

……

11/08/16 02:38:40 INFO util.GSet: VM type       = 64-bit

11/08/16 02:38:40 INFO namenode.NameNode: Caching file names occuring more than 10 times

11/08/16 02:38:41 INFO common.Storage: Image file of size 110 saved in 0 seconds.

11/08/16 02:38:41 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

11/08/16 02:38:41 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

************************************************************/

[root@localhost hadoop-0.20.203.0]#

  9)启动Hadoop集群

  只需要在NameNode主节点上执行下面的start-all.sh命令即可,同时Master节点可以通过ssh登录到各slave节点去启动其它相关进程。

[root@localhost hadoop-0.20.203.0]# bin/start-all.sh

starting namenode, logging to /opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-namenode-localhost.localdomain.out

192.168.3.232: starting datanode, logging to /opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-datanode-localhost.localdomain.out

192.168.3.233: starting datanode, logging to

……

/opt/hadoop-0.20.203.0/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out

192.168.3.233: starting tasktracker, logging to

……

[root@localhost hadoop-0.20.203.0]#

  10)查看Master和slave进程状态

  在NameNode主节点上,查看Java进程情况:

[root@localhost hadoop-0.20.203.0]# jps

867 SecondaryNameNode

735 NameNode

1054 Jps

946 JobTracker

[root@localhost hadoop-0.20.203.0]#

  在4台DataNode从节点上,查看Java进程情况:

[root@localhost opt]# jps

30012 TaskTracker

29923 DataNode

30068 Jps

[root@localhost opt]#

  各节点上进程都在的话,说明集群部署成功。

0
相关文章