技术开发 频道

云计算的利刃:快速部署Hadoop集群

  4、常见异常的处理

  这部分将讲解Hadoop集群配置中最容易犯的错误及解决方案,希望可以让大家尽快的解决问题。

  1) Unrecognized option: -jvm

  异常现象:

[root@localhost hadoop-0.20.203.0]# bin/start-all.sh

…….

192.168.3.232: Unrecognized option: -jvm

192.168.3.232: Error: Could not create the Java Virtual Machine.

192.168.3.232: Error: A fatal exception has occurred. Program will exit.

…….

  解决方案:

  需要修改$HADOOP_HOM /bin/hadoop,注释掉这2行:

  if [[ $EUID -eq 0 ]]; then

#    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"

#  else

    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"

  fi

  2) Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

  异常现象:

11/08/16 03:37:41 INFO mapred.JobClient:  map 100% reduce 0%

11/08/16 03:37:58 INFO mapred.JobClient: Task Id : attempt_201108160249_0001_r_000000_0, Status : FAILED

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

11/08/16 03:37:58 WARN mapred.JobClient: Error reading task outputConnection refused

11/08/16 03:37:58 WARN mapred.JobClient: Error reading task outputConnection refused

11/08/16 03:38:08 INFO mapred.JobClient:  map 100% reduce 16%

  解决方案:

  需要修改2个文件:

vi /etc/security/limits.conf

加上:

* soft nofile 102400

* hard nofile 409600

 

vi /etc/pam.d/login

加上:

session    required     /lib/security/pam_limits.so

  3) Too many fetch-failures

  异常现象:

11/08/16 03:38:28 INFO mapred.JobClient: Task Id : attempt_201108160249_0001_m_000001_0, Status : FAILED

Too many fetch-failures

11/08/16 03:38:28 WARN mapred.JobClient: Error reading task outputConnection refused

11/08/16 03:38:28 WARN mapred.JobClient: Error reading task outputConnection refused

  解决方案:

  需要在/etc/hosts中添加:

192.168.3.230 test1

192.168.3.231 test2

192.168.3.232 test3

192.168.3.233 test4

192.168.3.234 test5

  但做这个之前需要修改集群中所有5台节点的计算机名,即修改/etc/sysconfig/network和/etc/hosts。

0
相关文章