4、常见异常的处理
这部分将讲解Hadoop集群配置中最容易犯的错误及解决方案,希望可以让大家尽快的解决问题。
1) Unrecognized option: -jvm
异常现象:
[root@localhost hadoop-0.20.203.0]# bin/start-all.sh ……. 192.168.3.232: Unrecognized option: -jvm 192.168.3.232: Error: Could not create the Java Virtual Machine. 192.168.3.232: Error: A fatal exception has occurred. Program will exit. ……. |
解决方案:
需要修改$HADOOP_HOM /bin/hadoop,注释掉这2行:
if [[ $EUID -eq 0 ]]; then # HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS" # else HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS" fi |
2) Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
异常现象:
11/08/16 03:37:41 INFO mapred.JobClient: map 100% reduce 0% 11/08/16 03:37:58 INFO mapred.JobClient: Task Id : attempt_201108160249_0001_r_000000_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/08/16 03:37:58 WARN mapred.JobClient: Error reading task outputConnection refused 11/08/16 03:37:58 WARN mapred.JobClient: Error reading task outputConnection refused 11/08/16 03:38:08 INFO mapred.JobClient: map 100% reduce 16% |
解决方案:
需要修改2个文件:
vi /etc/security/limits.conf 加上: * soft nofile 102400 * hard nofile 409600 vi /etc/pam.d/login 加上: session required /lib/security/pam_limits.so |
3) Too many fetch-failures
异常现象:
11/08/16 03:38:28 INFO mapred.JobClient: Task Id : attempt_201108160249_0001_m_000001_0, Status : FAILED Too many fetch-failures 11/08/16 03:38:28 WARN mapred.JobClient: Error reading task outputConnection refused 11/08/16 03:38:28 WARN mapred.JobClient: Error reading task outputConnection refused |
解决方案:
需要在/etc/hosts中添加:
192.168.3.230 test1 192.168.3.231 test2 192.168.3.232 test3 192.168.3.233 test4 192.168.3.234 test5 |
但做这个之前需要修改集群中所有5台节点的计算机名,即修改/etc/sysconfig/network和/etc/hosts。