伪分布式Hadoop搭建

从零开始的Hadoop搭建,从单价到伪分布式再到迫真分布式

版本选择

操作系统:ubuntu18.04

hadoop版本:2.9.1

JAVA版本:1.8.0_181

更新APT包安装SSH

更新apt

1
sudo apt-get update

安装SSH

1
sudo apt-get install openssh-server

配置SSH
首先进行本地登录,创建.ssh文件夹,然后进入该文件夹。

1
cd ~/.ssh/

使用rsa算法生成秘钥和公钥对:
ssh-keygen -t rsa

把公钥加入授权:

1
cat ./id_rsa.pub >> ./authorized_keys

再次SSH本地就可以免密码登录了。

JDK安装配置

下载JDK然后解压

1
2
3
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz

sudo tar xvzf jdk-8u131-Linux-x64.tar.gz

开始配置环境变量,并在末尾添加

1
2
3
4
5
6
sudo vim /etc/profile

JAVA_HOME=/usr/local/java/jdk1.8.0_181
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export PATH

Ubuntu java jdk位置设置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jdk1.8.0_181/bin/java" 1


sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.8.0_181/bin/javac" 1


sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jdk1.8.0_181/bin/javaws" 1


sudo update-alternatives --set java /usr/local/java/jdk1.8.0_181/bin/java

sudo update-alternatives --set javac /usr/local/java/jdk1.8.0_181/bin/javac

sudo update-alternatives --set javaws /usr/local/java/jdk1.8.0_181/bin/javaws

重新加载环境变量配置文件,检测一下是否加载成功

1
2
3
source /etc/profile

java -version

安装Hadoop

镜像文件地址:https://mirrors.cnnic.cn/apache/hadoop/common/
解压并修改文件和权限

1
2
3
4
5
sudo tar -zxf ~/Downloads/hadoop-2.9.1.tar.gz -C /usr/local

sudo mv ./hadoop-2.9.1/ ./hadoop

sudo chown -R ykt ./hadoop

检测是否安装成功

1
./usr/local/hadoop/bin/hadoop version

配置Hadoop

(1)单机模式
安装成功即为默认单机模式。

1
2
3
cd /usr/local/hadoop

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar

例:将xml作为输入

1
2
3
4
5
6
7
cd /usr/local/hadoop

cp ./etc/hadoop/*.xml ./input

cd input

ls

运行并查看输出

1
2
3
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep ./input/ ./out 'dfs[a-z.]+'

cat ./out/*

注意:Hadoop默认不会覆盖结果文件,想要再次运行上面实例会提示出错,需要先将./out删除

(2)伪分布式搭建
开始配置文件,分别是core-site.xml和hdfs-site.xml

路径在/usr/local/hadoop/etc/hadoop

core-site.xml进行如下配置

1
2
3
4
5
6
7
8
9
10
11
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml进行如下配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>

切换回hadoop主目录:/usr/local/hadoop开始进行NameNode进行格式化

1
2
3
./bin/hdfs namenode -format

./sbin/start-dfs.sh

如果出现报错,可能是没有添加JAVA_HOME路径

1
2
3
vim hadoop-env.sh

export JAVA_HOME=/usr/local/java/jdk1.8.0_181

再次启动即可

例:

首先创建用户目录,输入文件夹,然后将所有xml文件复制到输入

1
2
3
4
./bin/hdfs dfs -mkdir -p /user/hadoop
./bin/hdfs dfs -mkdir /user/hadoop/input
./bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input
./bin/hdfs dfs -ls /user/hadoop/input

运行grep并查看结果

1
2
3
 ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep /user/hadoop/input output 'dfs[a-z]+'

./bin/hdfs dfs -cat output/*

配置Hadoop环境变量

1
2
3
4
5
6
7
8
9
vim /etc/bash.bashrc 

export JAVA_HOME=/usr/local/java/jdk1.8.0_181
export JRE_HOME=${JAVA_HOME}/jre
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

source /etc/bash.bashrc

参考博客:https://blog.csdn.net/weixin_42001089/article/details/81865101