版本选择
操作系统:ubuntu18.04
hadoop版本:2.9.1
JAVA版本:1.8.0_181
更新APT包安装SSH
更新apt1
sudo apt-get update
安装SSH1
sudo apt-get install openssh-server
配置SSH
首先进行本地登录,创建.ssh文件夹,然后进入该文件夹。1
cd ~/.ssh/
使用rsa算法生成秘钥和公钥对:
ssh-keygen -t rsa
把公钥加入授权:1
cat ./id_rsa.pub >> ./authorized_keys
再次SSH本地就可以免密码登录了。
JDK安装配置
下载JDK然后解压1
2
3wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
sudo tar xvzf jdk-8u131-Linux-x64.tar.gz
开始配置环境变量,并在末尾添加1
2
3
4
5
6sudo vim /etc/profile
JAVA_HOME=/usr/local/java/jdk1.8.0_181
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export PATH
Ubuntu java jdk位置设置1
2
3
4
5
6
7
8
9
10
11
12
13
14sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jdk1.8.0_181/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.8.0_181/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jdk1.8.0_181/bin/javaws" 1
sudo update-alternatives --set java /usr/local/java/jdk1.8.0_181/bin/java
sudo update-alternatives --set javac /usr/local/java/jdk1.8.0_181/bin/javac
sudo update-alternatives --set javaws /usr/local/java/jdk1.8.0_181/bin/javaws
重新加载环境变量配置文件,检测一下是否加载成功1
2
3source /etc/profile
java -version
安装Hadoop
镜像文件地址:https://mirrors.cnnic.cn/apache/hadoop/common/
解压并修改文件和权限1
2
3
4
5sudo tar -zxf ~/Downloads/hadoop-2.9.1.tar.gz -C /usr/local
sudo mv ./hadoop-2.9.1/ ./hadoop
sudo chown -R ykt ./hadoop
检测是否安装成功1
./usr/local/hadoop/bin/hadoop version
配置Hadoop
(1)单机模式
安装成功即为默认单机模式。1
2
3cd /usr/local/hadoop
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar
例:将xml作为输入
1 | cd /usr/local/hadoop |
运行并查看输出1
2
3./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep ./input/ ./out 'dfs[a-z.]+'
cat ./out/*
注意:Hadoop默认不会覆盖结果文件,想要再次运行上面实例会提示出错,需要先将./out删除
(2)伪分布式搭建
开始配置文件,分别是core-site.xml和hdfs-site.xml
路径在/usr/local/hadoop/etc/hadoop
core-site.xml进行如下配置1
2
3
4
5
6
7
8
9
10
11<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml进行如下配置1
2
3
4
5
6
7
8
9
10
11
12
13
14<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
切换回hadoop主目录:/usr/local/hadoop开始进行NameNode进行格式化1
2
3./bin/hdfs namenode -format
./sbin/start-dfs.sh
如果出现报错,可能是没有添加JAVA_HOME路径1
2
3vim hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_181
再次启动即可
例:
首先创建用户目录,输入文件夹,然后将所有xml文件复制到输入1
2
3
4./bin/hdfs dfs -mkdir -p /user/hadoop
./bin/hdfs dfs -mkdir /user/hadoop/input
./bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input
./bin/hdfs dfs -ls /user/hadoop/input
运行grep并查看结果1
2
3 ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep /user/hadoop/input output 'dfs[a-z]+'
./bin/hdfs dfs -cat output/*
配置Hadoop环境变量
1 | vim /etc/bash.bashrc |
参考博客:https://blog.csdn.net/weixin_42001089/article/details/81865101