-
Notifications
You must be signed in to change notification settings - Fork 3
Hadoop
- Prerequisite
- Install Java
- Build SSH Connection
- Setup Hadoop
- Setup Kaspacore & GeoLite Components
- Admin Web UI
- Useful Commands
Item | Value |
---|---|
Hadoop IP address (network interface) | 172.16.2.50 |
Hadoop IP address (docker0 interface) | 172.17.0.1 |
Hadoop user | ubuntu |
✅ Ubuntu 20.04 LTS installed and updated with the following command.
sudo apt update && sudo apt -y upgrade
✅ Time Zone and NTP already set.
✅ Docker 20.10 or later installed with the following command.
sudo apt -y install docker.io
sudo apt -y install openjdk-11-jdk
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost
Result
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:4KS9....CeIY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1026-aws x86_64)
❗ Don't forget to log out after successful login.
exit
wget -P ~/ https://archive.apache.org/dist/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
tar -xzf ~/hadoop-3.3.3.tar.gz -C ~/
sudo mv ~/hadoop-3.3.3 /usr/local/hadoop
.bashrc
to set user environment variables.
nano ~/.bashrc
Configuration
### Append to the end of the file.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
source ~/.bashrc
hadoop-env.sh
to set system environment variables.
nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Configuration
### Line 55: Change JAVA_HOME.
# export JAVA_HOME=
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
core-site.xml
to specify the defaultFS URI.
nano /usr/local/hadoop/etc/hadoop/core-site.xml
Configuration
🔑 It is generally recommended to set the docker0 interface 172.17.0.1
.
🔑 Alternatively, you can also use the actual IP address of your Hadoop server.
🚫 The loopback address 127.0.0.1 doesn't work.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.17.0.1:9000</value>
</property>
</configuration>
hdfs-site.xml
to specify your environment.
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Configuration
🔑 Change ubuntu
of "/home/ubuntu" of dfs.namenode.name.dir
and dfs.datanode.data.dir
according to your user account if necessary. (e.g. /home/hadoop
)
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/ubuntu/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/ubuntu/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
</configuration>
hdfs namenode -format
start-dfs.sh
🔑 Change ubuntu
of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop
)
hdfs dfs -mkdir -p hdfs://localhost:9000/user/ubuntu/kafka-checkpoint
hdfs dfs -mkdir -p hdfs://localhost:9000/user/ubuntu/kaspacore/files
wget -P ~/ https://github.com/mata-elang-stable/kaspacore-java/releases/download/20230213/kaspacore.jar
Click to check how to get the original database of GeoLite2
Click to view screen image
- URL: https://www.maxmind.com/en/accounts/current/geoip/downloads
- The edition ID of the database is
GeoLite2-City
. The target link is named 'Download GZIP'. - After downloading, send the file from your PC to your home directory on the server via SSH SCP.
Click to view screen image
🔑 Change <YYYYMMDD>
to match the downloaded file.
tar -xzf ~/GeoLite2-City_<YYYYMMDD>.tar.gz -C ~/
🔑 Change ubuntu
of "/user/ubuntu" according to your user account if necessary.
🔑 Change <YYYYMMDD>
to match the downloaded file.
hdfs dfs -put ~/kaspacore.jar hdfs://localhost:9000/user/ubuntu/kaspacore/files
hdfs dfs -put ~/GeoLite2-City_<YYYYMMDD>/GeoLite2-City.mmdb hdfs://localhost:9000/user/ubuntu/kaspacore/files
- URL:
http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9870/
Click to view screen image
💡 You can access the DataNode Web UI and SecondaryNameNode Web UI by entering the following URLs. The address part must be entered manually.
- URL:
http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9864/
for DataNode Web UI. - URL:
http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9868/
for SecondaryNameNode Web UI.
Click to show commands
✅ Show service status
jps
Result
1130359 SecondaryNameNode
1129894 NameNode
1130534 Jps
1130075 DataNode
✅ Start service
start-dfs.sh
✅ Stop service
stop-dfs.sh
🔑 Change ubuntu
of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop
)
✅ Delete the kaspacore file from Hadoop.
hdfs dfs -rm hdfs://localhost:9000/user/ubuntu/kaspacore/files/kaspacore.jar
✅ Put the kaspacore file on Hadoop.
hdfs dfs -put ~/kaspacore.jar hdfs://localhost:9000/user/ubuntu/kaspacore/files/
✅ Replace GeoLite2 database
✅ Show Hadoop version
hadoop version
✅ Show Java version
java --version
✅ Show Docker version
sudo docker version
✅ Show OS version
cat /etc/os-release
- Startup/Shutdown Procedures
- User Management
- Sensor Management
- Backup
- Update Software Version
- Troubleshooting