Skip to content
Kohei Ogura edited this page Apr 10, 2023 · 16 revisions

Back to HOME

hadoop

Example Environment

Item Value
Hadoop IP address (network interface) 172.16.2.50
Hadoop IP address (docker0 interface) 172.17.0.1
Hadoop user ubuntu

Prerequisite

✅ Ubuntu 20.04 LTS installed and updated with the following command.

sudo apt update && sudo apt -y upgrade

Time Zone and NTP already set.

✅ Docker 20.10 or later installed with the following command.

sudo apt -y install docker.io

Install Java

▶️ Install Java JDK 11.

sudo apt -y install openjdk-11-jdk

Build SSH Connection

▶️ Generate a key pair.

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

▶️ Generate a public key from a key pair.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

▶️ Test SSH passwordless connection.

ssh localhost
Result

The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:4KS9....CeIY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1026-aws x86_64)

❗ Don't forget to log out after successful login.

exit

Setup Hadoop

1. Install Hadoop.

▶️ Download Hadoop 3.3.

wget -P ~/ https://archive.apache.org/dist/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz

▶️ Extract the tar file to the specified directory.

tar -xzf ~/hadoop-3.3.3.tar.gz -C ~/
sudo mv ~/hadoop-3.3.3 /usr/local/hadoop

2. Configure Hadoop

▶️ Edit .bashrc to set user environment variables.

nano ~/.bashrc
Configuration

### Append to the end of the file.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

▶️ Load the new environment.

source ~/.bashrc

▶️ Edit hadoop-env.sh to set system environment variables.

nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Configuration

### Line 55: Change JAVA_HOME.
# export JAVA_HOME=
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

▶️ Edit core-site.xml to specify the defaultFS URI.

nano /usr/local/hadoop/etc/hadoop/core-site.xml
Configuration

🔑 It is generally recommended to set the docker0 interface 172.17.0.1.

🔑 Alternatively, you can also use the actual IP address of your Hadoop server.

🚫 The loopback address 127.0.0.1 doesn't work.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://172.17.0.1:9000</value>
  </property>
</configuration>

▶️ Edit hdfs-site.xml to specify your environment.

nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Configuration

🔑 Change ubuntu of "/home/ubuntu" of dfs.namenode.name.dir and dfs.datanode.data.dir according to your user account if necessary. (e.g. /home/hadoop)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/ubuntu/hadoop/dfs/name</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/ubuntu/hadoop/dfs/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-bind-host</name>
    <value>0.0.0.0</value>
  </property>
</configuration>

▶️ Format a namenode.

hdfs namenode -format

3. Start Hadoop

▶️ Start Hadoop service.

start-dfs.sh

▶️ Create Hadoop-HDFS directories.

🔑 Change ubuntu of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop)

hdfs dfs -mkdir -p hdfs://localhost:9000/user/ubuntu/kafka-checkpoint
hdfs dfs -mkdir -p hdfs://localhost:9000/user/ubuntu/kaspacore/files

Setup Kaspacore & GeoLite Components

1. Download Kaspacore

▶️ Download Kaspacore module.

wget -P ~/ https://github.com/mata-elang-stable/kaspacore-java/releases/download/20230213/kaspacore.jar

2. Download GeoLite2

▶️ Download GeoLite2 database and upload it to your server via PC.

Click to check how to get the original database of GeoLite2

▶️ If you don't have an account, go to the URL below and "sign up" to create a new account.

Click to view screen image

GeoLite2 Sign Up

▶️ Open the following URL and "Login" to download the database.

Click to view screen image

Download GeoIP Databases

▶️ Extract it to your home directory.

🔑 Change <YYYYMMDD> to match the downloaded file.

tar -xzf ~/GeoLite2-City_<YYYYMMDD>.tar.gz -C ~/

3. Upload Components

▶️ Put the kaspacore file and GeoLite2 database in the Hadoop-HDFS directory.

🔑 Change ubuntu of "/user/ubuntu" according to your user account if necessary.

🔑 Change <YYYYMMDD> to match the downloaded file.

hdfs dfs -put ~/kaspacore.jar hdfs://localhost:9000/user/ubuntu/kaspacore/files
hdfs dfs -put ~/GeoLite2-City_<YYYYMMDD>/GeoLite2-City.mmdb hdfs://localhost:9000/user/ubuntu/kaspacore/files

Admin Web UI

▶️ Open the following URL to see the Hadoop Admin UI.

  • URL: http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9870/
Click to view screen image

hadoop

💡 You can access the DataNode Web UI and SecondaryNameNode Web UI by entering the following URLs. The address part must be entered manually.

  • URL: http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9864/ for DataNode Web UI.
  • URL: http://<HADOOP_SERVER_IP_OR_NAME (e.g. 172.16.2.50)>:9868/ for SecondaryNameNode Web UI.

Useful Commands

Click to show commands

Service Commands

✅ Show service status

jps
Result

1130359 SecondaryNameNode
1129894 NameNode
1130534 Jps
1130075 DataNode

✅ Start service

start-dfs.sh

✅ Stop service

stop-dfs.sh

Maintenance Commands

🔑 Change ubuntu of "/user/ubuntu" to your user account if necessary. (e.g. /user/hadoop)

✅ Delete the kaspacore file from Hadoop.

hdfs dfs -rm  hdfs://localhost:9000/user/ubuntu/kaspacore/files/kaspacore.jar

✅ Put the kaspacore file on Hadoop.

hdfs dfs -put ~/kaspacore.jar hdfs://localhost:9000/user/ubuntu/kaspacore/files/

✅ Replace GeoLite2 database

Refer to https://github.com/mata-elang-stable/MataElang-Platform/wiki/Update-Software-Version#geolite2

Version Commands

✅ Show Hadoop version

hadoop version

✅ Show Java version

java --version

✅ Show Docker version

sudo docker version

✅ Show OS version

cat /etc/os-release

Next Step >>

Clone this wiki locally