For advanced user. A guide for developer to add their own service into pai.
This tutorial will guide you to add a service to PAI. An example of Apache HBase will be here. And follow it step by step, you will know how to add your own service.
In this tutorial, you will be able to setup HBase on PAI.
This chapter will teach you to write the configuration model of your service, and will guide you how to get other service's configuration.
This module will coming soon. After that, you will be able to add your configuration and pass them into the service.
This chapter will teach you how to add your customized image to pai. After everything is done, paictl image command will build and push your image to the target docker registry.
If your service image could be pulled from a public docker registry, you could skip this step.
It will not guide you to write a dockerfile. If you a new developer of docker, please refer to this tutorial and learn writing dockerfile.
In this tutorial, we have prepared the docker image in the path following. Note the file image.yaml
isn't part of docker image. It's pai's configuration file.
Everytime you wanna add a customized docker image into pai, you will have to prepare a image configuration first. This configuration should be named as image.yaml
, and be put into the directory of the image.
Here is the examples of the configuration.
Hbase docker image's configuration
Take hbase image's configuration here as an example to explain.
### the file is the relative path which is set in the value of the key src.
### the copy will be placed in the relative path copied_file
### in the path pai-management/ to execute the command "cp -r src dst"
#copy-list:
# - src: ../xxxxxx
# dst: src/xxxxxx/copied_file
Configuration only consists copy-list part. if you don't need you can just ignore this field then provide an empty image.yaml .
copy-list
part:- In project, we only keep one replica of source code or tool and we won't replace too much replicas in each image's directory. So this parts tell paictl the path to copy the file.
- Command:
cp -r pai/pai-management/$src pai/pai-management/$dst
.src
anddst
is the value in this part.
Note that the name of image's directory should be same with the image name.
For example, now we wanna add a docker image "XYZ" into pai. You will first create a directory named "XYZ" in the path pai/pai-management/src/XYZ
. That is the image's directory named as the image's name.
If you wanna paictl to build hbase image, you should move the director example/src/hbase
to pai/pai-management/src
.
./paictl.py image build -p /path/to/your/cluster-configuration/dir -n hbase
./paictl.py image push -p /path/to/your/cluster-configuration/dir -n hbase
After hbase image is built, you need bootstrap it in pai. Now the service management system is kubernetes.
This is the configuration of your service bootstrap module. And paictl will call different script to handle different things. This file should be placed in your service bootstrap module. And its name should be service.yaml
Hbase's bootstrap module's configuration
Here is the service configuration of HBase.
# Tell paictl which service should be ready, before starting hbase.
prerequisite:
- cluster-configuration
- hadoop-service
# paictl will generate the template file with the name "filename".template with jinja2.
template-list:
- node-label.sh
- master-hbase.yaml
# The script about how to starting a service
start-script: start.sh
# The script about how to stop a service
stop-script: stop.sh
# The script about how to stop a service and delete the data on the cluster
delete-script: delete.sh
# The script about refresh the status of the service.
# Usually it will update the configmap and re-label the node.
refresh-script: refresh.sh
# A script about rolling-upgrade.
# No example now.
upgraded-script: upgraded.sh
This configuration consists of 7 parts.
-
prerequisite
parts:- Let's consider this scenario. There are 3 services named
A
,B
andC
. And now serviceC
depends on serviceB
andC
. If you wanna startC
, you will have to startA
andB
. So in this field, you can tell paictl which service should be ready if you wanna start a service. - cluster-configuration is a special service in pai. Some important configuration of the cluster and registry's secret are in this service. So this service should be the first service of pai.
- Let's consider this scenario. There are 3 services named
-
template-list
parts:- refer to corresponding part of image part.
- After cluster-object-model is developed, more detail guide will be provided.
-
start-script
parts:- A shell script to start your service.
-
stop-script
parts:- A shell script to stop your service.
-
delete-script
parts:- A shell script to stop your service and delete all the data persisted on the cluster.
-
refresh-script
parts:- The script about refresh the status of the service. Usually it will update the configmap and re-label the node.
-
upgraded-script
parts:- Not supported yet.
- Benefits
- With node label and node selector, it is possible to assign a service pod to a specific node. For example, hadoop-name-node should be assigned to the node with the label master. And hadoop-data-node should be assigned to the node with the label worker.
- With node label, we are able to management a service on a specific node, but do not affect the same service on other nodes.
- Example
- Benefits
- DaemonSet can ensure there will be one and only one service pod on the target nodes. Hadoop and other similar service could benefits from this object a lot.
- Take advantage of node-label and daemonSet, we can deploy hadoop easily.
- Example
- example
-
Benefits
- Some batch job which does't have the demands to running on a specific nodes could created by this object. And when the job is succeed, the status of the pod will be completed. This status could be a notify that the job is finished.
-
Example
-
Benefits
- In pai's bootstrap module, we mount the configuration file through configmap. So that we could separate the cluster-specific configuration from the docker image. For example, the hadoop configuration.
- With the configmap's mount function, we could take advantage of one image in many different ways. For example, hadoop-run image could starts different service with different script got from configmap.
-
Example
-
Benefits
- With readness probe, we could block the deployment process until one service is ready.
-
Examples
Note that the name of service's directory should be same with the service name.
For example, now we wanna add a service module "XYZ" into pai. You will first create a directory named "XYZ" in the path pai/pai-management/bootstrap/XYZ
. That is the service's directory named as the service's name.
If you wanna paictl to start hbase image, you should move the director example/bootstrap/hbase
to pai/pai-management/bootstrap
.
In this example, an single master node hbase is deployed.
Before starting the hbase, you should label the node with corresponding label.
#For master, add this property
hbase-master: "true"
#For regionserver, add this property
hbase-regionserver: "true"
Starting service.
./paictl.py service start -p /path/to/configuration/dir -n hbase
Delete service
./paictl.py service stop -p /path/to/configuration/dir -n hbase