CAST AI Node Rotation Script

This project contains a Python script to rotate CAST AI managed nodes in a Kubernetes cluster. The script is responsible for cordoning, draining, and deleting nodes. It includes safeguards to ensure services are not disrupted, particularly when all replicas of a controller are on a single node.

Features

(Current)

Cordon nodes to prevent new pods from being scheduled.
Check if all replicas of any controller reside on a single node.
Evict one replica to another node if necessary, ensuring it becomes healthy and serves traffic.
Drain nodes by safely evicting all remaining pods.
Drain timeout configurable
Old nodes configurable in number of days
Log movements in Kubernetes events.
Package the script using Docker and deploy as a Kubernetes CronJob.

(Next):

Unit test coverage
Create e2e tests to ensure the script works as expected.

Getting Started

Prerequisites

Python 3.11 or higher
Docker
Kubernetes cluster
kubectl configured to interact with your Kubernetes cluster

Installation

Clone the Repository:

git clone https://github.com/your-organization/your-repo.git
cd your-repo

Set Up a Virtual Environment:

python3 -m venv venv
source venv/bin/activate

Install Dependencies:

pip install -r requirements.txt

helm

helm repo add castai-repo https://raw.githubusercontent.com/ronakforcast/castai_node_rotator/main/helmChart/castai-node-drainer
helm install castai-node-drainer-v2  castai-repo/castai-node-drainer

Terraform

To deploy the castai-node-drainer Helm chart using Terraform, you can utilize the helm_release resource provided by the Terraform Helm provider. Here are the steps:

Install the Terraform Helm Provider:

First, you need to install the Terraform Helm provider. You can do this by adding the following code to your Terraform configuration file:

terraform {
  required_providers {
    helm = {
      source = "hashicorp/helm"
      version = "2.9.0" # or the latest version
    }
  }
}

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config" # path to your Kubernetes cluster config file
  }
}

Define the Helm Release Resource:

Next, create a new Terraform configuration file (e.g., main.tf) and define the helm_release resource for your castai-node-drainer Helm chart:

resource "helm_release" "castai-node-drainer" {
  name       = "castai-node-drainer"
  repository = "https://raw.githubusercontent.com/ronakforcast/castai_node_rotator/main/helmChart/castai-node-drainer"
  chart      = "castai-node-drainer"
  namespace  = "castai-agent"

  # Optionally, you can set values for the Helm chart
  set {
    name  = "cronjob.schedule"
    value = "0 0 * * 0"
  }

  set {
    name  = "container.image"
    value = "castai/castai-node-rotate"
  }

  # Add any other values as needed
}

In this example:

name: Specifies the release name for the Helm chart deployment.
repository: Specifies the URL of the Helm repository where the chart is located.
chart: Specifies the name of the Helm chart to be deployed.
namespace: Specifies the namespace where the chart should be deployed.
set: Allows you to set values for the Helm chart. You can add multiple set blocks to configure different values.

Initialize Terraform:

Before applying the Terraform configuration, you need to initialize the Terraform working directory:

terraform init

Plan and Apply the Terraform Configuration:

Review the execution plan to see the changes Terraform will make:

terraform plan

If the plan looks good, apply the configuration to deploy the Helm chart:

terraform apply

Terraform will deploy the castai-node-drainer Helm chart to your Kubernetes cluster using the specified values.

Verify the Deployment:

You can verify the deployment by checking the Kubernetes resources created by the Helm chart:

kubectl get all -n castai-agent

You should see the CronJob, ServiceAccount, ClusterRole, and ClusterRoleBinding resources created by the Helm chart.

Update or Destroy the Deployment:

If you need to update the Helm chart deployment, make the necessary changes to the Terraform configuration file and run terraform apply again.

To destroy the Helm chart deployment, run:

terraform destroy

This will remove all the resources created by the Helm chart from your Kubernetes cluster.

Highlevl WorkFlow

+-------------------------+
|         Start           |
+-------------------------+
            |
            | load_config()
            | # Loads Kubernetes configuration
            |
+-------------------------+
|    get_cast_ai_nodes()  |
| # Retrieves CAST AI managed nodes
+-------------------------+
            |
            | Separate Critical and Non-Critical Nodes
            | # Separates nodes into critical and non-critical lists
            |
+-------------------------+
|  remove_cron_job_node() |
| # Removes cron job node from critical/non-critical lists
+-------------------------+
            |
+-------------------------+
|   Process Non-Critical Nodes  |
+-------------------------+
            |
+-------------------------+
|      cordon_node()      |
| # Cordons the node to prevent new pod scheduling
+-------------------------+
            |
+-------------------------+
| check_controller_replicas()|
| # Checks for controllers with all replicas on the node
+-------------------------+
            |
+-------------------------+
|       evict_pod()       |
| # Evicts a pod if a controller with all replicas is found
+-------------------------+
            |
+-------------------------+
| wait_for_none_pending() |
| # Waits for new replicas to become ready (if any pods evicted)
+-------------------------+
            |
+-------------------------+
|       drain_node()      |
| # Drains the node to evict remaining pods
+-------------------------+
            |
            | Wait for Pending Pods (if any)
            | # Waits for pending pods (if any)
            |
+-------------------------+
|  wait_for_new_nodes()   |
| # Waits for new nodes to become ready
+-------------------------+
            |
+-------------------------+
|   Process Critical Nodes   |
+-------------------------+
            |
+-------------------------+
|      cordon_node()      |
| # Cordons the node to prevent new pod scheduling
+-------------------------+
            |
+-------------------------+
| check_controller_replicas()|
| # Checks for controllers with all replicas on the node
+-------------------------+
            |
+-------------------------+
|       evict_pod()       |
| # Evicts a pod if a controller with all replicas is found
+-------------------------+
            |
+-------------------------+
| wait_for_none_pending() |
| # Waits for new replicas to become ready (if any pods evicted)
+-------------------------+
            |
+-------------------------+
|       drain_node()      |
| # Drains the node to evict remaining pods
+-------------------------+
            |
+-------------------------+
|         End             |
+-------------------------+

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
deploy		deploy
helmChart/castai-node-drainer		helmChart/castai-node-drainer
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAST AI Node Rotation Script

Features

Getting Started

Prerequisites

Installation

helm

Terraform

Highlevl WorkFlow

About

Releases

Packages

Contributors 3

Languages

cporter82/castai_node_rotator

Folders and files

Latest commit

History

Repository files navigation

CAST AI Node Rotation Script

Features

Getting Started

Prerequisites

Installation

helm

Terraform

Highlevl WorkFlow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages