Skip to content

A distributed job queue system using Redis, designed to manage tasks across multiple worker nodes in a cloud environment. It supports job prioritization, failure handling, retries, result storage, and monitoring.

Notifications You must be signed in to change notification settings

BayajidAlam/r-queue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

R-queue - A Distributed Job Queue System with Redis

A scalable, fault-tolerant distributed job queue system using Redis to manage tasks across worker nodes with job tracking, retries, prioritization, and a dashboard for monitoring and health checks.

Objective

The objective is to design and implement a distributed job queue system using Redis that can:

  • Distribute computational tasks dynamically across multiple worker nodes.
  • Track job statuses (pending, processing, completed, failed) and handle failures through retries or alternative mechanisms.
  • Provide a user-friendly dashboard for real-time monitoring of job statuses and worker health.
  • Support horizontal scaling of worker nodes and job prioritization.

Core Challenges

  • Ensuring fault-tolerance and graceful recovery from worker or network failures.
  • Efficiently managing a distributed queue to handle job priorities and dependencies.
  • Implementing a robust retry mechanism for failed jobs and a dead-letter queue for irrecoverable tasks.
  • Storing and retrieving job results in a scalable manner.
  • Handling dynamic workload variations and enabling worker auto-scaling based on queue length.

Additional Features (Bonus Challenges)

  • Implementing job dependencies where certain jobs can only start after others are completed.
  • Tracking real-time job progress for better monitoring and debugging.

Architecture Overview

image

1. Frontend:
A React.js application with an intuitive interface for monitoring and managing the system, providing:

  • Worker health and active worker status.
  • Queue length and benchmarking of jobs.
  • Total jobs (processing, completed, failed).
  • Detailed view of all jobs (pending, processing, canceled, failed, completed) with type, status, progress, and priorities, including dynamic pagination and filtering by parameters.
  • Input modal for simulating jobs.

2. Backend:

3. Cloud Infrastructure:

  • Networking:

    • AWS VPC for managing network configurations.
    • AWS EC2 for hosting the application instances.
    • AWS Security Groups for managing access control to EC2 instances.
    • AWS NAT Gateway for enabling internet access from private subnets.
  • DevOps:

    • Pulumi as IAC to manage AWS resources and automate deployments.

Features

  • Priority-based job scheduling
  • Automatic worker scaling (1-10 workers)
  • Job retry with exponential backoff
  • Dead letter queue for failed jobs
  • Real-time job progress tracking
  • Worker health monitoring
  • Comprehensive metrics collection
  • Circuit breaker pattern implementation
  • Job dependency management

Getting Started

Follow these steps to run the application locally

1. Clone the Repository

  git clone https://github.com/BayajidAlam/r-queue
  cd r-queue

2. Install Dependencies

  cd client
  yarn install

3. Set Up Environment Variables

Create a .env file in the /client directory and add this:

VITE_PUBLIC_API_URL=backend url

4. Run the server

  yarn dev

To run backend follow this steps:

1. Install Dependencies

  cd server
  yarn install

2. Create a .env file in the /server directory and add this:

REDIS_HOST=localhost 
PORT=5000

3. Navigate to docker compose folder and run all container:

cd docker-compose.yml
docker-compose up -d

You will see like this: image

4. Run following command to create cluster:

redis-cli --cluster create \
  <node-1 IP>:6379 <node-2 IP>:6379 <node-3 IP>:6379 \
  <node-4 IP>:6379 <node-5 IP>:6379 <node-6 IP>:6379 \
  --cluster-replicas 1

you will see something like this: image

6. Verify the Cluster

redis-cli -c cluster nodes

7. Now run the server and test your applicaion:

yarn dev

You will see something like this: image

Folder Structure

  • /client : Frontend

    • /public: Static files and assets.
    • /src: Application code.
    • .env: Frontend environment variables
    • package.json
  • /server: Backend

    • /src: Backend source code.
      • bulkJobSimulation.ts: Script for creating bulk amount job
    • docker-compose: For creating redis cluster in docker environment locally
    • .env: Backend environment variables
    • package.json
  • /IaC: Infrastructure

    • /pulumi:
      • index.ts: Pulumi IaC files for managing AWS resources includes networking, compute to create distributed redis cluster.
    • ansible: Ansible files for create and configure frontend, backend, redis setup and redis-cluster.

API Endpoints

The application have following API's

Root url(Local environment)

  http://localhost:5000/api

Check health (GET)

API Endpoint:

    http://localhost:5000/api/health

Response would be like this

{
    "status": "unhealthy",
    "details": {
        "redisConnected": true,
        "activeWorkers": 0,
        "queueLength": 0,
        "processingJobs": 0,
        "metrics": {
            "avgProcessingTime": 0,
            "errorRate": 0,
            "throughput": 0
        }
    },
    "timestamp": "2025-01-10T12:20:37.856Z",
    "version": "1.0"
}

Add new job (POST)

API Endpoint:

  http://localhost:5000/api/jobs

Examples

For register a user your request body should be like following

Reqeust body

{
    "type": "email",
    "data": {
        "Hello": "Hello",
        "world": "world"
    },
    "priority": 3,

    "dependencies": [
        "a3342ec2-fcae-4e8d-8df8-8f59a2c7d58c"
    ]
}

Response would be like this

{
    "acknowledged": true,
    "insertedId": "675002aea8b348ab91f524d0"
}

Prerequisites

Before deploying the application, ensure you have the following:

  • An AWS account with EC2 setup permissions.
  • Docker installed on your local machine for building containers.
  • AWS CLI installed and configured with your credentials.
  • Node.js (version 18 or above) and npm and yarn installed for both frontend and backend applications.
  • Pulumi installed for managing AWS infrastructure as code.
  • TypeScript installed on your computer

Deployments

1. Clone the Repository

  git clone https://github.com/BayajidAlam/r-queue
  cd r-queue/IaC/pulumi

2. Configure AWS CLI

Provide Access Key and Secret Key image

3. Create Key Pair

Create a new key pair for our instances using the following command:

aws ec2 create-key-pair --key-name MyKeyPair --query 'KeyMaterial' --output text > MyKeyPair.pem

3. Deploy the infrastructure

pulumi up

You will see like this: Screenshot from 2025-01-12 01-10-56

On your AWS VPC resources map will be like: image

EC2 dashboard will be like: image

4. Run the Ansible Playbook First navigate to ansible directory in pulumi and give following command

ansible-playbook -e @vars.yml playbooks/redis-setup.yml
ansible-playbook -e @vars.yml playbooks/redis-cluster.yml
ansible-playbook -e @vars.yml playbooks/backend-setup.yml
ansible-playbook -e @vars.yml playbooks/frontend-setup.yml

Now access frontend using user :5173 and you will see like : Screenshot from 2025-01-17 02-24-00

Test The Application

Create a job: Click on Add new job modal and give necessary input:

Job type: What type of job you want to simulate

Processing Time: How long the job will take to complete process

Priority: Priority of the job

Job Data (JSON): Data we are passing with the job

Dependencies (comma-separated job IDs): If the job is dependent to another job add ID here form Recent Activity dashboard

Simulate Failure: If you want to simulate a failure check this

Screenshot from 2025-01-17 02-28-07

Now click on add new job button

image

Summary:

  • One Active worker
  • Processing 1
  • One item is showing in Recent Job
  • After the job processing is done, completed = 1. Using this job ID you can create a new job with dependencies. And selecting Simulate you can create a job that will fail at the end.

Test with bulk input: First ssh to your backend ec2, navigate to /opt/r-queue/server/src and run the command

simulate 20 2 10
simulate totalJobs duration batchSize

You will see in terminal: Screenshot from 2025-01-17 02-46-33

Dashboard will look like: Screenshot from 2025-01-17 02-46-43

When all process are done it will look like image

About

A distributed job queue system using Redis, designed to manage tasks across multiple worker nodes in a cloud environment. It supports job prioritization, failure handling, retries, result storage, and monitoring.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published