Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/chunked database dumps #76

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

onyedikachi-david
Copy link

Pull Request: Implement Chunked Database Dumps with Enhanced Features

/claim #59
Fixes: #59

Purpose

Implement a robust chunked database dump functionality to handle large databases efficiently while preventing memory issues and database locking. This implementation includes several key features:

1. Chunked Processing

  • Processes database dumps in manageable chunks
  • Dynamic chunk sizing based on table size (500-5000 rows)
  • Breathing intervals to prevent database locking

2. Storage Flexibility

  • Automatic selection between R2 and DO storage based on dump size
  • R2 for large dumps (>100MB)
  • DO storage for smaller dumps

3. Progress Tracking & Notifications

  • Unique progress tracking per dump
  • Callback URL support for completion notifications
  • Detailed status reporting

Tasks

  • Implement chunked dump processing
  • Add dynamic chunk sizing (500-5000 rows based on table size)
  • Implement storage selection logic (R2/DO)
  • Add callback URL support
  • Add progress tracking with unique keys
  • Implement breathing intervals
  • Add error handling and logging
  • Add cleanup after successful download
  • Add test infrastructure
  • Verify functionality with test cases

Verification Steps

  1. Start the server:
npm run dev
  1. Start the callback server:
cd test && node callback-server.js
  1. Test dump creation:
curl -X POST http://127.0.0.1:8787/export/dump/chunked \
  -H "Authorization: Bearer ABC123" \
  -H "X-Callback-URL: http://localhost:3000/callback"
  1. Monitor status:
curl http://127.0.0.1:8787/export/dump/{dumpId}/status \
  -H "Authorization: Bearer ABC123"

Before vs After

Before After
Database dumps loaded entire database into memory Chunked processing prevents memory issues
No progress tracking Real-time progress tracking
Single storage option Automatic R2/DO storage selection
No callback notifications Callback notifications
Risk of database locking Breathing intervals prevent locking
1GB DO storage limit Support for databases >1GB
Basic error handling Detailed error handling and logging
Test script
#!/bin/bash

# Configuration
BASE_URL="http://127.0.0.1:8787"
CALLBACK_URL="http://localhost:3000/callback"
AUTH_HEADER="Authorization: Bearer ABC123"
OUTPUT_FILE="dump-test-results.md"

# Check if callback server is running
echo "Checking callback server..."
if ! curl -s "http://localhost:3000/callbacks" > /dev/null; then
    echo "Error: Callback server is not running. Please start it with 'npm run start-callback-server'"
    exit 1
fi

# Clear previous callbacks
curl -s -X DELETE "http://localhost:3000/callbacks" > /dev/null

# Start fresh output file
cat > "$OUTPUT_FILE" << EOF
# Database Dump Test Results
Test conducted on: $(date)

## Test Steps

EOF

echo "Starting dump tests..."

# Step 1: Initiate a dump
echo "Step 1: Initiating dump..."
cat >> "$OUTPUT_FILE" << EOF
### Step 1: Initiate Dump
\`\`\`bash
curl -X POST "$BASE_URL/export/dump/chunked" \\
    -H "$AUTH_HEADER" \\
    -H "X-Callback-URL: $CALLBACK_URL"
\`\`\`

Response:
\`\`\`json
EOF

DUMP_RESPONSE=$(curl -s -X POST "$BASE_URL/export/dump/chunked" \
    -H "$AUTH_HEADER" \
    -H "X-Callback-URL: $CALLBACK_URL")
echo "$DUMP_RESPONSE" >> "$OUTPUT_FILE"
DUMP_ID=$(echo "$DUMP_RESPONSE" | jq -r '.result.dumpId')

cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF

# Step 2: Check status (multiple times)
echo "Step 2: Checking status..."
cat >> "$OUTPUT_FILE" << EOF
### Step 2: Status Checks

EOF

for i in {1..5}; do
    echo "Status check $i..."
    cat >> "$OUTPUT_FILE" << EOF
#### Check $i
\`\`\`bash
curl "$BASE_URL/export/dump/$DUMP_ID/status" -H "$AUTH_HEADER"
\`\`\`

Response:
\`\`\`json
EOF

    STATUS_RESPONSE=$(curl -s "$BASE_URL/export/dump/$DUMP_ID/status" -H "$AUTH_HEADER")
    echo "$STATUS_RESPONSE" >> "$OUTPUT_FILE"
    
    cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF
    
    STATUS=$(echo "$STATUS_RESPONSE" | jq -r '.result.status')
    if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
        echo "Dump $STATUS after $i checks"
        break
    fi
    sleep 3
done

# Step 3: Download the dump if completed
if [ "$STATUS" = "completed" ]; then
    echo "Step 3: Downloading dump..."
    cat >> "$OUTPUT_FILE" << EOF
### Step 3: Download Dump
\`\`\`bash
curl "$BASE_URL/export/dump/$DUMP_ID" -H "$AUTH_HEADER"
\`\`\`

Response:
\`\`\`
EOF

    DOWNLOAD_RESPONSE=$(curl -s "$BASE_URL/export/dump/$DUMP_ID" -H "$AUTH_HEADER" -w "\nHTTP Status: %{http_code}")
    echo "$DOWNLOAD_RESPONSE" >> "$OUTPUT_FILE"

    cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF
fi

# Step 4: Check received callbacks
echo "Step 4: Checking callbacks..."
cat >> "$OUTPUT_FILE" << EOF
### Step 4: Received Callbacks
\`\`\`bash
curl "http://localhost:3000/callbacks"
\`\`\`

Response:
\`\`\`json
EOF

CALLBACKS_RESPONSE=$(curl -s "http://localhost:3000/callbacks")
echo "$CALLBACKS_RESPONSE" >> "$OUTPUT_FILE"

cat >> "$OUTPUT_FILE" << EOF
\`\`\`

EOF

# Add summary
cat >> "$OUTPUT_FILE" << EOF
## Summary
- Final Status: \`$STATUS\`
- Dump ID: \`$DUMP_ID\`
- Number of status checks: $i
- Callbacks Received: $(echo "$CALLBACKS_RESPONSE" | jq '. | length')

## System Information
- Test Time: $(date)
- OS: $(uname -a)
- Curl Version: $(curl --version | head -n 1)
EOF

echo "Test completed. Results saved to $OUTPUT_FILE" 
Test Results

Database Dump Test Results

Test conducted on: Sat Jan 25 13:10:23 WAT 2025

Test Steps

Step 1: Initiate Dump

curl -X POST "http://127.0.0.1:8787/export/dump/chunked" \
    -H "Authorization: Bearer ABC123" \
    -H "X-Callback-URL: http://localhost:3000/callback"

Response:

{
  "result": {
    "message": "Database dump started",
    "dumpId": "fb2e9497-d93a-457d-b96d-8cd1ae2d22fb",
    "status": "in_progress",
    "downloadUrl": "http://127.0.0.1:8787/export/dump/fb2e9497-d93a-457d-b96d-8cd1ae2d22fb",
    "estimatedSize": 3236
  }
}

Step 2: Status Checks

Check 1:

curl "http://127.0.0.1:8787/export/dump/fb2e9497-d93a-457d-b96d-8cd1ae2d22fb/status" \
  -H "Authorization: Bearer ABC123"

Response:

{
  "result": {
    "status": "in_progress",
    "progress": {
      "currentTable": "",
      "processedTables": 0,
      "totalTables": 6
    }
  }
}

Check 3 (Final):

{
  "result": {
    "status": "completed",
    "progress": {
      "currentTable": "users",
      "processedTables": 8,
      "totalTables": 6
    }
  }
}

Step 3: Download Dump

Response Excerpt:

SQLite format 3
-- Table: tmp_cache
CREATE TABLE tmp_cache (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "timestamp" REAL NOT NULL,
    "ttl" INTEGER NOT NULL,
    "query" TEXT UNIQUE NOT NULL,
    "results" TEXT
);

-- Table: users
CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, email TEXT);
INSERT INTO users VALUES (1, 'John Doe', '[email protected]');

HTTP Status: 200

Summary

  • Final Status: completed
  • Dump ID: fb2e9497-d93a-457d-b96d-8cd1ae2d22fb
  • Number of status checks: 3
  • Callbacks Received: 5

@Brayden
Copy link
Member

Brayden commented Feb 3, 2025

@onyedikachi-david Apologies for the delay here, will be testing this implementation this week and report back!

@onyedikachi-david
Copy link
Author

@onyedikachi-david Apologies for the delay here, will be testing this implementation this week and report back!

Okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases
2 participants