Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Docker Compose for Analytics Stack #1139

Merged
merged 3 commits into from
Feb 5, 2025
Merged

Add Docker Compose for Analytics Stack #1139

merged 3 commits into from
Feb 5, 2025

Conversation

emplam27
Copy link
Contributor

@emplam27 emplam27 commented Feb 5, 2025

What this PR does / why we need it:

Add yorkie analytics docker compose file

  • kafka: message broker
  • starrocks-fe / starrocks-be: warehouse
  • init-kafka-topic: create user-events kafka topic
  • init-starrocks-database: create yorkie database, user_events table and events routine load

Which issue(s) this PR fixes:

Address #1130

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Addressed and resolved all CodeRabbit review comments
  • Didn't break anything

Summary by CodeRabbit

  • New Features
    • Launched a new container orchestration setup to streamline deployment of integrated services including enhanced frontend/back-end components, messaging, and a user-friendly management interface.
    • Introduced an automated process for ingesting JSON event data from a message broker.
    • Established a dedicated database with a table designed to efficiently capture and log user events.
    • Added documentation for setting up and integrating Kafka and StarRocks for analytics, including initialization processes and troubleshooting steps.
    • Included instructions for checking routine load status and managing routine load jobs.

@emplam27 emplam27 requested a review from hackerwins February 5, 2025 04:01
@emplam27 emplam27 self-assigned this Feb 5, 2025
Copy link

coderabbitai bot commented Feb 5, 2025

Walkthrough

This changeset introduces a new Docker Compose configuration to deploy a multi-container stack for an analytics application. The configuration defines services for StarRocks (frontend and backend), Kafka (with UI and topic initialization), and a StarRocks database initialization service. Additionally, two SQL scripts are added: one to create a routine load for the user_events table sourcing from Kafka, and another to set up the yorkie database with the user_events table.

Changes

File(s) Change Summary
build/docker/analytics/docker-compose.yml Adds a new Docker Compose file which defines services for StarRocks FE/BE, Kafka, Kafka UI, topic initialization, and database initialization. Also configures a custom network and volume.
build/docker/analytics/init-routine-load.sql
build/docker/analytics/init-user-events-db.sql
Introduces SQL scripts: one creates a routine load (yorkie.events) on the user_events table sourcing JSON data from Kafka, and the other initializes the yorkie database with a user_events table and specified schema.
build/docker/analytics/README.md Adds a new section "StarRocks Analytics Stack" detailing the setup and integration of Kafka and StarRocks, including initialization processes and troubleshooting for routine loads.

Suggested labels

enhancement 🌟

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.49%. Comparing base (3bd66ce) to head (491018a).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1139      +/-   ##
==========================================
- Coverage   46.82%   38.49%   -8.34%     
==========================================
  Files          84      165      +81     
  Lines       12282    25169   +12887     
==========================================
+ Hits         5751     9688    +3937     
- Misses       5954    14665    +8711     
- Partials      577      816     +239     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hackerwins hackerwins marked this pull request as ready for review February 5, 2025 04:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
build/docker/analytics/init-routine-load.sql (1)

1-13: SQL Routine Load Configuration is Correctly Specified

The script creates a routine load for the user_events table with JSON formatting and a concurrency of 1. The connection to the Kafka broker (kafka:9092) and the topic (user-events) are properly defined. Consider adding a space around the equal sign in line 5 for consistency, e.g.,

-"desired_concurrent_number"="1"
+"desired_concurrent_number" = "1"
build/docker/analytics/docker-compose.yml (1)

35-40: Remove Trailing Whitespace in Command Block

Line 39 has trailing spaces that should be removed to maintain clean formatting.

Apply this diff:

-        sleep 15s; mysql --connect-timeout 2 -h starrocks-fe -P 9030 -u root -e "alter system add backend \"starrocks-be:9050\";"  
+        sleep 15s; mysql --connect-timeout 2 -h starrocks-fe -P 9030 -u root -e "alter system add backend \"starrocks-be:9050\";"
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 39-39: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ba679b and 8182428.

📒 Files selected for processing (3)
  • build/docker/analytics/docker-compose.yml (1 hunks)
  • build/docker/analytics/init-routine-load.sql (1 hunks)
  • build/docker/analytics/init-user-events-db.sql (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
build/docker/analytics/docker-compose.yml

[error] 39-39: trailing spaces

(trailing-spaces)

🔇 Additional comments (12)
build/docker/analytics/init-user-events-db.sql (1)

1-18: SQL Database and Table Initialization is Solid

The script correctly creates the yorkie database (if it does not already exist), then switches context to it and creates the user_events table with the desired columns and table properties. The distribution and replication properties are well defined for an OLAP workload.

build/docker/analytics/docker-compose.yml (11)

1-2: Compose File Version and Services Declaration Set Correctly

The file starts with a correct version ("3") and the services are grouped logically.


3-25: StarRocks FE Service is Configured Well

The starrocks-fe service specifies the correct image, ports, healthcheck (using MySQL to verify readiness), and volume mounts. The static IP assignment under the custom network is also appropriate.


26-34: StarRocks BE Service Dependencies are Correct

The starrocks-be service properly depends on the starrocks-fe service. The dependency ensures that the FE is healthy before BE initialization starts.


41-45: StarRocks BE Healthcheck is Appropriate

The healthcheck command for BE, which verifies backend readiness using MySQL and grep, is a sound mechanism for ensuring service availability.


46-53: Volumes and Network Settings for StarRocks BE are Proper

The volume mounts for storage and log directories and the network configuration (static IP assignment) are clear and correctly implemented.


54-81: Kafka Service Configuration is Robust

The Kafka service is configured with the Bitnami image along with essential environment variables for KRaft mode. The healthcheck using kafka-topics.sh ensures the broker is responsive.


82-102: Initialization of Kafka Topics is Well Implemented

The init-kafka-topics service uses a multi-line shell script to first wait for Kafka readiness and then create the user-events topic if it does not exist. This sequential approach ensures that topic initialization occurs reliably.


103-116: Kafka UI Service is Configured Correctly

The kafka-ui service is straightforward, exposing the correct port and setting the required environment variables. Its dependency on Kafka ensures it starts only once Kafka is available.


117-164: Init-Starrocks-Database Service Ensures Proper Initialization

The service sequentially:

  • Checks the health of the StarRocks FE and BE,
  • Creates the yorkie database and user_events table using the mounted SQL scripts,
  • And finally creates and verifies the routine load.

The use of appropriate healthchecks and dependency conditions makes the initialization process robust.


165-172: Custom Network Configuration is Clear

The defined custom network with a specified subnet, gateway, and IPAM configurations meets the deployment needs and facilitates predictable container communication.


173-176: Volumes Configuration for Kafka Data is Adequate

The volume declaration for Kafka data using the local driver is properly set up to persist data.

@hackerwins hackerwins marked this pull request as draft February 5, 2025 05:12
@hackerwins hackerwins removed their request for review February 5, 2025 05:12
@hackerwins
Copy link
Member

Thank you for your contribution.

It seems to be still in draft status. When you are able to review, please change this to Ready for review and assign me again.

@emplam27 emplam27 marked this pull request as ready for review February 5, 2025 08:15
@emplam27 emplam27 requested a review from hackerwins February 5, 2025 08:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
build/docker/analytics/README.md (2)

36-36: Add Language Identifier to Fenced Code Block

The fenced code block that displays the example output (starting at line 36) does not have a language identifier. Specifying a language such as text (or plaintext) will improve readability and comply with markdownlint recommendations.

-   ```
+   ```text
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

36-36: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


65-65: Add Language Identifier to Second Fenced Code Block

Similar to the earlier fenced block, the second code block that presents example output (starting at line 65) is missing a language identifier. Please update it to include a language identifier (e.g., text) for improved clarity and to satisfy markdown linting rules.

-   ```
+   ```text
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

65-65: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8182428 and e54dad4.

📒 Files selected for processing (1)
  • build/docker/analytics/README.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
build/docker/analytics/README.md

36-36: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


65-65: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🔇 Additional comments (6)
build/docker/analytics/README.md (6)

1-9: Excellent Overview and Structure

The new "Yorkie Analytics" section is well-organized and provides clear instructions on how to set up and use Kafka with StarRocks for analytics. The breakdown into subsections (Kafka and StarRocks Setup, Kafka Cluster Mode, Routine Load instructions, and troubleshooting for paused routine loads) makes the documentation very accessible.


24-26: Clear Shell Command Usage

The shell command provided for connecting to StarRocks Frontend is clear and properly formatted (using sh as the language specifier).


30-32: Well-Specified SQL Block for Routine Load Check

The SQL code block for checking the routine load (SHOW ROUTINE LOAD FROM yorkie\G) is correctly specified with the sql language identifier; this maintains clarity for readers.


53-55: Correct SQL Fenced Block Identifier

The fenced SQL code block for resuming the routine load is correctly labeled; no changes are needed here.


59-61: Proper Labeling for Verification SQL Block

The SQL code block used to verify the routine load is properly annotated with the sql specifier, providing clear context for the command outputs.


68-78: Clear and Informative Example Output

The example outputs provided for the routine load status are detailed and help readers understand the expected results when troubleshooting. The presentation is clear; just ensure that all fenced blocks consistently include language identifiers as noted.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
build/docker/analytics/README.md (3)

49-52: Helpful Guidance for Kafka Cluster Mode Setup
The reference to Bitnami’s Kafka documentation and cluster compose file is beneficial. It might be useful to add a brief note on any configuration differences users should expect when setting up Kafka in cluster mode.


75-88: Specify a Language for the Example Output Code Block
This fenced code block displaying an example output lacks a language specifier, which may trigger markdown lint warnings (MD040). Adding a language hint (e.g., text) will improve consistency and clarity.

Proposed Diff:

-   ```
+   ```text
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

75-75: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


104-117: Include a Language Specifier for the Second Output Example
Similar to the previous output block, this fenced code block omits a language identifier. To adhere to markdown lint recommendations (MD040) and maintain consistency, include a language specifier such as text.

Proposed Diff:

-   ```
+   ```text
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

104-104: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e54dad4 and 491018a.

📒 Files selected for processing (1)
  • build/docker/analytics/README.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
build/docker/analytics/README.md

75-75: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


104-104: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build
🔇 Additional comments (12)
build/docker/analytics/README.md (12)

1-4: Meaningful Introduction Section
The introductory lines clearly define the “StarRocks Analytics Stack” and set the context. Consider explicitly noting that the stack is tailored for development and testing, as mentioned later in the file.


5-12: Clear Component Breakdown
The bulleted list effectively details each component (StarRocks FE/BE, Kafka, Kafka UI, and initialization services). Ensure these components remain consistent with the actual service definitions in the accompanying Docker Compose file.


13-27: Usage Instructions are Well-Outlined
The "How To Use" section provides step-by-step commands that are easy to follow. It is helpful that commands for starting, monitoring, and shutting down the stack are included. Verify that these commands align with the file paths and container names defined in your Docker Compose configuration.


29-34: Informative Overview of Files
Listing the files used (e.g., docker-compose.yml, init-user-events-db.sql, and init-routine-load.sql) makes it easy for users to locate critical pieces of configuration.


35-41: Accurate Key Services Information
The “Key services” section with port information is very useful for quick reference. As a precaution, verify that these port numbers are accurate and match the Docker Compose configuration.


42-48: Well-Described Initialization Services
The explanation of what the initialization services will do (starting nodes, creating topics, initializing the database/tables, and configuring routine load) is clear and concise.


53-56: Routine Load Integration Explanation
The section "About StarRocks with Kafka Routine Load" includes a valuable external link to the StarRocks Routine Load Quick Start Guide. This aids users who are less familiar with routine load configurations.


57-62: Overview for Checking Routine Load Status
The introductory lines of the "How To Check Routine Load Status" section effectively prepare the user for the ensuing step-by-step instructions. Consider mentioning any prerequisites (such as ensuring the StarRocks FE is running) if not covered elsewhere.


63-65: Good Example of Docker Exec Usage
The docker command for connecting to the StarRocks Frontend is well formatted and uses the proper shell syntax.


69-71: Clear SQL Command for Checking Routine Load Status
The SQL command encapsulated in a sql code block to show routine load details is clear and correctly formatted.


90-94: SQL Command for Resuming Routine Load
The SQL command provided to resume the routine load is clear and properly formatted within a sql code block.


96-100: SQL Command for Verifying Routine Load Status
The instruction to run a SQL command for verifying the routine load status is effectively communicated and well formatted.

@hackerwins hackerwins changed the title Add yorkie analytics docker compose file Add Docker Compose for Analytics Stack Feb 5, 2025
@hackerwins hackerwins merged commit ca4ba5a into main Feb 5, 2025
5 checks passed
@hackerwins hackerwins deleted the yorkie-analytics branch February 5, 2025 08:38
hackerwins added a commit that referenced this pull request Feb 5, 2025
Sets up the complete analytics stack with Kafka for message brokering and
StarRocks for data warehousing. Includes initialization scripts for Kafka
topics, database creation, event tables, and routine load configurations.

---------

Co-authored-by: Youngteac Hong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants