-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Docker Compose for Analytics Stack #1139
Conversation
WalkthroughThis changeset introduces a new Docker Compose configuration to deploy a multi-container stack for an analytics application. The configuration defines services for StarRocks (frontend and backend), Kafka (with UI and topic initialization), and a StarRocks database initialization service. Additionally, two SQL scripts are added: one to create a routine load for the Changes
Suggested labels
Tip 🌐 Web search-backed reviews and chat
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1139 +/- ##
==========================================
- Coverage 46.82% 38.49% -8.34%
==========================================
Files 84 165 +81
Lines 12282 25169 +12887
==========================================
+ Hits 5751 9688 +3937
- Misses 5954 14665 +8711
- Partials 577 816 +239 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
build/docker/analytics/init-routine-load.sql (1)
1-13
: SQL Routine Load Configuration is Correctly SpecifiedThe script creates a routine load for the
user_events
table with JSON formatting and a concurrency of 1. The connection to the Kafka broker (kafka:9092
) and the topic (user-events
) are properly defined. Consider adding a space around the equal sign in line 5 for consistency, e.g.,-"desired_concurrent_number"="1" +"desired_concurrent_number" = "1"build/docker/analytics/docker-compose.yml (1)
35-40
: Remove Trailing Whitespace in Command BlockLine 39 has trailing spaces that should be removed to maintain clean formatting.
Apply this diff:
- sleep 15s; mysql --connect-timeout 2 -h starrocks-fe -P 9030 -u root -e "alter system add backend \"starrocks-be:9050\";" + sleep 15s; mysql --connect-timeout 2 -h starrocks-fe -P 9030 -u root -e "alter system add backend \"starrocks-be:9050\";"🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 39-39: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
build/docker/analytics/docker-compose.yml
(1 hunks)build/docker/analytics/init-routine-load.sql
(1 hunks)build/docker/analytics/init-user-events-db.sql
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
build/docker/analytics/docker-compose.yml
[error] 39-39: trailing spaces
(trailing-spaces)
🔇 Additional comments (12)
build/docker/analytics/init-user-events-db.sql (1)
1-18
: SQL Database and Table Initialization is SolidThe script correctly creates the
yorkie
database (if it does not already exist), then switches context to it and creates theuser_events
table with the desired columns and table properties. The distribution and replication properties are well defined for an OLAP workload.build/docker/analytics/docker-compose.yml (11)
1-2
: Compose File Version and Services Declaration Set CorrectlyThe file starts with a correct version ("3") and the services are grouped logically.
3-25
: StarRocks FE Service is Configured WellThe
starrocks-fe
service specifies the correct image, ports, healthcheck (using MySQL to verify readiness), and volume mounts. The static IP assignment under the custom network is also appropriate.
26-34
: StarRocks BE Service Dependencies are CorrectThe
starrocks-be
service properly depends on thestarrocks-fe
service. The dependency ensures that the FE is healthy before BE initialization starts.
41-45
: StarRocks BE Healthcheck is AppropriateThe healthcheck command for BE, which verifies backend readiness using MySQL and grep, is a sound mechanism for ensuring service availability.
46-53
: Volumes and Network Settings for StarRocks BE are ProperThe volume mounts for storage and log directories and the network configuration (static IP assignment) are clear and correctly implemented.
54-81
: Kafka Service Configuration is RobustThe Kafka service is configured with the Bitnami image along with essential environment variables for KRaft mode. The healthcheck using
kafka-topics.sh
ensures the broker is responsive.
82-102
: Initialization of Kafka Topics is Well ImplementedThe
init-kafka-topics
service uses a multi-line shell script to first wait for Kafka readiness and then create theuser-events
topic if it does not exist. This sequential approach ensures that topic initialization occurs reliably.
103-116
: Kafka UI Service is Configured CorrectlyThe
kafka-ui
service is straightforward, exposing the correct port and setting the required environment variables. Its dependency on Kafka ensures it starts only once Kafka is available.
117-164
: Init-Starrocks-Database Service Ensures Proper InitializationThe service sequentially:
- Checks the health of the StarRocks FE and BE,
- Creates the
yorkie
database anduser_events
table using the mounted SQL scripts,- And finally creates and verifies the routine load.
The use of appropriate healthchecks and dependency conditions makes the initialization process robust.
165-172
: Custom Network Configuration is ClearThe defined custom network with a specified subnet, gateway, and IPAM configurations meets the deployment needs and facilitates predictable container communication.
173-176
: Volumes Configuration for Kafka Data is AdequateThe volume declaration for Kafka data using the local driver is properly set up to persist data.
Thank you for your contribution. It seems to be still in draft status. When you are able to review, please change this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
build/docker/analytics/README.md (2)
36-36
: Add Language Identifier to Fenced Code BlockThe fenced code block that displays the example output (starting at line 36) does not have a language identifier. Specifying a language such as
text
(orplaintext
) will improve readability and comply with markdownlint recommendations.- ``` + ```text🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
36-36: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
65-65
: Add Language Identifier to Second Fenced Code BlockSimilar to the earlier fenced block, the second code block that presents example output (starting at line 65) is missing a language identifier. Please update it to include a language identifier (e.g.,
text
) for improved clarity and to satisfy markdown linting rules.- ``` + ```text🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
65-65: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
build/docker/analytics/README.md
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
build/docker/analytics/README.md
36-36: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
65-65: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🔇 Additional comments (6)
build/docker/analytics/README.md (6)
1-9
: Excellent Overview and StructureThe new "Yorkie Analytics" section is well-organized and provides clear instructions on how to set up and use Kafka with StarRocks for analytics. The breakdown into subsections (Kafka and StarRocks Setup, Kafka Cluster Mode, Routine Load instructions, and troubleshooting for paused routine loads) makes the documentation very accessible.
24-26
: Clear Shell Command UsageThe shell command provided for connecting to StarRocks Frontend is clear and properly formatted (using
sh
as the language specifier).
30-32
: Well-Specified SQL Block for Routine Load CheckThe SQL code block for checking the routine load (
SHOW ROUTINE LOAD FROM yorkie\G
) is correctly specified with thesql
language identifier; this maintains clarity for readers.
53-55
: Correct SQL Fenced Block IdentifierThe fenced SQL code block for resuming the routine load is correctly labeled; no changes are needed here.
59-61
: Proper Labeling for Verification SQL BlockThe SQL code block used to verify the routine load is properly annotated with the
sql
specifier, providing clear context for the command outputs.
68-78
: Clear and Informative Example OutputThe example outputs provided for the routine load status are detailed and help readers understand the expected results when troubleshooting. The presentation is clear; just ensure that all fenced blocks consistently include language identifiers as noted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
build/docker/analytics/README.md (3)
49-52
: Helpful Guidance for Kafka Cluster Mode Setup
The reference to Bitnami’s Kafka documentation and cluster compose file is beneficial. It might be useful to add a brief note on any configuration differences users should expect when setting up Kafka in cluster mode.
75-88
: Specify a Language for the Example Output Code Block
This fenced code block displaying an example output lacks a language specifier, which may trigger markdown lint warnings (MD040). Adding a language hint (e.g.,text
) will improve consistency and clarity.Proposed Diff:
- ``` + ```text🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
75-75: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
104-117
: Include a Language Specifier for the Second Output Example
Similar to the previous output block, this fenced code block omits a language identifier. To adhere to markdown lint recommendations (MD040) and maintain consistency, include a language specifier such astext
.Proposed Diff:
- ``` + ```text🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
104-104: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
build/docker/analytics/README.md
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
build/docker/analytics/README.md
75-75: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
104-104: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (12)
build/docker/analytics/README.md (12)
1-4
: Meaningful Introduction Section
The introductory lines clearly define the “StarRocks Analytics Stack” and set the context. Consider explicitly noting that the stack is tailored for development and testing, as mentioned later in the file.
5-12
: Clear Component Breakdown
The bulleted list effectively details each component (StarRocks FE/BE, Kafka, Kafka UI, and initialization services). Ensure these components remain consistent with the actual service definitions in the accompanying Docker Compose file.
13-27
: Usage Instructions are Well-Outlined
The "How To Use" section provides step-by-step commands that are easy to follow. It is helpful that commands for starting, monitoring, and shutting down the stack are included. Verify that these commands align with the file paths and container names defined in your Docker Compose configuration.
29-34
: Informative Overview of Files
Listing the files used (e.g.,docker-compose.yml
,init-user-events-db.sql
, andinit-routine-load.sql
) makes it easy for users to locate critical pieces of configuration.
35-41
: Accurate Key Services Information
The “Key services” section with port information is very useful for quick reference. As a precaution, verify that these port numbers are accurate and match the Docker Compose configuration.
42-48
: Well-Described Initialization Services
The explanation of what the initialization services will do (starting nodes, creating topics, initializing the database/tables, and configuring routine load) is clear and concise.
53-56
: Routine Load Integration Explanation
The section "About StarRocks with Kafka Routine Load" includes a valuable external link to the StarRocks Routine Load Quick Start Guide. This aids users who are less familiar with routine load configurations.
57-62
: Overview for Checking Routine Load Status
The introductory lines of the "How To Check Routine Load Status" section effectively prepare the user for the ensuing step-by-step instructions. Consider mentioning any prerequisites (such as ensuring the StarRocks FE is running) if not covered elsewhere.
63-65
: Good Example of Docker Exec Usage
The docker command for connecting to the StarRocks Frontend is well formatted and uses the proper shell syntax.
69-71
: Clear SQL Command for Checking Routine Load Status
The SQL command encapsulated in asql
code block to show routine load details is clear and correctly formatted.
90-94
: SQL Command for Resuming Routine Load
The SQL command provided to resume the routine load is clear and properly formatted within asql
code block.
96-100
: SQL Command for Verifying Routine Load Status
The instruction to run a SQL command for verifying the routine load status is effectively communicated and well formatted.
Sets up the complete analytics stack with Kafka for message brokering and StarRocks for data warehousing. Includes initialization scripts for Kafka topics, database creation, event tables, and routine load configurations. --------- Co-authored-by: Youngteac Hong <[email protected]>
What this PR does / why we need it:
Add yorkie analytics docker compose file
user-events
kafka topicyorkie
database,user_events
table andevents
routine loadWhich issue(s) this PR fixes:
Address #1130
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit