Mark all alternative stores but TSDB as deprecated #9105

wardbekker · 2023-04-11T19:47:31Z

Describe the bug
Object storage with TSDB (also called single store) is the recommended default going forward with 2.8+. In the docs there is still a lot of references to Cassandra/Bigtable/DynamoDB/BoltDB that might set a new Grafana Loki user on the wrong foot. Recommend to mark all references to those explicitly legacy/deprecated in the docs to remove any confusion

periklis · 2023-04-11T19:51:21Z

m2cents mark anything but boltdb-shipper ❤️

wardbekker · 2023-04-11T19:54:30Z

@periklis just to clarify, you prefer both the TSDB index and the boltdb-shipper to be marked supported, and the rest deprecated?

wardbekker · 2023-04-11T19:55:04Z

btw. @JStickler I'm planning on creating a PR for this.

periklis · 2023-04-11T19:57:04Z

@periklis just to clarify, you prefer both the TSDB index and the boltdb-shipper to be marked supported, and the rest deprecated?

Yes that is my intention. As per both will need to run in parallel for some installations out there. At least till 3.0 it wouldn't hurt keeping both stores supported.

timansky · 2023-04-22T06:46:26Z

If TSDB is now primary support storage, i didn't found any TSDB configs in tanka or helm installation(Neither in docs or config/values files).

brophyja · 2023-04-28T23:38:33Z

As a new Loki user, I can confirm that the list of documented back-end options is confusing, even if someone has decided to use S3/object storage.

For example, I have seen several comments "around the internet" that table-manager is going to be deprecated. My search to find specific details on the Loki roadmap or component life-cycle led me to this issue.

While I would like to be able to use DynamoDB for the Loki index (thus requiring table-manager) to limit the need for Persistent Volumes in a Kubernetes based deployment, I would like to know now if this combination of features is not going to be supported in the near future (we are still evaluating Loki).

If TSDB is the future, and components like table-manager are going to be deprecated, then these decisions should be clearly documented somewhere!

liguozhong · 2023-05-03T16:47:29Z

hi, all.this is a very important suggestion.
I found in the upstream system cortex that there is an upper limit for a single tenant of cortex in block mode, up to 20 million metrics. Because the compactor is a bottleneck, does our tsdb block implementation have similar problems (the data size of a single tenant cannot exceed a threshold). Although the design of loki should not have too many labels, once there are too many labels, will the compactor in loki become the bottleneck of the loki system?

The latest cortex project deletes the code of cassandra before the bottleneck of the compactor is resolved, which prevents us from introducing the cortex project into our infrastructure. I'm very worried about something like this happening in the loki project. We should slow down our pace on removing cassandra code. cc @owen-d

Currently, I am running a loki tenant with a huge log ingest rate of about 3Gb/s. But I can't know how many labels there are. I am running the index module in cassandra. Cassandra's powerful scalability has no single point of bottleneck, and I am very relieved of it when operating larger log data volumes.

Therefore, my suggestion is whether we can not delete the code of the Cassandra index part so quickly in the future. Various systems such as the single-point bottleneck problem of the compactor, cortex thanos mimir, etc. have made various attempts. Can we wait for this compactor problem to be completely resolved before delete Cassandra code?

https://aws.amazon.com/cn/blogs/opensource/scaling-cortex-with-parallel-compaction/
cortexproject/cortex#4843

bmarinov · 2023-05-05T19:45:48Z

The situation with the outdated documentation and the feeling for a 'moving target' with regards to recommendations is unfortunate. I almost choked on my coffee reading that boltdb is now apparently deprecated.

I am running 2.7.x versions in several environments, and just today I deployed 2.7.4 yet again (the version running in prod) on staging for some extensive troubleshooting. This is a version from February this year, and at that point in time there were only a few references of TSDB. Boltdb was the way to go.

What im trying to communicate is a legitimate problem with the developer UX with regards to documentation, configuration and getting Loki up and running in a setup suitable for production.

Personally, I am happy to see boltdb go - since we moved to 2.7.x and configured compaction, we've had a ton of issues with queries. Turning on alerts with regular checks for the conditions exacerbated the problem greatly, making the alerts unusable:

attempting to query broken chunks ends up in 'database not open' and similar errors, going further back in time is OK
aforementioned broken chunks getting lost (gap in historical data which was queryable just a few moments ago)
race conditions with the compactor? index_12345/1683...400: no such file or directory
(Loki) error rate spikes whenever recent data is being queried (while being compacted?)
Error = failed to execute query A: rpc error: code = Unknown desc = database not open

And this is in the simplest possible, monolithic setup. Basically, a setup which should be easy to deploy and operate.

This is not meant as criticism, but having a well organized documentation for several common scenarios, deployment topologies and the recommended storage backend (at the time) should be of very high priority. Having to slap together the configuration from several different pages and sources is time-intensive and does not inspire much confidence.

TL DR: sensible defaults should be preconfigured, and complete configuration examples for several common scenarios should be available, discoverable and kept up to date. Roadmap transparency would also be a big plus.

jerryjvl · 2023-06-19T06:45:33Z

I could not agree more with the sentiment above.

The hardest challenge I've found in setting up Loki in production (although based on flipping through docs, I expect the same to hold true for Mimir and Tempo once I get around to those), is the fact that the Documentation and Examples are very in-cohesive. When I try to correlate example configurations, or official templates to the recommendations in the docs to try to understand how to extrapolate partial setups into a full cohesive setup for my own environment, I'm finding I have to constantly back-track and iterate as I discover that parts of the advice I had incorporated have been outdated by changes elsewhere that weren't clearly signposted.

I love the power of Loki, but it is extremely difficult to synthesize a functioning configuration file from the docs and examples.

And I'm certainly not averse to putting my effort where my mouth is, but based on the drift in the existing content, I'm not confident any external party could keep the docs aligned with the moving target of internal developmental changes to how key parts of the configuration function together.

liguozhong · 2023-06-19T08:17:57Z

The situation with the outdated documentation and the feeling for a 'moving target' with regards to recommendations is unfortunate. I almost choked on my coffee reading that boltdb is now apparently deprecated.

I am running 2.7.x versions in several environments, and just today I deployed 2.7.4 yet again (the version running in prod) on staging for some extensive troubleshooting. This is a version from February this year, and at that point in time there were only a few references of TSDB. Boltdb was the way to go.

What im trying to communicate is a legitimate problem with the developer UX with regards to documentation, configuration and getting Loki up and running in a setup suitable for production.

Personally, I am happy to see boltdb go - since we moved to 2.7.x and configured compaction, we've had a ton of issues with queries. Turning on alerts with regular checks for the conditions exacerbated the problem greatly, making the alerts unusable:

attempting to query broken chunks ends up in 'database not open' and similar errors, going further back in time is OK

aforementioned broken chunks getting lost (gap in historical data which was queryable just a few moments ago)

race conditions with the compactor? index_12345/1683...400: no such file or directory

(Loki) error rate spikes whenever recent data is being queried (while being compacted?)

Error = failed to execute query A: rpc error: code = Unknown desc = database not open

And this is in the simplest possible, monolithic setup. Basically, a setup which should be easy to deploy and operate.

This is not meant as criticism, but having a well organized documentation for several common scenarios, deployment topologies and the recommended storage backend (at the time) should be of very high priority. Having to slap together the configuration from several different pages and sources is time-intensive and does not inspire much confidence.

TL DR: sensible defaults should be preconfigured, and complete configuration examples for several common scenarios should be available, discoverable and kept up to date. Roadmap transparency would also be a big plus.

👍 Thanks for sharing such a great tsdb migration experience.
I am worried about encountering these things you said in my production loki cluster. so I am still stuck in Cassandra.
We are still understanding the source code of tsdb to understand more details in order to better deal with it appeal problem.

#9246) **What this PR does / why we need it**: making clearer that TSDB is the new recommended index, and make explicit that non TSDB and BoltDB shipper are deprecated and not recommended Fixes #9105 **Special notes for your reviewer**: This needs to be changed in more places, but this is a start as it concerns the main storage docs **Checklist** - [X] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [X] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Co-authored-by: Travis Patterson <[email protected]> Co-authored-by: J Stickler <[email protected]> Co-authored-by: Travis Patterson <[email protected]>

wardbekker added type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories docs-p2 Non-urgent and important labels Apr 11, 2023

Eve832 assigned JStickler Apr 11, 2023

JStickler mentioned this issue Jun 7, 2023

making clearer that TSDB is the new recommended index, and make expli… #9246

Merged

5 tasks

MasslessParticle closed this as completed in #9246 Aug 16, 2023

saule1508 mentioned this issue Sep 9, 2023

Docs feedback: /docs/sources/get-started/architecture.md #10524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark all alternative stores but TSDB as deprecated #9105

Mark all alternative stores but TSDB as deprecated #9105

wardbekker commented Apr 11, 2023

periklis commented Apr 11, 2023

wardbekker commented Apr 11, 2023

wardbekker commented Apr 11, 2023

periklis commented Apr 11, 2023

timansky commented Apr 22, 2023

brophyja commented Apr 28, 2023

liguozhong commented May 3, 2023

bmarinov commented May 5, 2023 •

edited

Loading

jerryjvl commented Jun 19, 2023

liguozhong commented Jun 19, 2023

Mark all alternative stores but TSDB as deprecated #9105

Mark all alternative stores but TSDB as deprecated #9105

Comments

wardbekker commented Apr 11, 2023

periklis commented Apr 11, 2023

wardbekker commented Apr 11, 2023

wardbekker commented Apr 11, 2023

periklis commented Apr 11, 2023

timansky commented Apr 22, 2023

brophyja commented Apr 28, 2023

liguozhong commented May 3, 2023

bmarinov commented May 5, 2023 • edited Loading

jerryjvl commented Jun 19, 2023

liguozhong commented Jun 19, 2023

bmarinov commented May 5, 2023 •

edited

Loading