Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos store: "No block found" but thanos-validator shows blocks #1220

Closed
chrisghill opened this issue Jun 4, 2019 · 11 comments
Closed

Thanos store: "No block found" but thanos-validator shows blocks #1220

chrisghill opened this issue Jun 4, 2019 · 11 comments
Labels

Comments

@chrisghill
Copy link

Running Thanos image 0.4.0

What happened
I'm using prometheus-operator with Thanos. I just modified our Thanos-compactor to have the following retention policy:

--retention.resolution-raw=7d
--retention.resolution-5m=30d

Basically we'll use prometheus for up to 30 days of data, then Thanos for anything beyond 30 days (at 1h resolution). Thanos-compactor came up and cleared out many of the 5m and raw blocks. But now when I try to query Thanos, it only appears to see the raw data, and if I query beyond that the thanos-store says "No block found". I ran the thanos-validator to make sure 1h blocks were there, and they are:

Logs

level=info ts=2019-06-04T18:24:36.768967018Z caller=factory.go:39 msg="loading bucket configuration"
|            ULID            |        FROM         |        UNTIL        |  RANGE   | UNTIL-COMP |  #SERIES   |   #SAMPLES    |  #CHUNKS   | COMP-LEVEL | COMP-FAILED |                            LABELS                             | RESOLUTION |  SOURCE   |
|----------------------------|---------------------|---------------------|----------|------------|------------|---------------|------------|------------|-------------|---------------------------------------------------------------|------------|-----------|
| 01D6F71Z9KHGSXBGQVXDAN3S6C | 07-03-2019 00:00:00 | 21-03-2019 00:00:00 | 336h0m0s | -          | 12,352,444 | 45,151,679    | 12,625,365 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01D6F7FEXMP2PGQB2MGRPWYADX | 07-03-2019 00:00:00 | 21-03-2019 00:00:00 | 336h0m0s | -          | 12,358,666 | 45,154,320    | 12,631,585 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01D7KA7JZBRWTDSRT7PHMP9P2D | 21-03-2019 00:00:00 | 04-04-2019 00:00:00 | 336h0m0s | -          | 11,659,131 | 40,936,555    | 11,864,357 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01D7KAP8K1G24DV1VJ1R4411W3 | 21-03-2019 00:00:00 | 04-04-2019 00:00:00 | 336h0m0s | -          | 13,472,499 | 49,057,353    | 13,686,043 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01D9R3B7QH58WTQXSDVPWKACR7 | 04-04-2019 00:00:00 | 18-04-2019 00:00:00 | 336h0m0s | -          | 15,879,159 | 70,256,545    | 16,081,021 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01D9R3S9B2VJ15N3BJXS5HCSA7 | 04-04-2019 00:00:00 | 18-04-2019 00:00:00 | 336h0m0s | -          | 15,616,205 | 68,695,812    | 15,814,566 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01D9VBMH0RE1626A8M5MSZ0DMF | 18-04-2019 00:00:00 | 02-05-2019 00:00:00 | 336h0m0s | -          | 16,416,109 | 55,484,690    | 16,644,748 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01D9VC1ZKRGP529CMWZBCDZ46G | 18-04-2019 00:00:00 | 02-05-2019 00:00:00 | 336h0m0s | -          | 17,035,104 | 55,701,121    | 17,263,617 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01DAZCNTPTWT9ETPJ3MR4JT6N9 | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -96h0m0s   | 17,390,625 | 565,068,132   | 20,898,380 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 5m0s       | compactor |
| 01DAZD9V91DWV56J6FFPTVV9BG | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -96h0m0s   | 17,520,560 | 564,506,288   | 21,023,397 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 5m0s       | compactor |
| 01DAZDXAQ9K7355DP51WE3ESD8 | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -          | 17,390,625 | 63,296,074    | 17,634,179 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01DAZFA84ZNSMCG1ABAE7NEPEZ | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -          | 17,520,560 | 63,300,233    | 17,764,148 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01DC3D3Z5Y0ET1QQ7D4967CZ88 | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -296h0m0s  | 23,895,368 | 4,959,591,248 | 63,308,004 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DC3DZZPWEBWD2BJP3JJ1ZCTY | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -296h0m0s  | 23,167,023 | 4,935,822,157 | 62,379,764 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DC3FVJPBYR64STZK2MYM5WQ4 | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -96h0m0s   | 23,895,367 | 510,905,988   | 26,752,673 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 5m0s       | compactor |
| 01DC3H7NMWKFQWB291NDPGF16P | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -          | 23,895,367 | 64,585,052    | 24,071,774 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
| 01DC7GDB05K0VXNG7XQH0FK5E4 | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -96h0m0s   | 23,167,022 | 507,770,143   | 26,013,148 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 5m0s       | compactor |
| 01DC7H3HX32N0RCYA4BQZAHRKV | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -          | 23,167,022 | 63,691,597    | 23,344,252 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |
| 01DC8HG4HBM2K16S4H8HZ56YJ4 | 30-05-2019 00:00:00 | 01-06-2019 00:00:00 | 48h0m0s  | -8h0m0s    | 3,737,666  | 733,848,407   | 9,508,705  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DC8HMRCM7H5E2PK9E5GYKEC7 | 30-05-2019 00:00:00 | 01-06-2019 00:00:00 | 48h0m0s  | -8h0m0s    | 3,675,729  | 735,231,529   | 9,445,805  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DC8HPZZW8GR4G0ZKA1YN94MK | 30-05-2019 00:00:00 | 01-06-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 3,737,242  | 77,693,178    | 4,168,421  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 5m0s       | compactor |
| 01DC8HTGNZ0VM1KHSWCPWG3313 | 30-05-2019 00:00:00 | 01-06-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 3,675,614  | 77,747,085    | 4,106,792  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 5m0s       | compactor |
| 01DCDPF7B2ZQ7KA7ATBBQBH9JF | 01-06-2019 00:00:00 | 03-06-2019 00:00:00 | 48h0m0s  | -8h0m0s    | 3,605,573  | 876,969,038   | 10,578,324 | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DCDPHJC3Y261D0THXAVJHVN9 | 01-06-2019 00:00:00 | 03-06-2019 00:00:00 | 48h0m0s  | -8h0m0s    | 3,605,064  | 876,967,046   | 10,577,773 | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DCDPKVFXKH3H8VYMYKRSC826 | 01-06-2019 00:00:00 | 03-06-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 3,605,556  | 91,162,512    | 4,170,061  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 5m0s       | compactor |
| 01DCDPQBH3KS1DPA3TP3JS7E6A | 01-06-2019 00:00:00 | 03-06-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 3,605,047  | 91,161,994    | 4,169,552  | 3          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 5m0s       | compactor |
| 01DCEHWJZ557HXGDS2MHQF50N2 | 03-06-2019 00:00:00 | 03-06-2019 08:00:00 | 8h0m0s   | 32h0m0s    | 790,826    | 137,595,154   | 1,732,064  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DCEHX6JDD4D4MSM654GH0GKR | 03-06-2019 00:00:00 | 03-06-2019 08:00:00 | 8h0m0s   | 32h0m0s    | 788,331    | 136,761,518   | 1,729,566  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DCFDBGMAQV23J721WKDAKB4R | 03-06-2019 08:00:00 | 03-06-2019 16:00:00 | 8h0m0s   | 32h0m0s    | 725,724    | 144,143,680   | 1,746,539  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DCFDBZXBXG0HR8YJ2QZ87QR0 | 03-06-2019 08:00:00 | 03-06-2019 16:00:00 | 8h0m0s   | 32h0m0s    | 725,586    | 144,143,254   | 1,746,402  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DCG8TFC9M1KHHNPHJKDAKQKB | 03-06-2019 16:00:00 | 04-06-2019 00:00:00 | 8h0m0s   | 32h0m0s    | 787,086    | 144,775,635   | 1,812,501  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DCG8V095EYH1D9Q7CC6SAMXV | 03-06-2019 16:00:00 | 04-06-2019 00:00:00 | 8h0m0s   | 32h0m0s    | 787,161    | 144,775,786   | 1,812,579  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DCH49AFPPE5KXNWPMTQZ2433 | 04-06-2019 00:00:00 | 04-06-2019 08:00:00 | 8h0m0s   | 32h0m0s    | 760,332    | 145,679,534   | 1,797,417  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | compactor |
| 01DCH49VZJW66XR5SFTVHKY6JW | 04-06-2019 00:00:00 | 04-06-2019 08:00:00 | 8h0m0s   | 32h0m0s    | 760,586    | 145,680,267   | 1,797,671  | 2          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | compactor |
| 01DCH2AMY9FHB8X2ZXGSKWN6NV | 04-06-2019 08:00:00 | 04-06-2019 10:00:00 | 2h0m0s   | 38h0m0s    | 305,978    | 36,954,542    | 458,574    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | sidecar   |
| 01DCH2AMY9W6XF5VX9VGKA60WB | 04-06-2019 08:00:00 | 04-06-2019 10:00:00 | 2h0m0s   | 38h0m0s    | 306,010    | 36,954,625    | 458,583    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | sidecar   |
| 01DCH96C6A5ZGCVKRG5KHE1K8K | 04-06-2019 10:00:00 | 04-06-2019 12:00:00 | 2h0m0s   | 38h0m0s    | 310,222    | 37,821,226    | 465,667    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | sidecar   |
| 01DCH96C6BCTWWHWSE8Y058EYV | 04-06-2019 10:00:00 | 04-06-2019 12:00:00 | 2h0m0s   | 38h0m0s    | 310,285    | 37,821,465    | 465,730    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | sidecar   |
| 01DCHG23E89RQBKH3Y4M0R9BE6 | 04-06-2019 12:00:00 | 04-06-2019 14:00:00 | 2h0m0s   | 38h0m0s    | 344,556    | 39,185,563    | 505,107    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | sidecar   |
| 01DCHG23E9DBZX1V9NJSM3V40T | 04-06-2019 12:00:00 | 04-06-2019 14:00:00 | 2h0m0s   | 38h0m0s    | 344,526    | 39,185,516    | 505,077    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | sidecar   |
| 01DCHPXTP8VKWJREHBMQGP9VPG | 04-06-2019 14:00:00 | 04-06-2019 16:00:00 | 2h0m0s   | 38h0m0s    | 355,490    | 39,688,215    | 490,698    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 0s         | sidecar   |
| 01DCHPXTP9T0XK21BTB080P4VF | 04-06-2019 14:00:00 | 04-06-2019 16:00:00 | 2h0m0s   | 38h0m0s    | 355,579    | 39,688,483    | 490,787    | 1          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 0s         | sidecar   |
level=info ts=2019-06-04T18:24:42.358015552Z caller=main.go:185 msg=exiting

You can see we have data going back to March 21. Now if I try to query (through grafana) for something ~30 days ago,

level=debug ts=2019-06-04T18:42:49.695627375Z caller=bucket.go:680 msg="No block found" mint=1557081300000 maxt=1557168240000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}"
level=debug ts=2019-06-04T18:42:49.695698683Z caller=bucket.go:680 msg="No block found" mint=1557081300000 maxt=1557168240000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}"
level=debug ts=2019-06-04T18:42:49.696227324Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:0 postingsTouched:0 postingsTouchedSizeSum:0 postingsToFetch:0 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:0 seriesTouchedSizeSum:0 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:0 chunksTouchedSizeSum:0 chunksFetched:0 chunksFetchedSizeSum:0 chunksFetchCount:0 chunksFetchDurationSum:0 getAllDuration:891 mergedSeriesCount:0 mergedChunksCount:0 mergeDuration:1464}" err=null
level=debug ts=2019-06-04T18:42:49.698963513Z caller=bucket.go:680 msg="No block found" mint=1557081300000 maxt=1557168240000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}"
level=debug ts=2019-06-04T18:42:49.699003527Z caller=bucket.go:680 msg="No block found" mint=1557081300000 maxt=1557168240000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}"
level=debug ts=2019-06-04T18:42:49.699047663Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:0 postingsTouched:0 postingsTouchedSizeSum:0 postingsToFetch:0 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:0 seriesTouchedSizeSum:0 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:0 chunksTouchedSizeSum:0 chunksFetched:0 chunksFetchedSizeSum:0 chunksFetchCount:0 chunksFetchDurationSum:0 getAllDuration:586 mergedSeriesCount:0 mergedChunksCount:0 mergeDuration:647}" err=null

Those epoch's (mint and maxt) convert to May 5, 2019 6:35:00 PM through May 6, 2019 6:44:00 PM. Thanos should clearly have 1h resolution data from that time period. From the thanos-validator:

thanos-inspector thanos-inspector | 01DAZDXAQ9K7355DP51WE3ESD8 | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -          | 17,390,625 | 63,296,074    | 17,634,179 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-0 | 1h0m0s     | compactor |
thanos-inspector thanos-inspector | 01DAZFA84ZNSMCG1ABAE7NEPEZ | 02-05-2019 00:00:00 | 16-05-2019 00:00:00 | 336h0m0s | -          | 17,520,560 | 63,300,233    | 17,764,148 | 4          | false       | prometheus=monitoring/k8s,prometheus_replica=prometheus-k8s-1 | 1h0m0s     | compactor |

I'm wondering if there is something strange with my configuration maybe? If I just query the last 7 days via Thanos I get something like this:
image
Which appears to only be valid for the most recent 1.5 days. And the Thanos store spits some errors:

level=debug ts=2019-06-04T19:04:28.693462744Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=2 mint=1559318100000 maxt=1559375760000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559174400000-1559520000000 Resolution: 0"
level=debug ts=2019-06-04T19:04:28.693537619Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=2 mint=1559318100000 maxt=1559375760000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559174400000-1559520000000 Resolution: 0"
level=debug ts=2019-06-04T19:04:28.71361419Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=2 mint=1559318100000 maxt=1559375760000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559174400000-1559520000000 Resolution: 0"
level=debug ts=2019-06-04T19:04:28.713720578Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=2 mint=1559318100000 maxt=1559375760000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559174400000-1559520000000 Resolution: 0"
level=error ts=2019-06-04T19:04:29.476652039Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14420276 cacheType=Postings
level=error ts=2019-06-04T19:04:29.522143972Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14950684 cacheType=Postings
level=error ts=2019-06-04T19:04:29.664753976Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14422312 cacheType=Postings
level=error ts=2019-06-04T19:04:29.745722162Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14420276 cacheType=Postings
level=error ts=2019-06-04T19:04:29.773629201Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14702936 cacheType=Postings
level=error ts=2019-06-04T19:04:29.789013145Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14422312 cacheType=Postings
level=error ts=2019-06-04T19:04:29.846656789Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14950684 cacheType=Postings
level=error ts=2019-06-04T19:04:29.869744414Z caller=cache.go:226 msg="LRU has nothing more to evict, but we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=255923667 itemSize=14702936 cacheType=Postings
level=debug ts=2019-06-04T19:04:29.97812717Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:4 postingsTouched:8 postingsTouchedSizeSum:58496392 postingsToFetch:0 postingsFetched:8 postingsFetchedSizeSum:58496392 postingsFetchCount:8 postingsFetchDurationSum:4755997580 seriesTouched:58 seriesTouchedSizeSum:29809 seriesFetched:58 seriesFetchedSizeSum:290320 seriesFetchCount:4 seriesFetchDurationSum:295705245 chunksTouched:666 chunksTouchedSizeSum:165210 chunksFetched:666 chunksFetchedSizeSum:921646 chunksFetchCount:4 chunksFetchDurationSum:286589259 getAllDuration:1263680119 mergedSeriesCount:28 mergedChunksCount:666 mergeDuration:640406}" err=null
level=debug ts=2019-06-04T19:04:30.003435499Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:4 postingsTouched:8 postingsTouchedSizeSum:58496392 postingsToFetch:0 postingsFetched:8 postingsFetchedSizeSum:58496392 postingsFetchCount:8 postingsFetchDurationSum:4740538082 seriesTouched:58 seriesTouchedSizeSum:29809 seriesFetched:58 seriesFetchedSizeSum:290320 seriesFetchCount:4 seriesFetchDurationSum:431509621 chunksTouched:666 chunksTouchedSizeSum:165210 chunksFetched:666 chunksFetchedSizeSum:921646 chunksFetchCount:4 chunksFetchDurationSum:347813172 getAllDuration:1309098696 mergedSeriesCount:28 mergedChunksCount:666 mergeDuration:735164}" err=null

But if I just zoom in on one of the days that looks invalid, then the data looks valid:
image

If its relevant, I did upgrade our Thanos version to 0.4.0 for all images a couple days ago. Are there incompatibilities in the blocks for those versions perhaps?

What you expected to happen
Thanos to seamlessly query and return data from blocks

How to reproduce it (as minimally and precisely as possible):
Query Thanos for any 1h resolution data

Anything else we need to know

@vaibhavkhurana2018
Copy link

vaibhavkhurana2018 commented Jun 17, 2019

Hey, i am also facing the same issue with compact. I am also using 0.4.0 version

Thanos Store Logs:

level=debug ts=2019-06-17T06:41:29.511287237Z caller=bucket.go:680 msg="No block found" mint=1560666960000 maxt=1560710520000 lset="{monitor=\"prod-api\",replica=\"stage-blue-ap-south-1a\"}"
level=debug ts=2019-06-17T06:41:29.511439Z caller=bucket.go:680 msg="No block found" mint=1560666960000 maxt=1560710520000 lset="{monitor=\"prod-api\",replica=\"stage-blue-ap-south-1b\"}"
level=debug ts=2019-06-17T06:41:29.511474371Z caller=bucket.go:680 msg="No block found" mint=1560666960000 maxt=1560710520000 lset="{monitor=\"prod-api\",replica=\"ap-south-1b\"}"

Thanos Bucket Inspect:

/ # thanos bucket inspect --objstore.config-file=/etc/thanos/s3.yml
level=info ts=2019-06-17T06:44:24.743641854Z caller=factory.go:39 msg="loading bucket configuration"
|            ULID            |        FROM         |        UNTIL        |  RANGE   | UNTIL-COMP | #SERIES |  #SAMPLES  | #CHUNKS | COMP-LEVEL | COMP-FAILED |                     LABELS                      | RESOLUTION |  SOURCE   |
|----------------------------|---------------------|---------------------|----------|------------|---------|------------|---------|------------|-------------|-------------------------------------------------|------------|-----------|
| 01DCNYVK5NVNMMCF7EM0CE3PZ9 | 02-05-2019 08:00:00 | 16-05-2019 00:00:00 | 328h0m0s | -          | 953,925 | 8,951,219  | 955,580 | 4          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 1h0m0s     | compactor |
| 01DCNYTCCCVCE5Q6RQHZQ2WJ00 | 05-05-2019 22:00:00 | 16-05-2019 00:00:00 | 242h0m0s | -          | 906,986 | 8,428,337  | 910,160 | 4          | false       | monitor=prod-api,replica=ap-south-1b            | 1h0m0s     | compactor |
| 01DCNYZTS89AY6QXHGKMBQHMA2 | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -          | 825,591 | 32,687,479 | 961,148 | 4          | false       | monitor=prod-api,replica=ap-south-1b            | 1h0m0s     | compactor |
| 01DCNZ2T495ETEACB7RBF3ZGCX | 16-05-2019 00:00:00 | 30-05-2019 00:00:00 | 336h0m0s | -          | 831,417 | 32,659,675 | 967,567 | 4          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 1h0m0s     | compactor |
| 01DDHSJGPRE739GAQ9NBR6ZDP9 | 17-06-2019 00:00:00 | 17-06-2019 02:00:00 | 2h0m0s   | 38h0m0s    | 76,100  | 9,301,446  | 153,210 | 2          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 0s         | compactor |
| 01DDHSJPHPDYJ41405WVCMX7NX | 17-06-2019 00:00:00 | 17-06-2019 02:00:00 | 2h0m0s   | 38h0m0s    | 76,100  | 9,301,446  | 153,210 | 2          | false       | monitor=prod-api,replica=stage-blue-ap-south-1b | 0s         | compactor |
| 01DDJ0E7V3SMRNZ9V5PNP1KW2S | 17-06-2019 02:00:00 | 17-06-2019 04:00:00 | 2h0m0s   | 38h0m0s    | 76,100  | 9,300,982  | 153,210 | 2          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 0s         | compactor |
| 01DDJ0ED9HCY0472AW6BCKMBEW | 17-06-2019 02:00:00 | 17-06-2019 04:00:00 | 2h0m0s   | 38h0m0s    | 76,100  | 9,300,750  | 153,152 | 2          | false       | monitor=prod-api,replica=stage-blue-ap-south-1b | 0s         | compactor |
| 01DDHYKR52WT823NZVD4ATYFW7 | 17-06-2019 04:00:00 | 17-06-2019 05:00:00 | 1h0m0s   | 39h0m0s    | 76,615  | 4,624,592  | 77,074  | 1          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 0s         | sidecar   |
| 01DDHYKR5979679B3TW78CKXXH | 17-06-2019 04:00:00 | 17-06-2019 05:00:00 | 1h0m0s   | 39h0m0s    | 76,615  | 4,581,614  | 77,074  | 1          | false       | monitor=prod-api,replica=stage-blue-ap-south-1b | 0s         | sidecar   |
| 01DDJ21KS2Y4EY9S2RM2BXPS0Q | 17-06-2019 05:00:00 | 17-06-2019 06:00:00 | 1h0m0s   | 39h0m0s    | 76,687  | 4,670,076  | 77,169  | 1          | false       | monitor=prod-api,replica=stage-blue-ap-south-1a | 0s         | sidecar   |
| 01DDJ21KSAJJSP1681YAEEN00X | 17-06-2019 05:00:00 | 17-06-2019 06:00:00 | 1h0m0s   | 39h0m0s    | 76,687  | 4,670,076  | 77,169  | 1          | false       | monitor=prod-api,replica=stage-blue-ap-south-1b | 0s         | sidecar   |
level=info ts=2019-06-17T06:44:25.109209641Z caller=main.go:185 msg=exiting

Thanos Bucket Verify:

/ # thanos bucket verify --objstore.config-file=/etc/thanos/s3.yml
level=info ts=2019-06-17T06:45:20.11790173Z caller=factory.go:39 msg="loading bucket configuration"
level=warn ts=2019-06-17T06:45:20.118322786Z caller=verify.go:49 msg="GLOBAL COMPACTOR SHOULD __NOT__ BE RUNNING ON THE SAME BUCKET" issues=2 repair=false
level=info ts=2019-06-17T06:45:20.118353945Z caller=index_issue.go:29 msg="started verifying issue" with-repair=false issue=index_issue
level=info ts=2019-06-17T06:45:27.408800803Z caller=index_issue.go:130 msg="verified issue" with-repair=false issue=index_issue
level=info ts=2019-06-17T06:45:27.40884527Z caller=overlapped_blocks.go:26 msg="started verifying issue" with-repair=false issue=overlapped_blocks
level=info ts=2019-06-17T06:45:27.609243916Z caller=verify.go:68 msg="verify completed" issues=2 repair=false
level=info ts=2019-06-17T06:45:27.609473803Z caller=main.go:185 msg=exiting

Did you find out any solution for this?

@chrisghill
Copy link
Author

chrisghill commented Jun 17, 2019

@vaibhavkhurana2018 No, I haven't found a solution. Honestly I haven't really tried querying for a while, but I played around with it again today, and this is what I found:

I'm able to query in Thanos back to about 13 days now, which if you notice from the timestamp on this issue is right about when I updated our Thanos version. This makes me think that there may be some backwards-incompatibility issues with Thanos. Everything that was written within the last 13 days is queryable, but when I go beyond that I get the dreaded "No block found" error:

level=debug ts=2019-06-17T22:27:09.896968242Z caller=bucket.go:680 msg="No block found" mint=1559600460000 maxt=1559687280000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}"
level=debug ts=2019-06-17T22:27:09.897002904Z caller=bucket.go:680 msg="No block found" mint=1559600460000 maxt=1559687280000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}"

If I query within the last 13 days it's slow, but it works. Additionally there are numerous log messages that look concerning, but it at least works:

Query Logs

level=debug ts=2019-06-17T22:29:17.842223669Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:17.842291568Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:17.842223674Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:17.842469182Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:17.861960672Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:17.862025205Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=error ts=2019-06-17T22:29:19.509153715Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261978605 itemSize=11250944 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:19.761732178Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261923781 itemSize=28637540 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:19.87261349Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261868985 itemSize=11254164 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.149641951Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261814438 itemSize=1236028 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.150448766Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261759904 itemSize=508736 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.1514167Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261967005 itemSize=392848 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.152151889Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261912517 itemSize=3832376 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.153942646Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=262089275 itemSize=99192 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.154682404Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=262034475 itemSize=11125116 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.155477053Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=262054631 itemSize=779576 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.156198787Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261999880 itemSize=599124 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.15694545Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261945103 itemSize=439368 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.157666447Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261890291 itemSize=1180672 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.394099719Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261835359 itemSize=34686260 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.428497469Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261780359 itemSize=75489084 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.450136753Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261725373 itemSize=28642000 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.459864495Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261670726 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:20.520053678Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:20 postingsTouchedSizeSum:243053676 postingsToFetch:0 postingsFetched:12 postingsFetchedSizeSum:145634064 postingsFetchCount:8 postingsFetchDurationSum:21128514393 seriesTouched:76 seriesTouchedSizeSum:12254 seriesFetched:76 seriesFetchedSizeSum:809120 seriesFetchCount:2 seriesFetchDurationSum:136998325 chunksTouched:100 chunksTouchedSizeSum:5382 chunksFetched:100 chunksFetchedSizeSum:157046 chunksFetchCount:4 chunksFetchDurationSum:109515106 getAllDuration:9405842076 mergedSeriesCount:16 mergedChunksCount:100 mergeDuration:152332}" err=null
level=error ts=2019-06-17T22:29:20.540449717Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261632981 itemSize=1236880 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.545210997Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=262086764 itemSize=261528 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.546450916Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=262032086 itemSize=392676 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.547458991Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261977623 itemSize=3830828 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.548366925Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261923196 itemSize=278944 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.549417121Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261967962 itemSize=11121960 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:20.550437253Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:20 postingsTouchedSizeSum:243067472 postingsToFetch:0 postingsFetched:12 postingsFetchedSizeSum:145640960 postingsFetchCount:8 postingsFetchDurationSum:20334814589 seriesTouched:114 seriesTouchedSizeSum:18381 seriesFetched:114 seriesFetchedSizeSum:822912 seriesFetchCount:2 seriesFetchDurationSum:189503236 chunksTouched:150 chunksTouchedSizeSum:8073 chunksFetched:150 chunksFetchedSizeSum:224547 chunksFetchCount:4 chunksFetchDurationSum:91131918 getAllDuration:9435350972 mergedSeriesCount:24 mergedChunksCount:150 mergeDuration:204829}" err=null
level=error ts=2019-06-17T22:29:20.551229894Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261988259 itemSize=779232 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.551989619Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261933784 itemSize=598672 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.552743774Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261878984 itemSize=440140 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.553473604Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261824196 itemSize=1181616 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:20.569654515Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261769378 itemSize=34682916 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:20.582903148Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:20.583035771Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:20.622190285Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:20.622270896Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559687220000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=error ts=2019-06-17T22:29:20.724942771Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261874962 itemSize=34682916 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:21.409821833Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261972759 itemSize=28642000 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:23.266174612Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261917867 itemSize=34686260 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:23.625356023Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261862969 itemSize=28637540 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:23.652423906Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261809464 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:23.691559366Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:20 postingsTouchedSizeSum:243063596 postingsToFetch:0 postingsFetched:4 postingsFetchedSizeSum:145345696 postingsFetchCount:4 postingsFetchDurationSum:14232776967 seriesTouched:76 seriesTouchedSizeSum:12254 seriesFetched:76 seriesFetchedSizeSum:809472 seriesFetchCount:2 seriesFetchDurationSum:161318818 chunksTouched:100 chunksTouchedSizeSum:5382 chunksFetched:100 chunksFetchedSizeSum:159872 chunksFetchCount:4 chunksFetchDurationSum:99366213 getAllDuration:5848877321 mergedSeriesCount:16 mergedChunksCount:100 mergeDuration:123212}" err=null
level=debug ts=2019-06-17T22:29:23.754265352Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=debug ts=2019-06-17T22:29:23.754328468Z caller=bucket.go:706 msg="Blocks source resolutions" blocks=1 mint=1559686980000 maxt=1559773800000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1559692800000-1560384000000 Resolution: 0"
level=error ts=2019-06-17T22:29:24.056010033Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261761999 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:24.175106673Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:10 postingsTouchedSizeSum:229256584 postingsToFetch:0 postingsFetched:4 postingsFetchedSizeSum:132914432 postingsFetchCount:4 postingsFetchDurationSum:7465475730 seriesTouched:4 seriesTouchedSizeSum:3818 seriesFetched:4 seriesFetchedSizeSum:262144 seriesFetchCount:4 seriesFetchDurationSum:122824239 chunksTouched:46 chunksTouchedSizeSum:23794 chunksFetched:46 chunksFetchedSizeSum:55060 chunksFetchCount:2 chunksFetchDurationSum:164528278 getAllDuration:3591907055 mergedSeriesCount:2 mergedChunksCount:46 mergeDuration:57135}" err=null
level=error ts=2019-06-17T22:29:24.212227098Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261709448 itemSize=28637540 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:24.266647863Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261656312 itemSize=28642000 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:24.571925854Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261602262 itemSize=75489084 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:24.572598528Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261547763 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:24.761043329Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:631 postingsTouchedSizeSum:447359028 postingsToFetch:0 postingsFetched:32 postingsFetchedSizeSum:198666508 postingsFetchCount:7 postingsFetchDurationSum:19940448677 seriesTouched:16 seriesTouchedSizeSum:15272 seriesFetched:16 seriesFetchedSizeSum:1413664 seriesFetchCount:2 seriesFetchDurationSum:223805153 chunksTouched:184 chunksTouchedSizeSum:113187 chunksFetched:184 chunksFetchedSizeSum:987682 chunksFetchCount:6 chunksFetchDurationSum:511363949 getAllDuration:6918257432 mergedSeriesCount:8 mergedChunksCount:184 mergeDuration:415523}" err=null
level=error ts=2019-06-17T22:29:25.033327016Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261501070 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:25.117401192Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:20 postingsTouchedSizeSum:243049800 postingsToFetch:0 postingsFetched:4 postingsFetchedSizeSum:145345696 postingsFetchCount:4 postingsFetchDurationSum:12589982063 seriesTouched:76 seriesTouchedSizeSum:12254 seriesFetched:76 seriesFetchedSizeSum:809472 seriesFetchCount:2 seriesFetchDurationSum:63737463 chunksTouched:100 chunksTouchedSizeSum:5382 chunksFetched:100 chunksFetchedSizeSum:159872 chunksFetchCount:4 chunksFetchDurationSum:295860399 getAllDuration:7255182773 mergedSeriesCount:16 mergedChunksCount:100 mergeDuration:121579}" err=null
level=error ts=2019-06-17T22:29:25.629826988Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261453394 itemSize=75489084 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:25.657492713Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261400962 itemSize=34686260 cacheType=Postings iterations=500
level=error ts=2019-06-17T22:29:25.67908952Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261346625 itemSize=34682916 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:25.711947674Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:10 postingsTouchedSizeSum:229256584 postingsToFetch:0 postingsFetched:4 postingsFetchedSizeSum:132914432 postingsFetchCount:4 postingsFetchDurationSum:12364152490 seriesTouched:4 seriesTouchedSizeSum:3818 seriesFetched:4 seriesFetchedSizeSum:262144 seriesFetchCount:4 seriesFetchDurationSum:156865918 chunksTouched:46 chunksTouchedSizeSum:20692 chunksFetched:46 chunksFetchedSizeSum:52116 chunksFetchCount:2 chunksFetchDurationSum:140349614 getAllDuration:5089480010 mergedSeriesCount:2 mergedChunksCount:46 mergeDuration:124084}" err=null
level=error ts=2019-06-17T22:29:25.961405244Z caller=cache.go:212 msg="After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring." maxItemSizeBytes=131072000 maxSizeBytes=262144000 curSize=261291951 itemSize=75489084 cacheType=Postings iterations=500
level=debug ts=2019-06-17T22:29:25.98243361Z caller=bucket.go:803 msg="stats query processed" stats="&{blocksQueried:2 postingsTouched:18 postingsTouchedSizeSum:242451608 postingsToFetch:0 postingsFetched:3 postingsFetchedSizeSum:144858212 postingsFetchCount:3 postingsFetchDurationSum:6032019883 seriesTouched:144 seriesTouchedSizeSum:21260 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:200 chunksTouchedSizeSum:10764 chunksFetched:200 chunksFetchedSizeSum:1420206 chunksFetchCount:2 chunksFetchDurationSum:46210480 getAllDuration:2227790790 mergedSeriesCount:32 mergedChunksCount:200 mergeDuration:243727}" err=null

That is just an excerpt, but I don't know that's normal or not. Something to be aware of that has plagued us is that Thanos is VERY memory intensive. Make sure you are giving your instance enough memory:
image

You can see that our instance idles at just under 14GiB, but as soon as I performed a query it quickly jumped up to over 28GiB. We've had stability issues with thanos-store in the past when we didn't give it enough memory.

Memory wouldn't cause the "No block found" issue though. Did you recently update Thanos versions? I'm wondering if you're having backwards compatibility issues as well.

@ivan-kiselev
Copy link

ivan-kiselev commented Sep 18, 2019

Running in absolutely the same issue with Thanos 0.7.0

Basically no downsapled data available, only raw data.

For testing, we're using a simple request up in query interface and chose a moment of time where only downsampled blocks exist (no raw data). Downsapled blocks (at least those that we're requesting data from) were compacted with Thanos 0.6.0 though.

level=debug ts=2019-09-18T15:45:13.9626336Z caller=bucket.go:680 msg="No block found" mint=1566401808898 maxt=1566402108898 lset="$SOME_LABELS"

@ivan-kiselev
Copy link

Just thinking out loud: @chrisghill @vaibhavkhurana2018 did you guys change settings of compactor downsampling during the lifetime of data? Like when you already had some data that compactor worked with and then changed amount of time you store raw/5m/1h resolutions?

@vaibhavkhurana2018
Copy link

@homelessnessbo No, i haven't made any changes to the config post enabling compact. But, this should not impact in an ideal case because this is something that can change overtime based on the use case.

@chrisghill
Copy link
Author

@homelessnessbo It's been a long time, but I do remember that we changed the lifecycle of 0m and 5m data (to be shorter - a week for 0m and 1 month for 5m I think). And it very likely coincided with these issues.

@bwplotka
Copy link
Member

bwplotka commented Sep 20, 2019

  1. So msg="No block found" is just a debug log showing that there are some blocks matching external labels, but not really in the time period that is requested. This is mosty likely totally fine as you might have created some sources (Prometheus) with some external labels, then change it on the fly at some point. Agree it could be more explicit log message (:
  2. Have you guys seen compaction: Be clear about storage retention and downsampling. #813? It might be useful TL;DR downsampling is really for performing large range queries. If you zoom into downsampled data - as you can imagine you see low-resolution data, so query need to be crafted that way (e.g rate interval needs to be large enough)
  3. There are zero chances of incompatibilities between Thanos versions - we never changed the format. It's still a native TSDB block. In fact, we probably will never do change. It's even rare to have any incompatibility between Thanos and Prometheus. We keep it tight so I would rule this out.
  4. @chrischdi changing retention might coincided with these issues. because you literally removed raw data. Since downsampled data is again... downsampled (: some queries need to be more carefully crafted when touching downsampled data. Especially your graph mentioned in the beginning: Thanos store: "No block found" but thanos-validator shows blocks #1220 (comment) what query you actually do?

That is just an excerpt, but I don't know that's normal or not. Something to be aware of that has plagued us is that Thanos is VERY memory intensive. Make sure you are giving your instance enough memory:

Yup, we are aware (: Strong efforts are made on this front:

Help is welcome (:

Anyway, a summary is to put better docs into how to use downsampling. TBH I was opposed to adding any retention to Thanos because of such issues like this. Thanos is designed to store "unlimited" data. To do that we created downsampling to query long long ranges (years) with reasonable performance. Raw data is still very useful to zoom in into old historical data. All the deployments I do, I recommend allowing the same retentions for all resolutions: raw, 5m, and 1h. Maybe we should disable custom retentions like this? 🤔 We would love to enable users to fit their own cases, but looks like we failed on the communication side!

@ivan-kiselev
Copy link

ivan-kiselev commented Sep 20, 2019

TLDR from me: I was confused all the time working with Thanos, I had a fundamental misunderstanding of how downsampling settings should be set and that's where I created a problem for myself.

@chrischdi
Copy link
Contributor

Hi there,

I'd like to share some data from our configuration to show how we got compaction + retention working for us.

We have set the retention to the following values:

  • --retention.resolution-raw=46h
  • --retention.resolution-5m=14d

The result is the following for us:

|            ULID            |        FROM         |        UNTIL        |  RANGE   | UNTIL-COMP |  #SERIES   |   #SAMPLES    |  #CHUNKS   | COMP-LEVEL | COMP-FAILED |             LABELS              | RESOLUTION |  SOURCE   |
|----------------------------|---------------------|---------------------|----------|------------|------------|---------------|------------|------------|-------------|---------------------------------|------------|-----------|
| 01D3SGX9VCCPAGZ87KDHRHZ4SC | 28-01-2019 00:00:00 | 07-02-2019 00:00:00 | 240h0m0s | -          | 8,727,454  | 218,478,951   | 9,515,410  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01D4HJN7GR602A48BV5TEA78YE | 11-02-2019 00:00:00 | 21-02-2019 00:00:00 | 240h0m0s | -          | 5,687,200  | 153,850,970   | 6,628,290  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01D5NH5DQJY9XQ2J4N2VW2C2FE | 25-02-2019 00:00:00 | 07-03-2019 00:00:00 | 240h0m0s | -          | 2,792,814  | 115,767,571   | 3,189,006  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01D6SJJ000GFQKAVT4P9QQJY73 | 11-03-2019 00:00:00 | 21-03-2019 00:00:00 | 240h0m0s | -          | 2,649,311  | 109,970,775   | 3,352,141  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01D7XSWNCT39ZMPM0MDJ33FC35 | 25-03-2019 00:00:00 | 04-04-2019 00:00:00 | 240h0m0s | -          | 2,499,032  | 85,842,752    | 2,840,307  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01D91V1D6P40VKF73VDQTE2WQC | 08-04-2019 00:00:00 | 18-04-2019 00:00:00 | 240h0m0s | -          | 3,124,019  | 93,921,358    | 3,543,442  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DA5YQHJK3XBQR4F02T8J8HFM | 22-04-2019 00:00:00 | 02-05-2019 00:00:00 | 240h0m0s | -          | 2,951,961  | 97,259,114    | 3,456,094  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DBA1SGZVYYJN8NXH2BR6X7W0 | 06-05-2019 00:00:00 | 16-05-2019 00:00:00 | 240h0m0s | -          | 2,815,698  | 99,389,940    | 3,340,075  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DCE32FM6Z2SYGRQGNTAFQQD6 | 20-05-2019 00:00:00 | 30-05-2019 00:00:00 | 240h0m0s | -          | 3,365,957  | 102,356,123   | 3,852,387  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DDJ47X7HY7238KA814KPKPYT | 03-06-2019 00:00:00 | 13-06-2019 00:00:00 | 240h0m0s | -          | 3,395,748  | 99,172,038    | 3,922,885  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DEP588N2YVWY2TT94500QH63 | 17-06-2019 00:00:00 | 27-06-2019 00:00:00 | 240h0m0s | -          | 3,554,464  | 109,446,814   | 4,280,032  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DFT66C5RS7N3J1APWYVB30W3 | 01-07-2019 00:00:00 | 11-07-2019 00:00:00 | 240h0m0s | -          | 4,853,744  | 95,072,315    | 5,111,432  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DGY8QNVKZ213N7BJ4RYH9NQK | 15-07-2019 00:00:00 | 25-07-2019 00:00:00 | 240h0m0s | -          | 3,139,250  | 93,439,490    | 3,624,361  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DJ28RDQ2ARWT9DNYANGBF1C3 | 29-07-2019 00:00:00 | 08-08-2019 00:00:00 | 240h0m0s | -          | 942,400    | 100,833,253   | 1,594,913  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DK6AC4HXJ45ZVD1ZVJ8SYGTB | 12-08-2019 00:00:00 | 22-08-2019 00:00:00 | 240h0m0s | -          | 1,058,652  | 108,234,942   | 1,844,729  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DMAFPAA14Z567GM1G8Z5FVME | 26-08-2019 00:00:00 | 05-09-2019 00:00:00 | 240h0m0s | -          | 1,642,073  | 104,166,401   | 2,382,216  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DNEC9PQF1TK49N7GXGKR8VKR | 09-09-2019 00:00:00 | 19-09-2019 00:00:00 | 240h0m0s | 0s         | 2,000,241  | 1,662,211,799 | 14,677,928 | 4          | false       | monitor=prometheus,replica=0    | 5m0s       | compactor |
| 01DNEN7QZMQQJ3H864R1MW66NZ | 09-09-2019 00:00:00 | 19-09-2019 00:00:00 | 240h0m0s | -          | 2,000,241  | 140,198,619   | 2,698,296  | 4          | false       | monitor=prometheus,replica=0    | 1h0m0s     | compactor |
| 01DN93DZMVZA8WQR1RGG04TCA1 | 19-09-2019 00:00:00 | 21-09-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 742,031    | 345,955,789   | 2,975,106  | 3          | false       | monitor=prometheus,replica=0    | 5m0s       | compactor |
| 01DNE46KXKYM9AGJ62N2Q5WM8B | 21-09-2019 00:00:00 | 23-09-2019 00:00:00 | 48h0m0s  | -8h0m0s    | 644,839    | 1,643,500,497 | 14,365,934 | 3          | false       | monitor=prometheus,replica=0    | 0s         | compactor |
| 01DNE7TP6DP90DFY21STGBMQFB | 21-09-2019 00:00:00 | 23-09-2019 00:00:00 | 48h0m0s  | 192h0m0s   | 644,839    | 343,689,121   | 2,879,455  | 3          | false       | monitor=prometheus,replica=0    | 5m0s       | compactor |
| 01DNEZ372XDNE646Z4R37SA6ZG | 23-09-2019 00:00:00 | 23-09-2019 08:00:00 | 8h0m0s   | 32h0m0s    | 608,272    | 274,769,377   | 2,397,226  | 2          | false       | monitor=prometheus,replica=0    | 0s         | compactor |
| 01DNEWEMVEZFN5WWNVCC2YZPEC | 23-09-2019 08:00:00 | 23-09-2019 10:00:00 | 2h0m0s   | 38h0m0s    | 616,245    | 69,017,883    | 616,245    | 1          | false       | monitor=prometheus,replica=0    | 0s         | sidecar   |
| 01DNF3AC3F7293PT5CPFDEVCX0 | 23-09-2019 10:00:00 | 23-09-2019 12:00:00 | 2h0m0s   | 38h0m0s    | 609,597    | 69,714,629    | 609,597    | 1          | false       | monitor=prometheus,replica=0    | 0s         | sidecar   |
| 01DNFA63BEHCSGJQJK62F08JP6 | 23-09-2019 12:00:00 | 23-09-2019 14:00:00 | 2h0m0s   | 38h0m0s    | 612,731    | 69,730,258    | 612,731    | 1          | false       | monitor=prometheus,replica=0    | 0s         | sidecar   |
| 01DNFH1TKDJ7REVEGKZ5ZR6RAJ | 23-09-2019 14:00:00 | 23-09-2019 16:00:00 | 2h0m0s   | 38h0m0s    | 609,186    | 69,742,328    | 609,186    | 1          | false       | monitor=prometheus,replica=0    | 0s         | sidecar   |
| 01DNFQXHVDH91H3H95NSYVQQC7 | 23-09-2019 16:00:00 | 23-09-2019 18:00:00 | 2h0m0s   | 38h0m0s    | 607,612    | 69,749,165    | 607,612    | 1          | false       | monitor=prometheus,replica=0    | 0s         | sidecar   |

As you can see we have raw blocks of maximum size 48h and 5m blocks of maximum size 240h/10d.

How compaction & retention works for us

  • rawblocks of 2h range get uploaded
  • raw-blocks get compacted to maximum of 48h range
  • 48h raw-blocks get downsampled to 5m-blocks having 48h range
  • retention ensures that 48h raw-blocks don't get downsampled further
  • up to 5*48h/240h 5m-blocks get downsampled to 240h 1h-blocks
  • retention ensures that 240h 5m-blocks don't get downsampled further

Also important for us to make querying downsampled data work was to set --query.auto-downsampling for thanos query.

@bwplotka
Copy link
Member

Hope that this helps: https://thanos.io/components/compact.md/#downsampling-resolution-and-retention

AC:

  • Remove confusing "no block found" log message.

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this as completed Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants