Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: unexpected RangeNotFound keeps query spinning indefinitely #18249

Closed
asubiotto opened this issue Sep 6, 2017 · 6 comments
Closed

storage: unexpected RangeNotFound keeps query spinning indefinitely #18249

asubiotto opened this issue Sep 6, 2017 · 6 comments
Assignees
Milestone

Comments

@asubiotto
Copy link
Contributor

Running SELECT * FROM tpch.lineitem ORDER BY l_extendedprice (scalefactor 10) on 6-node navy using a9ef834 gets stuck retrying because a range was not found. The trace is (repeating indefinitely):

14:58:10.644415     .     5    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] evicting range descriptor on send error and backoff for re-lookup
14:58:10.644432     .    17    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] evicting cached range descriptor
14:58:11.501091     .856660    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] looked up range descriptor
14:58:11.501137     .    45    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] r322: sending batch 1 Scan to (n1,s1):10
14:58:11.501156     .    19    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] sending request to local server
14:58:11.501491     .   335    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] reply error [txn: 4f0e104f], Scan [/Table/58/1/233865057230618625/0,/Table/58/1/233865058872590337), Scan [/Table/58/1/233866733048594433,/Table/58/1/233866852547166209), Scan [/Table/58/1/233867091829391361,/Table/58/1/233867213002047489), Scan [/Table/58/1/233868905105752065,/Table/58/1/233869025231437825), Scan [/Table/58/1/233869264913727489,/Table/58/1/233869384511586305), Scan [/Table/58/1/233870224099835905,/Table/58/1/233870343914782721), Scan [/Table/58/1/233871182574813185,/Table/58/1/233871302725304321), Scan [/Table/58/1/233871900814639105,/Table/58/1/233872020037173249), Scan [/Table/58/1/233872500288815105,/Table/58/1/233872620173230081), Scan [/Table/58/1/233873219753541633,/Table/58/1/233873339812577281), Scan [/Table/58/1/233873579537498113,/Table/58/1/233873698344501249), Scan [/Table/58/1/233874779602518017,/Table/58/1/233874900426653697), Scan [/Table/58/1/233875738869923841,/Table/58/1/233875858019778561), Scan [/Table/58/1/233876098224390145,/Table/58/1/233876217745309697), Scan [/Table/58/1/233876820088356865,/Table/58/1/233876940421955585), Scan [/Table/58/1/233877662931091457,/Table/58/1/233877784951848961), Scan [/Table/58/1/233878026081107969,/Table/58/1/233878147594321921), Scan [/Table/58/1/233879113140600833,/Table/58/1/233879234814836737), Scan [/Table/58/1/233880076476809217,/Table/58/1/233880195646390273), Scan [/Table/58/1/233880316310290433,/Table/58/1/233880436420739073), ... 3 skipped ..., Scan [/Table/58/1/233882733897023489,/Table/58/1/233882855701872641), Scan [/Table/58/1/233884066017542145,/Table/58/1/233884186508394497), Scan [/Table/58/1/233885516440207361,/Table/58/1/233885757016866817), Scan [/Table/58/1/233886361891733505,/Table/58/1/233886481956732929), Scan [/Table/58/1/233886601671671809,/Table/58/1/233886722436595713): r322 was not found
14:58:11.501496     .     5    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] evicting range descriptor on send error and backoff for re-lookup
14:58:11.501515     .    19    ... [client=127.0.0.1:44460,user=root,TableReader=58,n1] evicting cached range descriptor
@asubiotto asubiotto added this to the 1.1 milestone Sep 6, 2017
@petermattis
Copy link
Collaborator

I'll take a look.

@petermattis petermattis self-assigned this Sep 6, 2017
@petermattis
Copy link
Collaborator

I poked around the cluster and didn't see anything wrong. Coming back to this 30min later and it looks like the query finished.

@asubiotto
Copy link
Contributor Author

Thanks for taking a look. It was running since ~5pm yesterday. Is it possible that something you did to debug unwedged it? Will try reproducing.

@asubiotto
Copy link
Contributor Author

Query seems to be finishing properly now. Closing.

@petermattis
Copy link
Collaborator

How long do you expect this query to take? How much data is it scanning/sorting?

@asubiotto
Copy link
Contributor Author

It's sorting 25GB of data. This is the query I'm investigating in #15332 so haven't seen it run properly to completion yet (the "finishing properly" in the previous comment should just be "finishing") but the usual running time for this query is ~17min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants