Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload-bin may upload the same index file twice #3762

Closed
AnnaShaleva opened this issue Dec 17, 2024 · 2 comments · Fixed by #3763
Closed

upload-bin may upload the same index file twice #3762

AnnaShaleva opened this issue Dec 17, 2024 · 2 comments · Fixed by #3763
Assignees
Labels
bug Something isn't working cli Command line interface I4 No visible changes S3 Minimally significant U2 Seriously planned
Milestone

Comments

@AnnaShaleva
Copy link
Member

AnnaShaleva commented Dec 17, 2024

Current Behavior

upload-bin has successfully completed its execution (uploaded the full N3 mainnet chain) and then was automatically restarted. After restart the log tells that there's duplicated index file:

2024-12-17 08:04:41.337	Successfully uploaded index file  48	
2024-12-17 08:04:41.338	Processing batch from 6272000 to 6399999	
2024-12-17 08:17:16.691	Successfully processed batch of blocks: from 6272000 to 6399999	
2024-12-17 08:17:17.628	Successfully uploaded index file  49	
2024-12-17 08:17:17.628	Processing batch from 6400000 to 6512280	
2024-12-17 08:27:59.983	Successfully processed batch of blocks: from 6400000 to 6512280	
2024-12-17 08:29:01.366	Chain block height: 6516072	
2024-12-17 08:29:05.335	failed to find objects: duplicated index file 22 found	
2024-12-17 08:30:06.750	Chain block height: 6516076	
2024-12-17 08:30:10.802	failed to find objects: duplicated index file 22 found	
2024-12-17 08:31:12.089	Chain block height: 6516080	
2024-12-17 08:31:15.770	failed to find objects: duplicated index file 22 found
...

However, no error happened on 22-nd index file uploading:

2024-12-16 22:37:40.798	Successfully uploaded index file  21	
2024-12-16 22:37:40.799	Processing batch from 2816000 to 2943999	
2024-12-16 22:58:47.451	Successfully processed batch of blocks: from 2816000 to 2943999	
2024-12-16 22:58:51.182	Successfully uploaded index file  22	
2024-12-16 22:58:51.182	Processing batch from 2944000 to 3071999	
2024-12-16 23:20:26.534	Successfully processed batch of blocks: from 2944000 to 3071999	
2024-12-16 23:20:27.464	Successfully uploaded index file  23	
2024-12-16 23:20:27.465	Processing batch from 3072000 to 3199999	
2024-12-16 23:42:55.080	Successfully processed batch of blocks: from 3072000 to 3199999	
2024-12-16 23:42:56.545	Successfully uploaded index file  24

Expected Behavior

There shouldn't be any duplicated index files uploaded during normal upload-bin work, because there was no script interruptions.

Possible Solution

  1. Manually check and compare the content of 22-nd index file. Check what's the difference, check the time of object creation, attach this information to the issue.
  2. Check the code and find how it's possible that two duplicating index files are uploaded. Fix the problem at the code level.
  3. Allow to continue execution of upload-bin even if multiple index files are found with the same index (but print a warning to logs).
  4. Manually remove duplicating index file after investigation is done and all relevant PRs are merged.

Steps to Reproduce

Check the logs of upload-bin running for N3 mainnet ({unit="neogo-block-uploader-mainnet.service"}), container ID is 3RCdP3ZubyKyo8qFeo7EJPryidTZaGCMdUjqFJaaEKBV.

@AnnaShaleva AnnaShaleva added bug Something isn't working U2 Seriously planned cli Command line interface S3 Minimally significant I4 No visible changes labels Dec 17, 2024
@roman-khimov roman-khimov changed the title upload-bin may upload the same index file twise upload-bin may upload the same index file twice Dec 17, 2024
@roman-khimov roman-khimov added this to the v0.108.0 milestone Dec 17, 2024
@AliceInHunterland
Copy link
Contributor

Indeed we do have a duplicated index file:

(base) ekaterinapavlova@MacBook-Air-4 neofs-node % ./bin/neofs-cli object head -r st4.storage.fs.neo.org:8080 -w /Users/ekaterinapavlova/Workplace/neo-go/panelwallet1.json --cid 3RCdP3ZubyKyo8qFeo7EJPryidTZaGCMdUjqFJaaEKBV --oid 73AsZJzibuRoyAvTgH1KN3NgrzbUAvuCiyEqjnYVLkYm
ID: 73AsZJzibuRoyAvTgH1KN3NgrzbUAvuCiyEqjnYVLkYm
CID: 3RCdP3ZubyKyo8qFeo7EJPryidTZaGCMdUjqFJaaEKBV
Owner: NVvY1FF67XJ2GTVhy9FqiZGC4jEQtvjmHt
CreatedAt: 28209
Size: 4096000
HomoHash: 050f1c5a364c0d1116d96c919c0a4ff563327a18067d9e91cd1f9ce804baae966879aedbd5a988b60bd56c17df9b564d60258aeb8b0a43e1baff9cd55cab2ae7
Checksum: 332cdf611958fb79fa7864390399baec39333327e2f63524e373c5f248884525
Type: REGULAR
Attributes:
  Index=22
  IndexSize=128000
  Timestamp=1734389927 (2024-12-17 01:58:47 +0300 MSK)
ID signature:
  public key: 02a4c3ab944c2a79c831c78d892ec71a5ea3e83f50327057fa4c4cbcd2f9c1b688
  signature: 04e1c1ec2550213bf6ee86359cc2ec7f46e975b6503d4339b602560968beb93b5cfb5c8d1ce07a83396dfcc124da0c903358b3ddf46dec53c60d5c50502fd7075c
(base) ekaterinapavlova@MacBook-Air-4 neofs-node % ./bin/neofs-cli object head -r st4.storage.fs.neo.org:8080 -w /Users/ekaterinapavlova/Workplace/neo-go/panelwallet1.json --cid 3RCdP3ZubyKyo8qFeo7EJPryidTZaGCMdUjqFJaaEKBV --oid 9zfEqjhqsm9xaKtCL6uYVB21mrVBNnYY7rnrAFWPYdAg
ID: 9zfEqjhqsm9xaKtCL6uYVB21mrVBNnYY7rnrAFWPYdAg
CID: 3RCdP3ZubyKyo8qFeo7EJPryidTZaGCMdUjqFJaaEKBV
Owner: NVvY1FF67XJ2GTVhy9FqiZGC4jEQtvjmHt
CreatedAt: 28209
Size: 4096000
HomoHash: 050f1c5a364c0d1116d96c919c0a4ff563327a18067d9e91cd1f9ce804baae966879aedbd5a988b60bd56c17df9b564d60258aeb8b0a43e1baff9cd55cab2ae7
Checksum: 332cdf611958fb79fa7864390399baec39333327e2f63524e373c5f248884525
Type: REGULAR
Attributes:
  Index=22
  IndexSize=128000
  Timestamp=1734389927 (2024-12-17 01:58:47 +0300 MSK)
ID signature:
  public key: 02ce1578db743f01a0968108c51b2fa690e140ffa644036b71c2f999914c751bbc
  signature: 046840124b89a434e7692700d24bd98baa073a4d7dc550ec406600444f5580e4b16fb6f94640729621f054a80702d507ea2b8452bce3f78731fed13bb8bb5440f4

As the Timestamp attribute is the same, the reason is that NeoFS didn't respond or return an error while successfully putting the initial object. That is why the uploader tried to put the same object back up until success was returned.

Looking at NeoFS logs where the first object was put at 2024-12-16 22:58:48.078 and the second one at 2024-12-16 22:58:50.854 - 2776 milliseconds between not closely but aligned with retry behaviour. Which proves the hypothesis about retry.

The error returned by NeoFS is unknown we should run our uploader in debug mode next time to catch such cases.

AliceInHunterland added a commit that referenced this issue Dec 17, 2024
As we are not afraid of duplicates this is not a critical error anymore.
BlockFetcher will take the first returned by search.

Close #3762

Signed-off-by: Ekaterina Pavlova <[email protected]>
@AnnaShaleva
Copy link
Member Author

NeoFS didn't respond or return an error

OK, in future we need to filter this error out in order to skip retry if it happens. But since we don't know what the error is, let's just ignore duplicating index files.

we should run our uploader in debug mode

Let's do this.

AliceInHunterland added a commit that referenced this issue Dec 17, 2024
As we are not afraid of duplicates this is not a critical error anymore.
BlockFetcher will take the first returned by search.

Close #3762

Signed-off-by: Ekaterina Pavlova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli Command line interface I4 No visible changes S3 Minimally significant U2 Seriously planned
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants