Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dependencies for index/add invocations #363

Merged
merged 6 commits into from
May 7, 2024

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented May 1, 2024

Upgrades to @web3-storage/upload-api v13 and adds dependencies for index/add invocations.

Adds multihashes to the Elastic IPFS SQS multihashes queue and adds block index data to Dynamo blocks-cars-position table.

Note: I did not also generate dudewhere/satnav indexes. Hoverboard reads directly from dynamo and Freeway uses location claims (generated from dynamo). I don't think we need them anymore.

Copy link

seed-deploy bot commented May 2, 2024

View stack outputs

@@ -66,6 +66,7 @@ export function UploadApiStack({ stack, app }) {
customDomain,
defaults: {
function: {
timeout: '30 seconds',
Copy link
Member Author

@alanshaw alanshaw May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing indexer lambda (the thing that reads CAR files and writes to dynamodb) seems to max out at ~14s, setting to 30s to be safe.

Screenshot 2024-05-02 at 15 13 10

@alanshaw alanshaw marked this pull request as ready for review May 2, 2024 14:15
Copy link
Contributor

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided some feedback inline mostly around naming suggestions and alike. I think it looks great regardless though! Thanks for making it happen.

/** @param {import('@web3-storage/upload-api').ShardedDAGIndex} index */
async publish (index) {
/** @type {import('multiformats').MultihashDigest[]} */
const items = [...index.shards.keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add comment explaining this, I'm very confused why would keys be digests and then why do we add more digests later. I'm sure it will become more clear as I reed more code, yet comment here would make it easier for a new reader to follow the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found my answer here, yet I'm not sure we should advertise shard multihashes unless they were explicitly listed under slices. Rational is here is if user didn't index it they don't want it to be advertised, also we would not find those in content claims and serve over bitswap, wolud we ?

items.push(digest)
records.push({
digest,
location: new URL(`https://w3s.link/blob/${base58btc.encode(digest.bytes)}`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we use raw CIDs instead ? Reduced API surface would be better IMO

}
}

export class BlocksCarsPositionStore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would call this BlockIndexStore or BlockRangeStore. Either way seems like dropping Cars from name would be a good idea.

for (const res of results) {
if (res.error) return res
}
return ok({})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should put records first and only push items to queue if that was a success. Otherwise we may end up with having blocks advertised on IPNI but have no ability to serve those because we don't have index for them.

}
})

export class MultihashesQueue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest calling this something like BlockAdventismentPublisher or something along those lines, I think it is more important to describe the role/job component has in the system than it is to describe mechanics of how it does it. In this instance it is component for announcing blocks on the network fact that it uses queues and multihashes are implementation details.

try {
const items = [...digests]
while (true) {
const batch = items.splice(0, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10 at a time ? Should this be a part of the config with 10 as an implicit default ? Still worth adding a comment why the default of 10 was chosen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the max number of messages you can send to an SQS queue in a batch.

const batch = items.splice(0, 10)
if (!batch.length) break

const dedupedBatch = new Set(batch.map(s => base58btc.encode(s.bytes)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like deduping on line 81 would make more sense as that would eliminate dupes if they are 10 items apart also.

useIPNIService(multihashesQueue, carsBlocksPositionStore),
{
/** @param {import('multiformats').MultihashDigest} digest */
async query (digest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think calling this has or contains would be far less confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did consider this. However I don't believe there's an explicit IPNI API for that, and I thought at some point we might want query to actually return the result of querying IPNI for a multihash.

Should have changed in w3up types before now...

}

return messages
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

@seed-deploy seed-deploy bot temporarily deployed to pr363 May 7, 2024 09:49 Inactive
@alanshaw alanshaw merged commit c5712d9 into main May 7, 2024
3 checks passed
@alanshaw alanshaw deleted the feat/add-index-add-deps branch May 7, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants