Skip to content

Commit

Permalink
Merge pull request #29 from trishankatdatadog/trishankatdatadog/doubl…
Browse files Browse the repository at this point in the history
…ing-targets

Talk about when to double number of bins
  • Loading branch information
trishankatdatadog authored Oct 17, 2019
2 parents 0c20efe + 7826d7a commit 24dea57
Showing 1 changed file with 16 additions and 3 deletions.
19 changes: 16 additions & 3 deletions pep-0458.txt
Original file line number Diff line number Diff line change
Expand Up @@ -451,12 +451,25 @@ Based on our findings as of the time of updating it for implementation
(Oct 7 2019), PyPI SHOULD split all targets in the *bins* role by delegating
them to 16,384 *bin-n* roles, each of which would sign for PyPI targets whose
hashes fall into that bin (see Figure 2). It was found__
that this number of bins would result in a 12-17% metadata overhead for
returning users, and a 148% overhead for new users who are installing
pip for the first time.
that this number of bins would result in a 12-17% metadata overhead
(relative to the average size of downloaded packages) for returning users
(assuming 256-byte target filenames for all packages), and a 148% overhead
for new users who are installing pip for the first time.

__ https://docs.google.com/spreadsheets/d/11_XkeHrf4GdhMYVqpYWsug6JNz5ZK6HvvmDZX0__K2I/edit?usp=sharing

This number of bins SHOULD double when the metadata overhead for returning
users exceeds 50%. Presently, this SHOULD happen when the number of targets
increase at least 4x from over 2M to nearly 9M, at which point the metadata
overhead for returning and new users would be around 49-54% (assuming 256-byte
target filenames for all packages) and 185% respectively, assuming that the
number of bins stay fixed. If the number of bins is increased, then the cost
for all users would effectively be the cost for new users, because their cost
would be dominated by the (once-in-a-while) cost of downloading the large
number of delegations in the `bins` metadata. If the cost for new users
should prove to be too much, then this subject SHOULD be revisited before
that happens.

It is possible to make TUF metadata more compact by representing it in a binary
format as opposed to the JSON text format. Nevertheless, a sufficiently large
number of projects and distributions will introduce scalability challenges at
Expand Down

0 comments on commit 24dea57

Please sign in to comment.