-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix embargo timeout in dandelion++ #9295
base: master
Are you sure you want to change the base?
Conversation
I should also mention this does mean in unlucky cases where a blackhole occurs after just one hop, could result in longer delays than with a poisson distribution (where the overwhelming number of values are around 39s). |
This does bring up an interesting point, using the exponential distribution could make it easier to estimate how many hops the transaction did before it reached the black hole. If the attacker keeps track of the time it receives a tx, and the time it takes for the tx to be broadcasted, then it could calculate the probability of that happening for different amounts of hops. For example if the tx gets blackholed after one hop then the average time for that tx to get diffused is 75s whereas a tx that makes it 9 hops will have an average time of 8.3s, so if the tx takes 300s to get diffused then we can say that is much more likely to happen with 1 hops than 9. The paper seemingly doesn't mention this. FalloutThe problem with using the poisson distribution is that it is not memoryless, so nodes earlier in the stem phase are slightly more likely to fluff first under a black hole attack. How much more likely? I don't know exactly but just off the top of my head I can't imagine it being significant. Fluff TimersI feel 1 second is too low, although the previous was 5 seconds it was 2.5 for outgoing connections:
|
I'm wondering whether my parameters are too high - we previously lowered the parameters so that the diffusion came quicker. Should I do the same here? The worst case scenario is more likely and longer than the existing poisson method.
This doesn't reveal the origin IP address though. So I think it's still better to go with the paper here.
Poisson distribution is also considered memoryless - but it may have different properties making it less suitable.
Revert back to 5 seconds? I didn't want to overlap with the blackhole timeout. |
In the past we had a lot of sybil nodes that were intentionally blackholing transactions, a significantly longer average time to diffusion would be bad for user experience. I don't know if these sybil nodes are still there. |
cbff1b8
to
8d86d61
Compare
I think so, especially if we have had problems with black holes in the past. If were to choose a time for which we would want a chosen percentage of txs to be fluffed under, if they were to be immediately black holed, we could find the highest For example if we were to say we want 90% of txs to be fluffed under 60s with I think we could get away with With
True, just wanted to mention.
The time between events in a Poisson process is memoryless, it can be modeled with the exponential distribution, but I don't think the Poisson distribution itself is memoryless.
I think so, I don't think overlapping is too big a concern due to how variable the output of the exponential distribution is. |
8d86d61
to
b6039f9
Compare
New force push has the parameters recommended by @Boog900 . I'm a little worried the new timeout may not be aggressive enough - but I'm leaning towards it being acceptable. |
We could go lower but 8 should be fine, more numbers: Txs fluffed under 180s when immediately black holed:
This means if an attacker managed to black hole every transaction immediately with |
b6039f9
to
2b16a5b
Compare
I added a 180s embargo max to the logic (as per @Boog900 suggestion). |
In
Why is there this restriction to broadcast in only integer second intervals? When you take the floor of an exponential distribution, you get a geometric distribution (see here). The geometric distribution is memoryless like the exponential distribution, but the substitution might affect the privacy properties of Dandelion++. I have been looking at whether the fluff-phase timer should also be changed from Poisson to exponential. The Dandelion++ paper doesn't explicitly say that the fluff timers should be exponential, but it strongly hints that way IMHO. Algorithm 5 "Dandelion++ Spreading at node v" in Fanti et al. (2018) ends with @Boog900 brought up the possibility that the total RAM load on nodes would increase if the fluff timer was switched from Poisson to exponential. The Poisson and exponential have the same mean (when you specify the mean to be the same), but the exponential distribution has much higher variance with our parameters. That means that there may be a higher probability of occasionally having a much higher number of transactions loaded in the node's per-connection fluff queues. I wrote a simulation to test this hypothesis. In the end, the total RAM load is not much different between the Poisson and exponential timers.
When the timer is Poisson, maximum simultaneous number of aggregate txs in all of the peer queues is an average of 843 across the 100 simulations. When the timer is exponential, it is 851. This is not a big difference IMHO. When I set the number of transactions to be lower, e.g. 10,000, the averages for the Poisson and exponential timers are farther apart. This may mean that the two numbers may be even closer to each other when the simulated time period is extended further. The R simulation code is below. # install.packages(c("data.table", "zoo", "parallelly" "future", "future.apply"))
# Install these packages if not already installed
library(data.table)
library(zoo)
# timer.method <- "set_when_previous_timer_expired"
timer.method <- "set_when_new_tx"
n.tx <- 100000
n.peers <- 100
n.monte.carlo.sims <- 100
do.multithreaded <- FALSE
# Multithread will use more RAM
if (do.multithreaded) {
n.workers <- floor(parallelly::availableCores()/2)
future::plan(future::multicore, workers = n.workers)
} else {
future::plan(future::sequential)
}
random.txs <- function(n) { rexp(n, 1/3) }
# Distribution of arrival times between transactions is exponential with
# rate paramaeter 1/3. This is 60^2*24/3 = 28800 transactions per day
stopifnot(timer.method %in% c("set_when_previous_timer_expired", "set_when_new_tx"))
if (timer.method == "set_when_previous_timer_expired") {
set.timers <- function(tx.arrival, random.flush) {
y <- random.flush(length(tx.arrival) * 2)
while ( sum(y) <= max(tx.arrival) ) {
y <- c(y, random.flush(length(tx.arrival)))
}
# In case the time period of the flush timers
# do not completely cover the time of the tx arrivals,
# add more flush timers.
y <- cumsum(y)
y <- y[ y <= max(tx.arrival) ]
y
}
}
if (timer.method == "set_when_new_tx") {
set.timers <- function(tx.arrival, random.flush) {
y <- vector("numeric", length(tx.arrival) + 1)
j <- 1
while (j <= length(tx.arrival)) {
y[j] <- tx.arrival[j] + random.flush(1)
# Add a random flush timer to the tx arrival time. The flush timer may
# expire before any new txs arrive or may expire after a few more txs.
# We need to figure out which tx arrives after the timer expires so we
# can set the next timer.
shortcut.length <- 100
# The shortcut.length is the number of elements of tx.arrival to evaluate
# to find how many transactions will be broadcast in the queue before
# the flush timer expires. It is shorter than the total length of tx.arrival
# to speed up computation.
while (TRUE) {
increment <- which(tx.arrival[ j:min(c(j + shortcut.length, length(tx.arrival))) ] > y[j])[1]
if (! is.na(increment)) { break }
if (j + shortcut.length < length(tx.arrival)) {
shortcut.length <- shortcut.length + 1000
# When which() does not have a TRUE element, it will return NA.
# If the shortcut did not search to the end of the tx.arrival
# vector, then add to shortcut.length and try again
} else {
break
}
}
j <- j - 1 + increment
if (is.na(j)) { break }
}
y <- y[y != 0]
y
}
}
set.seed(314)
final.results <- list()
for (timer.distribution in c("exp", "pois")) {
stopifnot(timer.distribution %in% c("exp", "pois"))
if (timer.distribution == "exp") {
random.flush <- function(n) { rexp(n, 1/5) }
}
if (timer.distribution == "pois") {
random.flush <- function(n) { rpois(n, 20)/4 }
}
max.results <- vector("numeric", n.monte.carlo.sims)
for (k in 1:n.monte.carlo.sims) {
tx.arrival <- cumsum(random.txs(n.tx))
peer.timers <- future.apply::future_replicate(n.peers, {
peer.queues <- set.timers(tx.arrival, random.flush)
peer.queues <- setdiff(peer.queues, tx.arrival)
# Cannot have tx arrive and flush at same time. This is rare because
# the tx arrival is exp-distributed (i.e. continuous). This
# could occur if the flush timer is zero, which would occur rarely
# with the Poisson distribution. setdiff() will also remove any
# duplicates in peer.queues
# This error will occur on c() below if any elements are at
# the same time:
# Error in rbind.zoo(...) : indexes overlap
tx.added <- zoo(rep(1, length(tx.arrival)), tx.arrival)
tx.flushed <- zoo(rep(0, length(peer.queues)), peer.queues)
# Create zoo time objects
all.events <- sort(c(tx.added, tx.flushed))
peer.queues.filled <- data.table(all.events = all.events)[,
all.events := cumsum(all.events), .(cumsum(coredata(all.events) == 0))]$all.events
# https://stackoverflow.com/questions/65335978/how-to-perform-cumsum-with-reset-at-0-in-r
# When we encounter a "1" from tx.added, add it to the running total.
# When we encounter at "0" from tx.flushed, reset the counter to zero.
peer.queues.filled <- data.table(master = index(peer.queues.filled),
x = coredata(peer.queues.filled))
data.table::setkey(peer.queues.filled, master)
list(peer.queues = peer.queues, peer.queues.filled = peer.queues.filled)
},
simplify = FALSE,
future.globals = c("set.timers", "tx.arrival", "random.flush"),
future.packages = c("data.table", "zoo"))
peer.queues <- lapply(peer.timers, FUN = function(x) {x$peer.queues})
peer.queues.filled <- lapply(peer.timers, FUN = function(x) {x$peer.queues.filled})
rm(peer.timers)
peer.queues.all <- data.table(master = sort(unique(c(tx.arrival, unlist(peer.queues)))))
# Create a master table of the time of all events.
# This table will be merged with each connection's running totals.
rm(tx.arrival, peer.queues)
data.table::setkey(peer.queues.all, master)
peer.queues.all <- future.apply::future_lapply(peer.queues.filled,
FUN = function(y) {
y <- merge(peer.queues.all, y, by = "master", all = TRUE)
y[, master := NULL]
y[, x := data.table::nafill(x, "locf")]
# "locf" means "last observation carried forward".
y[, x := data.table::nafill(x, fill = 0)]
# The observations in the beginning will still be NA. Fill with 0.
y
},
future.globals = c("peer.queues.all"),
future.packages = c("data.table")
)
peer.queues.all <- do.call(cbind, peer.queues.all)
results <- rowSums(peer.queues.all)
# The sum of each row is the aggregate number of txs in all
# queues at the time of each event
max.results[k] <- max(results)
rm(peer.queues.all, peer.queues.filled, results)
gc()
cat(base::date(), ", Flush timer distribution: ", timer.distribution, ", Iteration: ", k, "\n", sep = "")
}
final.results[[timer.distribution]] <- max.results
}
summary(final.results$exp)
summary(final.results$pois)
t.test(final.results$exp, final.results$pois)
ReferencesFanti, G., Venkatakrishnan, S. B., Bakshi, S., Denby, B., Bhargava, S., & Miller, A., Viswanath P (2018). "Dandelion++: Lightweight cryptocurrency networking with formal anonymity guarantees." Fanti G & Viswanath P (2017) "Anonymity Properties of the Bitcoin P2P Network" |
This refers to the blackhole timeout only, the fluff timers are different:
|
Using the current pseudo-geometric distribution increases the percent of txs we would expect to not make it all the way through the stem stage before an embargo timer firing. If we assume a tx takes 0.175s to pass through a node (the number we already used when calculating the embargo rate) then with the current code 2% of txs will be fluffed before reaching the first hop whereas the expected value should be 0.37% If my maths is correct I expect that the number of txs not making it all the way to the 8th node to be 17.5% whereas we should be targeting |
Maybe this PR can make the changes suggested here: https://libera.monerologs.net/monero-research-lab/20240904#c422232-c422253 and prepare to merge. |
Summary
@Boog900 pointed out that the embargo duration in Dandelion++ was incorrect - it was using poisson distribution instead of exponential distribution. I don't recall why I used poisson distribution, other than it takes an "average" parameter, which I took to mean the average embargo timeout. This is not the same distribution as meant in the Dandelion++ paper.
The primary difference is that the average embargo timeout will drop from ~39s to ~7s. There shouldn't be any loss in privacy as a result of this, because the propagation time to 10 nodes is roughly 1.75s.
Additionally @Boog900 discovered that the paper stated
log
but almost certainly meantln
(which helps bring down the average fluff time too).Fluff probability
Is once again 10%, which should result in longer stem phases. Since the distribution is now much shorter for the embargo timeout, this shouldn't result in longer flood times.
Fallout
I'm not aware of any fingerprinting that can be done on the existing implementation. The randomized duration should still make it difficult to determine which node in the stem-set fluffed first. Perhaps @Boog900 can share some thoughts on this topic.
Fluff Timers
I reduced the average poisson distribution for fluff delay from 5s to 1s. This is an arbitrary change, but was made due to the new reality of much shorter embargo timeouts. @Boog900 thoughts on this portion of the code? Dandelion++ doesn't really specify a randomized flush interval for fluff mode, this comes from inspecting the Bitcoin code.
Poisson Distribution
Poisson is still being used in a few places, but I am not aware of any issues right now. I will dig deeper to see if these need changing:
I'm not aware of these timers violating the Dandelion++ paper (again read above about fluff timers).
Future
I expect some feedback from @Boog900 and possibly others as to the additional changes that need to be made.