-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing named vectors, ignoring order #473
Comments
In many cases, sort_by_name <- function(x) x[order(names2(x))] with a suitable implementation of |
I had a similar problem for lists containing unnamed items and solved it using sets:
Note that an item can be contained multiple times in each list without affecting the equality. Hence, this solution is not the same as sorting the lists first, and then comparing them. |
Here's an implementation that compares contents of vectors (atomic vectors and lists). If the vector is named, it groups the vectors by name and then checks that each name has the same items across the two vectors. So for example, contents_identical <- function(a, b) {
# Convert to named vectors - needed for sorting later.
if (is.null(names(a))) {
names(a) <- rep("", length(a))
}
if (is.null(names(b))) {
names(b) <- rep("", length(b))
}
# Fast path for atomic vectors
if (is.atomic(a) && is.atomic(b)) {
# Sort first by names, then contents. This is so that the comparison can
# handle duplicated names.
a <- a[order(names(a), a)]
b <- b[order(names(b), b)]
return(identical(a, b))
}
# If we get here, we're on the slower path for lists
# Check if names are the same. If there are duplicated names, make sure
# there's the same number of duplicates of each.
if (!identical(sort(names(a)), sort(names(b)))) {
return(FALSE)
}
# Group each vector by names
by_names_a <- tapply(a, names(a), function(x) x)
by_names_b <- tapply(b, names(b), function(x) x)
# Compare each group
for (i in seq_along(by_names_a)) {
subset_a <- by_names_a[[i]]
subset_b <- by_names_b[[i]]
unique_subset_a <- unique(subset_a)
idx_a <- sort(match(subset_a, unique_subset_a))
idx_b <- sort(match(subset_b, unique_subset_a))
if (!identical(idx_a, idx_b)) {
return(FALSE)
}
}
TRUE
} There's probably room for improvement in this code. Things to note:
Some tests: # ============== Atomic vectors ==============
# Basic named vectors
expect_true(contents_identical(c(a=1, b=2), c(a=1, b=2)))
expect_true(contents_identical(c(a=1, b=2), c(b=2, a=1)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, c=2)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, b=99)))
expect_false(contents_identical(c(a=1, b=2), c(b=2, a=99)))
# Repeated names
expect_true(contents_identical(c(a=1, b=2, b=2, b=3), c(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=2)))
# Same names, but different repetitions
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=3)))
# Some different names
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, c=2, c=3)))
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, c=3)))
# No names
expect_true(contents_identical(c(1,2,3), c(1,2,3)))
expect_true(contents_identical(c(1,3,2), c(1,2,3)))
# Some names
expect_true(contents_identical(c(a=1,2,3), c(a=1,2,3)))
expect_true(contents_identical(c(a=1,2,3), c(2,3,a=1)))
expect_false(contents_identical(c(a=1,3,2), c(b=1,2,3)))
# ============== Same tests as above, but with lists ==============
# Basic named vectors
expect_true(contents_identical(list(a=1, b=2), list(a=1, b=2)))
expect_true(contents_identical(list(a=1, b=2), list(b=2, a=1)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, c=2)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, b=99)))
expect_false(contents_identical(list(a=1, b=2), list(b=2, a=99)))
# Repeated names
expect_true(contents_identical(list(a=1, b=2, b=2, b=3), list(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=2)))
# Same names, but different repetitions
expect_false(contents_identical(list(a=1, b=3, b=2), list(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=3)))
# No names
expect_true(contents_identical(list(1,2,3), list(1,2,3)))
expect_true(contents_identical(list(1,3,2), list(1,2,3)))
# Some names
expect_true(contents_identical(list(a=1,2,3), list(a=1,2,3)))
expect_true(contents_identical(list(a=1,2,3), list(2,3,a=1)))
expect_false(contents_identical(list(a=1,3,2), list(b=1,2,3)))
# ============== Nested lists ==============
expect_true(contents_identical(list(a=list(1,2),3,4), list(a=list(1,2),3,4)))
expect_true(contents_identical(list(a=list(1,2),3,4), list(3,4,a=list(1,2))))
# Order-insensitivity does not apply to inner items
expect_false(contents_identical(list(a=list(1,2),3,4), list(a=list(2,1),3,4)))
# ============== Comparing across types ==============
# This compares a numeric vector to a list. Maybe this shouldn't be TRUE?
expect_true(contents_identical(c(a=1, b=2), list(a=1, b=2))) |
This feels out of scope for testthat because doing it well is going to require quite a bit of code. Maybe it could go in a helper package? Maybe with #528? |
It would be useful to have an expectation that compares two named vectors (or environments) and returns TRUE if all the named items are the same, regardless of order. Currently, to do this you need to sort the items by name before comparing. For example:
It would be nice to be able to do something like this:
(The function name could be better though.)
The text was updated successfully, but these errors were encountered: