Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing named vectors, ignoring order #473

Closed
wch opened this issue May 13, 2016 · 4 comments
Closed

Comparing named vectors, ignoring order #473

wch opened this issue May 13, 2016 · 4 comments

Comments

@wch
Copy link
Member

wch commented May 13, 2016

It would be useful to have an expectation that compares two named vectors (or environments) and returns TRUE if all the named items are the same, regardless of order. Currently, to do this you need to sort the items by name before comparing. For example:

# Return TRUE if any items in vector x are unnamed; FALSE otherwise.
any_unnamed <- function(x) {
  # Zero-length vector
  if (length(x) == 0) return(FALSE)

  # List with no name attribute
  if (is.null(names(x))) return(TRUE)

  # List with name attribute; check for any ""
  any(!nzchar(names(x)))
}

sort_by_name <- function(x) {
  if (any_unnamed(x)) 
    stop("Can't sort by name because there are unnamed items")

  if (any(duplicated(names(x))))
    stop("Can't sort by name because there are duplicate names")

  x[sort(names(x))]
}


a <- list(x=1, y=2)
b <- list(y=2, x=1)

expect_identical(sort_by_name(a), sort_by_name(b))

It would be nice to be able to do something like this:

expect_identical_contents(a, b)

(The function name could be better though.)

@krlmlr
Copy link
Member

krlmlr commented Jun 25, 2016

In many cases, expect_identical(a[names(b)], b) will do the job. This won't detect items in a but not in b. A simpler sorting routine that also works for unnamed vectors would be

sort_by_name <- function(x) x[order(names2(x))]

with a suitable implementation of names2().

@fangly
Copy link

fangly commented Jul 19, 2016

I had a similar problem for lists containing unnamed items and solved it using sets:

expect_setequal <- function(actual, expected) {
   # Test that the sets from two objects are the same
   differences <- setdiff(actual,expected)
   sets_equal  <- length(differences) == 0
   message     <- ifelse( sets_equal, "", sprintf("Sets not equal. First difference was: %s", differences[[1]]) )
   expect(sets_equal, message)
   invisible(actual)
}

Note that an item can be contained multiple times in each list without affecting the equality. Hence, this solution is not the same as sorting the lists first, and then comparing them.

@wch
Copy link
Member Author

wch commented Aug 2, 2016

Here's an implementation that compares contents of vectors (atomic vectors and lists). If the vector is named, it groups the vectors by name and then checks that each name has the same items across the two vectors. So for example, contents_identical(c(a=1, a=2, b=3), c(a=2, a=1, b=3)) returns TRUE, but contents_identical(c(a=1, a=2, b=3), c(a=1, b=2, a=3)) returns FALSE.

contents_identical <- function(a, b) {
  # Convert to named vectors - needed for sorting later.
  if (is.null(names(a))) {
    names(a) <- rep("", length(a))
  }
  if (is.null(names(b))) {
    names(b) <- rep("", length(b))
  }

  # Fast path for atomic vectors
  if (is.atomic(a) && is.atomic(b)) {
    # Sort first by names, then contents. This is so that the comparison can
    # handle duplicated names.
    a <- a[order(names(a), a)]
    b <- b[order(names(b), b)]

    return(identical(a, b))
  }

  # If we get here, we're on the slower path for lists

  # Check if names are the same. If there are duplicated names, make sure
  # there's the same number of duplicates of each.
  if (!identical(sort(names(a)), sort(names(b)))) {
    return(FALSE)
  }

  # Group each vector by names
  by_names_a <- tapply(a, names(a), function(x) x)
  by_names_b <- tapply(b, names(b), function(x) x)

  # Compare each group
  for (i in seq_along(by_names_a)) {
    subset_a <- by_names_a[[i]]
    subset_b <- by_names_b[[i]]

    unique_subset_a <- unique(subset_a)
    idx_a <- sort(match(subset_a, unique_subset_a))
    idx_b <- sort(match(subset_b, unique_subset_a))
    if (!identical(idx_a, idx_b)) {
      return(FALSE)
    }
  }

  TRUE
}

There's probably room for improvement in this code.

Things to note:

  • There's a fast path for atomic vectors, and a slow path for lists.
  • Lists need the slow path because they are not sortable in general, so I used unique() and match() to check that each group is the same.

Some tests:

# ============== Atomic vectors ==============
# Basic named vectors
expect_true(contents_identical(c(a=1, b=2), c(a=1, b=2)))
expect_true(contents_identical(c(a=1, b=2), c(b=2, a=1)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, c=2)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, b=99)))
expect_false(contents_identical(c(a=1, b=2), c(b=2, a=99)))

# Repeated names
expect_true(contents_identical(c(a=1, b=2, b=2, b=3), c(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=2)))

# Same names, but different repetitions
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=3)))

# Some different names
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, c=2, c=3)))
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, c=3)))

# No names
expect_true(contents_identical(c(1,2,3), c(1,2,3)))
expect_true(contents_identical(c(1,3,2), c(1,2,3)))

# Some names
expect_true(contents_identical(c(a=1,2,3), c(a=1,2,3)))
expect_true(contents_identical(c(a=1,2,3), c(2,3,a=1)))
expect_false(contents_identical(c(a=1,3,2), c(b=1,2,3)))


# ============== Same tests as above, but with lists ==============
# Basic named vectors
expect_true(contents_identical(list(a=1, b=2), list(a=1, b=2)))
expect_true(contents_identical(list(a=1, b=2), list(b=2, a=1)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, c=2)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, b=99)))
expect_false(contents_identical(list(a=1, b=2), list(b=2, a=99)))
# Repeated names
expect_true(contents_identical(list(a=1, b=2, b=2, b=3), list(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=2)))
# Same names, but different repetitions
expect_false(contents_identical(list(a=1, b=3, b=2), list(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=3)))
# No names
expect_true(contents_identical(list(1,2,3), list(1,2,3)))
expect_true(contents_identical(list(1,3,2), list(1,2,3)))
# Some names
expect_true(contents_identical(list(a=1,2,3), list(a=1,2,3)))
expect_true(contents_identical(list(a=1,2,3), list(2,3,a=1)))
expect_false(contents_identical(list(a=1,3,2), list(b=1,2,3)))

# ============== Nested lists ==============
expect_true(contents_identical(list(a=list(1,2),3,4), list(a=list(1,2),3,4)))
expect_true(contents_identical(list(a=list(1,2),3,4), list(3,4,a=list(1,2))))
# Order-insensitivity does not apply to inner items
expect_false(contents_identical(list(a=list(1,2),3,4), list(a=list(2,1),3,4)))

# ============== Comparing across types ==============
# This compares a numeric vector to a list. Maybe this shouldn't be TRUE?
expect_true(contents_identical(c(a=1, b=2), list(a=1, b=2)))

@hadley
Copy link
Member

hadley commented Dec 15, 2016

This feels out of scope for testthat because doing it well is going to require quite a bit of code. Maybe it could go in a helper package? Maybe with #528?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants