Comparing named vectors, ignoring order #473

wch · 2016-05-13T16:44:39Z

It would be useful to have an expectation that compares two named vectors (or environments) and returns TRUE if all the named items are the same, regardless of order. Currently, to do this you need to sort the items by name before comparing. For example:

# Return TRUE if any items in vector x are unnamed; FALSE otherwise.
any_unnamed <- function(x) {
  # Zero-length vector
  if (length(x) == 0) return(FALSE)

  # List with no name attribute
  if (is.null(names(x))) return(TRUE)

  # List with name attribute; check for any ""
  any(!nzchar(names(x)))
}

sort_by_name <- function(x) {
  if (any_unnamed(x)) 
    stop("Can't sort by name because there are unnamed items")

  if (any(duplicated(names(x))))
    stop("Can't sort by name because there are duplicate names")

  x[sort(names(x))]
}


a <- list(x=1, y=2)
b <- list(y=2, x=1)

expect_identical(sort_by_name(a), sort_by_name(b))

It would be nice to be able to do something like this:

expect_identical_contents(a, b)

(The function name could be better though.)

krlmlr · 2016-06-25T01:18:26Z

In many cases, expect_identical(a[names(b)], b) will do the job. This won't detect items in a but not in b. A simpler sorting routine that also works for unnamed vectors would be

sort_by_name <- function(x) x[order(names2(x))]

with a suitable implementation of names2().

fangly · 2016-07-19T06:51:04Z

I had a similar problem for lists containing unnamed items and solved it using sets:

expect_setequal <- function(actual, expected) {
   # Test that the sets from two objects are the same
   differences <- setdiff(actual,expected)
   sets_equal  <- length(differences) == 0
   message     <- ifelse( sets_equal, "", sprintf("Sets not equal. First difference was: %s", differences[[1]]) )
   expect(sets_equal, message)
   invisible(actual)
}

Note that an item can be contained multiple times in each list without affecting the equality. Hence, this solution is not the same as sorting the lists first, and then comparing them.

wch · 2016-08-02T18:41:39Z

Here's an implementation that compares contents of vectors (atomic vectors and lists). If the vector is named, it groups the vectors by name and then checks that each name has the same items across the two vectors. So for example, contents_identical(c(a=1, a=2, b=3), c(a=2, a=1, b=3)) returns TRUE, but contents_identical(c(a=1, a=2, b=3), c(a=1, b=2, a=3)) returns FALSE.

contents_identical <- function(a, b) {
  # Convert to named vectors - needed for sorting later.
  if (is.null(names(a))) {
    names(a) <- rep("", length(a))
  }
  if (is.null(names(b))) {
    names(b) <- rep("", length(b))
  }

  # Fast path for atomic vectors
  if (is.atomic(a) && is.atomic(b)) {
    # Sort first by names, then contents. This is so that the comparison can
    # handle duplicated names.
    a <- a[order(names(a), a)]
    b <- b[order(names(b), b)]

    return(identical(a, b))
  }

  # If we get here, we're on the slower path for lists

  # Check if names are the same. If there are duplicated names, make sure
  # there's the same number of duplicates of each.
  if (!identical(sort(names(a)), sort(names(b)))) {
    return(FALSE)
  }

  # Group each vector by names
  by_names_a <- tapply(a, names(a), function(x) x)
  by_names_b <- tapply(b, names(b), function(x) x)

  # Compare each group
  for (i in seq_along(by_names_a)) {
    subset_a <- by_names_a[[i]]
    subset_b <- by_names_b[[i]]

    unique_subset_a <- unique(subset_a)
    idx_a <- sort(match(subset_a, unique_subset_a))
    idx_b <- sort(match(subset_b, unique_subset_a))
    if (!identical(idx_a, idx_b)) {
      return(FALSE)
    }
  }

  TRUE
}

There's probably room for improvement in this code.

Things to note:

There's a fast path for atomic vectors, and a slow path for lists.
Lists need the slow path because they are not sortable in general, so I used unique() and match() to check that each group is the same.

Some tests:

# ============== Atomic vectors ==============
# Basic named vectors
expect_true(contents_identical(c(a=1, b=2), c(a=1, b=2)))
expect_true(contents_identical(c(a=1, b=2), c(b=2, a=1)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, c=2)))
expect_false(contents_identical(c(a=1, b=2), c(a=1, b=99)))
expect_false(contents_identical(c(a=1, b=2), c(b=2, a=99)))

# Repeated names
expect_true(contents_identical(c(a=1, b=2, b=2, b=3), c(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=2)))

# Same names, but different repetitions
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3)))
expect_false(contents_identical(c(a=1, b=3, b=2, b=2), c(a=1, b=2, b=3, b=3)))

# Some different names
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, c=2, c=3)))
expect_false(contents_identical(c(a=1, b=3, b=2), c(a=1, b=2, c=3)))

# No names
expect_true(contents_identical(c(1,2,3), c(1,2,3)))
expect_true(contents_identical(c(1,3,2), c(1,2,3)))

# Some names
expect_true(contents_identical(c(a=1,2,3), c(a=1,2,3)))
expect_true(contents_identical(c(a=1,2,3), c(2,3,a=1)))
expect_false(contents_identical(c(a=1,3,2), c(b=1,2,3)))


# ============== Same tests as above, but with lists ==============
# Basic named vectors
expect_true(contents_identical(list(a=1, b=2), list(a=1, b=2)))
expect_true(contents_identical(list(a=1, b=2), list(b=2, a=1)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, c=2)))
expect_false(contents_identical(list(a=1, b=2), list(a=1, b=99)))
expect_false(contents_identical(list(a=1, b=2), list(b=2, a=99)))
# Repeated names
expect_true(contents_identical(list(a=1, b=2, b=2, b=3), list(a=1, b=2, b=2, b=3)))
expect_true(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=2)))
# Same names, but different repetitions
expect_false(contents_identical(list(a=1, b=3, b=2), list(a=1, b=2, b=3, b=2)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3)))
expect_false(contents_identical(list(a=1, b=3, b=2, b=2), list(a=1, b=2, b=3, b=3)))
# No names
expect_true(contents_identical(list(1,2,3), list(1,2,3)))
expect_true(contents_identical(list(1,3,2), list(1,2,3)))
# Some names
expect_true(contents_identical(list(a=1,2,3), list(a=1,2,3)))
expect_true(contents_identical(list(a=1,2,3), list(2,3,a=1)))
expect_false(contents_identical(list(a=1,3,2), list(b=1,2,3)))

# ============== Nested lists ==============
expect_true(contents_identical(list(a=list(1,2),3,4), list(a=list(1,2),3,4)))
expect_true(contents_identical(list(a=list(1,2),3,4), list(3,4,a=list(1,2))))
# Order-insensitivity does not apply to inner items
expect_false(contents_identical(list(a=list(1,2),3,4), list(a=list(2,1),3,4)))

# ============== Comparing across types ==============
# This compares a numeric vector to a list. Maybe this shouldn't be TRUE?
expect_true(contents_identical(c(a=1, b=2), list(a=1, b=2)))

hadley · 2016-12-15T04:16:46Z

This feels out of scope for testthat because doing it well is going to require quite a bit of code. Maybe it could go in a helper package? Maybe with #528?

hadley closed this as completed Dec 15, 2016

wch mentioned this issue Apr 11, 2019

Comparing named vectors, ignoring order (again) #863

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing named vectors, ignoring order #473

Comparing named vectors, ignoring order #473

wch commented May 13, 2016

krlmlr commented Jun 25, 2016 •

edited

Loading

fangly commented Jul 19, 2016

wch commented Aug 2, 2016

hadley commented Dec 15, 2016

Comparing named vectors, ignoring order #473

Comparing named vectors, ignoring order #473

Comments

wch commented May 13, 2016

krlmlr commented Jun 25, 2016 • edited Loading

fangly commented Jul 19, 2016

wch commented Aug 2, 2016

hadley commented Dec 15, 2016

krlmlr commented Jun 25, 2016 •

edited

Loading