Add fwf_cols function #616

jrnold · 2017-02-18T23:12:46Z

This adds a helper function fwf_cols that is a more intuitive way of specifying fixed width column start and end points. While fwf_positions requires three vectors for start, end, and names, fwf_cols accepts a named list of length-2 vectors of the column start and end positions.

For example,

# 3. Paired vectors of start and end positions
read_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn")))
# 4. Named list of start and end positions
read_fwf(fwf_sample, fwf_cols(list(name = c(1, 10), ssn = c(30, 42))))

This adds a helper function `fwf_cols` that is a more intuitive way of specifying fixed width column start and end points. While `fwf_positions` requires three vectors for start, end, and names, `fwf_cols` accepts a named list of length-2 vectors of the column start and end positions.

hadley · 2017-02-19T14:19:57Z

I think this is an improvement, but I wonder if a wrapper around tribble would be even nicer.

jrnold · 2017-02-19T20:33:00Z

I was thinking about whether a wrapper around a data frame would be useful and almost included a version in it, but decided against it. My thinking was there's two main way you could get the column specifications (1) if the column specifications are a data frame with two (variable, widths) or three columns (variable, start, end), or (2) they are entering it by hand.

If the column specifications are already in a data frame (i'll call is cols), it's not too unclear to simply call the existing functions by simply referencing the columns of the data frame:

fwf_postions(cols$start, cols$end, cols$varname)

To me, that's still pretty clear, and not too much typing.
Possibly fwf_positions and fwf_widths could be made into generic functions, with methods for the first function being a vector, matrix, or data frame.

The second case is entering it by hand (when it's not too many columns). In that case, having the columns as argument names and the widths or (start, end) as values seems most natural. fwf_cols could be generalized to allow the values to be either (start, end) tuples or widths.

# with widths
fwf_cols(foo = 1, bar = 5)
# with (start, end) tuples
fwf_cols(foo = c(1, 4), bar = c(5, 10)

This came up when I was helping a student read a fixed-width file. I foolishly didn't RTFM before writing code, and assumed that the format was something like what I was just wrote. When we got an error and actually read the documentation, I was too lazy to adjust change the code and used map to convert it into what fwf_positions was looking for.

It needs the .Rd file.

hadley · 2017-02-22T14:33:38Z

What if we allowed col_positions to take a data frame? If one row, the values are widths; if two rows, the values are start and end. If >= 3 rows, an error.

Then you'd have:

read_fwf(fwf_sample, tibble(name = c(1, 10), ssn = c(30, 42)))
read_fwf(fwf_sample, tibble(name = 10, skip = 20, ssn = 12))

jrnold · 2017-02-23T21:26:39Z

I don't know. It seems more natural and easy to document that col_positions accepts a "long" tibble like fwf_positions produces. And the helper functions, fwf_positions, fwf_widths, fwf_cols all return such a tibble. This would minimize the amount of code that's rewritten, and keep these functions backwards compatible while improving the ease of use.

If a user is able to write the following, it's about as concise as the code above, and I'd say as readable.

read_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42)))
read_fwf(fwf_sample, fwf_cols(name = 10, skip = 20, ssn = 12))

And the following would still work:

x <- tribble(
  ~ col_name, ~start, ~ end
  name, 1, 10,
  ssn, 30, 42
)
read_fwf(fwf_sample, x)

This adds a helper function `fwf_cols` that is a more intuitive way of specifying fixed width column start and end points. While `fwf_positions` requires three vectors for start, end, and names, `fwf_cols` accepts a named list of length-2 vectors of the column start and end positions.

It needs the .Rd file.

This adds a helper function `fwf_cols` that is a more intuitive way of specifying fixed width column start and end points. While `fwf_positions` requires three vectors for start, end, and names, `fwf_cols` accepts a named list of length-2 vectors of the column start and end positions.

- read_fwf arg col_positions will check for column names and whether the data frame is widths or a start/end data frame. - rewrite fwf_cols to accept named args of length 1 or 2. This makes it more concise. Also accept a data frame as the first argument - More checks for argument validity - Use tibbles instead of lists where appropriate Some tests failing. Still need to debug.

jrnold · 2017-02-24T01:52:29Z

Now I have it so that

The col_positions argument ofread_fwf accepts a list with either begin and end columns or width and treats them appropriately
The fwf_cols(...) calls tibble(...) or uses the first argument if it is a list. It will expect either 1 row, in which case it calls fwf_widths or 2 rows, in which case it calls fwf_positions.

hadley · 2017-02-24T14:46:43Z

R/read_fwf.R

 #' # 1. Guess based on position of empty columns
 #' read_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))
 #' # 2. A vector of field widths
 #' read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))
 #' # 3. Paired vectors of start and end positions
 #' read_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn")))
+#' # 4. Named arguments with start and end positions
+#' read_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42)))


Can you include the width form here too?

hadley · 2017-02-24T14:48:13Z

R/read_fwf.R

-    return(tibble::data_frame())
+    return(tibble::tibble())
+  }
+  if (!is.list(col_positions)) {


This feels too complicated to me. If we have fwf_cols() I don't think we need to worry about list/data.frame inputs.

hadley · 2017-02-24T14:48:36Z

R/read_fwf.R

  }

-  tokenizer <- tokenizer_fwf(col_positions$begin, col_positions$end, na = na, comment = comment)
+  tokenizer <- tokenizer_fwf(col_positions$begin, col_positions$end, na = na,


Can you change this back please?

hadley · 2017-02-24T14:49:25Z

R/read_fwf.R

@@ -1,3 +1,10 @@
+fwf_col_names <- function(nm, n) {


This should be much lower in the file

hadley · 2017-02-24T14:50:15Z

R/read_fwf.R

+
+#' @rdname read_fwf
+#' @export
+#' @param ... If the first element is a data frame,


This feels too flexible to me. But if you really think it's a good idea to keep it, the function signature should be x, ...

hadley · 2017-02-24T14:50:39Z

R/read_fwf.R

+  names(x) <- fwf_col_names(names(x), length(x))
+  x <- tibble::as_tibble(x)
+  if (nrow(x) == 2) {
+    fwf_positions(as.integer(x[1, ]),


Indenting style

hadley · 2017-02-24T14:51:03Z

R/read_fwf.R

+  if (is.list(x[[1]])) {
+    x <- x[[1]]
+  }
+  x <- try(lapply(x, as.integer), silent = TRUE)


I don't think like approach. I'd say just let the error bubble up to the user.

hadley · 2017-02-24T14:52:15Z

tests/testthat/test-read-fwf.R

@@ -127,6 +127,43 @@ test_that("error on empty spec (#511, #519)", {
  expect_error(read_fwf(txt, pos), "Zero-length.*specifications not supported")
 })

+# fwf_cols
+test_that("fwf_cols produces correct fwf_positions object with elements of length 2", {
+  expected <- fwf_positions(c(1,  9, 4),


Can you please fix the indenting here too?

If the arguments don't fit on one line it should look like:

function_name( arg1, argument_name = arg2, ... )

So this is a different style than the one in http://adv-r.had.co.nz/Style.html, which would be

function_name(arg1, argument_name = arg2, ...)

Move fwf_col_names function lower in file. See tidyverse#616

Add widths form of fwf_cols to documentation See tidyverse#616

This is too complicated; since we have fwf_cols, don't worry about list inputs. See tidyverse#616

Revert added newline See tidyverse#616

Remove a try() call since, the preference is for errors to bubble up to users. See tidyverse#616

This seems too flexible, so I'll change it to just use ... See tidyverse#616

See comments in tidyverse#616

- removed tests that failed after removing features in fwf_cols in previous commits - remove test that failed because fwf_positions changes columns to numeric

Revert this section in read_fwf since it is unnecessary to handle data list objects with the availability of fwf_cols See tidyverse#616

Convert some numeric constants to integer constants so that addition/subtraction does not coerce columns to numeric if they were integer. This is not a big deal, but since the positions represent integers anyways, it might as well keep them as such if they are already specified as such.

hadley

Looks good. Now just needs a bullet point in NEWS.md in the appropriate place

hadley · 2017-02-25T19:36:36Z

NEWS.md

@@ -1,5 +1,31 @@
 # readr 1.1.0

+* `fwf_cols()` allows for specifying the `col_positions` argument of


I think something went wrong with your merge 😞

Oops. wtf did that merge do? Bad git :-( Sorry about that, and fixed now.

weird things happened to NEWS.md. They are fixed now.

jimhester · 2017-02-27T16:08:41Z

Thanks!

jrnold added 2 commits February 18, 2017 15:04

fix example for fwf_cols

80586ec

Fix failed Travis build

e5cc862

It needs the .Rd file.

jrnold added 5 commits February 23, 2017 15:28

fix example for fwf_cols

f954bc7

Fix failed Travis build

f9f2f4d

It needs the .Rd file.

jrnold force-pushed the fwf_cols branch from 1f39aed to 7a35bac Compare February 23, 2017 23:31

fix failing tests

0d48bb2

hadley reviewed Feb 24, 2017

View reviewed changes

jrnold added 12 commits February 24, 2017 19:23

misc

980371b

merge

08495c2

respond to hadley's comments

8b942d7

Move fwf_col_names function lower in file. See tidyverse#616

respond to hadley's comments

24def9d

Add widths form of fwf_cols to documentation See tidyverse#616

respond to hadley's comments

8a7ddff

This is too complicated; since we have fwf_cols, don't worry about list inputs. See tidyverse#616

respond the hadley's comments

6370e0c

Revert added newline See tidyverse#616

respond to hadley's comments

0b29615

Remove a try() call since, the preference is for errors to bubble up to users. See tidyverse#616

respond to hadley's comments

c2cce8d

This seems too flexible, so I'll change it to just use ... See tidyverse#616

Fix indenting issues

8610b4b

See comments in tidyverse#616

fix tests

6640c3d

- removed tests that failed after removing features in fwf_cols in previous commits - remove test that failed because fwf_positions changes columns to numeric

respond to hadley's comments

0651081

Revert this section in read_fwf since it is unnecessary to handle data list objects with the availability of fwf_cols See tidyverse#616

hadley approved these changes Feb 25, 2017

View reviewed changes

add bullet point to NEWS.md

c22745c

Merge 'tidyverse/master' into fwf_cols

bfe64d7

hadley reviewed Feb 25, 2017

View reviewed changes

fix merge error

e7a5b62

weird things happened to NEWS.md. They are fixed now.

jimhester merged commit e7a5b62 into tidyverse:master Feb 27, 2017

jimhester mentioned this pull request May 13, 2019

Make fwf_positions() guess end argument from start #996

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fwf_cols function #616

Add fwf_cols function #616

jrnold commented Feb 18, 2017 •

edited

Loading

hadley commented Feb 19, 2017

jrnold commented Feb 19, 2017

hadley commented Feb 22, 2017

jrnold commented Feb 23, 2017

jrnold commented Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

hadley Feb 24, 2017

jrnold Feb 25, 2017

hadley left a comment

hadley Feb 25, 2017

jrnold Feb 25, 2017

jimhester commented Feb 27, 2017

		@@ -1,5 +1,31 @@
		# readr 1.1.0

		* `fwf_cols()` allows for specifying the `col_positions` argument of

Add fwf_cols function #616

Add fwf_cols function #616

Conversation

jrnold commented Feb 18, 2017 • edited Loading

hadley commented Feb 19, 2017

jrnold commented Feb 19, 2017

hadley commented Feb 22, 2017

jrnold commented Feb 23, 2017

jrnold commented Feb 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimhester commented Feb 27, 2017

jrnold commented Feb 18, 2017 •

edited

Loading