Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Short introduction to the data.table library in R #204

Merged
merged 10 commits into from
Jan 22, 2014
Merged

Short introduction to the data.table library in R #204

merged 10 commits into from
Jan 22, 2014

Conversation

naupaka
Copy link
Member

@naupaka naupaka commented Dec 10, 2013

Scott and I put together an Rmd file with code and descriptions of several useful features of the data.table library to be used as a short ~3 minute lesson.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 10, 2013

Thanks @naupaka and @sckott, I took a quick look and the overall format looks good.

@jdblischak, @karthik, do either of you want to mentor the review on this?

Note: There is still an open discussion on whether the bc repository is going to accept generated content (everything except the .Rmd file in the pull request, if I understand how knit works). I will leave it up to you if you would prefer to just commit the .Rmd file or the generated content as well until we come to a consensus, since the generated content is relatively light-weight.

@karthik
Copy link
Contributor

karthik commented Dec 10, 2013

I've dropped a few comments into the code above. Some general thoughts:

Overall the material is great for a 2-3 minutes discussion. But I see a few issues.

  1. There is no organic place for this material at any of the three levels. It sticks out as an odd sidebar for a novice bootcamp, but too basic for an advanced one.
  2. In January 2014 the dplyr package will be production ready and can handle data.frames, data.tables and sqlite and provide the same common syntax that we teach folks in our regular plyr section. So people who understand plyr operations will then benefit from the power and speed of data.table without learning the arcane syntax. So by then, the only folks who will need to learn the ins and outs of data.table will be advanced users with special use cases that can't be accomplished by the simpler plyr format.

My recommendation: This is ok to merge as a standalone lesson but I don't see much use for it in the immediate term. It could be brought up in case some students bring up speed or "big" data issues.
But in the long term it would be great to see this fleshed out as a complete data.table tutorial that could perhaps be a topic for an advanced R bootcamp. The data munging/manipulation section we have in the works for the intermediate section could be replaced by this for the advanced users. It would be also great to have more than simulated data as examples.

So @ahmadia I'll leave it up to you to merge after you hear back from @jdblischak

@jdblischak
Copy link
Contributor

Thanks for your contribution! Here are my thoughts:

  • I think this belongs in the intermediate lessons. I'd suggest putting it in a folder such as r/intermediate/misc/data.table/.
  • Use head more often. This will make it easier to follow from the html version of the lesson.
  • Since this would need to be incorporated as an add-on, you may want to include instructions such as prerequisite knowledge required, duration time (like you have in the PR message, though 3 minutes seems short to me), and learning objectives. Perhaps you could put this information in a README.md that resides in the same directory as your lesson.
  • An exercise or two would be useful.
  • I don't think anything beyond the RMarkdown file is necessary in the main repo. Generated content can be committed to bootcamp-specific repos. Doesn't matter to me though.
  • I personally do not like having text lines that cannot be read on GitHub. I wish RStudio had an option like fill-paragraph (alt-q) in Emacs to automatically adjust paragraphs, but I haven't been able to find a convenient method. However I don't know if anyone else feels the same, so I leave that to your judgement.


## Combine data.frames

`data.table` can do more than just read in files though. Another often-completed task is combining two data.frames. Let compare the base R approach to the `data.table` version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: "often-completed" -> "common"
typo: "Let" -> "Let's"

@ethanwhite
Copy link
Contributor

I personally do not like having text lines that cannot be read on GitHub. I wish RStudio had an option like fill-paragraph (alt-q) in Emacs to automatically adjust paragraphs, but I haven't been able to find a convenient method. However I don't know if anyone else feels the same, so I leave that to your judgement.

GitHub is actually supposed to be sof

@wking
Copy link
Contributor

wking commented Dec 10, 2013

On Tue, Dec 10, 2013 at 03:29:15PM -0800, Ethan White wrote:

GitHub is actually supposed to be sof

…t wrapping? I thought so too, but maybe .Rmd doesn't count as
“prose” 1?

@ethanwhite
Copy link
Contributor

Oops, yes, thanks. It works on the .Rmd on my phone, but not in either Firefox or Chrome on my laptop.

@naupaka
Copy link
Member Author

naupaka commented Dec 11, 2013

Thanks all for the helpful comments! I will get to the other suggested changes in the next couple days.

@@ -0,0 +1,91 @@
# Introduction to data.table

What is the `data.table` library for and why would you want to use it? Doesn't base R come with data frames build in already? Turns out that there are some things that can be done MUCH faster and more easily with data.table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a package, not a library. The latter is a collection of the former.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mybad, thanks for catching that. Will be fixed in the next commit.

@ghost ghost assigned ahmadia Dec 12, 2013
@ethanwhite
Copy link
Contributor

GitHub is supposed to be soft-wrapping prose

I have a PR in to Linguist that would fix this for Rmd files (github-linguist/linguist#831). However there is a test currently failing on Ruby 1.8.7 that I don't know enough to debug. If someone with more Ruby experience (@emhart ?) has a chance to take a look fixing this should increase the chances that this gets implemented quickly.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 16, 2013

@naupaka - Is this ready for review again?

@naupaka
Copy link
Member Author

naupaka commented Dec 16, 2013

Not yet, Scott and I still have stuff to fix. Been traveling.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 16, 2013

No problem, just making sure you weren't silently waiting on me :)

@emhart
Copy link
Contributor

emhart commented Dec 17, 2013

@ethanwhite I checked this out, but I see that they merged the request and that your fork was building fine. Did you resolve the problem?

@karthik
Copy link
Contributor

karthik commented Dec 17, 2013

@emhart yes, resolved. That discussion was just referenced here and should continue on the original thread if there is anything further.

@gvwilson
Copy link
Contributor

Is this one ready to merge?

@ahmadia
Copy link
Contributor

ahmadia commented Jan 20, 2014

It's waiting on @naupaka :). @naupaka - can we help with anything?

@sckott
Copy link
Contributor

sckott commented Jan 20, 2014

I'll take a look and get things fixed, hopefully today. I'm partnering with @naupaka by the wway :)

@sckott
Copy link
Contributor

sckott commented Jan 22, 2014

Okay all, I've fixed up the data.table lesson. @naupaka Anything else on your end?

@naupaka
Copy link
Member Author

naupaka commented Jan 22, 2014

Let me take a quick look in the next few hours and I'll let you know if I have anything else to add. Thanks!

On Jan 21, 2014, at 5:52 PM, Scott Chamberlain [email protected] wrote:

Okay all, I've fixed up the data.table lesson. @naupaka Anything else on your end?


Reply to this email directly or view it on GitHub.

@naupaka
Copy link
Member Author

naupaka commented Jan 22, 2014

ok @sckott and @ahmadia looks good to me.

gvwilson pushed a commit that referenced this pull request Jan 22, 2014
Short introduction to the data.table library in R
@gvwilson gvwilson merged commit 817d645 into swcarpentry:master Jan 22, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants