- The Ranker blog features articles from Ravi Iyer.
- Simply Statistics is the blog of Jeff Leek, Roger Peng, and Rafa Irizarry.
- Hilary Mason's blog always has good things.
- Paco Nathan has a very nice web site and email newsletter.
- DataTau is like Hacker News for data.
- Data for Good is a "public-good-only fork of DataTau"
- r/MachineLearning on reddit
- yhat's blog often has good original content, often on data science with
python
. - The Endeavor is the blog of John D. Cook, with a focus on applied statistics.
- Statistical Modeling, Causal Inference, and Social Science is the blog of Andrew Gelman, mostly on Bayesian statistics.
- DataScholars posts articles on data science, computer science, machine learning, etc.
- Data Community DC
- The kaggle blog often has interesting pieces, usually related to their competitions.
- Charles Martin's Machine Learning blog covers "notes, thoughts, and practice of applied machine learning".
- Jeroen Janssens "dutch data scientist in old new amsterdam"
- KDnuggets has some things.
- The Columbia Data Science blog was active in fall of 2013 and could be active again. (See also Doing Data Science in books, below.)
- The dataists blog hasn't been active for a while, but has some old gems.
- Jeff Knupp is quite the guy when it comes to Python.
- Benedikt Koehler has a neat blog called Beautiful Data.
- FastML: "Machine learning made easy"
- Edwin Chen has a fun blog.
- FiveThirtyEight is now a kind of data journalism syndicate.
- Win-Vector Blog: "The applied theorist's point of view"
- Machine Learning Mastery: "Making programmers awesome at machine learning"
- Quora's Machine Learning section has interesting things sometimes.
- The Aggregate Knowledge tech blog has some good stuff on large scale and streaming technology.
- Yet Another Data Blog, this one from a Zipfian Academy student.
- These two are related, quite good, and available as free PDFs. The first is more elementary than the second.
- Think Stats is an introduction to statistics with programming and available as a free PDF.
- Mining of Massive Datasets is also available as a free PDF from the author.
- Exploratory Data Analysis (EDA) by John Tukey deserves more attention than it gets.
- Data Analysis with Open Source Tools is quite a good intro to a range of tools.
- Practical Data Science with R: A fair book that crosses
R
and consulting ideas, by the main folks at Win-Vector. - The Hitchhiker’s Guide to Python
- Statistics in a Nutshell
- Statistics Done Wrong
- Probabilistic Programming & Bayesian Methods for Hackers
- Machine Learning for Hackers [amazon] [code] by Drew Conway and John Myles White
- Will it Python? is a blog series in which the
R
code from Machine Learning for Hackers is translated to Python.
- Will it Python? is a blog series in which the
- Doing Data Science is a book that came out of the data science course at Columbia and Cathy O'Neil's blogging of said course.
- Data Mining with R: Learning with Case Studies [book site] [amazon]
- Data Science for Business: What you need to know about data mining and data-analytic thinking (used as a textbook at NYU)
- Data Smart: Using Data Science to Transform Information into Insight
- Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
- Microsoft's Deep Learning: Methods and Applications (free PDF)
- Mining the Social Web, 2nd Edition is available for sale and also has a fairly comprehensive github repository with all the code from the book, in friendly IPython Notebook format, even with a whole virtual machine setup.
- Twitter Data Analytics is available free online as a pre-print; it uses Java and MongoDB.
- Little Book of R for Time Series
- Introduction to Data Technologies by Paul Murrell "is a book that provides a basic introduction to a number of computer technologies for working with data (HTML, XML, Databases, SQL, regular expressions, and R)".
- Introduction to Data Science, developed by Jeffrey Stanton for the Certificate of Data Science program at Syracuse University’s School of Information Studies.
- Full Stack Python by Matt Makai describes using Python for web projects.
- Introductory Graph Theory is an inexpensive text to pick up.
- The book Networks, Crowds, and Markets: Reasoning About a Highly Connected World is available online.
- Statistics: Methods and Applications is a book from StatSoft which has been available online and now is also available as a printed book. The book has some decent explanations, usually avoids mathematics, and frequently references the company's commercial software.
- Sams Teach Yourself SQL in 10 Minutes
- Beginning SQL Queries: From Novice to Professional
- Applied Mathematics for Database Professionals
- The future of data analysis by John tukey in 1961 is still interesting and relevant.
- Documenting with KnitR (using LaTeX for reports)
- Facebook article on deep learning for facial recognition: DeepFace: Closing the Gap to Human-Level Performance in Face Verification
- The UCLA IDRE Statistical Computing site is a rich resource for several computing environments (
R
, Stata, SPSS, SAS, etc.). - Probabilistic topic models: largely on Latent Dirichlet Allocation (LDA)
- Data Science Toolkit has some handy things packaged for the web, and as a VM.
- mloss.org: "machine learning open source software"
- Knowledge Discovery and Data Mining
- Neural Information Processing Systems
- O'Reilly Strata Conference (has also crossed with Hadoop World)
- DataGotham
- PyData
- PyCon (2014 video)
- BigConf: The Mid-Atlantic Data Conference
- csv,conf: A conference for data makers everywhere. (a fringe event of the Open Knowledge Festival)
- There seems to be some sort of Data Science Association.
People publish in a lot of places; other journal suggestions welcomed!
- Big Data seems to be fairly broad.
-
In and around DC:
-
In New York City:
- Harvards "CS109 Data Science" class has a collection of slides online.
- A Udacity course on Exploratory Data Analysis. "Investigate, Visualize, and Summarize Data Using R" - materials available for free.
- Roger Peng's YouTube playlists include his Computing for Data Analysis videos, which are quite good.
R
plotting content is in week three. - CS 194-16 Introduction to Data Science, a course by Jeff Hammerbacher and Mike Franklin
- UC Berkely School of Information has course videos for "Analyzing Big Data with Twitter".
- Software Carpentry
- School of Data
- videolectures.net
- Coursera / University of Washington Machine Learning
- Coursera / Stanford Machine Learning
- Udacity Artificial Intelligence with Peter Norvig and Sebastian Thrun
- statistics.com
- Pluralsight
- Enginehere
- Cloudera has some mix of online and in-person courses and certifications.
- Hinton's Coursera course on Neural Networks for Machine Learning
- Google's Making Sense of Data course materials (seems very introductory; focuses on Google Fusion Tables)
- Quant Education YouTube channel of videos on
R
- Deep Learning tutorials from deeplearning.net
- Deep Learning Reading List from deeplearning.net
- Unsupervised Feature Learning and Deep Learning tutorials from Stanford
- The well-known machine learning couse of Andrew Ng; uses MatLab/Octave and focuses on gradient descent from the beginning
- Coursera's Introduction to Recommender Systems with Konstan and Ekstrand
- MIT OpenCourseWare: Networks, Complexity and Its Applications
- Data Origami is a subscription site with screencasts on doing things with data.
- Software Carpentry is pretty great.
- District Data Labs (DC)
- Zipfian Academy (San Francisco): Their Data Science Immersive is 12 weeks and costs $16,000. Here are two blogs documenting the Zipfian experience in some depth. Zipfian also has two other programs: Data Fellows and Data Engineering.
- Persontyle school of data science
- Data Science Retreat in Berlin, from the makers of HackerRetreat.
- Comparison of various bootcamp programs.
- Kaplan runs "Metis" 12-week bootcamps in a couple subjects, including data science.
- Data Science Dojo: "unleash the data scientist in you" (some workshops and so on in Seattle and Silicon Valley)
- Insight Data Science Fellows Program: "An intensive six week post-doctoral training fellowship bridging the gap between academia and data science"
- The Open Source Data Science Masters curriculum
- listudy: lists for learning data science
- Quora post: "What are some good resources for learning about machine learning?"
- MetaOptimize post: "Good Freely Available Textbooks on Machine Learning"