Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code for Chapter 8 not working #50

Open
RaymondBalise opened this issue Apr 22, 2021 · 1 comment
Open

Code for Chapter 8 not working #50

RaymondBalise opened this issue Apr 22, 2021 · 1 comment

Comments

@RaymondBalise
Copy link

I attempted to use the read_mnist() function from dslabs and it returned this error:

Error in readBin(conn, "integer", n = prod(dim), size = 1, signed = FALSE) : 
  cannot read from connection
In addition: Warning message:
In readBin(conn, "integer", n = prod(dim), size = 1, signed = FALSE) :
  URL 'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz': Timeout of 60 seconds was reached

It looks like http://yann.lecun.com/exdb/mnist/ is no longer live but with a little help from the Brave browser I found an old image of the site using the wayback machine and I downloaded the files.

I modified the function to read the data out of local copies:

read_mnist_local <- function () {
    mnist <- list(train = list(images = c(), labels = c()), test = list(images = c(), 
        labels = c()))
    for (ttt in c("train", "t10k")) {
        fn <- paste0(ttt, "-images-idx3-ubyte.gz")
        # url <- url(paste0("", fn), "rb")
        # conn <- gzcon(url)
        conn <- gzcon(file(fn, "rb"))
        magic <- readBin(conn, "integer", n = 1, size = 4, endian = "big")
        typ <- bitwAnd(bitwShiftR(magic, 8), 255)
        ndm <- bitwAnd(magic, 255)
        dim <- readBin(conn, "integer", n = ndm, size = 4, endian = "big")
        data <- readBin(conn, "integer", n = prod(dim), size = 1, 
            signed = FALSE)
        tt <- ttt
        if (tt == "t10k") 
            tt <- "test"
        mmm <- matrix(data, nrow = dim[1], byrow = TRUE)
        mnist[[tt]][["images"]] <- mmm
        close(conn)
        fn <- paste0(ttt, "-labels-idx1-ubyte.gz")
        # url <- url(paste0("", fn), "rb")
        # conn <- gzcon(url)
        conn <- gzcon(file(fn, "rb"))
        magic <- readBin(conn, "integer", n = 1, size = 4, endian = "big")
        nlb <- readBin(conn, "integer", n = 1, size = 4, endian = "big")
        data <- readBin(conn, "integer", n = nlb, size = 1, signed = FALSE)
        mnist[[tt]][["labels"]] <- data
        close(conn)
    }
    mnist
}
 
# import MNIST training data
#mnist <- dslabs::read_mnist()
mnist <- read_mnist_local()

… and all is good. I don’t know the proper solution (other than hosting the files) but I figured I should share this in the hope it helps others.

@bradleyboehmke
Copy link
Member

@RaymondBalise, here is another slightly simpler approach. This uses the mnist data set provided by Keras. We were going to use this data set initially but decided not to since it is in a 3D array rather than the 2D dataframe provided by dslabs::read_mnist().

# Import MNIST data from Keras. This will import the data
# as a 3D array
mnist <- keras::dataset_mnist()

# Get our feature dimensions
mnist_train_dim <- dim(mnist$train$x)
train_nobs <- mnist_train_dim[1]
train_nfeat <- mnist_train_dim[2]*mnist_train_dim[3]

# Identify our sampled index
set.seed(123)
index <- sample(train_nobs, size = 10000)

# Convert features to 2D array, then to a dataframe
mnist_x_2d <- array(mnist$train$x, dim = c(train_nobs, train_nfeat))
mnist_x <- data.frame(mnist_x_2d)[index, ]

# extract response and convert to factor
mnist_y <- factor(mnist$train$y)[index]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants