Name		Name	Last commit message	Last commit date
parent directory ..
src/main/scala/acme		src/main/scala/acme
README.md		README.md
pom.xml		pom.xml

README.md

Overview

This module represents the work of data scientist, who explore data and then tune and create a predictive model. The task here is to create a classifier model that will predict room occupancy given the data set made available in acme-dataeng.

The data scientists will work in Cloudera's Data Science Workbench. Although this work will also use Scala and Spark, executing code in a shell or notebook-like tool is somewhat different than writing code to be compiled

Setup

In the workbench, create a new Git project from this repository on github.com: https://github.com/srowen/cdsw-simple-serving. Once the project is open, navigate to this directory, acme-datasci. Start a Spark Scala engine; 2 vCPU and 4GB will work fine.

Running

Although the source code can be compiled by Maven and opened in an IDE, it's written with a notebook like the workbench in mind. The source code executed by the workbench need not fall into a directory structure or even class declaration. To try to accommodate both this usage and the standards of a Maven build, the data scientist works in the file src/main/scala/acme/ACMEModel.scala. This file has a simplistic wrapper declaration but most of the code inside is intended to be executed as a block in the workbench.

Once this file is open in the workbench, select the code between // Start workbench session and // End workbench session and press command-Enter. It will execute. A data scientist could change the code, create a different model, and then check in the result using git

The result is, however, compilable Scala code that can be reused by a deployment module, acme-serving. It can be tested and evaluated by a continuous integration system like with any software because it also follows a standard Maven build structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acme-datasci

acme-datasci

README.md

Overview

Setup

Running

Files

acme-datasci

Directory actions

More options

Directory actions

More options

Latest commit

History

acme-datasci

Folders and files

parent directory

README.md

Overview

Setup

Running