Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBT Support #21

Merged
merged 41 commits into from
Feb 16, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8d142ec
clean up and formatting
karthikvadla Feb 8, 2017
b79ccc9
formatting testcases
karthikvadla Feb 8, 2017
cb4656e
upgraded to scala 2.11.8
karthikvadla Feb 9, 2017
651ae54
Merge branch 'master' into tensorflow
karthikvadla Feb 9, 2017
b57acb9
merge
karthikvadla Feb 9, 2017
6919a17
formatting merge
karthikvadla Feb 9, 2017
9591da4
update readme
karthikvadla Feb 9, 2017
7c6a0db
Merge branch 'master' of github.com:karthikvadla16/spark-tensorflow-c…
karthikvadla Feb 9, 2017
c30085b
Clean up .gitignore file, remove tf folder inside test folder
karthikvadla Feb 10, 2017
4128deb
Merge remote-tracking branch 'origin/master' into tensorflow
karthikvadla Feb 10, 2017
09508d3
Merge remote-tracking branch 'origin/tensorflow' into newtens
karthikvadla Feb 10, 2017
3a3fbef
Rename spark-tf-core to core, and update all references
karthikvadla Feb 10, 2017
9dcb951
Remove core module, add License file and make pom changes
karthikvadla Feb 10, 2017
37b31c2
Renaming namespace, update all files with new namespace
karthikvadla Feb 10, 2017
71b427f
Fix custom schema, correct pom
karthikvadla Feb 10, 2017
b3691b3
update readme
karthikvadla Feb 10, 2017
ebbd0d9
Merge branch 'master' into tensorflow
karthikvadla Feb 10, 2017
6b5ffda
update readme
karthikvadla Feb 10, 2017
bab9e3a
merge from master
karthikvadla Feb 14, 2017
1822369
Merge branch 'tensorflow' of github.com:karthikvadla16/spark-tensorfl…
karthikvadla Feb 14, 2017
f6fee5f
add sbt build files
karthikvadla Feb 14, 2017
9bf2e70
Add conversion from mvn to sbt (#15)
karthikvadla Feb 15, 2017
ee16f57
Add classifier to bring in correct shaded jar and class
joyeshmishra Feb 15, 2017
efaf1ed
Add classifier to bring in correct shaded jar and class (#16)
joyeshmishra Feb 15, 2017
c2b23f1
Merge branch 'sbt' of github.com:tapanalyticstoolkit/spark-tensorflow…
joyeshmishra Feb 15, 2017
2cc06c3
Add travis.yml file
joyeshmishra Feb 15, 2017
e2f5add
Refactor travis file
joyeshmishra Feb 15, 2017
5645965
Refactor travis file
joyeshmishra Feb 15, 2017
d7da13c
Update README.md
joyeshmishra Feb 15, 2017
5a9f563
Add Travis support to sbt branch (#17)
joyeshmishra Feb 15, 2017
b349f6a
Cleanup
joyeshmishra Feb 15, 2017
4b3c8c4
Merge branch 'sbt' of github.com:joyeshmishra/spark-tensorflow-connec…
joyeshmishra Feb 15, 2017
c479efd
merge
joyeshmishra Feb 15, 2017
b8503a1
Remove central1 dependency in sbt and sudo requirement from travis.ym…
joyeshmishra Feb 15, 2017
b369d84
Merge branch 'sbt' of github.com:tapanalyticstoolkit/spark-tensorflow…
joyeshmishra Feb 15, 2017
562beab
SBT working, Cleaned up (#19)
karthikvadla Feb 15, 2017
f710331
Merge branch 'sbt' of github.com:tapanalyticstoolkit/spark-tensorflow…
joyeshmishra Feb 15, 2017
2df8580
use filterNot
joyeshmishra Feb 15, 2017
ac9afcc
Refactor to use filterNot (#20)
joyeshmishra Feb 15, 2017
9fb9fff
Merge branch 'sbt' of github.com:tapanalyticstoolkit/spark-tensorflow…
joyeshmishra Feb 16, 2017
6171354
Add sbt-spark-package plugin support
joyeshmishra Feb 16, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ target
tf-sandbox
spark-warehouse/
metastore_db/
project/project/
test-output.tfr
21 changes: 21 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
language: scala

# Cache settings here are based on latest SBT documentation.
cache:
directories:
- $HOME/.ivy2/cache
- $HOME/.sbt/boot/

before_cache:
# Tricks to avoid unnecessary cache updates
- find $HOME/.ivy2 -name "ivydata-*.properties" -delete
- find $HOME/.sbt -name "*.lock" -delete

scala:
- 2.11.8

jdk:
- oraclejdk8

script:
- sbt ++$TRAVIS_SCALA_VERSION clean publish-local
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[![Build Status](https://travis-ci.org/tapanalyticstoolkit/spark-tensorflow-connector.svg?branch=sbt)](https://travis-ci.org/tapanalyticstoolkit/spark-tensorflow-connector)

# spark-tensorflow-connector

This repo contains a library for loading and storing TensorFlow records with [Apache Spark](http://spark.apache.org/).
Expand All @@ -19,19 +21,34 @@ None.
2. [Apache Maven](https://maven.apache.org/)

## Building the library
Build the library using Maven as shown below.
You can build library using both Maven and SBT build tools

#### Maven
Build the library using Maven(3.3) as shown below

```sh
mvn clean install
```

#### SBT
Build the library using SBT(0.13.13) as show below
```sh
sbt clean assembly
```

## Using Spark Shell
Run this library in Spark using the `--jars` command line option in `spark-shell` or `spark-submit`. For example:

Maven Jars
```sh
$SPARK_HOME/bin/spark-shell --jars target/spark-tensorflow-connector-1.0-SNAPSHOT.jar,target/lib/tensorflow-hadoop-1.0-01232017-SNAPSHOT-shaded-protobuf.jar
```

SBT Jars
```sh
$SPARK_HOME/bin/spark-shell --jars target/scala-2.11/spark-tensorflow-connector-assembly-1.0-SNAPSHOT.jar
```

The following code snippet demonstrates usage.

```scala
Expand All @@ -40,7 +57,7 @@ import org.apache.spark.sql.{ DataFrame, Row }
import org.apache.spark.sql.catalyst.expressions.GenericRow
import org.apache.spark.sql.types._

val path = s"$TF_SANDBOX_DIR/test-output.tfr"
val path = "test-output.tfr"
val testRows: Array[Row] = Array(
new GenericRow(Array[Any](11, 1, 23L, 10.0F, 14.0, List(1.0, 2.0), "r1")),
new GenericRow(Array[Any](21, 2, 24L, 12.0F, 15.0, List(2.0, 2.0), "r2")))
Expand Down
52 changes: 52 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
scalaVersion in Global := "2.11.8"

def ProjectName(name: String,path:String): Project = Project(name, file(path))

resolvers in Global ++= Seq("https://tap.jfrog.io/tap/public" at "https://tap.jfrog.io/tap/public" ,
"https://tap.jfrog.io/tap/public-snapshots" at "https://tap.jfrog.io/tap/public-snapshots" ,
"https://repo.maven.apache.org/maven2" at "https://repo.maven.apache.org/maven2" )

val `junit_junit` = "junit" % "junit" % "4.12"

val `org.apache.hadoop_hadoop-yarn-api` = "org.apache.hadoop" % "hadoop-yarn-api" % "2.7.3"

val `org.apache.spark_spark-core_2.11` = "org.apache.spark" % "spark-core_2.11" % "2.1.0"

val `org.apache.spark_spark-sql_2.11` = "org.apache.spark" % "spark-sql_2.11" % "2.1.0"

val `org.apache.spark_spark-mllib_2.11` = "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"

val `org.scalatest_scalatest_2.11` = "org.scalatest" % "scalatest_2.11" % "2.2.6"

val `org.tensorflow_tensorflow-hadoop` = "org.tensorflow" % "tensorflow-hadoop" % "1.0-01232017-SNAPSHOT"


spName := "spark-tensorflow-connector"

sparkVersion := "2.1.0"

sparkComponents ++= Seq("sql", "mllib")

spIgnoreProvided := true

version := "1.0-SNAPSHOT"

name := "spark-tensorflow-connector"

organization := "org.trustedanalytics"

libraryDependencies in Global ++= Seq(`org.tensorflow_tensorflow-hadoop` classifier "shaded-protobuf",
`org.scalatest_scalatest_2.11` % "test" ,
`org.apache.spark_spark-sql_2.11` % "provided" ,
`org.apache.spark_spark-mllib_2.11` % "test" classifier "tests",
`org.apache.spark_spark-core_2.11` % "provided" ,
`org.apache.hadoop_hadoop-yarn-api` % "provided" ,
`junit_junit` % "test" )

assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filterNot {x => List("spark-tensorflow-connector-1.0-SNAPSHOT.jar",
"tensorflow-hadoop-1.0-01232017-SNAPSHOT-shaded-protobuf.jar").contains(x.data.getName)}
}

licenses := Seq("Apache License 2.0" -> url("http://www.apache.org/licenses/LICENSE-2.0.html"))
1 change: 1 addition & 0 deletions project/build.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sbt.version=0.13.13
5 changes: 5 additions & 0 deletions project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.5")