forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-33398] Fix loading tree models prior to Spark 3.0
### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/21632/files#diff-0fdae8a6782091746ed20ea43f77b639f9c6a5f072dd2f600fcf9a7b37db4f47, a new field `rawCount` was added into `NodeData`, which cause that a tree model trained in 2.4 can not be loaded in 3.0/3.1/master; field `rawCount` is only used in training, and not used in `transform`/`predict`/`featureImportance`. So I just set it to -1L. ### Why are the changes needed? to support load old tree model in 3.0/3.1/master ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added testsuites Closes apache#30889 from zhengruifeng/fix_tree_load. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Sean Owen <[email protected]>
- Loading branch information
1 parent
963c60f
commit 6b7527e
Showing
74 changed files
with
122 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file added
BIN
+36 Bytes
...s/dtc-2.4.7/data/.part-00000-bd7ae42f-c890-406c-894c-ca4eac67c690-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.17 KB
...models/dtc-2.4.7/data/part-00000-bd7ae42f-c890-406c-894c-ca4eac67c690-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16 Bytes
mllib/src/test/resources/ml-models/dtc-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/dtc-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.classification.DecisionTreeClassificationModel","timestamp":1608687929358,"sparkVersion":"2.4.7","uid":"dtc_bc7ad285bb73","paramMap":{},"defaultParamMap":{"impurity":"gini","maxDepth":5,"labelCol":"label","maxMemoryInMB":256,"featuresCol":"features","predictionCol":"prediction","minInfoGain":0.0,"seed":159147643,"rawPredictionCol":"rawPrediction","minInstancesPerNode":1,"cacheNodeIds":false,"probabilityCol":"probability","maxBins":32,"checkpointInterval":10},"numFeatures":692,"numClasses":2} |
Binary file not shown.
Binary file added
BIN
+36 Bytes
...s/dtr-2.4.7/data/.part-00000-39b027f0-a437-4b3d-84af-d861adcb9ca8-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.19 KB
...models/dtr-2.4.7/data/part-00000-39b027f0-a437-4b3d-84af-d861adcb9ca8-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+12 Bytes
mllib/src/test/resources/ml-models/dtr-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/dtr-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.regression.DecisionTreeRegressionModel","timestamp":1608687932847,"sparkVersion":"2.4.7","uid":"dtr_c16a90fcdaf8","paramMap":{},"defaultParamMap":{"labelCol":"label","checkpointInterval":10,"minInfoGain":0.0,"maxMemoryInMB":256,"minInstancesPerNode":1,"maxBins":32,"seed":926680331,"cacheNodeIds":false,"maxDepth":5,"predictionCol":"prediction","featuresCol":"features","impurity":"variance"},"numFeatures":692} |
Binary file not shown.
Binary file added
BIN
+44 Bytes
.../gbtc-2.4.7/data/.part-00000-dacbde64-c861-41c7-91c0-6da8cc01fb43-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+4.44 KB
...odels/gbtc-2.4.7/data/part-00000-dacbde64-c861-41c7-91c0-6da8cc01fb43-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16 Bytes
mllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.classification.GBTClassificationModel","timestamp":1608687932103,"sparkVersion":"2.4.7","uid":"gbtc_81db008b4f25","paramMap":{"maxIter":2},"defaultParamMap":{"seed":-1287390502,"maxMemoryInMB":256,"stepSize":0.1,"validationTol":0.01,"maxBins":32,"checkpointInterval":10,"predictionCol":"prediction","lossType":"logistic","rawPredictionCol":"rawPrediction","featuresCol":"features","cacheNodeIds":false,"maxIter":20,"featureSubsetStrategy":"all","impurity":"gini","minInstancesPerNode":1,"minInfoGain":0.0,"maxDepth":5,"subsamplingRate":1.0,"labelCol":"label","probabilityCol":"probability"},"numFeatures":692,"numTrees":2} |
Binary file added
BIN
+8 Bytes
mllib/src/test/resources/ml-models/gbtc-2.4.7/treesMetadata/._SUCCESS.crc
Binary file not shown.
Binary file added
BIN
+36 Bytes
....7/treesMetadata/.part-00000-81137d9f-31e3-4a90-813c-ddc394101e21-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3 KB
...c-2.4.7/treesMetadata/part-00000-81137d9f-31e3-4a90-813c-ddc394101e21-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+40 Bytes
.../gbtr-2.4.7/data/.part-00000-3b5433ff-d346-4511-9aab-639288bfae6d-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.65 KB
...odels/gbtr-2.4.7/data/part-00000-3b5433ff-d346-4511-9aab-639288bfae6d-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16 Bytes
mllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.regression.GBTRegressionModel","timestamp":1608687942434,"sparkVersion":"2.4.7","uid":"gbtr_0a74cb2536ff","paramMap":{"maxIter":2},"defaultParamMap":{"impurity":"variance","maxMemoryInMB":256,"maxDepth":5,"subsamplingRate":1.0,"validationTol":0.01,"labelCol":"label","maxIter":20,"checkpointInterval":10,"minInfoGain":0.0,"predictionCol":"prediction","stepSize":0.1,"cacheNodeIds":false,"lossType":"squared","seed":-131597770,"featureSubsetStrategy":"all","featuresCol":"features","minInstancesPerNode":1,"maxBins":32},"numFeatures":692,"numTrees":2} |
Binary file added
BIN
+8 Bytes
mllib/src/test/resources/ml-models/gbtr-2.4.7/treesMetadata/._SUCCESS.crc
Binary file not shown.
Binary file added
BIN
+32 Bytes
....7/treesMetadata/.part-00000-6b9124f5-87fe-4fd8-ad9c-4be239c2215a-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+2.97 KB
...r-2.4.7/treesMetadata/part-00000-6b9124f5-87fe-4fd8-ad9c-4be239c2215a-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+40 Bytes
...s/rfc-2.4.7/data/.part-00000-e41a7b98-91f8-4485-b112-25b4b11c9009-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.75 KB
...models/rfc-2.4.7/data/part-00000-e41a7b98-91f8-4485-b112-25b4b11c9009-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16 Bytes
mllib/src/test/resources/ml-models/rfc-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/rfc-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.classification.RandomForestClassificationModel","timestamp":1608687930713,"sparkVersion":"2.4.7","uid":"rfc_db1adb353f1e","paramMap":{"numTrees":2},"defaultParamMap":{"impurity":"gini","predictionCol":"prediction","numTrees":20,"maxDepth":5,"featureSubsetStrategy":"auto","subsamplingRate":1.0,"featuresCol":"features","checkpointInterval":10,"rawPredictionCol":"rawPrediction","cacheNodeIds":false,"labelCol":"label","seed":207336481,"probabilityCol":"probability","maxBins":32,"minInstancesPerNode":1,"minInfoGain":0.0,"maxMemoryInMB":256},"numFeatures":692,"numClasses":2,"numTrees":2} |
Binary file added
BIN
+8 Bytes
mllib/src/test/resources/ml-models/rfc-2.4.7/treesMetadata/._SUCCESS.crc
Binary file not shown.
Binary file added
BIN
+36 Bytes
....7/treesMetadata/.part-00000-21082d24-b666-4c4e-a823-70c7afdcbdc5-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.31 KB
...c-2.4.7/treesMetadata/part-00000-21082d24-b666-4c4e-a823-70c7afdcbdc5-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+40 Bytes
...s/rfr-2.4.7/data/.part-00000-4a69607d-6edb-40fc-b681-981caaeca996-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.71 KB
...models/rfr-2.4.7/data/part-00000-4a69607d-6edb-40fc-b681-981caaeca996-c000.snappy.parquet
Binary file not shown.
Binary file not shown.
Binary file added
BIN
+16 Bytes
mllib/src/test/resources/ml-models/rfr-2.4.7/metadata/.part-00000.crc
Binary file not shown.
Empty file.
1 change: 1 addition & 0 deletions
1
mllib/src/test/resources/ml-models/rfr-2.4.7/metadata/part-00000
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"class":"org.apache.spark.ml.regression.RandomForestRegressionModel","timestamp":1608687933536,"sparkVersion":"2.4.7","uid":"rfr_d946d96b7ff0","paramMap":{"numTrees":2},"defaultParamMap":{"numTrees":20,"featureSubsetStrategy":"auto","maxDepth":5,"minInstancesPerNode":1,"labelCol":"label","cacheNodeIds":false,"checkpointInterval":10,"featuresCol":"features","maxMemoryInMB":256,"predictionCol":"prediction","minInfoGain":0.0,"subsamplingRate":1.0,"impurity":"variance","seed":235498149,"maxBins":32},"numFeatures":692,"numTrees":2} |
Binary file added
BIN
+8 Bytes
mllib/src/test/resources/ml-models/rfr-2.4.7/treesMetadata/._SUCCESS.crc
Binary file not shown.
Binary file added
BIN
+32 Bytes
....7/treesMetadata/.part-00000-dfe4db51-d349-447a-9b86-d95edaabcde8-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+2.98 KB
...r-2.4.7/treesMetadata/part-00000-dfe4db51-d349-447a-9b86-d95edaabcde8-c000.snappy.parquet
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters