[REVIEW] Create include folder for C++ API distribution #1129

dantegd · 2019-09-23T02:13:38Z

PR closes #964 and proposes folder structure for C++ API include folder for distribution.

Proposed folder structure is to mirror cuML’s Python API structure (which itself mirror’s Scikit-learn). CuDF has a flat structure, so this PR can be used to discuss what path we would prefer to take. Important observations:

This folder structure is only for the include folder (that forms the consumable client C++ API). Should improve clarity and consistency with the consumable Python API.
On the other hand I don’t think moving the src folder to the same structure is such a good idea, many algorithms have multiple files (which could even have naming clashes), a good example are dbscan and kmeans which each have quite a few files of their own, which would make a hypothetical cluster folder there be very big unless we introduce even more nested-ness to our folder structure. So I propose we stick to one folder per algo for the src folder, should help devs keep somewhat lean folders, and with the changes to CMakeLists in this PR is transparent to both users and almost entirely transparent to devs.
The above is done via a new CUML_INCLUDE_DIR cmake variable that makes it so that our codebase can include the headers in include independent of its structure, which will also minimize code changes if we decide to move any API around in the future or the results of the discussion of this PR are that we prefer a flat include/cuml folder
Only a few files form src_prims (i.e. cuml prims) are needed in the API, so those are installed at the INSTALL step of CMake/make. This way the prims still stay header only independent of the algorithms, and the consumable C++ API has the needed contents for consumption.

Tasks:

Create include folder and sub folders
Move header files to include folders for initial discussion (Most have been moved, enough to start discussion in PR)
Update CMakeLists and header includes in C++ code to test that things work
Test that install does install all the required cuML and prims headers needed
Discuss proposed folder structure
Update PR with folder structure discussion results
Maybe: Update C++ Apis to be templated like trustworthiness (could be a future issue/PR)
Update Python imports and code for changes
Test that everything works

…updates

teju85 · 2019-10-01T13:28:30Z

@dantegd never mind my previous request. I think I have a working version after merging the latest of branch-0.11. Will update this PR soon.

JohnZed · 2019-10-07T18:09:33Z

cpp/examples/kmeans/kmeans_example.cpp

@@ -68,41 +68,6 @@ bool get_arg(char **begin, char **end, const std::string &arg) {
  return false;
 }

-class cachingDeviceAllocator : public ML::deviceAllocator {


I think most changes are just moving around headers... out of curiosity, why do these cachingDeviceAllocator blocks get removed?

If I'm not mistaken its because they were part of the examples instead of inside cuML C++ library itself

that's true. cachingDeviceAllocator is a wrapper over the corresponding allocator inside cub. We have cub as our dependency anyways. So, made little sense to keep it inside examples. Thus, moved this inside our include folders.

It also helped resolve a cuml -> ml-prims -> cuml circular dependency on header files.

JohnZed · 2019-10-07T18:29:50Z

In general, I like the approach. I can't say I read every single header. My only concern is that sometimes the python and c++ code directories differ arbitrarily, and C++ users now have to understand the python layout. Maybe we should (in a follow on change?) update the locations of C++ source files as well? I'm thinking like randomforest -> ensemble as one clear example but there are many others.

dantegd · 2019-10-07T18:49:33Z

@JohnZed the problem with that and why I don't like the idea for the source files is that then folders can contain way more files in the source folders. For example cluster would include all the kmeans alongside all the dbscan folders, which would either require added folder sublevels (not ideal) or just living with mixed sources of different algorithms (even less ideal). One folder per algorithm source seems like the best way to avoid this mixing (with perhaps some exceptions). On the other hand where to put the header can be as simple as just asking one of the Python devs, or creating a first PR with the proposed API in its location which will then get feedback by us, which is a nice practice unto itself tbh

dantegd · 2019-10-07T19:06:20Z

Note: PR will require new libcumlprims 0.11 nightly that is not yet available

mike-wendt · 2019-10-10T16:38:05Z

@dantegd @teju85 This is no-longer blocked, packages are published: https://anaconda.org/rapidsai-nightly/libcumlprims/files

dantegd · 2019-10-10T21:03:16Z

rerun tests

teju85 · 2019-10-11T03:45:47Z

Thanks @mike-wendt !

JohnZed

I read over all the utilities and cmake code but only spot-checked the headers + source files.
My comments are minor and address the python utilities. I don't think they are really functional problems, so I'm ok pushing then updating the source (as long as that really happens soon! ;))

cpp/scripts/include_checker.py

…o fea-cpp-include

dantegd added 25 commits September 22, 2019 15:06

FEA First changes to cmakelists for include folder

a83ade2

FIX Proposed C API file name change

f5d3b64

FEA Move make_blobs hpp file to begin testing

55c71c4

FIX Undo name change

e520c09

FIX relative imports

bd7133f

FEA Move cuML and cuML_api headers to inclide

0ab7b62

FIX cuML to cuml in headers for RAPIDS consistency

314486f

FIX cuML to cuml in headers for RAPIDS consistency

b40328d

FEA Move headers to include following Scikit-learn folder organization

95afaae

FEA CMakeLists install include folder

8c90ce5

FIX CMake FILES typo

3244d08

FIX CMake FILES typo

ac918f6

FEA Add cuml inlcude variable to cmake

3a4c3a1

FIX Move additional needed headers for decision tree

e419722

FIX Move cuml.hpp and cuml_api to be inside cuml folder

114e4a7

FIX include paths

291057c

FIX include paths

83c47cb

FIX more include paths

536be8e

FIX more include paths

1907adb

FIX more include paths

6a7ef92

FIX Move umapparams to include folder

511f1ae

FIX more include paths

5c72bad

FEA Updated examples for include folder

e036591

FEA Updated googletest for include folder

56c9abe

FEA Updated googletest for include folder

15f5cf9

dantegd requested review from teju85, JohnZed, cjnolet and oyilmaz-nvidia September 23, 2019 02:13

dantegd changed the title ~~[WIP][DISCUSS] Creat include folder for C++ API distribution~~ [WIP][DISCUSS] Create include folder for C++ API distribution Sep 23, 2019

teju85 added 2 commits October 1, 2019 04:45

final set of updates to svm and ml unit-tests for the header updates

c070257

updated a couple more files from svm with the missing header include …

ec7ea03

…updates

teju85 added 5 commits October 6, 2019 22:57

updated include directories of cmake for comms

60a7f3b

fixed #include issue with the cuml comms header file

16fd8ba

updated pyx files for the latest include path changes

8f07139

updated few more headers and .pyx files with proper include paths

3930f3e

updated include path for adjusted rand-index pyx file

a742452

teju85 changed the title ~~[WIP][DISCUSS] Create include folder for C++ API distribution~~ [REVIEW] Create include folder for C++ API distribution Oct 7, 2019

teju85 added 2 commits October 6, 2019 23:48

making style checker happy

738df57

making style checker happy

4238351

JohnZed reviewed Oct 7, 2019

View reviewed changes

Merge branch 'branch-0.11' into fea-cpp-include

3e41b03

dantegd added the 0 - Blocked Cannot progress due to external reasons label Oct 7, 2019

dantegd removed the 0 - Blocked Cannot progress due to external reasons label Oct 10, 2019

dantegd marked this pull request as ready for review October 10, 2019 21:07

Merge branch 'branch-0.11' into 'fea-cpp-include'

548fec5

JohnZed approved these changes Oct 11, 2019

View reviewed changes

cpp/scripts/include_checker.py Outdated Show resolved Hide resolved

cpp/scripts/include_checker.py Outdated Show resolved Hide resolved

cpp/scripts/include_checker.py Outdated Show resolved Hide resolved

cpp/scripts/include_checker.py Outdated Show resolved Hide resolved

teju85 and others added 4 commits October 10, 2019 22:48

fixes addressing review comments

800afa2

added mpi comms directory to the include checker script

df9500f

Merge branch 'fea-cpp-include' of https://github.com/dantegd/cuml int…

c80f6fa

…o fea-cpp-include

Merge branch 'branch-0.11' into fea-cpp-include

3a86b9e

dantegd merged commit 494cf0b into rapidsai:branch-0.11 Oct 12, 2019

teju85 mentioned this pull request Oct 14, 2019

[REVIEW] Updated the treelite version #1239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Create include folder for C++ API distribution #1129

[REVIEW] Create include folder for C++ API distribution #1129

dantegd commented Sep 23, 2019 •

edited by mike-wendt

Loading

teju85 commented Oct 1, 2019

JohnZed Oct 7, 2019

dantegd Oct 7, 2019

teju85 Oct 9, 2019

JohnZed commented Oct 7, 2019

dantegd commented Oct 7, 2019

dantegd commented Oct 7, 2019

mike-wendt commented Oct 10, 2019

dantegd commented Oct 10, 2019

teju85 commented Oct 11, 2019

JohnZed left a comment

[REVIEW] Create include folder for C++ API distribution #1129

[REVIEW] Create include folder for C++ API distribution #1129

Conversation

dantegd commented Sep 23, 2019 • edited by mike-wendt Loading

teju85 commented Oct 1, 2019

JohnZed Oct 7, 2019

Choose a reason for hiding this comment

dantegd Oct 7, 2019

Choose a reason for hiding this comment

teju85 Oct 9, 2019

Choose a reason for hiding this comment

JohnZed commented Oct 7, 2019

dantegd commented Oct 7, 2019

dantegd commented Oct 7, 2019

mike-wendt commented Oct 10, 2019

dantegd commented Oct 10, 2019

teju85 commented Oct 11, 2019

JohnZed left a comment

Choose a reason for hiding this comment

dantegd commented Sep 23, 2019 •

edited by mike-wendt

Loading