-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tfrecords to parquet #1085
Tfrecords to parquet #1085
Conversation
First draft of the API overhauls changes. Adds most core functionality, including defining workflow graphs with a ColumnGroup class, the workflow and dataset changes , most operators converted to use the new api, etc.
Also partially fix some tests inside test_workflow
Co-authored-by: Richard (Rick) Zamora <[email protected]>
Co-authored-by: Richard (Rick) Zamora <[email protected]>
We should be doing online transforms like ```KerasSequenceLoader(workflow.transform(dataset), ...``` instead of ```KerasSequenceLoader(dataset, workflows=[workflow], ...``` now
* test_minmix * updates test * unittest ops
Re-add get_embedding_sizes . Note that this changes how we support multi-hot columns here (sizes are returned same as single hot, and we don't use this method to distinguish between multi and singlehot columns)
add save_stats/load_stats/clear_stats methods to the workflow, with each statoperator getting called as appropiate
git merge upstream/main Merge branch 'main' of https://github.com/NVIDIA/NVTabular
Click to view CI ResultsGitHub pull request #1085 of commit f7669bafdb6cd53b05316baa242a2da87cc8ac62, no merge conflicts. Running as SYSTEM Setting status of f7669bafdb6cd53b05316baa242a2da87cc8ac62 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3286/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1085/*:refs/remotes/origin/pr/1085/* # timeout=10 > git rev-parse f7669bafdb6cd53b05316baa242a2da87cc8ac62^{commit} # timeout=10 Checking out Revision f7669bafdb6cd53b05316baa242a2da87cc8ac62 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f f7669bafdb6cd53b05316baa242a2da87cc8ac62 # timeout=10 Commit message: "leverage pandas-tfrecords" > git rev-list --no-walk d81d683d4445c8b25c2df8a0efcbaba246ce67ce # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins6327538182942156347.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (57.4.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+57.gf7669ba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+57.gf7669ba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+57.gf7669ba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+57.gf7669ba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+57.gf7669ba is already the active version in easy-install.pth |
@benfred updated to write each tfrecord to one parquet file by using cuDF |
Click to view CI ResultsGitHub pull request #1085 of commit 390dacba0a7b8549956b081e2d69aac0c3de8016, no merge conflicts. Running as SYSTEM Setting status of 390dacba0a7b8549956b081e2d69aac0c3de8016 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3378/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1085/*:refs/remotes/origin/pr/1085/* # timeout=10 > git rev-parse 390dacba0a7b8549956b081e2d69aac0c3de8016^{commit} # timeout=10 Checking out Revision 390dacba0a7b8549956b081e2d69aac0c3de8016 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 390dacba0a7b8549956b081e2d69aac0c3de8016 # timeout=10 Commit message: "write to one parquet" > git rev-list --no-walk fae18bd9e616ce8d4ecb7da2f643f354be3d094c # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4703159016309718325.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+58.g390dacb -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+58.g390dacb -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+58.g390dacb -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+58.g390dacb -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+58.g390dacb is already the active version in easy-install.pth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Left a couple minor comments below
Click to view CI ResultsGitHub pull request #1085 of commit 797e575c85ffac7d5041615714c03125cf85225f, no merge conflicts. Running as SYSTEM Setting status of 797e575c85ffac7d5041615714c03125cf85225f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3434/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1085/*:refs/remotes/origin/pr/1085/* # timeout=10 > git rev-parse 797e575c85ffac7d5041615714c03125cf85225f^{commit} # timeout=10 Checking out Revision 797e575c85ffac7d5041615714c03125cf85225f (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 797e575c85ffac7d5041615714c03125cf85225f # timeout=10 Commit message: "updates" > git rev-list --no-walk c27ee12a57a1e7cb52ee0a8c02572ab4cc10b304 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins6732105261122344383.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+59.g797e575 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+59.g797e575 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+59.g797e575 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+59.g797e575 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+59.g797e575 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request #1085 of commit 81833705f65b1dfc3afea9c0e5b559437b294083, no merge conflicts. Running as SYSTEM Setting status of 81833705f65b1dfc3afea9c0e5b559437b294083 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3472/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1085/*:refs/remotes/origin/pr/1085/* # timeout=10 > git rev-parse 81833705f65b1dfc3afea9c0e5b559437b294083^{commit} # timeout=10 Checking out Revision 81833705f65b1dfc3afea9c0e5b559437b294083 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 81833705f65b1dfc3afea9c0e5b559437b294083 # timeout=10 Commit message: "Merge branch 'main' into tfrecords_to_parquet" > git rev-list --no-walk 6c1c28fa935f98047654c3e873c4de9f0eae63f1 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins3196692445156424998.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+92.g8183370 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+92.g8183370 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+92.g8183370 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+92.g8183370 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+92.g8183370 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request #1085 of commit f8f630f9f2cb14247781624258ead2fe7c617ac3, no merge conflicts. Running as SYSTEM Setting status of f8f630f9f2cb14247781624258ead2fe7c617ac3 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3474/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1085/*:refs/remotes/origin/pr/1085/* # timeout=10 > git rev-parse f8f630f9f2cb14247781624258ead2fe7c617ac3^{commit} # timeout=10 Checking out Revision f8f630f9f2cb14247781624258ead2fe7c617ac3 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f f8f630f9f2cb14247781624258ead2fe7c617ac3 # timeout=10 Commit message: "Merge branch 'main' into tfrecords_to_parquet" > git rev-list --no-walk 295d4e2c1059fe268a6e76560efac27ecaf6f887 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1491955176067397627.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+94.gf8f630f -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+94.gf8f630f -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+94.gf8f630f -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+94.gf8f630f -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+94.gf8f630f is already the active version in easy-install.pth |
Example + function to convert tfrecords to parquet