-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add db benchmark script #1928
Add db benchmark script #1928
Conversation
meh, i think im close but im having some issues due to docker memory swapping which isnt allowed by some db-benchmark helpers. will come back to this. |
Lol. I unknowingly created a shell script that had a name conflict with a db-benchmark script which caused issue. |
RUN apt-get update && \ | ||
apt-get install -y git build-essential | ||
|
||
# Install R, curl, and python deps | ||
RUN apt-get update && apt-get -y install --no-install-recommends --no-install-suggests \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the apt-get update seems to be redundant? looks like we can merge these two apt runs to reduce build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thx.
|
||
# Clone datafusion-python and build python library | ||
# Not sure if the wheel will be the same on all computers | ||
RUN git clone https://github.com/datafusion-contrib/datafusion-python \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to clone a particular tag/commit to make this more reproducible.
I have this working now. Would at least one other person be able to give it a try. I do the following from root directory of
So far im not seeing as good of results in docker as i was natively but i think thats expected, even using Right now, this only works with official datafusion release (version 7). However, I added option to pull in local datafusion so that its easier to benchmark local changes. Even if the docker performance isnt optimal it should at least give a baseline for making local changes and seeing if there are improvements - which i think is the intent of this script. let me know if any questions or if you see any areas for improvement. |
@matthewmturner you are on a mac machine right? |
Yes, M1 Mac |
that's expected then. the overhead should go away with when it's executed on a linux box. |
Is this specific to M1 Mac? I am running Ubuntu 20.04 on AMD64 and I get this error on docker build:
Will try |
@bobtins thanks for trying it out! You'll just need to update the wheel file name in the docker file. Looks like the below is what you'll need. /datafusion-python/target/wheels/datafusion-0.4.0-cp36-abi3-linux_x86_64.whl |
Yeah, that got it to work. |
Nice find - I can look into integrating that |
@houqp If no other feedback I think this is good to merge for now. @bobtins im a bit constrained on time right now to work on getting the architectures / wheels integrated properly into the script. given youve shown interest in the benchmarking area would you be willing to handle that as a follow on PR? |
Possible to adjust the docker file to make it more generic? Meet the same problem here. After some search, Buildx might do the trick? |
Almost got it...will post a diff as soon as it works. |
@bobtins thanks so much! |
benchmarks/db-benchmark/README.md
Outdated
Run the following from root `arrow-datafusion` directory | ||
|
||
```bash | ||
$ docker build -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ docker build -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . | |
$ docker buildx build -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it working with changes suggested.
Looking forward to playing around with this and maybe adding some more features.
# 1. datafusion-python that builds from datafusion version referenced datafusion-python | ||
RUN cd datafusion-python \ | ||
&& maturin build --release \ | ||
&& python3 -m pip install target/wheels/datafusion-0.4.0-cp36-abi3-linux_aarch64.whl \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&& python3 -m pip install target/wheels/datafusion-0.4.0-cp36-abi3-linux_aarch64.whl \ | |
&& case "${TARGETPLATFORM}" in \ | |
*/amd64) CPUARCH=x86_64 ;; \ | |
*/arm64) CPUARCH=aarch64 ;; \ | |
*) exit 1 ;; \ | |
esac \ | |
&& python3 -m pip install target/wheels/datafusion-0.4.0-cp36-abi3-linux_${CPUARCH}.whl \ | |
&& case "${TARGETPLATFORM}" in \ | |
*/amd64) CPUARCH=x86_64 ;; \ | |
*/arm64) CPUARCH=aarch64 ;; \ | |
*) exit 1 ;; \ | |
esac \ | |
&& python3 -m pip install target/wheels/datafusion-0.4.0-cp36-abi3-linux_${CPUARCH}.whl \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's weird, didn't mean to put it in twice; haven't used this github feature before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, ignore those suggestions...I just did a diff and attached it.
@bobtins thank you for the help! I am going to review and integrate tomorrow morning. |
benchmarks/db-benchmark/README.md
Outdated
Run the following from root `arrow-datafusion` directory | ||
|
||
```bash | ||
$ docker buildx -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ docker buildx -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . | |
$ docker buildx build -t db-benchmark -f benchmarks/db-benchmark/db-benchmark.dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sry, did that local but forgot to add there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @matthewmturner !
Which issue does this PR close?
Closes #1870
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?