Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build tiledbvcf-py on Windows #507

Merged
merged 3 commits into from
Mar 17, 2023
Merged

Build tiledbvcf-py on Windows #507

merged 3 commits into from
Mar 17, 2023

Conversation

jdblischak
Copy link
Collaborator

@jdblischak jdblischak commented Mar 2, 2023

This is a follow-up to PR #484 from @dhoke4tdb. That merged PR enabled building libtiledbvcf on Windows by linking against a customized build of htslib on msys2. The goal of this PR is to enable building the Python package on Windows as well.

Current status: Python client builds on Windows, tests pass (link to passing build on my fork)

Local installation

Steps to build locally

  • Checkout my branch

    git clone https://github.com/TileDB-Inc/TileDB-VCF.git
    cd TileDB-VCF
    git remote add john https://github.com/jdblischak/TileDB-VCF.git
    git fetch john
    git checkout john/build-win-py
    
  • Install the conda env:

    mamba env create --file ci/gha-win-env.yml
    mamba activate win-env
    
  • Build libtiledbvcf

    cmd /C CALL ci\build-libtiledbvcf.bat
    tiledbvcf version
  • Build tiledbvcf-py

    cmd /C CALL ci\build-tiledbvcf-py.bat
    python -c "import tiledbvcf; print(tiledbvcf.version)"
  • Run tests

    pytest apis\python\tests

Warning messages from tests

The warning messages from running the tests on Windows are different from the Azure build. Copy-pasting them below in case they are of concern

pytest warnings on Windows
============================= test session starts =============================
platform win32 -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rootdir: D:\a\TileDB-VCF\TileDB-VCF\apis\python
collected 43 items

apis\python\tests\test_dask.py ....                                      [  9%]
apis\python\tests\test_tiledbvcf.py .........s............s............. [ 93%]
s..                                                                      [100%]

============================== warnings summary ===============================
C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\ioloop.py:265
  C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\ioloop.py:265: DeprecationWarning: There is no current event loop
    loop = asyncio.get_event_loop()

C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\ioloop.py:350
  C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
    self.make_current()

C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\platform\asyncio.py:360
  C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\tornado\platform\asyncio.py:360: DeprecationWarning: There is no current event loop
    self.old_asyncio = asyncio.get_event_loop()

C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\bokeh\core\property\primitive.py:37
  C:\Users\runneradmin\micromamba-root\envs\win-env\Lib\site-packages\bokeh\core\property\primitive.py:37: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
    bokeh_bool_types += (np.bool8,)

tests/test_tiledbvcf.py::test_incomplete_read_generator
tests/test_tiledbvcf.py::test_incomplete_read_generator
  D:\a\TileDB-VCF\TileDB-VCF\apis\python\tests\test_tiledbvcf.py:375: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
    overall_df = overall_df.append(df, ignore_index=True)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 40 passed, 3 skipped, 6 warnings in 51.22s ==================

@dhoke4tdb
Copy link
Contributor

The following patch will allow that final setup.py to complete without errors, but is not a correct solution (won't be appropriate for any msys build if there is, or ever will be, one)

diff --git a/apis/python/setup.py b/apis/python/setup.py
index bf67312..de3b246 100644
--- a/apis/python/setup.py
+++ b/apis/python/setup.py
@@ -35,6 +35,7 @@ from wheel.bdist_wheel import bdist_wheel
 
 import multiprocessing
 import os
+import platform
 import shutil
 import subprocess
 import sys
@@ -223,6 +224,7 @@ class BuildExtCmd(build_ext):
     """Builds the Pybind11 extension module."""
 
     def build_extensions(self):
+        # Note: These options are simply reported as invalid by msvc cl 19.29.30145
         opts = ["-std=c++11", "-g"]
         if TILEDBVCF_DEBUG_BUILD:
             opts.extend(["-O0"])
@@ -243,7 +245,7 @@ class BuildExtCmd(build_ext):
             ext.include_dirs.append(pyarrow.get_include())
 
             # don't overlink the arrow core library
-            if "arrow" in ext.libraries:
+            if (platform.system() != "Windows") and ("arrow" in ext.libraries):
                 ext.libraries.remove("arrow")
             ext.library_dirs.extend(pyarrow.get_library_dirs())
 

@jdblischak
Copy link
Collaborator Author

The following patch will allow that final setup.py to complete without errors

@dhoke4tdb Thanks! I'm testing it locally now

but is not a correct solution (won't be appropriate for any msys build if there is, or ever will be, one)

It's my understanding that we only intend to support an MSVC build of libtiledbvcf and tiledbvcf-py, at least for now

@jdblischak
Copy link
Collaborator Author

Alright, I have it working locally!

python -c "import tiledbvcf; print(tiledbvcf.version)"
rem 0.22.1.dev11+dirty

@dhoke4tdb
Copy link
Contributor

@jdblischak here is a patch that has a few additional changes for msvc, this is to be applied to the commit -before- you applied my earlier patch...
These changes should probably also get into the main repository, if you want to replace the earlier change with these, that'd be great, if not I'll try to remember if I take note of this PR being merged and make the additional adjustments later.

diff --git a/apis/python/setup.py b/apis/python/setup.py
index bf67312..cb0c416 100644
--- a/apis/python/setup.py
+++ b/apis/python/setup.py
@@ -35,6 +35,7 @@ from wheel.bdist_wheel import bdist_wheel
 
 import multiprocessing
 import os
+import platform
 import shutil
 import subprocess
 import sys
@@ -223,11 +224,19 @@ class BuildExtCmd(build_ext):
     """Builds the Pybind11 extension module."""
 
     def build_extensions(self):
-        opts = ["-std=c++11", "-g"]
-        if TILEDBVCF_DEBUG_BUILD:
-            opts.extend(["-O0"])
-        else:
-            opts.extend(["-O2"])
+        if (platform.system() != "Windows"):
+            opts = ["-std=c++11", "-g"]
+            if TILEDBVCF_DEBUG_BUILD:
+                opts.extend(["-O0"])
+            else:
+                opts.extend(["-O2"])
+        else: # windows
+            # Note: newer versions of msvc cl may not recognize -std:c++11
+            opts = ["-std:c++11", "-Zi"]
+            if TILEDBVCF_DEBUG_BUILD:
+                opts.extend(["-Od"])
+            else:
+                opts.extend(["-O2"])
 
         link_opts = []
         for ext in self.extensions:
@@ -243,7 +252,7 @@ class BuildExtCmd(build_ext):
             ext.include_dirs.append(pyarrow.get_include())
 
             # don't overlink the arrow core library
-            if "arrow" in ext.libraries:
+            if (platform.system() != "Windows") and ("arrow" in ext.libraries):
                 ext.libraries.remove("arrow")
             ext.library_dirs.extend(pyarrow.get_library_dirs())
 

@jdblischak
Copy link
Collaborator Author

replace the earlier change with these, that'd be great

It's no problem to do an interactive rebase to update your patch. I'm working on other things at the moment, but I'll return to this first thing Monday

@jdblischak
Copy link
Collaborator Author

@dhoke4tdb I rebased and applied your updated patch

@jdblischak
Copy link
Collaborator Author

I also created a GitHub Actions workflow to test the build on Windows. The good news is that we can now build and install the Python client on Windows. The bad news is that both locally and on the GitHub Actions runner, the Python client is non-functional.

When I run pytest apis\python, it fails with the error below. According to this pytest Issue pytest-dev/pytest#7634 (comment), this means that there is a problem with a DLL

Windows fatal exception: access violation

And when I try to simply read an array, Python immediately crashes without any error message. The exit code is -1073741819 (note that the pytest run also exits with the same exit code).

Here is the quick python test code:

import tiledbvcf

ds = tiledbvcf.Dataset(
  uri = "libtiledbvcf/test/inputs/arrays/v3/ingested_2samples",
  mode = "r"
)

And here is what happens when I run it:

python ci\test-tiledbvcf-py.py
echo %ERRORLEVEL%
rem -1073741819

@jdblischak
Copy link
Collaborator Author

Also note that libtiledbvcf itself appears to be functional, so the problem is with the Python client

tiledbvcf stat --uri libtiledbvcf\test\inputs\arrays\v3\ingested_2samples
rem Statistics for dataset 'libtiledbvcf\test\inputs\arrays\v3\ingested_2samples':
rem - Version: 3
rem - Row tile extent: 10
rem - Tile capacity: 10,000
rem - Anchor gap: 1,000
rem - Number of samples: 2
rem - Extracted attributes: none

tiledbvcf list --uri libtiledbvcf\test\inputs\arrays\v3\ingested_2samples
rem HG00280
rem HG01762

@dhoke4tdb
Copy link
Contributor

The 'top' problem is that the tiledbvcf.exe winds up being linked/used as a .dll by the libtiledbvcf extension module instead of tiledbvcf.dll.
Below is a patch to address this.
After the patch, appears I am able to execute your mini test example ds = tiledbvcf.Dataset(...), when using your build-win-py environment. (If I try to use the api/tests/conda-env.yml 'tiledbvcf-py' environment, some other problem seems to occur involving loading of the tiledb.dll/tiledb extension module... )
However, I have not been successful running the api/python/tests, and I appear to be in some sort of 'python module hell'...
If I try to run pytest tests/test_tiledbvcf.py -s, I'm winding up with an error that looks like its failing on
import tiledb.cc as lt
but if I execute python directly and do an 'import tiledb.cc as lt' that seems to succeed...
So I'll pass this patch along, but will not be surprised if it does not solve all your problems - hoping you have greater python knowledge/fu to maybe recognize what's going on with the further module issues...

diff --git a/libtiledbvcf/src/CMakeLists.txt b/libtiledbvcf/src/CMakeLists.txt
index 471e8de..a78f2d9 100644
--- a/libtiledbvcf/src/CMakeLists.txt
+++ b/libtiledbvcf/src/CMakeLists.txt
@@ -207,7 +207,11 @@ add_executable(tiledbvcf-bin
   ${CMAKE_CURRENT_SOURCE_DIR}/cli/tiledbvcf.cc
 )
 
-set_target_properties(tiledbvcf-bin PROPERTIES OUTPUT_NAME tiledbvcf)
+if (WIN32)
+  set_target_properties(tiledbvcf-bin PROPERTIES OUTPUT_NAME tiledbvcfcli)
+else()
+  set_target_properties(tiledbvcf-bin PROPERTIES OUTPUT_NAME tiledbvcf)
+endif()
 
 if(!WIN32)
   target_link_libraries(tiledbvcf-bin
@@ -309,8 +313,28 @@ if (WIN32)
 endif()
 # TBD: how to handle the tiledbvcf4test, don't really want it part of ultimate distribution...
 # ... does release packaging simply need to ignore it?
+
+if(WIN32)
+  # Having the .dll and the .exe with the same name was resulting in a .exp
+  # file for the .exe being 'last out' (replacing one of same name for the .dll)
+  # and was causing the python api extension
+  # to link to the .exe, which was not functional.
+  # So, tiledbvcf-bin now sets an output (above somewhere) of tiledbvcfcli.exe
+  # and we install it here RENAMEing it to match the known name in pre-existing
+  # *nix world.
+  install(PROGRAMS $<TARGET_FILE:tiledbvcf-bin>
+    RENAME tiledbvcf.exe
+    DESTINATION ${CMAKE_INSTALL_BINDIR}
+    )
+else()
+  install(
+    TARGETS tiledbvcf-bin
+    RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
+    )
+endif()
+
 install(
-  TARGETS tiledbvcf tiledbvcf-bin ${TILEDBVCF_EXTRA_INSTALL_TARGETS}
+  TARGETS tiledbvcf ${TILEDBVCF_EXTRA_INSTALL_TARGETS}
   RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
   LIBRARY DESTINATION lib
   ARCHIVE DESTINATION lib

@jdblischak
Copy link
Collaborator Author

jdblischak commented Mar 15, 2023

Thanks David! After applying your patch, I have things working well locally.

My simple examples runs fine:

import tiledbvcf

ds = tiledbvcf.Dataset(
  uri = "libtiledbvcf/test/inputs/arrays/v3/ingested_2samples",
  mode = "r"
)
[[ed for my future reference: 
x = ds.read(attrs=["sample_name"])]
]]
print(x["sample_name"].unique())
## ['HG00280' 'HG01762']

And the tests are passing. The only failure was one test tries to call the cli tool bcftools, which isn't available as a conda package for Windows (if someone requests it on Windows, then we'd have to go through the same process as we did for htslib; bcftools install instructions for msys2). Below are the truncated test results

pytest apis\python\tests\test_tiledbvcf.py
rem ================================================= test session starts =================================================
rem platform win32 -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rem rootdir: C:\Users\john\repos\TileDB-VCF\apis\python
rem collected 39 items
rem 
rem apis\python\tests\test_tiledbvcf.py .........s............s.............F..                                      [100%]
rem 
rem ------------------------------------------------ Captured stderr call -------------------------------------------------
rem 'bcftools' is not recognized as an internal or external command,
rem operable program or batch file.
rem 
rem =============================================== short test summary info ===============================================
rem FAILED apis\python\tests\test_tiledbvcf.py::test_ingest_with_stats - subprocess.CalledProcessError: Command 'bcftools view --no-version -Oz -o C:\Users\john\AppData\Local\Temp\pytest-o...
rem =========================== 1 failed, 36 passed, 2 skipped, 2 warnings in 110.96s (0:01:50) ===========================


pytest apis\python\tests\test_dask.py
rem ================================================= test session starts =================================================
rem platform win32 -- Python 3.11.0, pytest-7.2.2, pluggy-1.0.0
rem rootdir: C:\Users\john\repos\TileDB-VCF\apis\python
rem collected 4 items
rem 
rem apis\python\tests\test_dask.py ....                                                                              [100%]
rem
rem ============================================ 4 passed, 4 warnings in 5.33s ============================================

@jdblischak jdblischak marked this pull request as ready for review March 16, 2023 17:23
@jdblischak
Copy link
Collaborator Author

@ihnorton This PR is ready for review. The Python client now builds on Windows. Some notes:

  • Please approve my new GitHub Actions workflow so that it runs
  • Please don't squash the commits. I purposefully re-formatted them to give @dhoke4tdb credit for his contributions. I'd prefer a fast-forward merge, but rebasing would also be ok

@ihnorton ihnorton requested a review from awenocur March 16, 2023 18:00
Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the comment. Amazing work! cc @awenocur for final review/merge as maintainer.

@jdblischak
Copy link
Collaborator Author

Also, since I'm a first-time contributor to this repo, my GitHub Action workflow has to be re-approved

@jdblischak
Copy link
Collaborator Author

My latest commit passed in the build from my fork

@jdblischak
Copy link
Collaborator Author

And as far as I can tell, the test that requires bcftools is still run on linux and macOS. Only 2 tests are skipped, just like the jobs that most recently ran on the master branch

@jdblischak
Copy link
Collaborator Author

@awenocur Could you please approve the GitHub Actions workflow to run and review the PR for any remaining items you would like me to address?

@awenocur awenocur merged commit 54ce6fc into TileDB-Inc:master Mar 17, 2023
@jdblischak jdblischak deleted the build-win-py branch March 17, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants