Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36090: [C++] Add testing libraries for Acero & Datasets #36206

Merged

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented Jun 21, 2023

Rationale for this change

The acero & datasets tests rely on some common sources. The current build includes those in every test executable as sources. This was a bit of a change introduced as we extracted acero into its own library. This causes those files to be built again and again and this caused an increase in build times.

This change creates test libraries for each of those modules. These test libraries include the common sources and are built once. These libraries can then be included in the tests. This avoids the duplicate builds.

What changes are included in this PR?

Changes to how common test sources are built and linked.

Are these changes tested?

Yes. These tests still run as part of the unit tests.

I'm not entirely sure if the cmake.in and pc.in files are tested or not as I don't know entirely how they factor into things. However, I believe these are only needed if users want to rely on these test libraries and I'm not sure how critical that is.

Are there any user-facing changes?

No.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@lidavidm
Copy link
Member

If you don't want a 'real' library, you could use a CMake object library, so that the object files for common sources are built once and then linked into each binary.

@westonpace
Copy link
Member Author

Kou had suggested this as well. I started by just copy/pasting flight (but must've missed something as CI tests are trying to build it even when testing is off). I'll look into making it an object library to see if that's simpler.

@westonpace
Copy link
Member Author

I assume I can just ignore ARROW_ACERO_TEST_LINKAGE for object libraries

@westonpace
Copy link
Member Author

westonpace commented Jun 21, 2023

Actually, it looks like we are already using object libraries (this is from the definition of add_arrow_lib):

  if(WIN32
     OR (CMAKE_GENERATOR STREQUAL Xcode)
     OR CMAKE_VERSION VERSION_LESS 3.12
     OR NOT ARROW_POSITION_INDEPENDENT_CODE)
    # We need to compile C++ separately for each library kind (shared and static)
    # because of dllexport declarations on Windows.
    # The Xcode generator doesn't reliably work with Xcode as target names are not
    # guessed correctly.
    # We can't use target for object library with CMake 3.11 or earlier.
    # See also: Object Libraries:
    # https://cmake.org/cmake/help/latest/command/add_library.html#object-libraries
    set(USE_OBJLIB OFF)
  else()
    set(USE_OBJLIB ON)
  endif()

  if(USE_OBJLIB)
    # Generate a single "objlib" from all C++ modules and link
    # that "objlib" into each library kind, to avoid compiling twice
    add_library(${LIB_NAME}_objlib OBJECT ${ARG_SOURCES})

I don't know if these requirements (e.g. XCode, WIN32) apply for these testing libraries or not but it seems we can't always use object libraries in all situations.

@lidavidm
Copy link
Member

I think that sets up an object library that gets linked into a shared and a static library.

But ugh, I forgot about the dllimport/dllexport issues. That said...it shouldn't matter for test code only? There would be no import/export issue since it's all the same binary.

@westonpace
Copy link
Member Author

I think that sets up an object library that gets linked into a shared and a static library.

Ah, ok.

But ugh, I forgot about the dllimport/dllexport issues. That said...it shouldn't matter for test code only? There would be no import/export issue since it's all the same binary.

That makes sense to me.

@westonpace westonpace changed the title [C++] Add testing libraries for Acero & Datasets GH-36090: [C++] Add testing libraries for Acero & Datasets Jun 21, 2023
@westonpace westonpace requested review from kou and lidavidm June 21, 2023 22:34
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 22, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We need CMake 3.11 or later to use an OBJECT library for target_link_libraries() but it'll be done by GH-35921 soon.

@github-actions
Copy link

⚠️ GitHub issue #36090 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member

kou commented Jun 22, 2023

Can we close GH-36090 by only this?

Or do we need more pull requests to close GH-36090? If so, we may want to create an issue for each sub-task.

@westonpace
Copy link
Member Author

I think this solves #36090 (which is focused on the regression). I've created #36246 as a follow-up task. I also want to investigate precompiled headers vs. unity builds but I don't think we need a GH issue for that just yet.

@westonpace
Copy link
Member Author

Should I wait for #35921 before merging?

@kou
Copy link
Member

kou commented Jun 23, 2023

We don't need to wait for #35921.

@kou kou merged commit e3c5dbb into apache:main Jun 23, 2023
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit e3c5dbb6.

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Excessive increase in debug build times
3 participants