Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter #22881

Conversation

TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Dec 25, 2024

PPC64 implementation of ead0116. I have implemented a couple of rotate instructions.
The first commit only implements lookup_secondary_supers_table_var and uses it in C2. The second commit makes the changes to use it in the interpreter, runtime and C1.
C1 part is refactored such that the same code as before this patch is generated when UseSecondarySupersTable is disabled. Some stubs are modified to provide one more temp register.

Performance difference can be observed when C2 is disabled (measured on Power10):

-XX:TieredStopAtLevel=1 -XX:-UseSecondarySupersTable:
SecondarySuperCacheHits.test  avgt   15  13.028 ± 0.005  ns/op
SecondarySuperCacheInterContention.test     avgt   15  417.746 ± 19.046  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  417.852 ± 17.814  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  417.641 ± 23.431  ns/op
SecondarySuperCacheIntraContention.test  avgt   15  340.995 ± 5.620  ns/op
-XX:TieredStopAtLevel=1 -XX:+UseSecondarySupersTable:
SecondarySuperCacheHits.test  avgt   15  14.539 ± 0.002  ns/op
SecondarySuperCacheInterContention.test     avgt   15  25.667 ± 0.576  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  25.709 ± 0.655  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  25.626 ± 0.820  ns/op
SecondarySuperCacheIntraContention.test  avgt   15  22.466 ± 1.554  ns/op

SecondarySuperCacheHits seems to be slightly slower, but SecondarySuperCacheInterContention and SecondarySuperCacheIntraContention are much faster (when C2 is disabled).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22881/head:pull/22881
$ git checkout pull/22881

Update a local copy of the PR:
$ git checkout pull/22881
$ git pull https://git.openjdk.org/jdk.git pull/22881/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22881

View PR using the GUI difftool:
$ git pr show -t 22881

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22881.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 25, 2024

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 25, 2024

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8344232: [PPC64] secondary_super_cache does not scale well: C1 and interpreter

Reviewed-by: rrich, amitkumar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 303 new commits pushed to the master branch:

  • 5cc690d: 8347994: Add additional diagnostics to macOS failure handler to assist with diagnosing MCast test failures
  • c00557f: 8345049: Remove the jmx.tabular.data.hash.map compatibility property
  • 8b46db0: 8345045: Remove the jmx.remote.x.buffer.size JMX notification property
  • 119899b: 8345048: Remove the jmx.extend.open.types compatibility property
  • 89bfcb8: 8348308: Make fields of ListSelectionEvent final
  • 17df515: 8348303: Remove repeated 'a' from ListSelectionEvent
  • 337118d: 8348388: Incorrect copyright header in TestFluidAndNonFluid.java
  • 3069e91: 8344969: Remove the jmx.mxbean.multiname compatibility property
  • c882160: 8344966: Remove the allowNonPublic MBean compatibility property
  • 6032f6e: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat
  • ... and 293 more: https://git.openjdk.org/jdk/compare/62a4544bb76aa339a8129f81d2527405a1b1e7e3...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 25, 2024
@openjdk
Copy link

openjdk bot commented Dec 25, 2024

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Dec 25, 2024

Webrevs

Copy link
Member

@offamitkumar offamitkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Maybe update copyright header year.

@TheRealMDoerr
Copy link
Contributor Author

Thanks for the review!
I haven't found precise rules how to handle Copyright headers. I usually use the year of the PR publication date. Does anybody know other requirements?

@offamitkumar
Copy link
Member

I have seen header getting updated for some PRs like this one: #22246

So I think we are expected to update, I haven’t seen any such rule though?

@TheRealMDoerr
Copy link
Contributor Author

I have seen header getting updated for some PRs like this one: #22246

So I think we are expected to update, I haven’t seen any such rule though?

Some people use a script, but it's unclear if it does the right thing: #22890 (comment)

@TheRealMDoerr
Copy link
Contributor Author

sh make/scripts/update_copyright_year.sh says "No files were changed". All changes in this PR were done in 2024, so Copyright year changes are only needed for files which are changed in 2025.

__ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, temp2_reg); // returns with CR0.eq if successful
__ crandc(CCR0, Assembler::equal, CCR0, Assembler::equal); // failed: CR0.ne
temp1_reg = R6;
__ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, noreg); // may return with CR0.eq if successful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is unclear to me. Where is the result of the subtype check? Can it also return with CR0.ne if successful?
I noticed you added the crandc to check_klass_subtype_slow_path_linear() but if we reach there calling from this location then the crandc is not emitted because L_success == nullptr. Is this ok?
I'd appreciate comments on the masm methods explaining how the result of the subtype check is conveyed.

Copy link
Contributor Author

@TheRealMDoerr TheRealMDoerr Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct result is always in CR0 with this PR (unless a label or result GP reg is provided).
"return" means "blr", here. That can optionally be used in case of success. In this case, CR0 is always "eq".
I've moved the crandc instruction into check_klass_subtype_slow_path_linear which contains such a "blr" for a success case. This way, the linear version works exactly as before.
The new code check_klass_subtype_slow_path_table doesn't use "blr". That's why I added "may" to the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely hard to see.
L2154 with the "blr" in check_klass_subtype_slow_path_linear looks redundant to me. It should be removed if you agree.
The comment here should be adapted then too.
Also the comment at macroAssembler_ppc.cpp:2258 needs to be adapted because fallthrough from check_klass_subtype_slow_path does not mean "not successful". L_failure could be renamed to L_fast_path_failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think L_failure is correct. And it's used the same way on all platforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I missed that L_failure is passed by reference

void MacroAssembler::check_klass_subtype(Register sub_klass,
Register super_klass,
Register temp1_reg,
Register temp2_reg,
Label& L_success) {
Label L_failure;
check_klass_subtype_fast_path(sub_klass, super_klass, temp1_reg, temp2_reg, &L_success, &L_failure);
check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, temp2_reg, &L_success);
bind(L_failure); // Fallthru if not successful.
}

Therefore it's never null and L_failure is reached if, and only if the result of the type check is negative.

@@ -2154,6 +2154,96 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass,
else if (result_reg == noreg) { blr(); } // return with CR0.eq if neither label nor result reg provided

bind(fallthru);
if (L_success != nullptr && result_reg == noreg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem if L_success == nullptr && result_reg == noreg and there aren't any secondary supers?
In that case we would reach here with CR0.eq from L2134 and we would fallthrough with CR0.eq. Due to the change in C1StubId::slow_subtype_check_id we would return there with CR0.eq.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reproducer:

public class InstanceOfTest {

    public static interface TestInterfaceI {
    }

    public static class TestClassNegative {
    }

    public static void main(String[] args) {
        Object obj = new TestClassNegative();
        for (int i = 100_000; i > 0; i--) {
            dontinline_testMethod(obj);
        }
        boolean result = dontinline_testMethod(obj);
        System.out.println("result: " + result);
    }

    static boolean dontinline_testMethod(Object obj) {
        return obj instanceof TestInterfaceI;
    }
}
./jdk/bin/java -XX:TieredStopAtLevel=1 -XX:-UseSecondarySupersTable InstanceOfTest
result: true

@TheRealMDoerr
Copy link
Contributor Author

Thanks for looking at this! The condition was wrong. I have improved the design of check_klass_subtype_slow_path_linear and removed the early return by "blr". Please take a look at 37789b3.

li(result_reg, 1); // load non-zero result (indicates a miss)
} else if (L_success == nullptr) {
crandc(CCR0, Assembler::equal, CCR0, Assembler::equal); // miss indicated by CR0.ne
}
b(fallthru);

bind(hit);
std(super_klass, target_offset, sub_klass); // save result to cache
if (result_reg != noreg) { li(result_reg, 0); } // load zero result (indicates a hit)
if (L_success != nullptr) { b(*L_success); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling L_success != nullptr should be put on the else-branch of the previous if-statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make a real difference, but I've cleaned it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It matches the assertion you've added. I like consistency. It helps understanding stuff.

@TheRealMDoerr
Copy link
Contributor Author

I've run most of the tier 1 tests with JTREG="VM_OPTIONS=-XX:-UseSecondarySupersTable" and didn't see new failures. I'll rerun tests. Note that Oracle Copyright years are already updated in head, but I don't want to merge because the PPC64le build is currently broken.

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the port 👍
Cheers, Richard.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 23, 2025
@TheRealMDoerr
Copy link
Contributor Author

Thanks for the review and for finding the bug!

@TheRealMDoerr
Copy link
Contributor Author

Tier 1-4 have passed with and without UseSecondarySupersTable on both, linux ppc64le and AIX.
/integrate

@openjdk
Copy link

openjdk bot commented Jan 24, 2025

Going to push as commit 4a375e5.
Since your change was applied there have been 320 commits pushed to the master branch:

  • 0df9dcb: 8346572: Check is_reserved() before using ReservedSpace instances
  • a09f06d: 8348265: RMIConnectionImpl: Remove Subject.callAs on MarshalledObject
  • 0395593: 8346751: Internal java compiler error with type annotations in constants expression in constant fields
  • 2daafe4: 8348283: java.lang.classfile.components.snippets.PackageSnippets shipped in java.base.jmod
  • 50ca450: 8340784: Remove PassFailJFrame constructor with screenshots
  • 416d469: 8347008: beancontext package spec does not clearly explain why the API is deprecated
  • 471d63c: 8343609: Broken links in java.xml
  • 7f16a08: 8348240: Remove SystemDictionaryShared::lookup_super_for_unregistered_class()
  • 48ece07: 8282862: AwtWindow::SetIconData leaks old icon handles if an exception is detected
  • 356e2a8: 8348406: Remove tests GrantAllPermToExtWhenNoPolicy and PrincipalExpansionError from problem list
  • ... and 310 more: https://git.openjdk.org/jdk/compare/62a4544bb76aa339a8129f81d2527405a1b1e7e3...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 24, 2025
@openjdk openjdk bot closed this Jan 24, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 24, 2025
@openjdk
Copy link

openjdk bot commented Jan 24, 2025

@TheRealMDoerr Pushed as commit 4a375e5.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr TheRealMDoerr deleted the 8344232_PPC64_secondary_super_cache branch January 24, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants