-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classic scoring growth is too steep after scorev2 changes #23763
Comments
This scaling proposal seems making classicised score from scorev2 closer to previous version of lazer. Considering the current scoring's growth curve is different from legacy lazer scoring system, accept this scaling algorithm might make the gap of display score between older version and current closer. |
Yep what @WitherFlower is suggesting makes sense. return (long)Math.Round(Math.Pow(scaledRawScore * Math.Max(1, maxBasicJudgements), 2) * getStandardisedToClassicMultiplier(rulesetId)); becomes (by simply moving scaledRawScore) return (long)Math.Round(scaledRawScore * Math.Pow(Math.Max(1, maxBasicJudgements), 2) * getStandardisedToClassicMultiplier(rulesetId)); With current lazer using scorev2, it's matching quite well v1's progression. With sqrt-root scoring however it might need to be changed depending on what we want to match. return (long)Math.Round(Math.Pow(scaledRawScore, 2/1.5) * Math.Pow(Math.Max(1, maxBasicJudgements), 2) * getStandardisedToClassicMultiplier(rulesetId)); or alternatively (trading issues for other issues...) : return (long)Math.Round((AccScore + ComboScore * Math.Pow(map_progression, 0.5)) * Math.Pow(Math.Max(1, maxBasicJudgements), 2) * getStandardisedToClassicMultiplier(rulesetId)); So that ComboScore grows quadratically when FCing. However I believe it's not worth going down the road of matching the progression when FCing because it's detrimental in other scenarios, thus it makes more sense to stay with the original proposal from @WitherFlower. |
I updated my above answer to fix a small mistake and add my viewpoint (TL;DR: basically saying Wither's original proposal is good and the "non-quadratic scaling when FCing" issue can not really be fixed without bigger downsides) Just one more thing: This concerns score farmers, but also all people that use "total score" to compare how much experience they have in the game. |
I think we should go with @WitherFlower's change for now - any objections? As for score farming, not sure. It's a very niche community and I'm not sure how to reach them, is there any good place to gather such feedback? We should probably look to get this done before ppy/osu-queue-score-statistics#134, because if we want to use classic mode for total score, we should probably get it in a good place first. |
I'm part of the scorefarming community myself, and I'm also present on most discords related to it, namely osu!alternative's discord server (discord.gg/osualt, currently mostly maintained by @respektive) so I can handle the community feedback part. Also, regarding what Zyf said in terms of classic scoring balancing after #24166, I'm planning to (very soon) run the score conversion algorithm on the aforementioned osu!alternative's score database (which contains around 78 million scores) to check if classic score needs any further adjustments. I guess that ppy/osu-queue-score-statistics#134 can be looked at once those final adjustments are made, if any. |
After applying the score conversion from #24166 and the change I initially proposed in this PR, that is to move For the technical details, here is the SQL query I ran on the database (which took 26 minutes to run on my home computer 😅) : https://gist.github.com/WitherFlower/ce1b20c61a16902d0f4a6f1d4771fca6 A few remarks regarding sources of errors of the calculation :
If I had to give a conclusion, I'd say that this is totally acceptable as :
I will also be gathering feedback from the scorefarming community to ensure that there is no significant pushback regarding the results I got. |
@WitherFlower are we good to go with your proposed change, or do you still need some time for gathering feedback? |
I've received feedback from a few members of the community, including the current first and second place, and the change was mostly a welcome one, so i think we're good to go for the osu! ruleset for now. There's 2 other issues i have in mind for classic scoring though.
|
Well I don't know that, and it kinda falls under scope of this issue, I'd say? I'm not sure why it would be a separate one. It would also imply that we'd have even more divergent implementations of standardised->classic conversion which I'm not sure that is something we're willing to abide. I'd say we should not be having multiple implementations if it can be avoided, but that'll probably need the players' blessings. That said this already exists, so may be fine: osu/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs Lines 39 to 70 in 25d0f0f
@ppy/team-client thoughts? |
@ppy/team-client bumping this again - tl;dr: are we okay with even further divergence in how standardised -> classic scoring is done than what we already have? Any foreseeable problems? I don't immediately see any, but you may do. @WitherFlower if we decided that we'd be okay with not applying quadratic scaling for taiko/mania, could we ask for your assistance in establishing a better formula for those rulesets? |
Sure, I'd be glad to help for the formulas. Also, I'd like to mention that i ran a survey asking what the direction would be and judging by the results, mania scorefarmers definitely prefer to use 1-million based scoring even for classic. Responses for taiko are more split and i didn't get any input from taiko scorefarmers as there are barely any to begin with, so I guess we need dev input on that one. I'll try asking around the taiko community to get more feedback on that in the meantime. |
Since mania (and taiko) is not a combo-games (acc is much more important in the score), it makes no sense to have a score growing "quadratically" like it does in standard (and even less "exponentially" as asked). Might be interesting to know their opinion on a linear grows instead.
|
@bdach I asked some of the top 10 ranked score taiko players, and the responses i got ranged from "keeping it the same as stable" to "not caring about change", so i assume it would be a better idea to ditch quadratic scaling for taiko and mania. @Zyfarok I disagree with your comment. Linear growth is basically the same as what taiko currently uses, and seeing the rejection from osu!mania players I think they'd rather see no change at all. |
The rejection from quadratic scaling ? I don't see how it would imply that they don't want linear neither. taiko and mania are very similar games and use very similar scoring, so it would make sense for mania to also offer the same linear scaling as taiko for score-farming. I guess players don't like when things change though. |
Since I can't get any response to my pings I'm just gonna make a judgement call here and say that we are probably fine with having taiko and mania work differently with respect to classic scoring since they kinda already do. @WitherFlower if you are able to help out with an estimation for taiko, it'd be very helpful. As a reminder, the criteria is that the classic score must be:
Would be appreciated if you can provide something along those lines for taiko too. For mania I presume we'd just be using the identity function (i.e. just standardised score directly as classic too). |
After reusing the spreadsheet from when we did the estimations for osu and catch, i arrvied to the following estimation for taiko :
For mania, using the identity function is most likely the way to go indeed. I think we should also fix the "score not changing on spinners" issue i mentioned earlier in the thread by adding a So in the end we'd have :
One last thing : the |
Is this at all negotiable? Where is this particular requirement coming from? It's something we probably can make happen, but it's rather annoying to make happen and is yet another complication. Instinctively it also doesn't make all that much sense to me. |
If a mod (like strict tracking) changes the number of hitobjects / 300s, then the property of "non-reordering" isn't preserved anymore, as some scores get an unfair advantage. See #19232 another solution is to disallow mods to do this, but i don't know how reasonable that requirement is... |
I mean we were midway to fixing that one so that issue can't happen, so "fixing the mod" is alright with me for sure. At least in the short term. |
@WitherFlower Just for my reference (and to avoid unnecessary back-and-forths): can you provide some reference - spreadsheet or otherwise - that describes how those formulae were derived / arrived at? I've implemented the formulae you provided on top of the test scenes added above, and the general results seem pretty good when it comes to the general feel and trend, but it looks like the formulae above may result in classic score being slightly inflated. This of course partially depends on parameters like map length and the mystery "score multiplier", and I don't feel like I have a good enough feel for those parameters yet to make conclusions, but it's something I want to investigate for sure and it may be better / easier if I have your source materials to cross-check with. |
Here is the spreadsheet I used to get the formulas I sent above : https://docs.google.com/spreadsheets/d/1hYsT3U3b0tg9SIMDhmBmqGBVTXASufQ8W0sQ-toifTg/edit#gid=0 Now for a brief rundown of how I used that. I first took maps at random of all star ratings and lengths to make sure the approximation would cover most cases. I then searched for the multiplier that minimised the sum of squared relative error between the estimation and the expected SV1 value. In mathematical terms, we're looking for the following values :
A few precisions :
One last thing : if you want to compare the values between scoreV1 and the estimation, I suggest you use map/leaderboard data directly instead of the scoring test scene, cause afaik that simulates an edge case scenario of a map with only circles that you will very rarely encounter in actual gameplay. |
That is correct, but I'm not sure I'm following why that's an "edge case". While yes, an average map is rarely like that, the test scene still simulates the score algorithm in terms of the general magnitude of score, rate of growth, etc.? So I'm not seeing why it would be that much of an edge case as to discard any findings based on it. |
If I remember correctly the test scene lets you adjust the max combo of the simulated score. However maps with the same max combo will generally contain less hitobjects (because of sliders), resulting in scoreV1 values noticeably smaller than what the test scene would suggest. The estimation is based on that general case, so the difference in maximum score between classic and SV1 can appear much larger in the test scene than it actuallly is on most maps. tl;dr big differences in max score in the test scene don't mean much in practice In any case, it's just something to keep in mind when comparing maximum score values. Rate of growth shouldn't be affected. |
If we want to have something closer to scorev1 for SSes, it would have to account for the object counts of each kind separately, and possibly slider-ticks too. This might be doable without too much hassle though. |
@Zyfarok I'm pretty sure the only goal of classic scoring is to give a feeling similar to that of scoreV1, aka big numbers of the same magnitude. The proposal I gave above accomplishes that with good accuracy, so I'm not sure it's worth going the extra mile by considering more map characteristics for a 5% improvement, which will also add more potential ways in which classic scoring can break in the future (see the current issue with strict tracking). Not to mention this requires time that could be spent on all the other stuff needed to get permanent lazer ranking out. On a personal note, I'd be glad to see scoreV1's difficulty multiplier nonsense get deleted from reality, but maybe that's just me... |
@WitherFlower I've reviewed the spreadsheets you provided. I've got a few follow-up matters. First of all, after looking at the least-squares estimation method used to determine the multipliers, I'm not sure the results are valid, since the estimation does not appear to include the -- although, on that last part, 23 is the number that fell out of least squares, and since I can't reproduce this part:
I'm not sure what correction - if any - would be appropriate. Can you take a look at the above and corroborate as to whether I'm correct on the potentially mistaken estimation, and elaborate as to what "closer ranked score values" would entail? Finally, either way, I'd probably be looking to use the same methodology but on a wider gamut of maps, rather than checking only 19 instances per ruleset. But before I go and try that I'd want to make sure that I understand the method correctly and dispel the doubts mentioned above. |
@bdach I ran a query on the catch top 1000 data dump from data.ppy.sh to check the balancing of catch and with the migration to scoreV2 the values are pretty close when using a
I can also confirm that the multipliers you found are correct after taking into consideration the
When doing this, I think it's best to include a wide variety of maps rather than just taking maps at random which will likely result in low difficulties / "low score maps" being overrepresented in the final estimation, as 40% of all maps in osu are below 3 stars and half of all maps in osu give less than 5 million score. |
Cool, thanks!
I was thinking of running that through the full 150k maps from Maybe I'll add on some weighting to avoid overfit on low difficulties. Not sure. |
You could try using the squares of absolute differences instead of relative in order to counter the abundance of low difficulties, but if could also have the opposite effect and "buff" marathons too much. The results of that could be interesting though. |
@WitherFlower for your spreadsheets, where did you source "max score v1" and "object count" data from? Asking because when recalcing over the set of all maps, I'm seeing discrepancies, and knowing your sources may help explain them. |
For max scorev1 I just used the #1 nomod score on the maps, those being high acc FCs in all cases so pretty close to the actual max value. The "object count" should be correct for osu, but might be off by a bit for taiko and catch. For taiko I used the maximum combo as it is equal to the amount of don/kat hits. It's possible that I made a few mistakes as I did all of this by hand, but the "object count" I aimed for is the amount of full-judgement-awarding-objects in all gamemodes. |
Thanks for the info. I've suspected that was the case for high scores; my program that uses lazer's score V1 simulators was returning higher values for most maps. Some of that is the fact that the simulators are purposefully overestimating bonus, some was actual cases of nomod top scores not actually matching autoplay/max score yet. I've also had mismatching object counts at the start, but that was my bad and I've sorted it out. All in all I should have something concrete to get back with very soon. I am now able to export estimations of top scores on all beatmaps from data.ppy.sh (converts included), so I should be about ready to do some data crunching on those and see how our iterative attempts at the conversion formula are doing (and maybe suggest better ones yet, we'll see). Source for everything will be provided and fully transparent for review when I'm done. |
Type
Game behaviour
Bug description
Because the previous scoring was linear, a power of 2 was applied to the standardised score to make its growth feel like scorev1.
Since scorev2 follows a growth curve very similar to scorev1 (quadratic), the power of two makes classic scoring have a quartic growth, feeling very wrong when playing.
This should (hopefully) be as easy to fix as moving
scaledRawScore
outside of theMath.Pow
operation here :osu/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
Line 36 in 25d0f0f
I should also mention that making this change resolves the issue of classic scoring having squared mod multipliers, as well as adressing part of #17824
Screenshots or videos
No response
Version
2023.605.0
Logs
runtime.log
The text was updated successfully, but these errors were encountered: