-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of all parameters and constants, and chosen numbers #989
Comments
Some references:
|
Thank you, These show the current state of the most high level documentation available (maybe also some blogs). I think. I am confident that NNue should come with its own tradition of parameter handling necessary in machine learning with Neural Nets. Clear distinctions between types of optimisation, such a training, validation and hyper-parameters tuning, all having their own requirements on the nature of input and outputs definitions (position and score for example), and how those definitions or variables contribute to the formulation of the objective functions being globally optimized (many parameters, non-linear). Yet, I think I did see a hardwired value as a scaling factor in some transformation or scaling of raw NNue position score. The value was in an atomic function as written in source code of SF release executable, That number could or could not be the result of tuning. I claim that there is no way to tell from above links. The second not touching it, probably because the problem is just not seen as one, and not a real NNue question (and i agree with that last part). These are good links otherwise, that is not my point which is specific about parameters and how their values are chosen, kept or optimized as SF versions come about. The first one is still missing the whole parameter determination question, which we would all agree is half the SF development story. or the fishtest story, if the object of this issue was moot. (i.e. any other pointers, if not same level as previous comment, then lower, repository issues or discussions, or last release source code lines (or filenames, fishtest or stockfish repos). |
The second one, is also missing some aspects, not about implementations innovations and descriptions, but more about the basic flow of machine learning, its global optimization setup, and what NNue is being trained to approximate. Because NNs are just that. They are flexible families of functions spanned by all the values that their many parameters can take. Their specific layered architecture inspired by the animal visual cortex, and the nature of the output formulation dependent on input vector variables (chess position information) can fit any complexity of input-output relationship. Well that is given enough layers, and appropriate training algorithm. But even without NN, a machine learning problem is a global optimisation problem, and requires description of the data that will serve as input vector, the data that will serve as output (TARGET vector), and finally the as important loss function that is being optimized, integrating all the data available in the training set. Not the same issue as here, in terms of what is missing. link here. I have created an issue where I suggest my current understanding of what that target is. I neglected the validation and generalization story on purpose, because in tuning there is no such thing. And first things first. and small steps etc....
The input variable definition and the database composition as projected on that space, (chess positions), being fine and assumed same as the input definition of the smaller networks. All I could find, often enough for high level understanding, is about the set of inputs of the training data sets, for the small transformer NNues that actually are shipped for release. Not much about the big master network and its target definitions. I did see one paragraph that seem to tell the answer, but I had also read, a year ago, in some description, or review, that there was reinforcement learning involved, upstream of the NNue process.
|
Note. I have omitting to put all i write into the interrogation mode, but it should be implied that i know that i am fallible, and only being able to uncover only a tiny part of the veil. So all I write is my best current effort at making sense of what SF and its development, is actually doing as a chess analysis tool, given a position as input (or a dataset of them, in development of course). I am nor alone in asking those questions, and I take the risk of making mistake in order to progress.... help is appreciated in providing increasingly helpful clues... |
Regarding the overall set of parameters (not counting the network) I don't think there is a list nor an easy way to obtain it. This set of parameters chances quite frequently, and most have no particular meaning that can be assigned to them. Would probably require some manual labour to gather all relevant ones. These parameters are either handpicked or tuned using SPSA (supported in the fishtest framework) - #535 is a good start. Regarding the network. The architecture should be well described by this diagram https://user-images.githubusercontent.com/8037982/118656988-553a1700-b7eb-11eb-82ef-56a11cbebbf2.png. To understand the "HalfKAv2" inputs one has to read https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md#a-simple-input-feature-set, https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md#halfkp, https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md#halfkav2-feature-set. For training we do not employ RL. There is no selfplay being done with the nets generated, currently. Some old nets used RL but it's too costly in general, we don't have a good framework to conduct it, and we don't even have good signs of it working well (in other words, we don't know how to make it work well). The currently used best datasets are
In both cases we have the evaluation and the game result available, during training we can vary the contribution from each of these, usually putting higher weight for the evaluation. We convert evaluation/result to approximate The training is conducted using this implementation in pytorch https://github.com/glinscott/nnue-pytorch. The fishtest framework is not involved currently. |
Thank you for yours links and answers. It does fill some gaps. I did not read carefully, all those links, but I did browse them to try to find what the target setup of the biggest net was. I did note that the architecture and the encoding optimization discussions, and I promise myself to someday go enjoy reading how these allow to capture what they capture, and yet provide for fast implementations of the smaller networks, given the primary training setup that gets fitted that way. However, I need to find where I read this, but for now I could only spot one paragraph, where it is mentioned that the target in the oracle type of supervised learning of the bigger nets is classical non-NNue SF itself (this is not about the position datasets, although i am thankful for your precision and notes on evolution). I think that knowing the loss function definitions would complete that picture. In those links, would you be able to point to me where the loss functions over the datasets described is defined. What did I gloss over in my browsing. I would have expect that defining the position datasets of the machine learning process would at least systematically be accompanied by the definition of the target vector being fitted for generalization by the big trainee. As this may actually be half of the stories right there. But here i wanted to separate the NNue parameter question. With regard to the non-NNue parameter story.: thanks you for letting me know that manual inspection might be required, and historical evaluation to. I understand that with increasing number of parameters assigning a signification to anyone parameter or parameter value for a meaning or null hypotheses statistical margins, is becoming difficult. I also do want to have a clear view, of what parameter set is fishtest handled or not. Perhaps, I should use fishtest code, to figure out the extent of parameters there, to narrow the manual inspection of the others? Would you suggest any code file or region? Maybe i should just look at the repo, and it might be easy to find anyway. in case it would not, if anyone can just give more pointers. it might help. Although I said the non-NNue part, there may be a reason I did mention it above, and it is in case that the target of the bigger network has been to use Non-NNue SF evaluations of "moderate depth", as oracle target vector to optimize during NNue training and testing, over the datasets mentioned, in all the experiements so far, including SF14. In that case, all classical parameters will find their values directly affecting the NNue weights, whatever encoding or architectural optimization used. As this is the higher level machine learning tasks of fitting an input-output function over the datasets. I hope I made clear that there is a big piece of information missing. or not salient enough for my first pass research to have found. I think, this would make a more reproducible (and evolvable) development process, to be clear about what NNue is optimizing and fitting. And the experiments already done in that regard. My current guess, from non-source code documentation, is that it has not been explored, which would make it a constant of all the experiments, and hence not ever needed to be explained in new source code accompanying introductions, or the wikis. Most of the art being focused on the datasets definitions, NN architectures, NN size scaling down by further oracle type of regression (that is what NN do, they approximate target functions however defined by data input and its target values to get close to) (which is totally fine, with me, and otherwise interesting for how to reduce the cost of NN function training but mostly evaluations).. .. Sorry if I repeated some of the above. I tried to make it more articulated. in light of your posts. And those links were also needed. thanks. |
I'm not sure I understand the question. Fishtest doesn't know about anything, it does what we tell it to, including the parameter sets to optimize.
sure, but it requires non-gradient methods which are infeasible for this many (tens of millions) parameters |
@Dboingue off-topic. To speedup the recursive fibonacci use memoization, see https://www.python-course.eu/python3_memoization.php You don't have to write the @lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2) |
This comment has been minimized.
This comment has been minimized.
I am trying to understand code more than write it, under the assumption that development and code improvements may need some ecosystem of understanding levels. Math not being too far out. Which is most my aim. have a functioning math level model practical enough to explain the input-output story that engines tell. Here having serious enough questions to either understand of shed light on that difficult to get high level. |
Thank you. working on that. (however why is the first mention of loss function from there commented as debug purpose?) I wonder whether I might have the same misconception of assuming that code would contain the definition of the parameters for fishtest tuning, as I might have had in assuming that the loss function definition in source code would also provide an answer to my fundamental machine learning set up construction requirement: the output data generation (i.e. target definition major component of any NN training objective function. Should i keep going in that file to get to the output data definition? or is that generated elsewhere and assume a data structure in the code below that link. If definition of target vector can itself be produced by source code, I thought somehow that the loss function would take the input positions as a data structure, and call the function that computes the target output. But I think making all of this modular is a theme of good development. right? Part of my misconception (or not having code development experience, enough to ask the right question), was that I did not think of looking for NNue big master network output target generation. I have seen data generation using classical eval SF to guide for position generation, in the readme file. I was then thinking, hmm. Could the big master network be handled that way? Now I ask: as consequence of replies, is it also generating the target output data that the above source code might be assuming as defined in some data structure? So now I may have the same question to solve the main issue about fishtest optimization target parameters and another (misplaced in some other repos i lost track of, plural), that of NNue big network upstream first training set-up. For both cases, since I am told i would not find the parameter for fishtest, and for NNue, I think it might also have been generated with intput data. Could you further help me, finding the repository and probable source code, where that is made transparent and reproducible. I am surprised that it is so difficult to get the target definition concept across. If not a variable of development, i understand how that may be not at the forefront of documentation or source code changes on github. So please. now that it appears that data structures and or database generating code is involved (unless I made a wrong guess above, and i should keep digging in the loss function source code). Could I have some more pointers about it. Thanks you for persevering with me. I feel like I am making progress. or not? |
there may have been a misunderstanding here, when i used global optimization. I want to reiterate that both fishtest modern tuning and NN training are both part of the mathematical problem of global optimization with many parameters. I never meant to do global optimization of the fishtest kind including NN weights. On the contrary, a major distinction to be made between those 2 types of optimization (tuning versus NN training), is the purpose of the approximation being sought. In tuning. There is only test data. In NN training there is training/ validation mutually exclusive partition (redundancy intended). If one method should influence the other, I would go the other way., But the details of the loss function definition (not at source code level or as a source code syntax object), but as a higher level mathematical object representation of the source code (any algorithm defines an input-output function in appropriate domain and co-domain. That function does not have to only have source code as unique mathematical representation. This is not just mathematical pickiness (it even is not at all). This is so that even people using which ever fixed method of output target generation, could see that they can also think about the effect of that on the whole algorithm they implemented. At least what a smaller NNue is doing on the balanced material nodes it is restricted to when in use. If classical eval of moderate depth is use to define the output data upstream of the whole process, then is is also what you get downstream when imputing one position. and some search algorithm is called to give it a score. Whichever tip of PV, or Hash table entry score is obtained by NNue. The score there is the approximation of the classical evaluation without NNue that was used as oracle (or output data generation). Simple as that. Worth mentioning I think. Unless i have mistaken. and i would like to be corrected about the output target data definition. This is data fitting with functions. The result is a function that best approximated the target function of the input data as could be captured from the sample from the domain of the function we are approximating (hoping to cover it well, usually). So if target is SF11, then approximation is approaching the same values that SF11 would have with same input. SF11 shorthand for non-NNue. |
I'm not sure why you think classical eval is somehow special? It would be easier for all of us if you started asking precise questions, preferably not buried in a wall of text. |
Sorry. i have a lot to say. i don't know where to start and where to stop. This is multidisciplinary curse when it is valued, which I am assuming, and your continued replie keep me hoping. Jump to headers. 2. and read there first. if not in the mood for my prose. The question of the title is the bottom story question. But it stems from many questions accumulated, and long lasting hopes that wikis or read-mes or blogs would someday talk about it. Not knowing what this community knows, and does not know, and what i don't know. is kind of the curse. Iteration of walls might converge as i get to know what you find obvious or not. I actually waited a long time (more than a year that I want to get some overview like that). And the title of this issue is I could redit my long paragraphs into chunks. I am not the one who enlarge my question into NNue. but it is tied. as we found out. While thinking about how to chop up the above and prune repetition, i wanted to add, a consequence of my current understanding of the machine learning NNue information flow. Nest comment so the full consequence not be buried. What is missing on this tangent point brought by the second post (thank you, as I don't need to worry about my other dispersed issues over the different repository involved, not a critic as this is innovative). The accepted tangent question in interro-negative form about actual NNue training specifications (complete)Are you telling me that the master network is not using classical SF as target output data to approximate when training over the many position database that have bee used? If not so, then where can i find information that completely (for reproducibililty) describe the training data generation, including the output target definition, and the accompanying error function in mathematical terms if possible, otherwise, source code or documentation about source code. The target definition can be output vector generation code, or data. but if Data, I would like to know how the data was produced. So I don't think the classical functions should be so special. I am asking it is the target of the NNue predictor of score given a chess position, for any position it could be expected to encounter in the universe of engine X engine competition challenges that define SF perfromance measure in ELO units. If so, i am not the one making it special. NNue using SF classical as function to mimic (not the static SF, the attributed score from "moderate search depth" from SF engine without NNue). Next question back to title consequences after replies:Where can i find information that gives the complete set of parameters (or the closest threads available), that are not about NNue training at all.? I understand: not in fishtext. probably configuration files. or data files (in which case perhaps source code to generate the data files). That is my most compact question. Hopefully (partial) answers can be seen better now. Or reduce the question completeness level if nothing comes to mind. So why not start with focus on classical static eval. I would definitely be even happier if search parameters were also in there, but my focus in about position information to output score. Although I have the growing impression that the static evaluations are deeply intertwined with all other node tests toward optimal search and branching decisions in a given root tree search given some time constraint. With recursion. So that score at root is not just the static evaluation story at the tip of the PV. NNue fitting the root score function given by non-NNue target output from SF classical. I think some people call that using SF non-NNue as oracle for the training of NNue. sorry for repeat, i see lots of connections and want to share them. So NNue as black box. As I trust that the learning process is made with all rigorous steps needed in proper machine learning methods sharing. One normally gives and specify on equal footing all that is necessary for own implementation in any tool. input database, output target vector, the world which that database is meant to be representative (hoping for unbiased covering by the sample of that universe, i.e. even positions not part of database). And the objective function to optimize in a training/validation partition scheme (normally also to be specified). Reminders: |
Pure speculation from afar stemming form all of the above. no questions expecting answers, but thinking and argumentative replies or questions are welcome. I would not mind if this was transfered to discussion after closing about the title question restricted to non-NNue weights. If answer to first question (tangential but finally central), is that yes all experiements and SF releases are based on SF generated the target output vector in the training database used (moderate depth search), so that NNue is actually capturing moderate depths PV tip scoring or SF search score for any NNue input position, then like the transformers for the master netwrok, NNue is basically SF a moderate depth further function in disguise. No irony here. trying to explain without jargon. Conclusion? If that is. Then consider the possible co-evolutionary problem of alternatively optmiizing, two parts of the partition of the whole set of SF parameters (yes what you though I wanted to do at once). Besides that tuning does not having a generalization performance measure (maybe hard to define), Maybe, there could be oscillations between the 2 processes. What if you were to feed back (maybe you have been doing it already) the new tuned SF without NNue back to development version of SF, and new NNues? You would be apply the composition of 1 global optimization algorithm (tuning) to the next optimization algorithm for NNue (learning by training by oracle). The hope is that alternative partitioned optimization would still converge and keep improving. Perhaps discussing this may not make any source code change in next version. But i am interested by chess engine in general, assuming the SF community is too. Walls of text may hurt though. I am sorry for that. idea: develop statistical measures. look for funny failed experiments.... often negative results are overlooked... |
This was already answered. Needs to be dig out from the code. |
which repo which file. from your previous reply (unless i missed), it was not in fishtest. Is is in stockfish repo. Are there many files. Since fishtest in being told what to do, and not including the paramters, I infered it must be thought data input or configuration files. So no, I am sorry to say, The last answer I was given was not in fishtest. So somewhere else. Any more precise pointer. I did say I was ready to dig. so only need a bit of more help for that question. Is there some module taking care of the parameters meant for fishtest? converging to an answer. I am mostly looking for the scope of global optimization in tuning. I understood that otherwise it is mostly a few parameters at a time that are tested for null hypothesis margin of improvement in ELO based (some appropriate statistical model based around ELO) ratings for battery of engine pairs. So here, let me help you with some promising narrowing questions can formulate.
Also: https://github.com/glinscott/fishtest/wiki/Creating-my-first-test I can't close this. You might. it looks like you think it answered. thanks for the earlier replies. |
Regarding the tangent, but tangled question of NNue training complete specification:While waiting for a simple confirmation or infirmation of the SF as oracle training specification. Here in the further reading or diffing with at least some of your vetting on relevance to that basic training question i am asking. Can I assume that the following links cover the training methods specification for NNue, with needed information (not only code)? Or, there again, if only code, which code is responsible for current master net training X and Y, the data generation. Here, some of my past gleanings, which left me with more questions. (not critics, question simple as that).
https://github.com/nodchip/Stockfish/blob/master/docs/gensfen.md
Would the source code behind gensfen command, allow me to figure out how the master network is still being trained? I am interested about under the hood, there. Because of the architecture optimizations, I am curious about the self-play evolution (perhaps comparing with other architecture self-play like lc0, in some future). Also, if desinging only X part of data generation, has RL experiments yielded criteria for a non-RL position generation approaching RL coverage of possibilities. I guess this is my fog. I did not dream about either hypotheses. RL self-play or SF oracle. are boths still used. Which SF versions... If only SF as y data generator, please confirm I will get out of your hair on that. https://github.com/nodchip/Stockfish/blob/master/docs/learn.md ( I have not looked there). Which on the links above are more likely to contain what you understand I would need. |
https://github.com/glinscott/nnue-pytorch/blob/master/docs/nnue.md That might be a critic. but it should be viewed as constructive as justifying my perseverance and insistance here. no? |
|
https://github.com/official-stockfish/Stockfish do you have any understanding at all about how Stockfish works?
https://github.com/glinscott/nnue-pytorch
currently none, we use old data, see official-stockfish/Stockfish@d61d385 |
@Dboingue I closed this Issue, it was going to nowhere.
Join also our Discord server where you find thematic channels and the data repository, feel free to ask there any questions (please keep them short :) ): |
@ppigazzini @Sopel97 To both. I did not get answers to my refined questions that stayed within the original scope of the title of this issue. It was clear from the start. That I was talking about non-NNue part (righ?). In any case. I did include quotes from many of the general pointers provided in replies, to make my issue original objective get some progress, and what I see, it avoidance. This may be fast development culture. A big gap between documentation and source code. And a big gap between open source (code) and open data. with minimal understanding of what data might mean. I consider that the magic numbers peppering the source code is Data, and getting reproducible behavior for their choice, should be part of open Data. any configuration file design, or data involved in between source code transactions or mutations, should be part of open data. The chain of openeness might be as weak as its weakest link. I understand what i am asking is difficult but denial is not the way to answer it. I disagree, but understand the behavior to be normal. So I am just disappointed. hope gone. |
Yes now working with this too wide focus and the source code for SF search high level model with somebody else. I put this parameter optimisation on the back burner. very long term. until I have a mathematical model of SF (faithful), and when other projects more amenable to progress would bring this back into the picture for comparison. I make no individual blame. I think man-hours are limited and all sorts of priorities might get in the way of my questions. I wanted to make sure first that there was no other way than doing the job myself of extracting a higher level algorithm. thanks for the while you could manage replying. and the emoji. |
The closest I could find was the following, but while i could not read all of it, The point was about methodology for optimization.
#774 (comment)
Also, this is concerned it seems with NNue parameters also. And possible tuning of NNue training hyper-parameters. or how Nnue is integrated into the non-nnue part. Here I would like to focus on that part.
This should be an issue about the wiki here and or that of stockfish. My focus is to understand the engine evaluations at high level, the closest to user end as possible that is still exact and representing the engine scoring mechanism as input-output function from the set of any legal chess position to some scoring scale. While the search parameters (branching decisions) is one part, there is another interwoven part although not the bulk of the engine, and that is the static evaluation function.
Is there a way to construct (or find) a list of all the parameters or constant that come into play for that output function of an input position. And i mean all of them. not only those that are reused and hence are assigned labels, but also the "magic" numbers that can be found in the heuristic static evaluation function. I would like to be able to get enough information to build a table for all of them, even the orphaned ones. and attached the following information for each.
I apologize if such information if already somewhere for the picking. And would appreciate being redirected there.
Otherwise, can anyone help me figure out how to progress and survive the ordeal.
This issue seems justified, by the wiki page mentioning global optimization in tuning parameters, while on the other hand inspection of the static evaluation function and to a lesser extent the lines where NNue evaluation docks with the rest of engine, have many examples of factors that have not "genealogy". As if past optimisation of the involved formula had been hardwired into the engine executable code. Those 2 opposite perspective, or one by one parameter improvement of SF performance, and the tuning global optimization, create the natural question. Which parameters are being tuned, exactly?
If nothing of the sort exist. I would be happy, to get a sensible list of most likely repositories or code modules where most of those parameters are actually defined. For example, where is the code with the tuning, is there a file where I could figure out all those parameters? And from there, manual inspection of the SF release source code for those not in that code, could trigger a history search to previous versions around same lines where found.... That sound like a big job. Anything that could speed that up is welcome. Any question that would help clarify what i am asking also.
And if I wrote something that seem to imply a wrong conception, please let me know. that would also save time and energy.
The text was updated successfully, but these errors were encountered: