-
-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binary-search-tree: implement canonical-data.json #940
binary-search-tree: implement canonical-data.json #940
Conversation
Since I don't know who to ask for review @kytrinyx I'll leave that to you. |
"null should be replaced by the respective language equivalent or -1", | ||
"if the language does not have any.", | ||
|
||
"The purpose of ths exercise is to create a data structure which", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on how this sentence affects someone implementing this exercise for a given track, and I don't think there was any danger of anyone using a sorting function for this exercise anyway? Will you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well as i understand this exercise from description.md the purpose is to create a data structure which represent sorted data and specifically binary search tree. Usage of any sort() function or algorithm will defeat its purpose don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of any sort() function or algorithm will defeat its purpose don't you think?
Agree.
I don't think it is likely that anyone will think to use a sorting function, because they should already be at the point where they're inserting nodes into a tree.
Therefore, I don't think it is useful to have this text here, and I will find it confusing becuase I will read it and think "why are you telling me this?"
It could be that I am biased and this text does help those who see it from a different point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's possible that in future exercism adds a track in which the obvious choice would be to use the sort() function or a sorting algorithm. In this case this exercise might become invalid for that track.
- It is also possible that the person contributing the exercise is new and binary search tree is a new concept to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore current implementations of the testsuite do test for usually internal/private functions of such a datastructure.
They ask for the left or right subtree of the root node and assert if its value is correct given the insertion order.
The do roughly the following:
root = empty |> insert(1) |> insert(3) |> insert(2)
three = left_of(root)
two = root |> left_of |> right_of
assert value_of(root) = 1
assert value_of(three) = 3
assert value_of(two= 2
So not only testing if the tree contains the values, but also if it is built correctly is not possible by just using a sort
function, since original order of insertion dictates the structure of the tree.
So I do not think that bullet 1 is a concern if you are testing for internals in this canonical data. If you do not, please add those tests.
A person that is new to the BST, will see the README.md which explains the exercise to solve. The comments in the canonical data are meant as a guidance/help for the implementor of a testsuite/a generator, not to describe the actual exercise.
Because of this two reasons, I am with @petertseng and think this part of the comment should be removed,
"cases": [{ | ||
"description": "insert invalid/null data into the tree", | ||
"property": "insert", | ||
"value": null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar reasoning as #925: I don't think it is good to have this case.
In most languages that have static type checking, even if they have nullable references, integers are not reference types, so if the function under test is defined to take an integer,
null
is not a possible input.For languages without static type checking,
null
would be a possible input, but I also generally do not see functions that expect to take integers explicitly check fornull
(or language equivalent); let me know if I'm wrong. So I don't think the tests need to make students write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I stated in the comments : if the specific track does not have any equivalent to null
they should use -1
. The purpose of this case is not limited to null
but actually invalid data. I knew about #925 when I was writing this case and that's why I added the above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the specific track does not have any equivalent to null they should use -1. The purpose of this case is not limited to
null
but actually invalid data
I don't understand why -1 is invalid for any reason other than "the comments of this file say so".
I think it is a strange restriction to only allow positive numbers. It seems like the restriction is there for the sole purpose of classifying some numbers to be invalid, allowing there to be a test for inserting invalid values. If that is the sole purpose of the restriction, I would think it just makes more sense to not have the restriction and not have this test.
Further, if negative values are to be deemed invalid values for the tree being tested, that seems like information that is very important to be added in description.md, where students will see it, not canonical-data.json (where students will not see it!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the exercise is supposed to teach how to implement a data structure. And a data structure should know the difference between valid and invalid data before inserting it. Now some languages do not have static type checking whereas some don't have equivalent of null. But we still need to draw a line between the two. And I agree the description.md might need an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case should exist, because if we draw a line between valid and invalid data and specify in the description we should also have a test case to check against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a data structure should know the difference between valid and invalid data before inserting it.
I expect that for a BST, by contract, the criterion for validity is a standard "are two elements comparable against each other?". I therefore also expect that this is the criterion that the tests will use, rather than a different standard ("is the number positive?", etc.)
So then, in a language with static type checking, this would get enforced by the type system. In a language without one, now it gets interesting. We could consider a test of adding in "hello" to a tree that already contains a 6, for example. This would fail. However, is it the student's responsibility to write the code that checks for that? The failure might happen without the student having to write any additional code. If it would, then there is no value in having this test.
the exercise is supposed to teach how to implement a data structure.
This makes it useful to keep the standard criterion for validity. If the exercise adds in an additional criterion, students may wonder why it exists and/or whether they have to add such an additional restriction in any future BSTs they write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the case for invalid data should exist but anything other than an integer should be considered as invalid. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anything other than an integer should be considered as invalid
Right.
the case for invalid data should exist
Only if it is useful. I haven't been convinced it is useful because in most cases the student doesn't need to write any code to pass the test. Okay, I can name at least one language where 6 < "hello"
is not a type error (whether that be at run time or compile time), so in that language it could be useful.
I can't tell whether #902 is telling us "yes, put such cases in the canonical-data.json and let languages that don't need the test ignore it" or "no, don't put such cases in and let languages that need the tests add them". I am thinking the number of languages that benefit from the case is lower than the number of languages that don't, if that helps decide.
I still say no. Remember that it's not necessary to convince me, it's only necessary to convince those who approve the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A BST is about comperability, and I do not see why string and integer should not be comparable to each other...
In the erlang VM (and therefore in at least 3 languages on exercism) there is a total order defined over all values possible. 6 =< "foo"
is true in erlang, lfe and elixir (in their corresponding syntaxes).
Also after having read the introductionary comment, where you state that we only use positiv numbers as valid input, I'd use an unsigned int in statically typed languages.
But you have specified -1
as an error indicator, but the only thing that can cause an error, is invalid input (which can't happen in accordingly typed languages) or someone else mangles with the tree from the outside, which we suppose not to happen.
So independently how I look at it, I do not feel that this test is necessary in any way.
{ | ||
"description": "search invalid/null data", | ||
"property": "search", | ||
"value": null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar reasoning as #925: I don't think it is good to have this case.
In most languages that have static type checking, even if they have nullable references, integers are not reference types, so if the function under test is defined to take an integer,
null
is not a possible input.For languages without static type checking,
null
would be a possible input, but I also generally do not see functions that expect to take integers explicitly check fornull
(or language equivalent); let me know if I'm wrong. So I don't think the tests need to make students write it.
{ | ||
"description": "insert a number that already exist", | ||
"property": "insert", | ||
"value": 6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see how we can understand that the number already exists, using the information provided in this case. If we suppose that we are inserting into a tree that already had 6
inserted into it at some point, that needs to be specified in some way here. Otherwise, compare this case with the case immediately above it: the only difference is the description
and expected
values, so there doesn't seem to be any apparent reason why one would expect true and the other would expect false.
"description": "search a number that exists", | ||
"property": "search", | ||
"value": 6, | ||
"expected": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that 6 exists in the tree? What tree is this search being performed on?
"description": "search a number that does not exist", | ||
"property": "search", | ||
"value": 11, | ||
"expected": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that 11 does not exist in the tree? What tree is this search being performed on?
"property": "ascendingOrder", | ||
"comments": [ | ||
"Assuming the data is entered in this order :", | ||
"6 -> 1 -> 3 -> 2 -> 5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like these values should be an array of numbers, rather than forcing people to parse this string, splitting by ->
to find the numbers, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but i wasn't sure if i should add a new key called "state" which represents the data entered in the tree at that case in that specific order. This new key "state" will also prove useful in previous cases where its not clear what data the tree currently has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there has to be such a key. You could call it previously_inserted_values
instead of state
if needing a more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um. maybe current_tree_data
because then i can also have this key in all cases and make all cases consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure how to read this...
6, then 1, then 3, then...
or is it
6 after, 1 after 3, after...
or
6 has child 1, has child 3, has child (but which child then?)
Some implementations ask for implementing a function that creates a BST from a list, growing the tree from the first item to the last.
Roughly like this:
foldl Leaf (\ x t -> insert t x) list
In the introductionary comment we could explain that a property "value"
could be used to create a singleton tree, or inserting a single value to a given tree, while "values"
or "tree"
implies to create a tree from a list using this function.
"property": "descendingOrder", | ||
"comments": [ | ||
"Assuming the data is entered in this order :", | ||
"6 -> 1 -> 3 -> 2 -> 5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like these values should be an array of numbers, rather than forcing people to parse this string, splitting by ->
to find the numbers, etc
So in conclusion I should
I'd like your opinions on the order of tests too and if I should add another test. @petertseng @NobbZ It's been a busy week for me, hence the late reply. |
I agree with that
I defer to all tracks that have implemented this exercise.
[
{
"slug": "binary-search-tree",
"blurb": "Insert and search for numbers in a binary tree.",
"readme_url": "https://github.com/exercism/problem-specifications/blob/master/exercises/binary-search-tree/description.md",
"data": null,
"implementations": [
{
"track_id": "csharp",
"url": "https://github.com/exercism/csharp/tree/master/exercises/binary-search-tree"
},
{
"track_id": "clojure",
"url": "https://github.com/exercism/clojure/tree/master/exercises/binary-search-tree"
},
{
"track_id": "coffeescript",
"url": "https://github.com/exercism/coffeescript/tree/master/exercises/binary-search-tree"
},
{
"track_id": "crystal",
"url": "https://github.com/exercism/crystal/tree/master/exercises/binary-search-tree"
},
{
"track_id": "ecmascript",
"url": "https://github.com/exercism/ecmascript/tree/master/exercises/binary-search-tree"
},
{
"track_id": "elixir",
"url": "https://github.com/exercism/elixir/tree/master/exercises/binary-search-tree"
},
{
"track_id": "fsharp",
"url": "https://github.com/exercism/fsharp/tree/master/exercises/binary-search-tree"
},
{
"track_id": "go",
"url": "https://github.com/exercism/go/tree/master/exercises/binary-search-tree"
},
{
"track_id": "haskell",
"url": "https://github.com/exercism/haskell/tree/master/exercises/binary-search-tree"
},
{
"track_id": "java",
"url": "https://github.com/exercism/java/tree/master/exercises/binary-search-tree"
},
{
"track_id": "javascript",
"url": "https://github.com/exercism/javascript/tree/master/exercises/binary-search-tree"
},
{
"track_id": "lua",
"url": "https://github.com/exercism/lua/tree/master/exercises/binary-search-tree"
},
{
"track_id": "perl5",
"url": "https://github.com/exercism/perl5/tree/master/exercises/binary-search-tree"
},
{
"track_id": "ruby",
"url": "https://github.com/exercism/ruby/tree/master/exercises/binary-search-tree"
},
{
"track_id": "scala",
"url": "https://github.com/exercism/scala/tree/master/exercises/binary-search-tree"
},
{
"track_id": "swift",
"url": "https://github.com/exercism/swift/tree/master/exercises/binary-search-tree"
},
{
"track_id": "typescript",
"url": "https://github.com/exercism/typescript/tree/master/exercises/binary-search-tree"
}
]
}
] The suggestions in https://github.com/exercism/docs/blob/master/you-can-help/improve-exercise-metadata.md#extracting-canonical-test-data say to consult the implementations to see what tests to add. I am presently unable to do this on your behalf, so I trust you to do it. |
@kabiir are you still working on this? |
@Insti Yes!, I just didn't get enough time. I've been going through tracks that have implemented tests, they all seem to follow a trend it's like there is a hidden canonical-data.json. Anyway i'm working on it and i'll try to follow the trend too. |
Which is probably entirely due to the fact that people often copy an existing set of tests and modify them to fit their track :) Great to hear that you are working on this by the way! |
So for test case "insert smaller number on left node" and data [4, 2], to check {
"description": "smaller number at left node",
"property": "data",
"node": "root.left",
"tree_data": [4, 2],
"expected": 2
} Or maybe I could add a {
"description": "smaller number at left node",
"property": "data",
"comments": "Left child of node '4' should be node '2'",
"tree_data": [4, 2],
"expected": 2
} I'm inclined to use comments. Thoughts? @petertseng @NobbZ @Insti |
My generator were not able to parse those comments. Therefore I think we need to find a way to do it in a formalized way, but I'M undercaffeinated, so I do not have a clue how right now… |
You may use a series of operations as in https://github.com/exercism/problem-specifications/blob/master/exercises/zipper/canonical-data.json#L86-L96 if you desire. This statement must not be taken as an endorsement or rejection of using a series of operations. |
Since most tests are expecting the tree to be in a certain form. Adding an "comments": [
"The key expectedTree represents a tree as a node object starting from root node",
"a node object consists of 3 keys 'data', 'left' and 'right'",
"The key 'data' contains data of the said node, 'left' contains",
"a node object of left-child node and 'right' contains a node object",
"of right-child node."
]
{
"description": "greater number at right node",
"property": "insert",
"tree_data": [4, 5],
"expectedTree": {
"data": 4,
"left": null,
"right": {
"data": 5,
"left": null,
"right": null
}
}
} |
I agree that our goal should be to check that the tree resulting from inserting a given sequence of numbers has a shape matching the expected shape. I advise that if the trees are implemented as abstract data types that an outside caller such as a test suite may not be able to directly construct a tree that can be compared against the observed tree (otherwise it would force the implementation), so they may be restricted to using the interface to test the shape of the tree. This advice doesn't prevent the proposed representation, therefore it is not a rejection of the proposal. The schema will require that |
One advantage of the A disadvantage is that it doesn't represent the entire tree, which increases the number of tests required to determine that the whole tree is in the state expected. I'm moderately in favor of the traversal path suggestion even so, because I suspect that for most tests, we don't need to test the entire tree state to determine whether or not the student has done the right thing; we just need to ensure that the result of the tested operation is as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very good!
Per #996, we're modifying policy such that inputs should all appear under the input
key. For new canonical data, I think it best to make the change now. For this exercise, that should be a simple matter of replacing all instances of tree_data
with input
.
This helps test generators keep other data, such as the node
property, out of the test inputs.
You'll need to wrap it rather than just replace it.
Would still not match the specification.
|
@Insti I agree, but I still don't understand how this particular canonical-data can be parsed by test generators and produce a usable output. |
@kabiir I've got exercise generators on my mind because I recently wrote the one the Rust track uses. I can't speak for other tracks' exercise generators, but what the Rust track's generator will do is group the tests by the tested fn process_data_case<I, O>(input: I, expected: O) {
unimplemented!()
}
fn process_sorteddata_case<I, O>(input: I, expected: O) {
unimplemented!()
} It then generates a bunch of actual tests written in terms of those functions. Crucially, if the case in the canonical data includes So, in cases such as Back to // assuming BST is the name of the student's struct
fn process_data_case(input: HashMap<&'static str, Vec<&'static str>>, expected: HashMap<&'static str, Option<&'static str>>) {
let mut student_bst = BST::new();
for item in input.get("tree_data").unwrap() {
student_bst.insert(item.parse());
}
unimplemented!() // TODO: build the expected tree here, which might be kind of tedious
assert_eq!(student_bst, expected_bst);
} The exercise generator does the busywork of generating a whole bunch of tests which look like this2: #[test]
#[ignore]
fn test_data_is_retained() {
process_data_case(hashmap!("tree_data" => vec!["4"]), hashmap!("data" => "4", "left" => None, "right" => None));
} However, it's not the responsibility of the exercise generator to get a complete exercise ready to merge without human input. It's just there to streamline and simplify the work of generating an exercise.
|
It is a precondition to the test thus it counts as input to the test generator even if it's not going to be an argument to the function being tested. |
@coriolinus I saw that some of the tracks were using strings to represent numbers so I made the choice of using strings because the person who will implement the test whether using test generators or not will know that data is not restricted to integers only. Since depending upon the test the student will model there program and its not described in the Readme which data type input to BST is going to be. So the tests should not restrict it to numbers or strings but I had to make a choice so strings. So I get it that the canonical-data doesn't necessarily have to be 100% test generators friendly right? I mean I can try to follow all of the standard but at some point some test cases just won't be parsed by test generators @Insti Should I mention that on the comment? |
Why do you think this would be the case?
Probably. |
I thought test generators were general purpose for all exercises |
{ | ||
"description": "same number at left node", | ||
"property": "data", | ||
"node": "root->left", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this file you have taken your suggestion of #940 (comment) to use expected
to represent the entire tree, and not your suggestion of #940 (comment) to list one specific node being traversed to. If you confirm this is true, I think this node
line should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!, good catch.
"description": "data is retained", | ||
"property": "data", | ||
"input": { | ||
"tree_data": ["4"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coriolinus @Insti
I just saw issue #987
Should the input key tree_data
be changed to lowerCamelCase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a mild preference for snake_case
over lowerCamelCase
, but TBH I didn't even think about the applicability of #987 here.
As I understand it, the driving intent behind that issue was to avoid using keys which require special handling because they contain spaces or otherwise are not permissible as variable names. As that's handled by either naming convention, I don't have a preference one way or the other about whether you should update your cases here.
@rpottsoh @petertseng Is this PR ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#987 should be followed, otherwise PR looks fine.
"description": "smaller number at left node", | ||
"property": "data", | ||
"input": { | ||
"tree_data": ["4", "2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "same number at left node", | ||
"property": "data", | ||
"input": { | ||
"tree_data": ["4", "4"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "greater number at right node", | ||
"property": "data", | ||
"input": { | ||
"tree_data": ["4", "5"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can create complex tree", | ||
"property": "data", | ||
"input": { | ||
"tree_data": ["4", "2", "6", "1", "3", "5", "7"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can sort single number", | ||
"property": "sortedData", | ||
"input": { | ||
"tree_data": ["2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can sort if second number is smaller than first", | ||
"property": "sortedData", | ||
"input": { | ||
"tree_data": ["2", "1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can sort if second number is same as first", | ||
"property": "sortedData", | ||
"input": { | ||
"tree_data": ["2", "2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can sort if second number is greater than first", | ||
"property": "sortedData", | ||
"input": { | ||
"tree_data": ["2", "3"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
"description": "can sort complex tree", | ||
"property": "sortedData", | ||
"input": { | ||
"tree_data": ["2", "1", "3", "6", "7", "5"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be lowerCamelCase in accordance with #987.
convert to lowerCamelCase
I think this is now ready to be merged. Not sure if it should be squashed. |
Way to go @kabiir! That was a lot of work over a period of time. |
Canonical version 1.0.0 from exercism/problem-specifications#940 This commit does not contain a test generator for the exercise.
Closes #555
Edit to add: Description