Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary-search-tree: implement canonical-data.json #940

Merged
merged 5 commits into from
Jan 5, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions exercises/binary-search-tree/canonical-data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{
"exercise": "binary-search-tree",
"version": "1.0.0",
"comments": [
"For simplicity only positive integer is considered as valid data,",
"null should be replaced by the respective language equivalent or -1",
"if the language does not have any.",

"The purpose of ths exercise is to create a data structure which",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on how this sentence affects someone implementing this exercise for a given track, and I don't think there was any danger of anyone using a sorting function for this exercise anyway? Will you explain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well as i understand this exercise from description.md the purpose is to create a data structure which represent sorted data and specifically binary search tree. Usage of any sort() function or algorithm will defeat its purpose don't you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of any sort() function or algorithm will defeat its purpose don't you think?

Agree.

I don't think it is likely that anyone will think to use a sorting function, because they should already be at the point where they're inserting nodes into a tree.

Therefore, I don't think it is useful to have this text here, and I will find it confusing becuase I will read it and think "why are you telling me this?"

It could be that I am biased and this text does help those who see it from a different point of view.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's possible that in future exercism adds a track in which the obvious choice would be to use the sort() function or a sorting algorithm. In this case this exercise might become invalid for that track.
  • It is also possible that the person contributing the exercise is new and binary search tree is a new concept to them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore current implementations of the testsuite do test for usually internal/private functions of such a datastructure.

They ask for the left or right subtree of the root node and assert if its value is correct given the insertion order.

The do roughly the following:

root = empty |> insert(1) |> insert(3) |> insert(2)
three = left_of(root)
two = root |> left_of |> right_of
assert value_of(root) = 1
assert value_of(three) = 3
assert value_of(two= 2

So not only testing if the tree contains the values, but also if it is built correctly is not possible by just using a sort function, since original order of insertion dictates the structure of the tree.

So I do not think that bullet 1 is a concern if you are testing for internals in this canonical data. If you do not, please add those tests.


A person that is new to the BST, will see the README.md which explains the exercise to solve. The comments in the canonical data are meant as a guidance/help for the implementor of a testsuite/a generator, not to describe the actual exercise.


Because of this two reasons, I am with @petertseng and think this part of the comment should be removed,

"makes sure the data is always sorted, hence the exercise should",
"not make use of any sort() functions or sorting algorithms."
],
"cases": [{
"description": "insert invalid/null data into the tree",
"property": "insert",
"value": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar reasoning as #925: I don't think it is good to have this case.

In most languages that have static type checking, even if they have nullable references, integers are not reference types, so if the function under test is defined to take an integer, null is not a possible input.

For languages without static type checking, null would be a possible input, but I also generally do not see functions that expect to take integers explicitly check for null (or language equivalent); let me know if I'm wrong. So I don't think the tests need to make students write it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in the comments : if the specific track does not have any equivalent to null they should use -1. The purpose of this case is not limited to null but actually invalid data. I knew about #925 when I was writing this case and that's why I added the above comment.

Copy link
Member

@petertseng petertseng Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the specific track does not have any equivalent to null they should use -1. The purpose of this case is not limited to null but actually invalid data

I don't understand why -1 is invalid for any reason other than "the comments of this file say so".

I think it is a strange restriction to only allow positive numbers. It seems like the restriction is there for the sole purpose of classifying some numbers to be invalid, allowing there to be a test for inserting invalid values. If that is the sole purpose of the restriction, I would think it just makes more sense to not have the restriction and not have this test.

Further, if negative values are to be deemed invalid values for the tree being tested, that seems like information that is very important to be added in description.md, where students will see it, not canonical-data.json (where students will not see it!).

Copy link
Author

@devkabiir devkabiir Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the exercise is supposed to teach how to implement a data structure. And a data structure should know the difference between valid and invalid data before inserting it. Now some languages do not have static type checking whereas some don't have equivalent of null. But we still need to draw a line between the two. And I agree the description.md might need an update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case should exist, because if we draw a line between valid and invalid data and specify in the description we should also have a test case to check against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a data structure should know the difference between valid and invalid data before inserting it.

I expect that for a BST, by contract, the criterion for validity is a standard "are two elements comparable against each other?". I therefore also expect that this is the criterion that the tests will use, rather than a different standard ("is the number positive?", etc.)

So then, in a language with static type checking, this would get enforced by the type system. In a language without one, now it gets interesting. We could consider a test of adding in "hello" to a tree that already contains a 6, for example. This would fail. However, is it the student's responsibility to write the code that checks for that? The failure might happen without the student having to write any additional code. If it would, then there is no value in having this test.

the exercise is supposed to teach how to implement a data structure.

This makes it useful to keep the standard criterion for validity. If the exercise adds in an additional criterion, students may wonder why it exists and/or whether they have to add such an additional restriction in any future BSTs they write.

Copy link
Author

@devkabiir devkabiir Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the case for invalid data should exist but anything other than an integer should be considered as invalid. Right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything other than an integer should be considered as invalid

Right.

the case for invalid data should exist

Only if it is useful. I haven't been convinced it is useful because in most cases the student doesn't need to write any code to pass the test. Okay, I can name at least one language where 6 < "hello" is not a type error (whether that be at run time or compile time), so in that language it could be useful.

I can't tell whether #902 is telling us "yes, put such cases in the canonical-data.json and let languages that don't need the test ignore it" or "no, don't put such cases in and let languages that need the tests add them". I am thinking the number of languages that benefit from the case is lower than the number of languages that don't, if that helps decide.

I still say no. Remember that it's not necessary to convince me, it's only necessary to convince those who approve the pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A BST is about comperability, and I do not see why string and integer should not be comparable to each other...

In the erlang VM (and therefore in at least 3 languages on exercism) there is a total order defined over all values possible. 6 =< "foo" is true in erlang, lfe and elixir (in their corresponding syntaxes).

Also after having read the introductionary comment, where you state that we only use positiv numbers as valid input, I'd use an unsigned int in statically typed languages.

But you have specified -1 as an error indicator, but the only thing that can cause an error, is invalid input (which can't happen in accordingly typed languages) or someone else mangles with the tree from the outside, which we suppose not to happen.

So independently how I look at it, I do not feel that this test is necessary in any way.

"expected": false
},
{
"description": "insert a number into the tree",
"property": "insert",
"value": 6,
"expected": true
},
{
"description": "insert a number that already exist",
"property": "insert",
"value": 6,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see how we can understand that the number already exists, using the information provided in this case. If we suppose that we are inserting into a tree that already had 6 inserted into it at some point, that needs to be specified in some way here. Otherwise, compare this case with the case immediately above it: the only difference is the description and expected values, so there doesn't seem to be any apparent reason why one would expect true and the other would expect false.

"expected": false
},
{
"description": "search invalid/null data",
"property": "search",
"value": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar reasoning as #925: I don't think it is good to have this case.

In most languages that have static type checking, even if they have nullable references, integers are not reference types, so if the function under test is defined to take an integer, null is not a possible input.

For languages without static type checking, null would be a possible input, but I also generally do not see functions that expect to take integers explicitly check for null (or language equivalent); let me know if I'm wrong. So I don't think the tests need to make students write it.

"expected": false
},
{
"description": "search a number that exists",
"property": "search",
"value": 6,
"expected": true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that 6 exists in the tree? What tree is this search being performed on?

},
{
"description": "search a number that does not exist",
"property": "search",
"value": 11,
"expected": false
Copy link
Member

@petertseng petertseng Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that 11 does not exist in the tree? What tree is this search being performed on?

},
{
"description": "traverse the data in ascending order",
"property": "ascendingOrder",
"comments": [
"Assuming the data is entered in this order :",
"6 -> 1 -> 3 -> 2 -> 5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like these values should be an array of numbers, rather than forcing people to parse this string, splitting by -> to find the numbers, etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but i wasn't sure if i should add a new key called "state" which represents the data entered in the tree at that case in that specific order. This new key "state" will also prove useful in previous cases where its not clear what data the tree currently has.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there has to be such a key. You could call it previously_inserted_values instead of state if needing a more descriptive name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um. maybe current_tree_data because then i can also have this key in all cases and make all cases consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure how to read this...

6, then 1, then 3, then...

or is it

6 after, 1 after 3, after...

or

6 has child 1, has child 3, has child (but which child then?)

Some implementations ask for implementing a function that creates a BST from a list, growing the tree from the first item to the last.

Roughly like this:

foldl Leaf (\ x t -> insert t x) list

In the introductionary comment we could explain that a property "value" could be used to create a singleton tree, or inserting a single value to a given tree, while "values" or "tree" implies to create a tree from a list using this function.

],
"expected": [1, 2, 3, 5, 6]
},
{
"description": "traverse the data in descending order",
"property": "descendingOrder",
"comments": [
"Assuming the data is entered in this order :",
"6 -> 1 -> 3 -> 2 -> 5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like these values should be an array of numbers, rather than forcing people to parse this string, splitting by -> to find the numbers, etc

],
"expected": [6, 5, 3, 2, 1]
}
]
}