-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformance Rust tests added #20
Conversation
Here are some of the questions we have so far:
|
@eefriedman @mbrubeck - Can you please review. |
@jdm, during the execution of the test case automation suite, these strange files get created at some step. We were not able to decipher where or why they get created. However, I have deleted them and have added the entry to the ".gitignore" file to make sure that they never get pushed again. I had added this entry to .gitignore in the commit before the one which had the issue, however I made the commit while inside the /src/ directory, hence ignoring the .gitignore file. I hope everything looks good now. |
@mbrubeck , @eefriedman, @jdm while we have pushed out code in this commit, we are yet to integrate the code for our steps L1 and N0 into lib.rs. Here are some of the points that I need help on, to proceed with this integration:
|
For the record, @mbrubeck is unfortunately on vacation today and this weekend :/ |
str has a few related methods for extracting characters; see https://doc.rust-lang.org/nightly/std/primitive.str.html#method.chars and https://doc.rust-lang.org/nightly/std/primitive.str.html#method.char_indices . For performance reasons, it's usually best to stick with the
Separate the code into different files however you feel is best. Putting the code for a complicated rule with a single entry point into a separate file seems like a good idea.
Separating them seems fine.
At a high level, step N0 is similar to
See IsolatingRunSequence. |
…s. Also removed the .pyc files and added them to .gitignore
Thanks, @eefriedman. We'll get the changes in as soon as possible! |
@eefriedman I have some doubts too. Could you please answer them.
We have added test cases for the method process_text(). The test cases are of the same format as the existing test cases: assert_eq!(process_text("abc123", Some(0)), BidiInfo { a) Is the following test case correct? assert_eq!(process_text("\u{FA5E}\u{1F102}\u{FF0B}\u{20B0}\u{FE55}\u{E017E}\u{001C}\u{001F}\u{2000}\u{1015A}\u{2066}\u{2067}\u{2068}\u{2069}", Some(0)), BidiInfo { levels: vec![ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], classes: vec![L, EN, ES, ET, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI], paragraphs: vec![ParagraphInfo { range: 0..14, level: 0 } ], }); b) As per my understanding the value in the Some field in the given test should be the values of the bitset (para levels). Is that correct? c) While inserting the test cases in lib.rs can we remove Bidi Classes which correspond to levels with the value x. for example, can we ignore this test case? @levels: x or in the following example, can we remove LRE, LRO, RLE from these test cases? @levels: 0 x
|
@eefriedman here is the test case from BidiCharacterTest.txt: 05D0 0028 05D1 0061 2680 0028 005B 0029 005D;0;0;1 1 1 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8 |
Are you sure you're converting the testcase correctly? You can end up with either your left or your right result depending on the default paragraph embedding level (an LTR context versus an RTL context). |
I am taking the index values given in the test case to form the result. Here is the description of a test case: I am using Field 0 as input, and taking indexing into Field 0 using Field 4 as output indices. Do you think there is something wrong with this approach? The LTR/RTL context should be taken care of by the output, right? |
The "0" in Field 1 represents left-to-right, which means that you should apply HL1 and explicitly set the paragraph embedding level to zero. |
Does this mean that I take the output produced by using the indexes of Field 0, and reverse the string? |
|
@eefriedman, I noticed that as well. Thanks for the help. I'll try with this and get back to you! |
I am now getting this error: thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in This is the test case: assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40 Here is the input: 05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13 Do you know why this fails? |
A "do not lie on character boundary" panic generally means you're using string indexing incorrectly; e.g. |
I am not able to make sense of this backtrace. Is there any indicator of the point of failure here? thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in stack backtrace: failures: test result: FAILED. 11 passed; 1 failed; 0 ignored; 0 measured thread ' ' panicked at 'Some tests failed', ../src/libtest/lib.rs:252stack backtrace: 1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs 2: 0x564cc1e851b6 - panicking::on_panic::ha085a58a08f78856lzx 3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w 4: 0x564cc1e35461 - rt::unwind::begin_unwind::h218749302018745608 5: 0x564cc1e36e1c - test_main::h6a499577710d847aI1a 6: 0x564cc1e3d6d2 - test_main_static::hb27df092379fd7dbd4a 7: 0x564cc1e3404c - __test::main::hfd578a788dc6470cDee 8: 0x564cc1e84add - __rust_try 9: 0x564cc1e86e7a - rt::lang_start::hefba4015e797c325hux 10: 0x564cc1e341cb - main 11: 0x7fe3178016ff - __libc_start_main 12: 0x564cc1dfb5f8 - _start 13: 0x0 - |
@eefriedman, I have commented out the code that we had inserted. It seems that this error arises even with existing code. This tells me that the test case itself is going wrong somewhere. Are you sure this is the correct test case? BidiCharacterTest line 4005D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13 assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40 I have made the following function to test reorder line while specifying the paragraph level input: fn reorder_with_para_level(s: &str, level: Option) -> Cow { |
It looks like it's crashing at Line 169 in 2a99bca
Line 204 in 2a99bca
Anyway, I don't think your test should be triggering this; I'm guessing it's a bug in the code. |
Thanks for looking into it. How should we go about this now? We've implemented all the steps and added code to pull test cases from BidiTest.txt and BidiCharacterTest.txt |
@vicky-katara For now, you can temporarily comment out or delete the failing test(s), or add code to your Python script to do this (maybe using a list of line numbers that are known to fail). File a new issue in GitHub about the failure so that we will come back and fix it later. |
@mbrubeck, there are more than a 100,000 test cases in total, from BidiCharacterTest.txt and BidiTest.txt. I don't think finding a list of failing test cases is viable. Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)? |
Yes, that sounds good. |
We're getting failed test cases for test cases like this from BidiTest.txt: @ Levels: x 4 #Count: 4 We feel we built the test case correctly. Could you help us convert this test case into a rust assert test case of this form: This is what we have: assert_eq!(process_text("\u{0669}\u{11F2}\u{1D7D6}\u{10E74}", Some(1)), |
Here is what we have accomplished:
Additionally: *Identified defect with process_text [https://github.com//issues/22]. |
Great work, thanks! I haven't had time to review all of this code yet, but I look forward to getting it merged. |
Sorry I haven't reviewed this PR yet; I'm just back from vacation and will be working on it this week. Since there's a lot of code here split across a few dozen commits, I think I'll try to split it into a few separate PRs that can be reviewed landed separately. If I find code that needs to be changed, I'm happy to apply the changes myself (and have another Servo developer review my changes as needed). Or if any of you have the time and inclination to keep working on this code, that would be great too—just let me know. |
☔ The latest upstream changes (presumably #25) made this pull request unmergeable. Please resolve the merge conflicts. |
I think we can close this PR now that we have another approach for having conformance tests (as integration tests) in #30 . There's still more work needed (including adding the char-dependent tests rules), but they should be done in a separate PR, IMHO. |
Files added:
src/BidiCharacterTest.txt
src/bracket_pairs.rs
src/BidiTest.txt
src/UnicodeData.txt
tools/bidiCharacterTestParser.py
tools/bidiTest.py
tools/bidiCharacterTestParser.pyc
tools/bidiTest.pyc
Preliminary pull request. Need help with integration of new code with existing code. Verification needed for Rust conformance tests added.