Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformance Rust tests added #20

Closed
wants to merge 34 commits into from
Closed

Conversation

samruds1
Copy link

@samruds1 samruds1 commented Dec 3, 2015

Files added:
src/BidiCharacterTest.txt
src/bracket_pairs.rs
src/BidiTest.txt
src/UnicodeData.txt

tools/bidiCharacterTestParser.py
tools/bidiTest.py
tools/bidiCharacterTestParser.pyc
tools/bidiTest.pyc

Preliminary pull request. Need help with integration of new code with existing code. Verification needed for Rust conformance tests added.

Review on Reviewable

@samruds1
Copy link
Author

samruds1 commented Dec 3, 2015

Here are some of the questions we have so far:

  1. Rust code for step N0 has been added to bracket_pairs.rs, need help with integration into lib.rs.
  2. Code for W1-W7 has been added to lib.rs. Please verify if correct.
  3. Need clarifications for the Rust test cases inserted in src/lib.rs:
    a) We are not able to execute cargo test possibly due to large number of test cases.
    b) For test cases added from BidiTest.txt we have removed characters where level equals 'x'. Is this correct.
  4. L1 step has not been implemented yet.

@samruds1
Copy link
Author

samruds1 commented Dec 4, 2015

@eefriedman @mbrubeck - Can you please review.

@vicky-katara
Copy link
Contributor

@jdm, during the execution of the test case automation suite, these strange files get created at some step. We were not able to decipher where or why they get created. However, I have deleted them and have added the entry to the ".gitignore" file to make sure that they never get pushed again. I had added this entry to .gitignore in the commit before the one which had the issue, however I made the commit while inside the /src/ directory, hence ignoring the .gitignore file.

I hope everything looks good now.

@vicky-katara
Copy link
Contributor

@mbrubeck , @eefriedman, @jdm while we have pushed out code in this commit, we are yet to integrate the code for our steps L1 and N0 into lib.rs. Here are some of the points that I need help on, to proceed with this integration:

  • I need to know how to give input to the resolve_bracket_pairs() function in bracket_pairs.rs.

    (a) Where will I find the indexes[] array containing character codes for the characters in the input string?

    (b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?

    (c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.

  • Apart from these, we need information on integration of our code.

    (a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?

    (b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?

@jdm
Copy link
Member

jdm commented Dec 4, 2015

For the record, @mbrubeck is unfortunately on vacation today and this weekend :/

@eefriedman
Copy link
Contributor

(a) Where will I find the indexes[] array containing character codes for the characters in the input string?

str has a few related methods for extracting characters; see https://doc.rust-lang.org/nightly/std/primitive.str.html#method.chars and https://doc.rust-lang.org/nightly/std/primitive.str.html#method.char_indices . For performance reasons, it's usually best to stick with the &str representation of a string rather than converting it to a &[char].

(a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?

Separate the code into different files however you feel is best. Putting the code for a complicated rule with a single entry point into a separate file seems like a good idea.

(b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?

Separating them seems fine.

(c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.

At a high level, step N0 is similar to resolve_weak: you're taking the array of BIDI classes, and resolving some of them, so the entry point probably looks similar.

(b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?

See IsolatingRunSequence.

@vicky-katara
Copy link
Contributor

Thanks, @eefriedman. We'll get the changes in as soon as possible!

@samruds1
Copy link
Author

@eefriedman I have some doubts too. Could you please answer them.

  1. For the test cases related to BidiTest.txt:

We have added test cases for the method process_text(). The test cases are of the same format as the existing test cases:

assert_eq!(process_text("abc123", Some(0)), BidiInfo {
levels: vec![0, 0, 0, 0, 0, 0],
classes: vec![L, L, L, EN, EN, EN],
paragraphs: vec![ParagraphInfo { range: 0..6, level: 0 }],
});

a) Is the following test case correct?

assert_eq!(process_text("\u{FA5E}\u{1F102}\u{FF0B}\u{20B0}\u{FE55}\u{E017E}\u{001C}\u{001F}\u{2000}\u{1015A}\u{2066}\u{2067}\u{2068}\u{2069}", Some(0)), BidiInfo { levels: vec![ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], classes: vec![L, EN, ES, ET, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI], paragraphs: vec![ParagraphInfo { range: 0..14, level: 0 } ], });

b) As per my understanding the value in the Some field in the given test should be the values of the bitset (para levels). Is that correct?

c) While inserting the test cases in lib.rs can we remove Bidi Classes which correspond to levels with the value x.

for example, can we ignore this test case?

@levels: x
@reorder:
LRE; 7
LRO; 7
RLE; 7

or in the following example, can we remove LRE, LRO, RLE from these test cases?

@levels: 0 x
@reorder: 0
L LRE; 3
L LRO; 3
L RLE; 3

  1. For the implementation of step L1 can a separate function be created (resolve_white_space) instead of inserting the code in the visual_runs method?

@vicky-katara
Copy link
Contributor

@eefriedman here is the test case from BidiCharacterTest.txt:

05D0 0028 05D1 0061 2680 0028 005B 0029 005D;0;0;1 1 1 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8

@eefriedman
Copy link
Contributor

Are you sure you're converting the testcase correctly? You can end up with either your left or your right result depending on the default paragraph embedding level (an LTR context versus an RTL context).

@vicky-katara
Copy link
Contributor

I am taking the index values given in the test case to form the result.

Here is the description of a test case:
-# Field 0: A sequence of hexadecimal code point values separated by space
-# Field 1: A value representing the paragraph direction, as follows:
-# 0 represents left-to-right
-# 1 represents right-to-left
-# 2 represents auto-LTR according to rules P2 and P3 of the algorithm
-# Field 2: The resolved paragraph embedding level
-# Field 3: A list of resolved levels; characters removed in rule X9 are
-# indicated with an 'x'
-# Field 4: A list of indices showing the resulting visual ordering from
-# left to right; characters with a resolved level of 'x' are skipped

I am using Field 0 as input, and taking indexing into Field 0 using Field 4 as output indices. Do you think there is something wrong with this approach? The LTR/RTL context should be taken care of by the output, right?

@eefriedman
Copy link
Contributor

The "0" in Field 1 represents left-to-right, which means that you should apply HL1 and explicitly set the paragraph embedding level to zero.

@vicky-katara
Copy link
Contributor

Does this mean that I take the output produced by using the indexes of Field 0, and reverse the string?

@eefriedman
Copy link
Contributor

process_text has two inputs; the first is the string, the second is the paragraph embedding level.

@vicky-katara
Copy link
Contributor

@eefriedman, I noticed that as well. Thanks for the help. I'll try with this and get back to you!

@vicky-katara
Copy link
Contributor

I am now getting this error:

thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh do not lie on character boundary', ../src/libcore/str/mod.rs:1284

This is the test case: assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40

Here is the input:

05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13

Do you know why this fails?

@eefriedman
Copy link
Contributor

A "do not lie on character boundary" panic generally means you're using string indexing incorrectly; e.g. let x = &"⚀"[1..];. (See also https://doc.rust-lang.org/nightly/book/strings.html#slicing .) Try getting a backtrace to figure out where it's happening.

@vicky-katara
Copy link
Contributor

I am not able to make sense of this backtrace. Is there any indicator of the point of failure here?

thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh do not lie on character boundary', ../src/libcore/str/mod.rs:1284

stack backtrace:
1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs
2: 0x564cc1e84f25 - panicking::on_panic::ha085a58a08f78856lzx
3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w
4: 0x564cc1e791f6 - rt::unwind::begin_unwind_fmt::ha4be06289e0df3dbIZw
5: 0x564cc1e84b86 - rust_begin_unwind
6: 0x564cc1eb3aa4 - panicking::panic_fmt::he7875691f9cbe589SgC
7: 0x564cc1eb624f - str::slice_error_fail::hbe442ed6cb13a72dsqK
8: 0x564cc1e0688a - str::traits::str.ops..Index<ops..Range>::index::h909793bab573faecccK
at ../src/libcore/str/mod.rs:1104
9: 0x564cc1e0e9c0 - reorder_line::h180dad6937165a51e6b
at /home/servo/unicode-bidi/src/lib.rs:176
10: 0x564cc1e32aae - test::test_reorder_line::reorder_with_para_level::ha84ac2c49163f9500Id
at /home/servo/unicode-bidi/src/lib.rs:1237
11: 0x564cc1e31e66 - test::test_reorder_line::h00b48c9c08476d87iId
at /home/servo/unicode-bidi/tools/:1262
12: 0x564cc1e5611b - boxed::F.FnBox::call_box::h13096251856404813500
13: 0x564cc1e58de1 - boxed::F.FnBox::call_box::h17352075482666389094
14: 0x564cc1e568c9 - rt::unwind::try::try_fn::h2399095994474881594
15: 0x564cc1e84add - __rust_try
16: 0x564cc1e7fe37 - rt::unwind::try::inner_try::h59523aa853a0e10avWw
17: 0x564cc1e56ae8 - boxed::F.FnBox::call_box::h389096937566750418
18: 0x564cc1e83c81 - sys::thread::Thread::new::thread_start::h890d1188505773c835v
19: 0x7fe317fc7554 - start_thread
20: 0x7fe3178e3b9c - __clone
21: 0x0 -

failures:
test::test_reorder_line

test result: FAILED. 11 passed; 1 failed; 0 ignored; 0 measured

thread '

' panicked at 'Some tests failed', ../src/libtest/lib.rs:252
stack backtrace:
1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs
2: 0x564cc1e851b6 - panicking::on_panic::ha085a58a08f78856lzx
3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w
4: 0x564cc1e35461 - rt::unwind::begin_unwind::h218749302018745608
5: 0x564cc1e36e1c - test_main::h6a499577710d847aI1a
6: 0x564cc1e3d6d2 - test_main_static::hb27df092379fd7dbd4a
7: 0x564cc1e3404c - __test::main::hfd578a788dc6470cDee
8: 0x564cc1e84add - __rust_try
9: 0x564cc1e86e7a - rt::lang_start::hefba4015e797c325hux
10: 0x564cc1e341cb - main
11: 0x7fe3178016ff - __libc_start_main
12: 0x564cc1dfb5f8 - _start
13: 0x0 -

@vicky-katara
Copy link
Contributor

@eefriedman, I have commented out the code that we had inserted. It seems that this error arises even with existing code. This tells me that the test case itself is going wrong somewhere. Are you sure this is the correct test case?

BidiCharacterTest line 40

05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13

assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40

I have made the following function to test reorder line while specifying the paragraph level input:

fn reorder_with_para_level(s: &str, level: Option) -> Cow {
let info = process_text(s, level);
let para = &info.paragraphs[0];
reorder_line(s, para.range.clone(), &info.levels)
}

@eefriedman
Copy link
Contributor

It looks like it's crashing at

result.push_str(&text[run]);
... and it looks like that range comes from
runs.push(start..i);
. I'm guessing we're somehow violating the constraint that every byte of a character has the same level?

Anyway, I don't think your test should be triggering this; I'm guessing it's a bug in the code.

@vicky-katara
Copy link
Contributor

Thanks for looking into it.

How should we go about this now?

We've implemented all the steps and added code to pull test cases from BidiTest.txt and BidiCharacterTest.txt

@mbrubeck
Copy link
Contributor

@vicky-katara For now, you can temporarily comment out or delete the failing test(s), or add code to your Python script to do this (maybe using a list of line numbers that are known to fail). File a new issue in GitHub about the failure so that we will come back and fix it later.

@vicky-katara
Copy link
Contributor

@mbrubeck, there are more than a 100,000 test cases in total, from BidiCharacterTest.txt and BidiTest.txt. I don't think finding a list of failing test cases is viable. Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?

@mbrubeck
Copy link
Contributor

Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?

Yes, that sounds good.

@vicky-katara
Copy link
Contributor

@eefriedman @mbrubeck

We're getting failed test cases for test cases like this from BidiTest.txt:

@ Levels: x 4
@ Reorder: 1
LRE AN; 7
RLE L; 4
RLE EN; 4
RLE AN; 4

#Count: 4

We feel we built the test case correctly. Could you help us convert this test case into a rust assert test case of this form:

This is what we have:

assert_eq!(process_text("\u{0669}\u{11F2}\u{1D7D6}\u{10E74}", Some(1)),
BidiInfo { levels: vec![ 7, 4, 4, 4], classes: vec![AN, L, EN, AN],
paragraphs: vec![ParagraphInfo { range: 0..4, level: 4 } ], });

@vicky-katara
Copy link
Contributor

@mbrubeck @eefriedman @jdm

Here is what we have accomplished:

  • add code to tools/generate.py that automatically converts the tests in the conformance suite into Rust tests that can be run automatically
    • Added code to [Code present in bidiCharacterTestParser.py] delete previously added automated test cases (to prevent creating multiple copies of test cases)
    • Added code to pull test cases from BidiCharacterTest.txt [Code present in bidiCharacterTestParser.py]. The function call is commented because of the issue with process text.
    • Added code to pull test cases from BidiTest.txt [Code present in bidiTest.py]. These test cases do not match with the output of process_text. We were not able to get them to match. The function call for this has been commented as well.
  • implement the missing step L1 from the UBA
    • Added algorithm step L1 [Code added to lib.rs]. Partial testing complete.
  • implement the missing step N0 from the UBA
    • Added algorithm step N0 [Code added to lib.rs in 'mod bracket_pair_resolver']. Partial testing complete.
  • solve the conformance problems related to the implementation of steps W1 to W7
    • modified algorithm steps for W1~W7 [Existing code modified]. Partial testing complete.

Additionally:

*Identified defect with process_text [https://github.com//issues/22].

@mbrubeck
Copy link
Contributor

Great work, thanks! I haven't had time to review all of this code yet, but I look forward to getting it merged.

@mbrubeck
Copy link
Contributor

mbrubeck commented Jan 5, 2016

Sorry I haven't reviewed this PR yet; I'm just back from vacation and will be working on it this week. Since there's a lot of code here split across a few dozen commits, I think I'll try to split it into a few separate PRs that can be reviewed landed separately. If I find code that needs to be changed, I'm happy to apply the changes myself (and have another Servo developer review my changes as needed). Or if any of you have the time and inclination to keep working on this code, that would be great too—just let me know.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #25) made this pull request unmergeable. Please resolve the merge conflicts.

@behnam
Copy link
Contributor

behnam commented May 15, 2017

I think we can close this PR now that we have another approach for having conformance tests (as integration tests) in #30 . There's still more work needed (including adding the char-dependent tests rules), but they should be done in a separate PR, IMHO.

@jdm jdm closed this May 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants