Conformance Rust tests added #20

samruds1 · 2015-12-03T11:13:02Z

Files added:
src/BidiCharacterTest.txt
src/bracket_pairs.rs
src/BidiTest.txt
src/UnicodeData.txt

tools/bidiCharacterTestParser.py
tools/bidiTest.py
tools/bidiCharacterTestParser.pyc
tools/bidiTest.pyc

Preliminary pull request. Need help with integration of new code with existing code. Verification needed for Rust conformance tests added.

This reverts commit 6823cea.

…m in lib.rs

…argo test passing

All conflicts merged

…deletion

samruds1 · 2015-12-03T11:25:52Z

Here are some of the questions we have so far:

Rust code for step N0 has been added to bracket_pairs.rs, need help with integration into lib.rs.
Code for W1-W7 has been added to lib.rs. Please verify if correct.
Need clarifications for the Rust test cases inserted in src/lib.rs:
a) We are not able to execute cargo test possibly due to large number of test cases.
b) For test cases added from BidiTest.txt we have removed characters where level equals 'x'. Is this correct.
L1 step has not been implemented yet.

samruds1 · 2015-12-04T21:19:26Z

@eefriedman @mbrubeck - Can you please review.

vicky-katara · 2015-12-04T23:02:40Z

@jdm, during the execution of the test case automation suite, these strange files get created at some step. We were not able to decipher where or why they get created. However, I have deleted them and have added the entry to the ".gitignore" file to make sure that they never get pushed again. I had added this entry to .gitignore in the commit before the one which had the issue, however I made the commit while inside the /src/ directory, hence ignoring the .gitignore file.

I hope everything looks good now.

vicky-katara · 2015-12-04T23:18:08Z

@mbrubeck , @eefriedman, @jdm while we have pushed out code in this commit, we are yet to integrate the code for our steps L1 and N0 into lib.rs. Here are some of the points that I need help on, to proceed with this integration:

I need to know how to give input to the resolve_bracket_pairs() function in bracket_pairs.rs.

(a) Where will I find the indexes[] array containing character codes for the characters in the input string?

(b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?

(c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.
Apart from these, we need information on integration of our code.

(a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?

(b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?

jdm · 2015-12-04T23:24:10Z

For the record, @mbrubeck is unfortunately on vacation today and this weekend :/

eefriedman · 2015-12-05T00:24:33Z

(a) Where will I find the indexes[] array containing character codes for the characters in the input string?

str has a few related methods for extracting characters; see https://doc.rust-lang.org/nightly/std/primitive.str.html#method.chars and https://doc.rust-lang.org/nightly/std/primitive.str.html#method.char_indices . For performance reasons, it's usually best to stick with the &str representation of a string rather than converting it to a &[char].

(a) Should the code for the new steps be part of lib.rs or can they appear in a different file and be imported into lib.rs?

Separate the code into different files however you feel is best. Putting the code for a complicated rule with a single entry point into a separate file seems like a good idea.

(b) Should we keep the rust modules for the two steps L1 and N0 in different rust source code files or the in the same one,(in the case we that we are not integrating them into lib.rs)?

Separating them seems fine.

(c) How do I deliver output codes back for the next step to proceed. For this I need to pass the 'codes' array which holds the output codes for the brackets.

At a high level, step N0 is similar to resolve_weak: you're taking the array of BIDI classes, and resolving some of them, so the entry point probably looks similar.

(b) Where will I find the 'sos' and 'level' variables holding the start of sequence and the embedding direction level for the given run?

See IsolatingRunSequence.

…s. Also removed the .pyc files and added them to .gitignore

vicky-katara · 2015-12-07T10:02:33Z

Thanks, @eefriedman. We'll get the changes in as soon as possible!

samruds1 · 2015-12-12T05:32:39Z

@eefriedman I have some doubts too. Could you please answer them.

For the test cases related to BidiTest.txt:

We have added test cases for the method process_text(). The test cases are of the same format as the existing test cases:

assert_eq!(process_text("abc123", Some(0)), BidiInfo {
levels: vec![0, 0, 0, 0, 0, 0],
classes: vec![L, L, L, EN, EN, EN],
paragraphs: vec![ParagraphInfo { range: 0..6, level: 0 }],
});

a) Is the following test case correct?

assert_eq!(process_text("\u{FA5E}\u{1F102}\u{FF0B}\u{20B0}\u{FE55}\u{E017E}\u{001C}\u{001F}\u{2000}\u{1015A}\u{2066}\u{2067}\u{2068}\u{2069}", Some(0)), BidiInfo { levels: vec![ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], classes: vec![L, EN, ES, ET, CS, NSM, B, S, WS, ON, LRI, RLI, FSI, PDI], paragraphs: vec![ParagraphInfo { range: 0..14, level: 0 } ], });

b) As per my understanding the value in the Some field in the given test should be the values of the bitset (para levels). Is that correct?

c) While inserting the test cases in lib.rs can we remove Bidi Classes which correspond to levels with the value x.

for example, can we ignore this test case?

@levels: x
@reorder:
LRE; 7
LRO; 7
RLE; 7

or in the following example, can we remove LRE, LRO, RLE from these test cases?

@levels: 0 x
@reorder: 0
L LRE; 3
L LRO; 3
L RLE; 3

For the implementation of step L1 can a separate function be created (resolve_white_space) instead of inserting the code in the visual_runs method?

vicky-katara · 2015-12-14T03:34:39Z

@eefriedman here is the test case from BidiCharacterTest.txt:

05D0 0028 05D1 0061 2680 0028 005B 0029 005D;0;0;1 1 1 0 0 0 0 0 0;2 1 0 3 4 5 6 7 8

eefriedman · 2015-12-14T04:39:27Z

Are you sure you're converting the testcase correctly? You can end up with either your left or your right result depending on the default paragraph embedding level (an LTR context versus an RTL context).

vicky-katara · 2015-12-14T04:44:06Z

I am taking the index values given in the test case to form the result.

Here is the description of a test case:
-# Field 0: A sequence of hexadecimal code point values separated by space
-# Field 1: A value representing the paragraph direction, as follows:
-# 0 represents left-to-right
-# 1 represents right-to-left
-# 2 represents auto-LTR according to rules P2 and P3 of the algorithm
-# Field 2: The resolved paragraph embedding level
-# Field 3: A list of resolved levels; characters removed in rule X9 are
-# indicated with an 'x'
-# Field 4: A list of indices showing the resulting visual ordering from
-# left to right; characters with a resolved level of 'x' are skipped

I am using Field 0 as input, and taking indexing into Field 0 using Field 4 as output indices. Do you think there is something wrong with this approach? The LTR/RTL context should be taken care of by the output, right?

eefriedman · 2015-12-14T04:53:29Z

The "0" in Field 1 represents left-to-right, which means that you should apply HL1 and explicitly set the paragraph embedding level to zero.

vicky-katara · 2015-12-14T04:55:27Z

Does this mean that I take the output produced by using the indexes of Field 0, and reverse the string?

eefriedman · 2015-12-14T05:17:21Z

process_text has two inputs; the first is the string, the second is the paragraph embedding level.

vicky-katara · 2015-12-14T05:28:19Z

@eefriedman, I noticed that as well. Thanks for the help. I'll try with this and get back to you!

vicky-katara · 2015-12-14T05:32:41Z

I am now getting this error:

thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh do not lie on character boundary', ../src/libcore/str/mod.rs:1284

This is the test case: assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40

Here is the input:

05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13

Do you know why this fails?

eefriedman · 2015-12-14T05:39:27Z

A "do not lie on character boundary" panic generally means you're using string indexing incorrectly; e.g. let x = &"⚀"[1..];. (See also https://doc.rust-lang.org/nightly/book/strings.html#slicing .) Try getting a backtrace to figure out where it's happening.

vicky-katara · 2015-12-14T06:01:23Z

I am not able to make sense of this backtrace. Is there any indicator of the point of failure here?

thread 'test::test_reorder_line' panicked at 'index 2 and/or 3 in אב(גד[&ef].)gh do not lie on character boundary', ../src/libcore/str/mod.rs:1284

stack backtrace:
1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs
2: 0x564cc1e84f25 - panicking::on_panic::ha085a58a08f78856lzx
3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w
4: 0x564cc1e791f6 - rt::unwind::begin_unwind_fmt::ha4be06289e0df3dbIZw
5: 0x564cc1e84b86 - rust_begin_unwind
6: 0x564cc1eb3aa4 - panicking::panic_fmt::he7875691f9cbe589SgC
7: 0x564cc1eb624f - str::slice_error_fail::hbe442ed6cb13a72dsqK
8: 0x564cc1e0688a - str::traits::str.ops..Index<ops..Range>::index::h909793bab573faecccK
at ../src/libcore/str/mod.rs:1104
9: 0x564cc1e0e9c0 - reorder_line::h180dad6937165a51e6b
at /home/servo/unicode-bidi/src/lib.rs:176
10: 0x564cc1e32aae - test::test_reorder_line::reorder_with_para_level::ha84ac2c49163f9500Id
at /home/servo/unicode-bidi/src/lib.rs:1237
11: 0x564cc1e31e66 - test::test_reorder_line::h00b48c9c08476d87iId
at /home/servo/unicode-bidi/tools/:1262
12: 0x564cc1e5611b - boxed::F.FnBox::call_box::h13096251856404813500
13: 0x564cc1e58de1 - boxed::F.FnBox::call_box::h17352075482666389094
14: 0x564cc1e568c9 - rt::unwind::try::try_fn::h2399095994474881594
15: 0x564cc1e84add - __rust_try
16: 0x564cc1e7fe37 - rt::unwind::try::inner_try::h59523aa853a0e10avWw
17: 0x564cc1e56ae8 - boxed::F.FnBox::call_box::h389096937566750418
18: 0x564cc1e83c81 - sys::thread::Thread::new::thread_start::h890d1188505773c835v
19: 0x7fe317fc7554 - start_thread
20: 0x7fe3178e3b9c - __clone
21: 0x0 -

failures:
test::test_reorder_line

test result: FAILED. 11 passed; 1 failed; 0 ignored; 0 measured

thread '

' panicked at 'Some tests failed', ../src/libtest/lib.rs:252
stack backtrace:
1: 0x564cc1e817fe - sys::backtrace::write::haf6e4e635ac76143Ivs
2: 0x564cc1e851b6 - panicking::on_panic::ha085a58a08f78856lzx
3: 0x564cc1e7885e - rt::unwind::begin_unwind_inner::hc90ee27246f12475C0w
4: 0x564cc1e35461 - rt::unwind::begin_unwind::h218749302018745608
5: 0x564cc1e36e1c - test_main::h6a499577710d847aI1a
6: 0x564cc1e3d6d2 - test_main_static::hb27df092379fd7dbd4a
7: 0x564cc1e3404c - __test::main::hfd578a788dc6470cDee
8: 0x564cc1e84add - __rust_try
9: 0x564cc1e86e7a - rt::lang_start::hefba4015e797c325hux
10: 0x564cc1e341cb - main
11: 0x7fe3178016ff - __libc_start_main
12: 0x564cc1dfb5f8 - _start
13: 0x0 -

vicky-katara · 2015-12-14T06:44:35Z

@eefriedman, I have commented out the code that we had inserted. It seems that this error arises even with existing code. This tells me that the test case itself is going wrong somewhere. Are you sure this is the correct test case?

BidiCharacterTest line 40

05D0 05D1 0028 05D2 05D3 005B 0026 0065 0066 005D 002E 0029 0067 0068;0;0;1 1 0 1 1 0 0 0 0 0 0 0 0 0;1 0 2 4 3 5 6 7 8 9 10 11 12 13

assert_eq!(reorder_with_para_level("\u{05D0}\u{05D1}\u{0028}\u{05D2}\u{05D3}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}", Some(0)),"\u{05D1}\u{05D0}\u{0028}\u{05D3}\u{05D2}\u{005B}\u{0026}\u{0065}\u{0066}\u{005D}\u{002E}\u{0029}\u{0067}\u{0068}");//BidiCharacterTest.txt Line Number:40

I have made the following function to test reorder line while specifying the paragraph level input:

fn reorder_with_para_level(s: &str, level: Option) -> Cow {
let info = process_text(s, level);
let para = &info.paragraphs[0];
reorder_line(s, para.range.clone(), &info.levels)
}

eefriedman · 2015-12-14T08:25:09Z

It looks like it's crashing at

unicode-bidi/src/lib.rs

Line 169 in 2a99bca

result.push_str(&text[run]);

... and it looks like that range comes from

unicode-bidi/src/lib.rs

Line 204 in 2a99bca

runs.push(start..i);

. I'm guessing we're somehow violating the constraint that every byte of a character has the same level?

Anyway, I don't think your test should be triggering this; I'm guessing it's a bug in the code.

vicky-katara · 2015-12-14T08:29:22Z

Thanks for looking into it.

How should we go about this now?

We've implemented all the steps and added code to pull test cases from BidiTest.txt and BidiCharacterTest.txt

mbrubeck · 2015-12-14T16:08:51Z

@vicky-katara For now, you can temporarily comment out or delete the failing test(s), or add code to your Python script to do this (maybe using a list of line numbers that are known to fail). File a new issue in GitHub about the failure so that we will come back and fix it later.

vicky-katara · 2015-12-14T16:12:00Z

@mbrubeck, there are more than a 100,000 test cases in total, from BidiCharacterTest.txt and BidiTest.txt. I don't think finding a list of failing test cases is viable. Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?

mbrubeck · 2015-12-14T17:57:35Z

Will it be fine if we include a list of passing test cases, with comments describing how to add all test cases back to lib.rs (for example by uncommenting code which adds test cases)?

Yes, that sounds good.

vicky-katara · 2015-12-14T21:36:16Z

@eefriedman @mbrubeck

We're getting failed test cases for test cases like this from BidiTest.txt:

@ Levels: x 4
@ Reorder: 1
LRE AN; 7
RLE L; 4
RLE EN; 4
RLE AN; 4

#Count: 4

We feel we built the test case correctly. Could you help us convert this test case into a rust assert test case of this form:

This is what we have:

assert_eq!(process_text("\u{0669}\u{11F2}\u{1D7D6}\u{10E74}", Some(1)),
BidiInfo { levels: vec![ 7, 4, 4, 4], classes: vec![AN, L, EN, AN],
paragraphs: vec![ParagraphInfo { range: 0..4, level: 4 } ], });

vicky-katara · 2015-12-15T11:49:28Z

@mbrubeck @eefriedman @jdm

Here is what we have accomplished:

add code to tools/generate.py that automatically converts the tests in the conformance suite into Rust tests that can be run automatically
- Added code to [Code present in bidiCharacterTestParser.py] delete previously added automated test cases (to prevent creating multiple copies of test cases)
- Added code to pull test cases from BidiCharacterTest.txt [Code present in bidiCharacterTestParser.py]. The function call is commented because of the issue with process text.
- Added code to pull test cases from BidiTest.txt [Code present in bidiTest.py]. These test cases do not match with the output of process_text. We were not able to get them to match. The function call for this has been commented as well.
implement the missing step L1 from the UBA
- Added algorithm step L1 [Code added to lib.rs]. Partial testing complete.
implement the missing step N0 from the UBA
- Added algorithm step N0 [Code added to lib.rs in 'mod bracket_pair_resolver']. Partial testing complete.
solve the conformance problems related to the implementation of steps W1 to W7
- modified algorithm steps for W1~W7 [Existing code modified]. Partial testing complete.

Additionally:

*Identified defect with process_text [https://github.com//issues/22].

mbrubeck · 2015-12-15T19:07:29Z

Great work, thanks! I haven't had time to review all of this code yet, but I look forward to getting it merged.

mbrubeck · 2016-01-05T01:35:28Z

Sorry I haven't reviewed this PR yet; I'm just back from vacation and will be working on it this week. Since there's a lot of code here split across a few dozen commits, I think I'll try to split it into a few separate PRs that can be reviewed landed separately. If I find code that needs to be changed, I'm happy to apply the changes myself (and have another Servo developer review my changes as needed). Or if any of you have the time and inclination to keep working on this code, that would be great too—just let me know.

bors-servo · 2017-02-08T01:23:41Z

☔ The latest upstream changes (presumably #25) made this pull request unmergeable. Please resolve the merge conflicts.

behnam · 2017-05-15T20:17:23Z

I think we can close this PR now that we have another approach for having conformance tests (as integration tests) in #30 . There's still more work needed (including adding the char-dependent tests rules), but they should be done in a separate PR, IMHO.

tarunchhabra26 and others added 15 commits November 27, 2015 19:41

Made changes to lib.rs. Added separate passes for W1-W7

6823cea

Revert "Made changes to lib.rs. Added separate passes for W1-W7"

49b55ff

This reverts commit 6823cea.

Added comment markers which will mark insertion of test cases in lib.rs

345f104

Added code to parse test cases from BidiCharacterTest.txt and put the…

a838c81

…m in lib.rs

Made changes for W1-W7, by preserving previous state of prev_state. C…

3c04697

…argo test passing

Merge branch 'master' of https://github.com/samrudhisharma/unicode-bidi

b56ad6d

All conflicts merged

Added previously inserted test case removal logic

e920787

Added comment to lib.rs to not remove marker comments

f55e721

Changes added to generate.py to integrate the test case addition and …

72cc2e6

…deletion

Clean version of lib.rs

6d7d434

Clean version of lib.rs

52c2540

BidiTestParser.py added

2e2c403

Automated Test Cases inserted

d971d8b

Removed Temp files

2da8999

Removed BidiTestReduced.txt

d2eb304

root and others added 3 commits December 3, 2015 20:06

Formatted bracket_pairs.rs and Integrated brackets.rs

6b7ef82

Made changes to .gitignore

2e6d569

L1 step added. Some tests fail

711b3a8

Modified .gitignore to remove the tmp* files

6948afd

vicky-katara added 2 commits December 4, 2015 19:43

Converted bracket_pairs.rs into a mod ready to be imported into lib.r…

6314638

…s. Also removed the .pyc files and added them to .gitignore

Comments and changes

9ee4bbf

Adjusted Indentation of resolve weak

ff1c8c5

Added paragraph level input to test cases from BidiCharacterTest.txt

da18f09

vicky-katara added 3 commits December 14, 2015 17:25

Made formatting changes and added comments

ab5f48b

Removed Failing Test Case

34e5a71

Attempted duplicate elimination!

a78e0d5

vicky-katara added 2 commits December 15, 2015 10:17

Corrected test case file

77611d0

Removed BidiTestRed.txt

a1b5d84

add isolate formatting chars to white_space

7bed4e6

jdm closed this May 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conformance Rust tests added #20

Conformance Rust tests added #20

samruds1 commented Dec 3, 2015

samruds1 commented Dec 3, 2015

samruds1 commented Dec 4, 2015

vicky-katara commented Dec 4, 2015

vicky-katara commented Dec 4, 2015

jdm commented Dec 4, 2015

eefriedman commented Dec 5, 2015

vicky-katara commented Dec 7, 2015

samruds1 commented Dec 12, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

mbrubeck commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

mbrubeck commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 15, 2015

mbrubeck commented Dec 15, 2015

mbrubeck commented Jan 5, 2016

bors-servo commented Feb 8, 2017

behnam commented May 15, 2017

Conformance Rust tests added #20

Conformance Rust tests added #20

Conversation

samruds1 commented Dec 3, 2015

samruds1 commented Dec 3, 2015

samruds1 commented Dec 4, 2015

vicky-katara commented Dec 4, 2015

vicky-katara commented Dec 4, 2015

jdm commented Dec 4, 2015

eefriedman commented Dec 5, 2015

vicky-katara commented Dec 7, 2015

samruds1 commented Dec 12, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

BidiCharacterTest line 40

eefriedman commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

mbrubeck commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

mbrubeck commented Dec 14, 2015

vicky-katara commented Dec 14, 2015

vicky-katara commented Dec 15, 2015

mbrubeck commented Dec 15, 2015

mbrubeck commented Jan 5, 2016

bors-servo commented Feb 8, 2017

behnam commented May 15, 2017