Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf-8 validation for input source #2374

Merged
merged 1 commit into from
Jul 6, 2023

Conversation

tamaroning
Copy link
Contributor

@tamaroning tamaroning commented Jul 4, 2023

Addresses #2287

gcc/rust/ChangeLog:

	* lex/rust-lex.cc (Lexer::input_source_is_valid_utf8): New method of `Lexer`.
	* lex/rust-lex.h: Likewise.
	* rust-session-manager.cc (Session::compile_crate): Add error.

gcc/testsuite/ChangeLog:

	* rust/compile/broken_utf8.rs: New test.

Comment on lines 221 to 225
{
uint8_t input = next_byte ();
uint32_t input = next_byte ();

if ((int8_t) input == EOF)
if ((int32_t) input == EOF)
return Codepoint::eof ();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed input from uint8_t to uint32_t so as to differentiate 0xff and EOF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a bugfix

Comment on lines 364 to 368
{
if (offs >= buffer.size ())
return EOF;

return buffer.at (offs++);
return (uint8_t) buffer.at (offs++);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added casting to prevend bytes whose MSB is 1 from being sign-extended.
Without casting , for example, 0xfe becomes 0xfffffffe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bugfix too

Comment on lines +1 to +2
// { dg-excess-errors "stream did not contain valid UTF-8" }
Copy link
Contributor Author

@tamaroning tamaroning Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains a 0xff in line 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ÿ (U+FF) as we see.

gcc/rust/ChangeLog:

	* lex/rust-lex.cc (Lexer::input_source_is_valid_utf8): New method of `Lexer`.
	* lex/rust-lex.h: Likewise.
	* rust-session-manager.cc (Session::compile_crate): Add error.

gcc/testsuite/ChangeLog:

	* rust/compile/broken_utf8.rs: New test.

Signed-off-by: Raiki Tamura <[email protected]>
@tamaroning tamaroning mentioned this pull request Jul 4, 2023
15 tasks
@philberty philberty requested review from CohenArthur, P-E-P and philberty and removed request for CohenArthur and philberty July 6, 2023 10:55
Copy link
Member

@philberty philberty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@philberty
Copy link
Member

The only thing i might say is it would be nice to have these constants named in some way but utf8 stuff is so specific i dont think that really helps.

@philberty philberty added this pull request to the merge queue Jul 6, 2023
Merged via the queue into Rust-GCC:master with commit 46a61f0 Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants