diff --git a/spec/Appendix A -- Notation Conventions.md b/spec/Appendix A -- Notation Conventions.md index cbb8e8a3a..d21f62fba 100644 --- a/spec/Appendix A -- Notation Conventions.md +++ b/spec/Appendix A -- Notation Conventions.md @@ -48,10 +48,12 @@ ListOfLetterA : The GraphQL language is defined in a syntactic grammar where terminal symbols are tokens. Tokens are defined in a lexical grammar which matches patterns of -source characters. The result of parsing a sequence of source Unicode characters -produces a GraphQL AST. +source characters. The result of parsing a source text sequence of Unicode +characters first produces a sequence of lexical tokens according to the lexical +grammar which then produces abstract syntax tree (AST) according to the +syntactical grammar. -A Lexical grammar production describes non-terminal "tokens" by +A lexical grammar production describes non-terminal "tokens" by patterns of terminal Unicode characters. No "whitespace" or other ignored characters may appear between any terminal Unicode characters in the lexical grammar production. A lexical grammar production is distinguished by a two colon diff --git a/spec/Appendix B -- Grammar Summary.md b/spec/Appendix B -- Grammar Summary.md index efdcae8f8..cd1f629be 100644 --- a/spec/Appendix B -- Grammar Summary.md +++ b/spec/Appendix B -- Grammar Summary.md @@ -1,5 +1,10 @@ # B. Appendix: Grammar Summary +The source text of a GraphQL document must be a sequence of {SourceCharacter}. +The character sequence must be described by a sequence of {Token} and {Ignored} +lexical grammars. The lexical token sequence, omitting {Ignored}, must be +described by a single {Document} syntactical grammar. + SourceCharacter :: /[\u0009\u000A\u000D\u0020-\uFFFF]/ diff --git a/spec/Section 2 -- Language.md b/spec/Section 2 -- Language.md index bf31b2db7..2ae62893a 100644 --- a/spec/Section 2 -- Language.md +++ b/spec/Section 2 -- Language.md @@ -7,11 +7,13 @@ common unit of composition allowing for query reuse. A GraphQL document is defined as a syntactic grammar where terminal symbols are tokens (indivisible lexical units). These tokens are defined in a lexical -grammar which matches patterns of source characters (defined by a -double-colon `::`). +grammar which matches patterns of source characters. In this document, syntactic +grammar productions are distinguished with a colon `:` while lexical grammar +productions are distinguished with a double-colon `::`. -Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more details about the definition of lexical and syntactic grammar and other notational conventions -used in this document. +Note: See [Appendix A](#sec-Appendix-Notation-Conventions) for more information +about the lexical and syntactic grammar and other notational conventions used +throughout this document. ## Source Text @@ -25,6 +27,19 @@ ASCII range so as to be as widely compatible with as many existing tools, languages, and serialization formats as possible and avoid display issues in text editors and source control. +**Greedy Lexical Parsing** + +The source text of a GraphQL document is first converted into a sequence of +lexical tokens, {Token}, and ignored tokens, {Ignored}. The source text is +scanned from left to right, repeatedly taking the longest possible sequence of +unicode characters as the next token. + +For example, the sequence `123` is always interpreted as a single {IntValue}, +and `""""""` is always interpreted as a single block {StringValue}. + +This sequence of lexical tokens are then scanned from left to right to produce +an abstract syntax tree (AST) according to the {Document} syntactical grammar. + ### Unicode @@ -118,8 +133,7 @@ Token :: A GraphQL document is comprised of several kinds of indivisible lexical tokens defined here in a lexical grammar by patterns of source Unicode characters. -Tokens are later used as terminal symbols in a GraphQL Document -syntactic grammars. +Tokens are later used as terminal symbols in GraphQL syntactic grammar rules. ### Ignored Tokens