Skip to content

Commit

Permalink
Add section about merge[] and separate[] to the docs.
Browse files Browse the repository at this point in the history
Partially addresses #55.
  • Loading branch information
tzlaine committed Jan 14, 2024
1 parent 6fc7eef commit d17268d
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/parser.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@

[def _p_api_ [link boost_parser__proposed_.tutorial.the__parse____api the `parse()` API]]
[def _rule_parsers_ [link boost_parser__proposed_.tutorial.rule_parsers Rule Parsers]]
[def _parsing_structs_ [link boost_parser__proposed_.tutorial.parsing__struct_s Parsing `struct`s]]
[def _expect_pts_ [link boost_parser__proposed_.tutorial.backtracking.html#boost_parser__proposed_.tutorial.backtracking.expectation_points Expectation points]]
[def _attr_gen_ [link boost_parser__proposed_.tutorial.attribute_generation Attribute Generation]]
[def _directives_ [link boost_parser__proposed_.tutorial.directives Directives]]
Expand Down
84 changes: 83 additions & 1 deletion doc/tutorial.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -1912,7 +1912,89 @@ same attribute generation rules.
[[`p1 | p2[a] | p3`] [`std::optional<std::variant<_ATTR_np_(p1), _ATTR_np_(p3)>>`]]
]

[heading Directives that affect attribute generation]
[heading Controlling attribute generation with _merge_ and _sep_]

As we saw in the previous _parsing_structs_ section, if you parse two strings
in a row, you get two separate strings in the resulting attribute. The parser
from that example was this:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
>> '{'
>> bp::int_ >> ','
>> quoted_string >> ','
>> quoted_string >> ','
>> bp::double_
>> '}';

`employee_parser`'s attribute is `_bp_tup_<int, std::string, std::string,
double>`. The two `quoted_string` parsers produce `std::string` attributes,
and those attributes are not combined. That is the default behavior, and it
is just what we want for this case; we don't want the first and last name
fields to be jammed together such that we can't tell where one name ends and
the other begins. What if we were parsing some string that consisted of a
prefix and a suffix, and the prefix and suffix were defined separately for
reuse elsewhere?

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = prefix >> suffix;
// Continue to use prefix and suffix to make other parsers....

In this case, we might want to use these separate parsers, but want
`special_string` to produce a single `std::string` for its attribute. _merge_
exists for this purpose.

namespace bp = boost::parser;
auto prefix = /* ... */;
auto suffix = /* ... */;
auto special_string = bp::merge[prefix >> suffix];

_merge_ only applies to sequence parsers (like `p1 >> p2`), and forces all
subparsers in the sequence parser to use the same variable for their
attribute.

Another directive, _sep_, also applies only to sequence parsers, but does the
opposite of _merge_. If forces all the attributes produced by the subparsers
of the sequence parser to stay separate, even if they would have combined.
For instance, consider this parser.

namespace bp = boost::parser;
auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;

`string_and_char` matches one or more `'a'`s, followed by some other
character. As written above, `string_and_char` produces a `std::string`, and
the final character is appended to the string, after all the `'a'`s. However,
if you wanted to store the final character as a separate value, you would use
_sep_.

namespace bp = boost::parser;
auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];

With this change, `string_and_char` produces the attribute
`_bp_tup_<std::string, char32_t>`.

[heading _merge_ and _sep_ in more detail]

As mentioned previously, _merge_ applies only to sequence parsers. All
subparsers must have the same attribute, or produce no attribute at all. At
least one subparser must produce an attribute. When you use _merge_, you
create a /combining group/. Every parser in a combining group uses the same
variable for its attribute. No parser in a combining group interacts with the
attributes of any parsers outside of its combining group. Combining groups
are disjoint; `merge[/*...*/] >> merge[/*...*/]` will produce a tuple of two
attributes, not one.

_sep_ also applies only to sequence parsers. When you use _sep_, you disable
interaction of all the subparsers' attributes with adjacent attributes,
whether they are inside or outside the _sep_ directive; you force each
subparser to have a separate attribute.

The rules for _merge_ and _sep_ overrule the steps of the algorithm described
above for combining the attributes of a sequence parser.

[heading Other directives that affect attribute generation]

`_omit_np_[p]` disables attribute generation for the parser `p`.
`_raw_np_[p]` changes the attribute from `_ATTR_np_(p)` to a view that
Expand Down
1 change: 0 additions & 1 deletion test/merge_separate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ TEST(merge_separate, merge_)
EXPECT_EQ(*result, detail::hl::make_tuple('a', 'c', 'd'));
}
}
// TODO: Don't forget to document that merge[eps >> eps] is ill-formed.
{
constexpr auto parser = char_ >> merge[eps >> char_ >> char_] >> char_;

Expand Down

0 comments on commit d17268d

Please sign in to comment.