Gauche has a built-in regular expression engine which is mostly upper-compatible of POSIX extended regular expression, plus some extensions from Perl 5 regexp.
A special syntax is provided for literal regular expressions. Also regular expressions are applicable, that is, it works like procedures that match the given string to itself. Combining with these two features enables writing some string matching idioms compact.
(find #/pattern/ list-of-strings) ⇒ match object or #f
• Regular expression syntax: | ||
• Using regular expressions: | ||
• Inspecting and assembling regular expressions: |
#/regexp-spec/
¶#/regexp-spec/i
¶Denotes literal regular expression object. When read, it becomes
an instance of <regexp>
.
If a letter ’i
’ is given at the end, the created regexp
becomes case-folding regexp, i.e. it matches in the case-insensitive
way.
The advantage of using this syntax over string->regexp
is
that the regexp is compiled only once. You can use literal regexp
inside loop without worrying about regexp compilation overhead.
If you want to construct regexp on-the-fly, however, use string->regexp
.
Gauche’s built-in regexp syntax follows POSIX extended regular
expression, with a bit of extensions taken from Perl.
(Scheme Regulare Expression (SRE) is also supported as an alternative
syntax; see scheme.regex
- R7RS regular expressions, for the details of SRE.)
re*
Matches zero or more repetition of re.
re+
Matches one or more repetition of re.
re?
Matches zero or one occurrence of re.
re{n}
re{n,m}
Bounded repetition. re{n}
matches exactly n
occurrences of re.
re{n,m}
matches at least n and at most
m occurrences of re, where n <= m.
In the latter form, either n or m can be omitted; omitted n
is assumed as 0, and omitted m is assumed infinity.
re*?
re+?
re??
re{n,m}?
Same as the above repetition construct, but these syntaxes use "non-greedy" or "lazy" match strategy. That is, they try to match the minimum number of occurrences of re first, then retry longer ones only if it fails. In the last form either n or m can be omitted. Compare the following examples:
(rxmatch-substring (#/<.*>/ "<tag1><tag2><tag3>") 0) ⇒ "<tag1><tag2><tag3>" (rxmatch-substring (#/<.*?>/ "<tag1><tag2><tag3>") 0) ⇒ "<tag1>"
(re…)
Clustering with capturing. The regular expression enclosed by parenthesis works as a single re. Besides, the string that matches re … is saved as a submatch.
(?:re…)
Clustering without capturing. re …
works as
a single re, but the matched string isn’t saved.
(?<name>re…)
Named capture and clustering. Like (re…)
,
but adds the name name to the matched substring.
You can refer to the matched substring by both index number
and the name.
When the same name appears more than once in a regular expression, it is undefined which matched substring is returned as the submatch of the named capture.
(?i:re…)
(?-i:re…)
Lexical case sensitivity control.
(?i:re…)
makes re… matches case-insensitively,
while (?-i:re…)
makes
re… matches case-sensitively.
Perl’s regexp allows several more flags to appear between ’?’ and ’:’. Gauche only supports above two, for now.
pattern1|pattern2|…
Alternation. Matches either one of patterns, where each pattern is re ….
\n
Backreference. n is an integer. Matches the substring captured by the n-th capturing group. (counting from 1). When capturing groups are nested, groups are counted by their beginnings. If the n-th capturing group is in a repetition and has matched more than once, the last matched substring is used.
\k<name>
Named backreference. Matches the substring captured by the capturing group with the name name. If the named capturing group is in a repetition and has matched more than once, the last matched substring is used. If there are more than one capturing group with name, matching will succeed if the input matches either one of the substrings captured by those groups.
.
Matches any character (including newline).
[char-set-spec]
Matches any of the character set specified by char-set-spec. See Character Sets, for the details of char-set-spec.
\s, \d, \w
Matches a whitespace character (char-set:ascii-whitespace
,
#[\u0009-\u000d ]
),
a digit character(char-set:ascii-digit
, #[0-9])
, or
a word-constituent character (char-set:ascii-word
, #[A-Za-z0-9_]
),
respectively.
Note that they don’t include characters outside ASCII range.
Can be used both inside and outside of character set.
\S, \D, \W
Matches the complement character set of \s
, \d
and
\w
, respectively.
^, $
Beginning and end of string assertion, when appears at the beginning or end of the pattern, or optionally, beginning and end of line in multi-line mode.
These characters loses special meanings and matches the characters
themselves if they appear in the position
other than the beginning of the pattern (for ^
) or the end
(for $
). For the sake of recognizing those characters,
lookahead/lookbehind
assertions ((?=...)
, (?!...)
, (?<=...)
,
(?<!...)
) and atomic clustering ((?>...)
) are
treated as if they are a whole pattern. That is, ^
at the
beginning of those groupings are beginning-of-string assertion
no matter where these group appear in the containing regexp.
So as $
at the end of these groupings.
\b, \B
Word boundary and non word boundary assertion, respectively.
That is, \b
matches an empty string between
word-constituent character and non-word-constituent character,
and \B
matches an empty string elsewhere.
\;
\"
\#
These are the same as ;
, "
, and #
, respectively,
and can be used to avoid confusing Emacs or other syntax-aware editors
that are not familiar with Gauche’s extension.
(?=pattern)
(?!pattern)
Positive/negative lookahead assertion. Match succeeds if pattern matches (or does not match) the input string from the current position, but this doesn’t move the current position itself, so that the following regular expression is applied again from the current position.
For example, the following expression matches strings that might be a phone number, except the numbers in Japan (i.e. ones that begin with "81").
\+(?!81)\d{9,}
(?<=pattern)
(?<!pattern)
Positive/negative lookbehind assertion. If the input string immediately before the current input position matches pattern, this pattern succeeds or fails, respectively. Like lookahead assertion, the input position isn’t changed.
Internally, this match is tried by reversing pattern and applies it to the backward of input character sequence. So you can write any regexp in pattern, but if the submatches depend on the matching order, you may get different submatches from when you match pattern from left to right.
(?>pattern)
Atomic clustering. Once pattern matches, the match is fixed; even if the following pattern fails, the engine won’t backtrack to try the alternative match in pattern.
re*+
re++
re?+
They are the same as (?>re*), (?>re+), (?>re?), respectively.
(?test-pattern then-pattern)
(?test-pattern then-pattern|else-pattern)
Conditional matching. If test-pattern counts true, then-pattern is tried; otherwise else-pattern is tried when provided.
test-pattern can be either one of the following:
(integer)
Backreference. If integer-th capturing group has a match, this test counts true.
(?=pattern)
(?!pattern)
Positive/negative lookahead assertion. It tries pattern from the current input position without consuming input, and if the match succeeds or fails, respectively, this test counts true.
(?<=pattern)
(?<!pattern)
Positive/negative lookbehind assertion. It tries pattern backward from the left size of the current input position, and if the match succeeds or fails, respectively, this test counts true.
Regular expression object. You can construct a regexp object
from a string by string->regexp
or sre->regexp
at run time. Gauche also
has a special syntax to denote regexp literals, which construct
regexp object at loading time.
Gauche’s regexp engine is fully aware of multibyte characters.
Regexp match object. A regexp matcher rxmatch
returns
this object if match. This object contains all the information
about the match, including submatches.
The advantage of using match object, rather than substrings or list of indices, is efficiency. The regmatch object keeps internal state of match, and computes indices and/or substrings only when requested. This is particularly effective for multibyte strings, for index access is slow on them.
Takes string as a regexp specification, and constructs
an instance of <regexp>
object.
If a true value is given to the keyword argument case-fold, the created regexp object becomes case-folding regexp. (See the above explanation about case-folding regexp).
If a true value is given to the keyword argument multi-line,
^
and $
will assert the beginning and end of line in
addition to beginning and end of string. Popular line terminators (LF
only, CRLF and CR only) are recognized.
Takes a scheme regexp sre and returns a <regexp>
object. The
zero-th group is always captured.
If a false value is given to the keyword argument multi-line,
which is the default,
bol
and eol
behave like bos
and eos
(i.e. only match at the beginning or end of string).
Returns true iff obj is a regexp object.
Returns a source string describing the regexp regexp. The returned string is immutable.
Returns a scheme regexp (SRE) describing the regexp regexp.
See scheme.regex
- R7RS regular expressions, for the details of SRE.
Queries the number of capturing groups, and an alist of named capturing groups, in the given regexp, respectively.
The number of capturing groups corresponds to the number of matches
returned by rxmatch-num-matches
. Note that the entire regexp
forms a group, so the number is always positive.
The alist returned from regexp-named-groups
has the group name
(symbol) in car
, and its subgroup number in cdr
.
Note that the order of groups in the alist isn’t fixed.
(regexp-num-groups #/abc(?<foo>def)(ghi(?<bar>jkl)(mno))/) ⇒ 5 (regexp-named-groups #/abc(?<foo>def)(ghi(?<bar>jkl)(mno))/) ⇒ ((bar . 3) (foo . 1))
Regexp is a regular expression object.
A string string is matched by
regexp. If it matches, the function returns a <regmatch>
object. Otherwise it returns #f
.
If start and/or end are given, only the substring between start (inclusive) and end (exclusive) is searched.
This is called match
, regexp-search
or string-match
in some other Scheme implementations.
Internally, Gauche uses backtracking for regexp match. When regexp has multiple match possibilities, Gauche saves an intermediate result in a stack and try one choice, and if it fails try another. Depending on regexp, the saved results may grow linear to the input. Gauche allocates a fixed amount of memory for that, and if there are too many saved results, you’ll get the following error:
ERROR: Ran out of stack during matching regexp #/.../. Too many retries?
If you get this error, consider using hybrid parsing approach. Our regexp engine isn’t made to do everything-in-one-shot parsing; in most cases, the effect of complex regexp can be achieved better with more powerful grammar than regular grammar.
To apply the match repeatedly on the input string, or to match
from the input stream (such as the data from the port),
you may want to check grxmatch
in gauche.generator
(see Generator operations).
A regular expression object can be applied directly to the string.
This works the same as (rxmatch regexp string)
,
but allows shorter notation. See Applicable objects, for
generic mechanism used to implement this.
Match is a match object returned by rxmatch
.
If i equals to zero, the functions return
start, end or the substring of entire match, respectively.
With positive integer I, it returns those of I-th
submatches. It is an error to pass other values to I.
It is allowed to pass #f
to match for convenience.
The functions return #f
in such case.
These functions correspond to scsh’s match:start
, match:end
and match:substring
.
Returns substring of the input string after or before match. If optional argument is given, the i-th submatch is used (0-th submatch is the entire match).
(define match (rxmatch #/(\d+)\.(\d+)/ "pi=3.14...")) (rxmatch-after match) ⇒ "..." (rxmatch-after match 1) ⇒ ".14..." (rxmatch-before match) ⇒ "pi=" (rxmatch-before match 2) ⇒ "pi=3."
Retrieves multiple submatches (again, 0-th match is the entire match), in substrings and in a cons of start and end position, respectively.
(rxmatch-substrings (#/(\d+):(\d+):(\d+)/ "12:34:56")) ⇒ ("12:34:56" "12" "34" "56") (rxmatch-positions (#/(\d+):(\d+):(\d+)/ "12:34:56")) ⇒ ((0 . 8) (0 . 2) (3 . 5) (6 . 8))
For the convenience, you can pass #f
to match; those
procedures returns ()
in that case.
The optional start and end arguments specify the
range of submatch index. If omitted, start defaults to 0 and
end defaults to (rxmatch-num-matches match)
.
For example, if you don’t need the whole match, you can give 1
to start as follows:
(rxmatch-substrings (#/(\d+):(\d+):(\d+)/ "12:34:56") 1) ⇒ ("12" "34" "56")
A convenience procedure to match a string to the given regexp,
then returns the matched substring, or #f
if it doesn’t match.
If no selector is given, it is the same as this:
(rxmatch-substring (rxmatch regexp string))
If an integer is given as a selector, it returns the substring of the numbered submatch.
If a symbol after
or before
is given, it returns
the substring after or before the match. You can give these symbols
and an integer to extract a substring before or after the numbered
submatch.
gosh> (rxmatch->string #/\d+/ "foo314bar") "314" gosh> (rxmatch->string #/(\w+)@([\w.]+)/ "foo@example.com" 2) "example.com" gosh> (rxmatch->string #/(\w+)@([\w.]+)/ "foo@example.com" 'before 2) "foo@"
'before
:optional index ¶'after
:optional index ¶A regmatch object can be applied directly to the integer index,
or a symbol before
or after
.
They works the same as (rxmatch-substring regmatch index)
,
(rxmatch-before regmatch)
, and
(rxmatch-after regmatch)
, respectively.
This allows shorter notation. See Applicable objects, for
generic mechanism used to implement this.
(define match (#/(\d+)\.(\d+)/ "pi=3.14...")) (match) ⇒ "3.14" (match 1) ⇒ "3" (match 2) ⇒ "14" (match 'after) ⇒ "..." (match 'after 1) ⇒ ".14..." (match 'before) ⇒ "pi=" (match 'before 2) ⇒ "pi=3." (define match (#/(?<integer>\d+)\.(?<fraction>\d+)/ "pi=3.14...")) (match 1) ⇒ "3" (match 2) ⇒ "14" (match 'integer) ⇒ "3" (match 'fraction) ⇒ "14" (match 'after 'integer) ⇒ ".14..." (match 'before 'fraction) ⇒ "pi=3."
Returns the number of matches, and an alist of named groups and whose
indices, in match.
This corresponds regexp-num-groups
and regexp-named-groups
on a regular expression that has been used to generate match.
These procedures are useful to inspect match object without having
the original regexp object.
The number of matches includes the "whole match", so it is always a positive
integer for a <regmatch>
object. The number also includes
the submatches that don’t have value (see the examples below).
The result of rxmatch-named-matches
also includes all the
named groups in the original regexp, not only the matched ones.
For the convenience, rxmatch-num-matches
returns 0
and rxmatch-named-groups
returns ()
if match is #f
.
(rxmatch-num-matches (rxmatch #/abc/ "abc")) ⇒ 1 (rxmatch-num-matches (rxmatch #/(a(.))|(b(.))/ "ba")) ⇒ 5 (rxmatch-num-matches #f) ⇒ 0 (rxmatch-named-groups (rxmatch #/(?<h>\d\d):(?<m>\d\d)(:(?<s>\d\d))?/ "12:34")) ⇒ ((s . 4) (m . 2) (h . 1))
Replaces the part of string that matched to regexp
for substitution. regexp-replace
just replaces
the first match of regexp, while regexp-replace-all
repeats the replacing throughout entire string.
substitution may be a string or a procedure. If it is a string,
it can contain references to the submatches by
digits preceded by a backslash (e.g. \2
)
or the named submatch reference (e.g. \k<name>
.
\0
refers to the
entire match. Note that you need two backslashes to include
backslash character in the literal string; if you want to include a backslash
character itself in the substitution, you need four backslashes.
(regexp-replace #/def|DEF/ "abcdefghi" "...") ⇒ "abc...ghi" (regexp-replace #/def|DEF/ "abcdefghi" "|\\0|") ⇒ "abc|def|ghi" (regexp-replace #/def|DEF/ "abcdefghi" "|\\\\0|") ⇒ "abc|\\0|ghi" (regexp-replace #/c(.*)g/ "abcdefghi" "|\\1|") ⇒ "ab|def|hi" (regexp-replace #/c(?<match>.*)g/ "abcdefghi" "|\\k<match>|") ⇒ "ab|def|hi"
If substitution is a procedure, for every match in string
it is called with one argument, regexp-match object. The returned
value from the procedure is inserted to the output string using
display
.
(regexp-replace #/c(.*)g/ "abcdefghi" (lambda (m) (list->string (reverse (string->list (rxmatch-substring m 1)))))) ⇒ "abfedhi"
Note: regexp-replace-all
applies itself recursively to the remaining
of the string after match. So the beginning of string assertion
in regexp doesn’t only mean the beginning of input string.
Note: If you want to operate on multiple matches in the string
instead of replacing it, you can use lrxmatch
in gauche.lazy
module or grxmatch
in gauche.generator
module. Both can
match a regexp repeatedly and lazily to the given string,
and lrxmatch
returns a lazy sequence of regmatch
es, while grxmatch
returns a generator that yields regmatch
es.
(map rxmatch-substring (lrxmatch #/\w+/ "a quick brown fox!?")) ⇒ ("a" "quick" "brown" "fox")
First applies regexp-replace
or regexp-replace-all
to
string with a regular expression rx1 substituting for
sub1, then applies the function on the result string
with a regular expression rx2 substituting for sub2, and
so on. These functions are handy when you want to apply
multiple substitutions sequentially on a string.
Returns a string with the characters that are special to regexp escaped.
(regexp-quote "[2002/10/12] touched foo.h and *.c") ⇒ "\\[2002/10/12\\] touched foo\\.h and \\*\\.c"
In the following macros, match-expr is an expression
which produces a match object or #f
. Typically
it is a call of rxmatch
, but it can be any expression.
Evaluates match-expr, and if matched, binds var …
to the matched strings, then evaluates forms.
The first var receives the entire match, and subsequent
variables receive submatches. If the number of submatches are
smaller than the number of variables to receive them, the rest
of variables will get #f
.
It is possible to put #f
in variable position, which says
you don’t care that match.
(rxmatch-let (rxmatch #/(\d+):(\d+):(\d+)/ "Jan 1 23:59:58, 2001") (time hh mm ss) (list time hh mm ss)) ⇒ ("23:59:58" "23" "59" "58") (rxmatch-let (rxmatch #/(\d+):(\d+):(\d+)/ "Jan 1 23:59:58, 2001") (#f hh mm) (list hh mm)) ⇒ ("23" "59")
This macro corresponds to scsh’s let-match
.
Evaluates match-expr, and if matched, binds var …
to the matched strings and evaluate then-form.
Otherwise evaluates else-form.
The rule of binding vars is the same as rxmatch-let
.
(rxmatch-if (rxmatch #/(\d+:\d+)/ "Jan 1 11:22:33") (time) (format #f "time is ~a" time) "unknown time") ⇒ "time is 11:22" (rxmatch-if (rxmatch #/(\d+:\d+)/ "Jan 1 11-22-33") (time) (format #f "time is ~a" time) "unknown time") ⇒ "unknown time"
This macro corresponds to scsh’s if-match
.
Evaluate condition in clauses one by one.
If a condition of a clause satisfies, rest portion of the clause
is evaluated and becomes the result of rxmatch-cond
.
Clause may be one of the following pattern.
(match-expr (var …) form …)
Evaluate match-expr, which may return a regexp match
object or #f
. If it returns a match object, the matches
are bound to vars, like rxmatch-let, and forms
are evaluated.
(test expr form …)
Evaluates expr. If it yields true, evaluates forms.
(test expr => proc)
Evaluates expr and if it is true, calls proc with the result of expr as the only argument.
(else form …)
If this clause exists, it must be the last clause. If other clauses fail, forms are evaluated.
If no else
clause exists, and all the other clause fail,
an undefined value is returned.
;; parses several possible date format
(define (parse-date str)
(rxmatch-cond
((rxmatch #/^(\d\d?)\/(\d\d?)\/(\d\d\d\d)$/ str)
(#f mm dd yyyy)
(map string->number (list yyyy mm dd)))
((rxmatch #/^(\d\d\d\d)\/(\d\d?)\/(\d\d?)$/ str)
(#f yyyy mm dd)
(map string->number (list yyyy mm dd)))
((rxmatch #/^\d+\/\d+\/\d+$/ str)
(#f)
(errorf "ambiguous: ~s" str))
(else (errorf "bogus: ~s" str))))
(parse-date "2001/2/3") ⇒ (2001 2 3)
(parse-date "12/25/1999") ⇒ (1999 12 25)
This macro corresponds to scsh’s match-cond
.
String-expr is evaluated, and clauses are interpreted one by one. A clause may be one of the following pattern.
(re (var …) form …)
Re must be a literal regexp object (see Regular expressions).
If the result of string-expr matches re,
the match result is bound to vars
and forms are evaluated, and rxmatch-case
returns
the result of the last form.
If re doesn’t match the result of string-expr, string-expr yields non-string value, the interpretation proceeds to the next clause.
(test proc form …)
A procedure proc is applied on the result of string-expr.
If it yields true value, forms are evaluated, and
rxmatch-case
returns the result of the last form.
If proc yields #f
, the interpretation proceeds
to the next clause.
(test proc => proc2)
A procedure proc is applied on the result of string-expr.
If it yields true value, proc2 is applied on the result,
and its result is returned as the result of rxmatch-case
.
If proc yields #f
, the interpretation proceeds
to the next clause.
(else form …)
This form must appear at the end of clauses, if any.
If other clauses fail, forms are evaluated,
and the result of the last form becomes the result of
rxmatch-case
.
(else => proc)
This form must appear at the end of clauses, if any.
If other clauses fail, proc is evaluated, which should
yield a procedure taking one argument. The value of string-expr
is passed to proc, and its return values become the
return values of rxmatch-case
.
rx
If no else
clause exists, and all other clause fail,
an undefined value is returned.
The parse-date
example above becomes simpler if you use
rxmatch-case
(define (parse-date2 str) (rxmatch-case str (test (lambda (s) (not (string? s))) #f) (#/^(\d\d?)\/(\d\d?)\/(\d\d\d\d)$/ (#f mm dd yyyy) (map string->number (list yyyy mm dd))) (#/^(\d\d\d\d)\/(\d\d?)\/(\d\d?)$/ (#f yyyy mm dd) (map string->number (list yyyy mm dd))) (#/^\d+\/\d+\/\d+$/ (#f) (errorf "ambiguous: ~s" str)) (else (errorf "bogus: ~s" str))))
When Gauche reads a string representation of regexp, first it parses the string and construct an abstract syntax tree (AST), performs some optimizations on it, then compiles it into an instruction sequence to be executed by the regexp engine.
The following procedures expose this process to user programs. It may be easier for programs to manipulate an AST than a string representation.
Parses a string specification of regexp in string and returns its AST, represented in S-expression. See below for the spec of AST.
When a true value is given to the keyword argument case-fold, returned AST will match case-insensitively. (Case insensitive regexp is handled in parser level, not by the engine).
Parses sre as a Scheme Regular Expression (SRE) as described in
SRFI-115 and returns its AST.
See scheme.regex
- R7RS regular expressions
see scheme.regex
- R7RS regular expressions.
Performs some rudimental optimization on the regexp AST, returning regexp AST.
Currently it only optimizes some trivial cases. The plan is to make it cleverer in future.
Takes a regexp ast and returns a regexp object.
Currently the outermost form of ast must be
the zero-th capturing group. (That is, ast
should have the form (0 #f x …)
.)
The outer grouping is always added
by regexp-parse
to capture the entire regexp.
Note: The function does some basic check to see the given AST is valid, but it may not reject invalid ASTs. In such case, the returned regexp object doesn’t work properly. It is caller’s responsibility to provide a properly constructed AST. (Even if it rejects an AST, error messages are often incomprehensible. So, don’t use this procedure as a AST validness checker.)
Returns AST used for the regexp object regexp.
From the regexp’s ast, reconstruct the string representation
of the regexp. The keyword argument on-error can be
a keyword :error
(default) or #f
. If it’s the former,
an error is signaled when ast isn’t valid regexp AST.
If it’s the latter, regexp-unparse
just returns #f
.
This is the structure of AST. Note that this is originally developed only for internal use, and not very convenient to manipulate from the code (e.g. if you insert or delete a subtree, you have to renumber capturing groups to make them consistent.)
<ast> : <clause> ; special clause | <item> ; matches <item> <item> : <char> ; matches char | <char-set> ; matches char set | (comp . <char-set>) ; matches complement of char set | any ; matches any char | bos | eos ; beginning/end of string assertion | bol | eol ; beginning/end of line assertion | bow | eow | wb | nwb ; word-boundary/negative word boundary assertion | bog | eog ; beginning/end of grapheme assertion <clause> : (seq <ast> ...) ; sequence | (seq-uncase <ast> ...) ; sequence (case insensitive match) | (seq-case <ast> ...) ; sequence (case sensitive match) | (alt <ast> ...) ; alternative | (rep <m> <n> <ast> ...) ; repetition at least <m> up to <n> (greedy) ; <n> may be `#f' | (rep-min <m> <n> <ast> ...) ; repetition at least <m> up to <n> (lazy) ; <n> may be `#f' | (rep-while <m> <n> <ast> ...) ; like rep, but no backtrack | (<integer> <symbol> <ast> ...) ; capturing group. <symbol> may be #f. | (cpat <condition> (<ast> ...) (<ast> ...)) ; conditional expression | (backref . <integer>) ; backreference by group number | (backref . <symbol>) ; backreference by name | (once <ast> ...) ; standalone pattern. no backtrack | (assert . <asst>) ; positive lookahead assertion | (nassert . <asst>) ; negative lookahead assertion <condition> : <integer> ; (?(1)yes|no) style conditional expression | (assert . <asst>) ; (?(?=condition)...) or (?(?<=condition)...) | (nassert . <asst>) ; (?(?!condition)...) or (?(?<!condition)...) <asst> : <ast> ... | ((lookbehind <ast> ...))