44 lines
1.6 KiB
Plaintext
44 lines
1.6 KiB
Plaintext
|
The simplifier produces regexps with some simple invariants:
|
||
|
|
||
|
- DSM's are only top-level, never appearing in the body of a DSM,
|
||
|
repeat, sequence, choice, or submatch.
|
||
|
|
||
|
- A repeat's body is not a repeat, trivial match, or empty match.
|
||
|
|
||
|
- A choice's body contains more than one element; no element is
|
||
|
- a choice,
|
||
|
- a DSM, or
|
||
|
- an empty-match.
|
||
|
|
||
|
- A choice contains 0 or 1 char-set, bos, and eos elements.
|
||
|
|
||
|
- A sequence's body contains more than one element; no element is
|
||
|
- a sequence,
|
||
|
- a DSM,
|
||
|
- a trivial match, or
|
||
|
- an empty-match
|
||
|
|
||
|
- There are no empty matches in the regexp unless the entire regexp
|
||
|
is either an empty match, or a dsm whose body is an empty match.
|
||
|
(This is good, because there is no way to write an empty match
|
||
|
in Posix notation in a char-set independent way -- you have to
|
||
|
use the six-char "[^\000-\177]" for ASCII.)
|
||
|
|
||
|
To see these invariants:
|
||
|
|
||
|
- We can always bubble up empty matches:
|
||
|
- If a sequence has one, the whole sequence is reduced to an empty match.
|
||
|
- They can be deleted from a choice; if the choice reduces to 0 elements,
|
||
|
the choice can be reduced to an empty match.
|
||
|
- A repeat of an empty match is either an empty match or a trivial match,
|
||
|
depending upon whether FROM is >0 or 0, respectively.
|
||
|
- DSM of an empty match: the DSM itself can be bubbled upwards (see below).
|
||
|
|
||
|
- We can always bubble up DSM regexps:
|
||
|
- If an elt of a choice or sequence is a DSM, it can be "absorbed"
|
||
|
into the element's relocation offset.
|
||
|
- Repeat commutes with DSM.
|
||
|
- A DSM body can be "absorbed" into a submatch record by increasing the
|
||
|
submatch's DSM0 count.
|
||
|
- Nested DSM's can be collapsed together.
|