scsh-0.5/scsh/rx/todo

89 lines
2.8 KiB
Plaintext

- scsh integration
Affected: fr nawk filemtch glob rdelim re scsh-interfaces scsh-package
- Naming conventions. "re" vs. "regexp", should I have "smart" versions
of make-re-string, etc.
- Remove all "reduce" forms from scsh, replace with foldl, foldr forms.
- Check FPS, network code
- The match fun should allow you to state the beginning of string is not a
real bos & likewise for eos. Similarly for bol & eol.
execution flag:
-- REG_NOTBOL -- beginning of string doesn't count as ^ match.
-- REG_NOTEOL -- end of string doesn't count as $ match.
- Hack awk, expect, chat, dir-match for new regexp system
Current:
(awk (test body ...)
(:range test1 test2 body ...)
(else body ...)
(test => proc)
(test ==> vars body ...))
test ::=
integer
expression
string
New:
(else body ...)
(:range test1 test2 body ...)
(after body ...)
(test => proc)
(test ==> vars body ...)
(test body ...)
test ::= integer | sre | (WHEN exp) | exp
-------------------------------------------------------------------------------
Must disallow, due to Posix' RE_CONTEXT_INVALID_OPS
...^*...
*... ...(*... ...|*...
|... ...| ...|$... ...||... ...(|...
That is:
1. Do simplification below to remove repeats of zero-length matches.
2. An empty elt of a choice renders as ().
3. ...|$... Hack it: If first char of a rendered choice elt is $, prefix
with ().
Simplify ^{0,n} -> ""
^{m,n} -> ^ (0<m<=n)
^{m,n} -> (in) (m>n)
Similarly for bos/eos bol/eol bow/eow ""
Spencer says:
A repetition operator (?, *, +, or bounds) cannot follow
another repetition operator. A repetition operator cannot
begin an expression or subexpression or follow `^' or `|'.
`|' cannot appear first or last in a (sub)expression or
after another `|', i.e. an operand of `|' cannot be an
empty subexpression. An empty parenthesized subexpres-
sion, `()', is legal and matches an empty (sub)string. An
empty string is not a legal RE.
Fix the printer and reader so control chars are printed as
\ddd; do syntax for control-char input
-------------------------------------------------------------------------------
Less important:
- Support for searching vs. matching
- Case-scope hacking (needs s48 0.51 CODE-QUOTE)
- simp caching
- Better char-set->sre renderer
First, bound the cset with tightest possible superset,
then look for negations.
Possible interesting extensions:
- An ADT->DFA compiler
- A DFA->Scheme-code compiler
- An ADT interpreter
- A pattern notation for matching against s-expressions.
This would be handy for specifying the grammar of Scheme macros,
for example.
- Only allocate svec and evec if we match?