- scsh integration Affected: fr nawk filemtch glob rdelim re scsh-interfaces scsh-package - Naming conventions. "re" vs. "regexp", should I have "smart" versions of make-re-string, etc. - Remove all "reduce" forms from scsh, replace with foldl, foldr forms. - Check FPS, network code - The match fun should allow you to state the beginning of string is not a real bos & likewise for eos. Similarly for bol & eol. execution flag: -- REG_NOTBOL -- beginning of string doesn't count as ^ match. -- REG_NOTEOL -- end of string doesn't count as $ match. - Hack awk, expect, chat, dir-match for new regexp system Current: (awk (test body ...) (:range test1 test2 body ...) (else body ...) (test => proc) (test ==> vars body ...)) test ::= integer expression string New: (else body ...) (:range test1 test2 body ...) (after body ...) (test => proc) (test ==> vars body ...) (test body ...) test ::= integer | sre | (WHEN exp) | exp ------------------------------------------------------------------------------- Must disallow, due to Posix' RE_CONTEXT_INVALID_OPS ...^*... *... ...(*... ...|*... |... ...| ...|$... ...||... ...(|... That is: 1. Do simplification below to remove repeats of zero-length matches. 2. An empty elt of a choice renders as (). 3. ...|$... Hack it: If first char of a rendered choice elt is $, prefix with (). Simplify ^{0,n} -> "" ^{m,n} -> ^ (0 (in) (m>n) Similarly for bos/eos bol/eol bow/eow "" Spencer says: A repetition operator (?, *, +, or bounds) cannot follow another repetition operator. A repetition operator cannot begin an expression or subexpression or follow `^' or `|'. `|' cannot appear first or last in a (sub)expression or after another `|', i.e. an operand of `|' cannot be an empty subexpression. An empty parenthesized subexpres- sion, `()', is legal and matches an empty (sub)string. An empty string is not a legal RE. Fix the printer and reader so control chars are printed as \ddd; do syntax for control-char input ------------------------------------------------------------------------------- Less important: - Support for searching vs. matching - Case-scope hacking (needs s48 0.51 CODE-QUOTE) - simp caching - Better char-set->sre renderer First, bound the cset with tightest possible superset, then look for negations. Possible interesting extensions: - An ADT->DFA compiler - A DFA->Scheme-code compiler - An ADT interpreter - A pattern notation for matching against s-expressions. This would be handy for specifying the grammar of Scheme macros, for example. - Only allocate svec and evec if we match?