From 34b543229dadc204e5a38eced0a0220c3743ab43 Mon Sep 17 00:00:00 2001 From: olin-shivers Date: Wed, 21 Mar 2001 22:27:41 +0000 Subject: [PATCH] Added a small fix to string-lib.scm. Removed obsolete strings.txt. --- scsh/lib/string-lib.scm | 4 +- scsh/lib/strings.txt | 578 ---------------------------------------- 2 files changed, 2 insertions(+), 580 deletions(-) delete mode 100644 scsh/lib/strings.txt diff --git a/scsh/lib/string-lib.scm b/scsh/lib/string-lib.scm index 242f6be..beedda1 100644 --- a/scsh/lib/string-lib.scm +++ b/scsh/lib/string-lib.scm @@ -1130,7 +1130,7 @@ ;;; string-index-right string char/char-set/pred [start end] ;;; string-skip string char/char-set/pred [start end] ;;; string-skip-right string char/char-set/pred [start end] -;;; string-count char/char-set/pred string [start end] +;;; string-count string char/char-set/pred [start end] ;;; There's a lot of replicated code here for efficiency. ;;; For example, the char/char-set/pred discrimination has ;;; been lifted above the inner loop of each proc. @@ -1220,7 +1220,7 @@ string-skip-right criterion))))) -(define (string-count criterion s . maybe-start+end) +(define (string-count s criterion . maybe-start+end) (let-string-start+end (start end) string-count s maybe-start+end (cond ((char? criterion) (do ((i start (+ i 1)) diff --git a/scsh/lib/strings.txt b/scsh/lib/strings.txt deleted file mode 100644 index fa2d32d..0000000 --- a/scsh/lib/strings.txt +++ /dev/null @@ -1,578 +0,0 @@ -Todo: - parse-start+end parse-final-start+end need "string" in the name - Also, export macro binder. - What's up w/quotient? (quotient -1 3) = 0. - regexp-foldl - type regexp interface - land* - Let-optional: - A let-optional that parses a prefix of the args. - Arg checking forms that get used if it parses, but are not - applied to the default. - -The Scheme Underground string library includes a rich set of operations -for manipulating strings. These are frequently useful for scripting and -other text-manipulation applications. - -The library's design was influenced by the string libraries found in MIT -Scheme, Gambit, RScheme, MzScheme, slib, Common Lisp, Bigloo, guile, APL and -the SML standard basis. Some of the code bears a distant family relation to -the MIT Scheme implementation, and being derived from that code, is covered by -the MIT Scheme copyright (which is a fairly generic "free" copyright -- see -the source file for details). The fast KMP string-search code used in -SUBSTRING? was loosely adapted from old slib code by Stephen Bevan. - -The library has the following design principles: -- *All* procedures involving character comparison are available in - both case-sensitive and case-insensitive forms. - -- *All* functionality is available in substring and full-string forms. - -- The procedures are spec'd so as to permit efficient implementation in a - Scheme that provided shared-text substrings (e.g., guile). This means that - you should not rely on many of the substring-selecting procedures to return - freshly-allocated strings. Careful attention is paid to the issue of which - procedures allocate fresh storage, and which are permitted to return results - that share storage with the arguments. - -- Common Lisp theft: - + inequality functions return mismatch index. - I generalised this so that this "protocol" is extended even to - the equality functions. This means that clients can be handed any generic - string-comparison function and rely on the meaning of the true value. - - + Common Lisp capitalisation definition - -The library addresses some problems with the R5RS string procedures: - - Question marks after string-comparison functions (string=?, etc.) - This is inconsistent with numeric comparison functions, and ugly, too. - - String-comparison functions do not provide useful true value. - - STRING-COPY should have optional start/end args; - SUBSTRING shouldn't specify if it copies or returns shared bits. - - STRING-FILL! and STRING->LIST should take optional start/end args. - - No <> function provided. - -In the following procedure specifications: - - Any S parameter is a string; - - - START and END parameters are half-open string indices specifying - a substring within a string parameter; when optional, they default - to 0 and the length of the string, respectively. When specified, it - must be the case that 0 <= START <= END <= (string-length S), for - the corresponding parameter S. They typically restrict a procedure's - action to the indicated substring. - - - A CHAR/CHAR-SET/PRED parameter is a value used to select/search - for a character in a string. If it is a character, it is used in - an equality test; if it is a character set, it is used as a - membership test; if it is a procedure, it is applied to the - characters as a test predicate. - -This library contains a large number of procedures, but they follow -a consistent naming scheme. The names are composed of smaller lexemes -in a regular way that exposes the structure and relationships between the -procedures. This should help the programmer to recall or reconstitute the name -of the particular procedure that he needs when writing his own code. In -particular - - Procedures whose names end in "-ci" are case-insensitive variants. - - Procedures whose names end in "!" are side-effecting variants. - These procedures generally return an unspecified value. - - The order of common parameters is fairly consistent across the - different procedures. - -For more text-manipulation functionality, see also the regular expression, -file-name, character set, and character->character partial map packages. - -------------------------------------------------------------------------------- -* R4RS/R5RS procedures - -The R4RS and R5RS reports define 22 string procedures. The string-lib -package includes 8 of these exactly as defined, 4 in an extended, -backwards-compatible way, and drops the remaining 10 (whose functionality -is available via other bindings). - -The 8 procedures provided exactly as documented in the reports are - string? - make-string - string - string-length - string-ref - string-set! - string-append - list->string - -The ten functions not included are the R4RS string-comparison functions: - string=? string-ci=? - string? string-ci>? - string<=? string-ci<=? - string>=? string-ci>=? -The string-lib package provides alternate bindings. - -Additionally, the four extended procedures are - - string-fill! s char [start end] -> unspecific - string->list s [start end] -> char-list - substring s start [end] -> string - string-copy s [start end] -> string - -These procedures are documented in the following section. In brief, they are -extended to take optional start/end parameters specifying substring ranges; -Additionally, SUBSTRING is allowed to return a value that shares storage with -its argument. - - -* Procedures - -These procedures are contained in the Scheme 48 package "string-lib", -which is open in the default user package. They are not found in the -"scsh" package; script writers and other programmers that use the Scheme -48 module system must open string-lib explicitly. - -string-map proc s [start end] -> string -string-map! proc s [start end] -> unspecified - PROC is a char->char procedure; it is mapped over S. - Note: no sequence order is specified. - -string-fold kons knil s [start end] -> value -string-fold-right kons knil s [start end] -> value - These are the fundamental iterators for strings. - The left-fold operator maps the KONS procedure across the - string from left to right - (... (kons s[2] (kons s[1] (kons s[0] knil)))) - In other words, string-fold obeys the recursion - (string-fold kons knil s start end) = - (string-fold kons (kons s[start] knil) start+1 end) - - The right-fold operator maps the KONS procedure across the - string from right to left - (kons s[0] (... (kons s[end-3] (kons s[end-2] (kons s[end-1] knil))))) - obeying the recursion - (string-fold-right kons knil s start end) = - (string-fold-right kons (kons s[end-1] knil) start end-1) - - Examples: - To convert a string to a list of chars: - (string-fold-right cons '() s) - - To count the number of lower-case characters in a string: - (string-fold (lambda (c count) - (if (char-set-contains? char-set:lower c) - (+ count 1) - count)) - 0 - s) - -string-unfold p f g seed -> string - This is the fundamental constructor for strings. - - G is used to generate a series of "seed" values from the initial seed: - SEED, (G SEED), (G^2 SEED), (G^3 SEED), ... - - P tells us when to stop -- when it returns true when applied to one - of these seed values. - - F maps each seed value to the corresponding character - in the result string. - - More precisely, the following (simple, inefficient) definition holds: - (define (string-unfold p f g seed) - (if (p seed) "" - (string-append (string (f seed)) - (string-unfold p f g (g seed))))) - - STRING-UNFOLD is a fairly powerful constructor -- you can use it to - reverse a string, copy a string, convert a list to a string, read - a port into a string, and so forth. Examples: - (port->string p) = (string-unfold eof-object? values - (lambda (x) (read-char p)) - (read-char p)) - - (list->string lis) = (string-unfold null? car cdr lis) - - (tabulate-string f size) = (string-unfold (lambda (i) (= i size)) f add1 0) - - To map F over a list LIS, producing a string: - (string-unfold null? (compose f car) cdr lis) - -string-tabulate proc len -> string - PROC is an integer->char procedure. Construct a string of size LEN - by applying PROC to each index to produce the corresponding string - element. The order in which PROC is applied to the indices is not - specified. - -string-for-each proc s [start end] -> unspecified -string-iter proc s [start end] -> unspecified - Apply PROC to each character in S. - STRING-FOR-EACH has no specified iteration order. - STRING-ITER is required to iterate from START to END - in increasing order. - -string-every? pred s [start end] -> boolean -string-any? pred s [start end] -> value - Note: no sequence order specified. - Checks to see if predicate PRED is true of every / any character in S. - STRING-ANY? is witness-generating -- it applies PRED to the elements - of S, returning the first true value it finds, otherwise false. - -string-compare s1 s2 lt-proc eq-proc gt-proc -> values -string-compare-ci s1 s2 lt-proc eq-proc gt-proc -> values - Apply LT-PROC, EQ-PROC, GT-PROC to the mismatch index, depending - upon whether S1 is less than, equal to, or greater than S2. - The "mismatch index" is the largest index i such that for - every 0 <= j < i, s1[j] = s2[j] -- that is, I is the first - position that doesn't match. If S1 = S2, the mismatch index - is simply the length of the strings; we observe the protocol - in this redundant case for uniformity. - -substring-compare s1 start1 end1 s2 start2 end2 lt-proc eq-proc gt-proc -> values -substring-compare-ci s1 start1 end1 s2 start2 end2 lt-proc eq-proc gt-proc -> values - The continuation procedures are applied to S1's mismatch index (as defined - above). In the case of EQ-PROC, this is always END1. - -string= s1 s2 -> #f or integer -string<> s1 s2 -> #f or integer -string< s1 s2 -> #f or integer -string> s1 s2 -> #f or integer -string<= s1 s2 -> #f or integer -string>= s1 s2 -> #f or integer - If the comparison operation is true, the function returns the - mismatch index (as defined for the previous comparator functions). - -string-ci= s1 s2 -> #f or integer -string-ci<> s1 s2 -> #f or integer -string-ci< s1 s2 -> #f or integer -string-ci> s1 s2 -> #f or integer -string-ci<= s1 s2 -> #f or integer -string-ci>= s1 s2 -> #f or integer - Case-insensitive variants. - -substring= s1 start1 end1 s2 start2 end2 -> #f or integer -substring<> s1 start1 end1 s2 start2 end2 -> #f or integer -substring< s1 start1 end1 s2 start2 end2 -> #f or integer -substring> s1 start1 end1 s2 start2 end2 -> #f or integer -substring<= s1 start1 end1 s2 start2 end2 -> #f or integer -substring>= s1 start1 end1 s2 start2 end2 -> #f or integer - -substring-ci= s1 start1 end1 s2 start2 end2 -> #f or integer -substring-ci<> s1 start1 end1 s2 start2 end2 -> #f or integer -substring-ci< s1 start1 end1 s2 start2 end2 -> #f or integer -substring-ci> s1 start1 end1 s2 start2 end2 -> #f or integer -substring-ci<= s1 start1 end1 s2 start2 end2 -> #f or integer -substring-ci>= s1 start1 end1 s2 start2 end2 -> #f or integer - These variants restrict the comparison to the indicated - substrings of S1 and S2. - -string-upper-case? s [start end] -> boolean -string-lower-case? s [start end] -> boolean - STRING-UPPER-CASE? returns true iff the string contains - no lower-case characters. STRING-LOWER-CASE returns true - iff the string contains no upper-case characters. - (string-upper-case? "") => #t - (string-lower-case? "") => #t - (string-upper-case? "FOOb") => #f - (string-upper-case? "U.S.A.") => #t - -capitalize-string s [start end] -> string -capitalize-string! s [start end] -> unspecified - Capitalize the string: upcase the first alphanumeric character, - and downcase the rest of the string. CAPITALIZE-STRING returns - a freshly allocated string. - - (capitalize-string "--capitalize tHIS sentence.") => - "--Capitalize this sentence." - - (capitalize-string "see Spot run. see Nix run.") => - "See spot run. see nix run." - - (capitalize-string "3com makes routers.") => - "3com makes routers." - -capitalize-words s [start end] -> string -capitalize-words! s [start end] -> unspecified - A "word" is a maximal contiguous sequence of alphanumeric characters. - Upcase the first character of every word; downcase the rest of the word. - CAPITALIZE-WORDS returns a freshly allocated string. - - (capitalize-words "HELLO, 3THErE, my nAME IS olin") => - "Hello, 3there, My Name Is Olin" - - More sophisticated capitalisation procedures can be synthesized - using CAPITALIZE-STRING and pattern matchers. In this context, - the REGEXP-SUBSTITUTE/GLOBAL procedure may be useful for picking - out the units to be capitalised and applying CAPITALIZE-STRING to - their components. - -string-upcase s [start end] -> string -string-upcase! s [start end] -> unspecified -string-downcase s [start end] -> string -string-downcase! s [start end] -> unspecified - Raise or lower the case of the alphabetic characters in the string. - STRING-UPCASE and STRING-DOWNCASE return freshly allocated strings. - -string-take s nchars -> string -string-drop s nchars -> string -string-take-right s nchars -> string -string-drop-right s nchars -> string - STRING-TAKE returns the first NCHARS of STRING; - STRING-DROP returns all but the first NCHARS of STRING. - STRING-TAKE-RIGHT returns the last NCHARS of STRING; - STRING-DROP-RIGHT returns all but the last NCHARS of STRING. - These generalise MIT Scheme's HEAD & TAIL functions. - If these procedures produce the entire string, they may return either - S or a copy of S; in some implementations, proper substrings may share - memory with S. - -string-pad s k [char start end] -> string -string-pad-right s k [char start end] -> string - Build a string of length K comprised of S padded on the left (right) - by as many occurences of the character CHAR as needed. If S has more - than K chars, it is truncated on the left (right) to length k. CHAR - defaults to #\space. - - If K is exactly the length of S, these functions may return - either S or a copy of S. - -string-trim s [char/char-set/pred start end] -> string -string-trim-right s [char/char-set/pred start end] -> string -string-trim-both s [char/char-set/pred start end] -> string - Trim S by skipping over all characters on the left / on the right / - on both sides that satisfy the second parameter CHAR/CHAR-SET/PRED: - - If it is a character CHAR, characters equal to CHAR are trimmed. - - If it is a char set CHAR-SET, characters contained in CHAR-SET - are trimmed. - - If it is a predicate PRED, it is a test predicate that is applied - to the characters in S; a character causing it to return true - is skipped. - CHAR/CHAR/SET-PRED defaults to CHAR-SET:WHITESPACE. - - If no trimming occurs, these functions may return either S or a copy of S; - in some implementations, proper substrings may share memory with S. - - (string-trim-both " The outlook wasn't brilliant, \n\r") - => "The outlook wasn't brilliant," - -string-filter s char/char-set/pred [start end] -> string -string-delete s char/char-set/pred [start end] -> string - Filter the string S, retaining only those characters that - satisfy / do not satisfy the CHAR/CHAR-SET/PRED argument. If - this argument is a procedure, it is applied to the character - as a predicate; if it is a char-set, the character is tested - for membership; if it is a character, it is used in an equality test. - - If the string is unaltered by the filtering operation, these - functions may return either S or a copy of S. - -string-index s char/char-set/pred [start end] -> integer or #f -string-index-right s char/char-set/pred [end start] -> integer or #f -string-skip s char/char-set/pred [start end] -> integer or #f -string-skip-right s char/char-set/pred [end start] -> integer or #f - Note the inverted start/end ordering of index-right and skip-right's - parameters. - - Index (index-right) searches through the string from the left (right), - returning the index of the first occurence of a character which - - equals CHAR/CHAR-SET/PRED (if it is a character); - - is in CHAR/CHAR-SET/PRED (if it is a char-set); - - satisfies the predicate CHAR/CHAR-SET/PRED (if it is a procedure). - If no match is found, the functions return false. - - The skip functions are similar, but use the complement of the criteria: - they search for the first char that *doesn't* satisfy the test. E.g., - to skip over initial whitespace, say - (cond ((string-skip s char-set:whitespace) => - (lambda (i) - ;; (string-ref s i) is not whitespace. - ...))) - -string-prefix-count s1 s2 -> integer -string-suffix-count s1 s2 -> integer -string-prefix-count-ci s1 s2 -> integer -string-suffix-count-ci s1 s2 -> integer - Return the length of the longest common prefix/suffix of the two strings. - This is equivalent to the "mismatch index" for the strings. - -substring-prefix-count s1 start1 end1 s2 start2 end2 -> integer -substring-suffix-count s1 start1 end1 s2 start2 end2 -> integer -substring-prefix-count-ci s1 start1 end1 s2 start2 end2 -> integer -substring-suffix-count-ci s1 start1 end1 s2 start2 end2 -> integer - Substring variants. - -string-prefix? s1 s2 -> boolean -string-suffix? s1 s2 -> boolean -string-prefix-ci? s1 s2 -> boolean -string-suffix-ci? s1 s2 -> boolean - Is S1 a prefix/suffix of S2? - -substring-prefix? s1 start1 end1 s2 start2 end2 -> boolean -substring-suffix? s1 start1 end1 s2 start2 end2 -> boolean -substring-prefix-ci? s1 start1 end1 s2 start2 end2 -> boolean -substring-suffix-ci? s1 start1 end1 s2 start2 end2 -> boolean - Substring variants. - -substring? s1 s2 [start end] -> integer or false -substring-ci? s1 s2 [start end] -> integer or false - Return the index in S2 where S1 occurs as a substring, or false. - The returned index is in the range [start,end). - The current implementation uses the Knuth-Morris-Pratt algorithm. - -string-fill! s char [start end] -> unspecified - Store CHAR into the elements of S. - This is the R4RS procedure extended to have optional START/END parameters. - -string-copy! target tstart s [start end] -> unspecified - Copy the sequence of characters from index range [START,END) in - string S to string TARGET, beginning at index TSTART. The characters - are copied left-to-right or right-to-left as needed -- the copy is - guaranteed to work, even if TARGET and S are the same string. - -substring s start [end] -> string -string-copy s [start end] -> string - These R4RS procedures are extended to have optional START/END parameters. - Use STRING-COPY when you want to indicate explicitly in your code that you - wish to allocate new storage; use SUBSTRING when you don't care if you - get a fresh copy or share storage with the original string. - E.g.: - (string-copy "Beta substitution") => "Beta substitution" - (string-copy "Beta substitution" 1 10) - => "eta subst" - (string-copy "Beta substitution" 5) => "substitution" - - SUBSTRING may return a value with shares memory with S. - -string-reverse s [start end] -> string -string-reverse! s [start end] -> unspecific - Reverse the string. - -reverse-list->string char-list -> string - An efficient implementation of (compose string->list reverse): - (reverse-list->string '(#\a #\B #\c)) -> "cBa" - This is a common idiom in the epilog of string-processing loops - that accumulate an answer in a reverse-order list. - -string-concat string-list -> string - Append the elements of STRING-LIST together into a single list. - Guaranteed to return a freshly allocated list. Appears sufficiently - often as to warrant being named. - -string-concat/shared string-list -> string -string-append/shared s ... -> string - These two procedures are variants of STRING-CONCAT and STRING-APPEND - that are permitted to return results that share storage with their - parameters. In particular, if STRING-APPEND/SHARED is applied to just - one argument, it may return exactly that argument, whereas STRING-APPEND - is required to allocate a fresh string. - -string->list s [start end] -> char-list - The R5RS STRING->LIST procedure is extended to take optional START/END - arguments. - -string-null? s -> bool - Is S the empty string? - -xsubstring s from [to start end] -> string - This is the "extended substring" procedure that implements replicated - copying of a substring of some string. - - S is a string; START and END are optional arguments that demarcate - a substring of S, defaulting to 0 and the length of S (e.g., the whole - string). Replicate this substring up and down index space, in both the - positive and negative directions. For example, if S = "abcdefg", START=3, - and END=6, then we have the conceptual bidirectionally-infinite string - ... d e f d e f d e f d e f d e f d e f d e f ... - ... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 ... - XSUBSTRING returns the substring of this string beginning at index FROM, - and ending at TO (which defaults to FROM+(END-START)). - - You can use XSUBSTRING to perform a variety of tasks: - - To rotate a string left: (xsubstring "abcdef" 2) => "cdefab" - - To rotate a string right: (xsubstring "abcdef" -2) => "efabcd" - - To replicate a string: (xsubstring "abc" 0 7) => "abcabca" - - Note that - - The FROM/TO indices give a half-open range -- the characters from - index FROM up to, but not including, index TO. - - The FROM/TO indices are not in terms of the index space for string S. - They are in terms of the replicated index space of the substring - defined by S, START, and END. - - It is an error if START=END -- although this is allowed by special - dispensation when FROM=TO. - -string-xcopy! target tstart s sfrom [sto start end] -> unspecific - Exactly the same as XSUBSTRING, but the extracted text is written - into the string TARGET starting at index TSTART. - This operation is not defined if (EQ? TARGET S) -- you cannot copy - a string on top of itself. - - -* Lower-level procedures - -The following procedures are useful for writing other string-processing -functions, and are contained in the string-lib-internals package. - -parse-start+end proc s args -> [start end rest] -parse-final-start+end proc s args -> [start end] - PARSE-START+END may be used to parse a pair of optional START/END arguments - from an argument list, defaulting them to 0 and the length of some string - S, respectively. Let the length of string S be SLEN. - - If ARGS = (), the function returns (values 0 slen '()) - - If ARGS = (i), I is checked to ensure it is an integer, and - that 0 <= i <= slen. Returns (values i slen (cdr rest)). - - If ARGS = (i j ...), I and J are checked to ensure they are - integers, and that 0 <= i <= j <= slen. Returns (values i j (cddr rest)). - If any of the checks fail, an error condition is raised, and PROC is used - as part of the error condition -- it should be the name of the client - procedure whose argument list PARSE-START+END is parsing. - - parse-final-start+end is exactly the same, except that the args list - passed to it is required to be of length two or less; if it is longer, - an error condition is raised. It may be used when the optional START/END - parameters are final arguments to the procedure. - -check-substring-spec proc s start end -> unspecific - Check values START and END to ensure they specify a valid substring - in S. This means that START and END are exact integers, and - 0 <= START <= END <= (STRING-LENGTH S) - If this is not the case, an error condition is raised. PROC is used - as part of error condition, and should be the procedure whose START/END - parameters we are checking. - -make-kmp-restart-vector s c= -> vector - Build the Knuth-Morris-Pratt "restart vector," which is useful - for quickly searching character sequences for the occurrence of - string S. C= is a character-equality function used to construct - the restart vector; it is usefully CHAR=? or CHAR-CI=?. - - The definition of the restart vector RV for string S is: - If we have matched chars 0..i-1 of S against some search string SS, and - S[i] doesn't match SS[k], then reset i := RV[i], and try again to - match SS[k]. If RV[i] = -1, then punt SS[k] completely, and move on to - SS[k+1] and S[0]. - - In other words, if you have matched the first i chars of S, but - the i+1'th char doesn't match, RV[i] tells you what the next-longest - prefix of PATTERN is that you have matched. - - The following string-search function shows how a restart vector - is used to search. It can be easily adapted to search other character - sequences (such as ports). - - (define (find-substring pattern source start end) - (let ((plen (string-length pattern)) - (rv (make-kmp-restart-vector pattern char=?))) - - ;; The search loop. SJ & PJ are redundant state. - (let lp ((si start) (pi 0) - (sj (- end start)) ; (- end si) -- how many chars left. - (pj plen)) ; (- plen pi) -- how many chars left. - - (if (= pi plen) (- si plen) ; Win. - - (and (<= pj sj) ; Lose. - - (if (char=? (string-ref source si) ; Search. - (string-ref pattern pi)) - (lp (+ 1 si) (+ 1 pi) (- sj 1) (- pj 1)) ; Advance. - - (let ((pi (vector-ref rv pi))) ; Retreat. - (if (= pi -1) - (lp (+ si 1) 0 (- sj 1) plen) ; Punt. - (lp si pi sj (- plen pi))))))))))