Added a small fix to string-lib.scm. Removed the obsolete strings.txt.
This commit is contained in:
parent
1b31a9f8f1
commit
e18289f61c
File diff suppressed because it is too large
Load Diff
|
@ -1,578 +0,0 @@
|
|||
Todo:
|
||||
parse-start+end parse-final-start+end need "string" in the name
|
||||
Also, export macro binder.
|
||||
What's up w/quotient? (quotient -1 3) = 0.
|
||||
regexp-foldl
|
||||
type regexp interface
|
||||
land*
|
||||
Let-optional:
|
||||
A let-optional that parses a prefix of the args.
|
||||
Arg checking forms that get used if it parses, but are not
|
||||
applied to the default.
|
||||
|
||||
The Scheme Underground string library includes a rich set of operations
|
||||
for manipulating strings. These are frequently useful for scripting and
|
||||
other text-manipulation applications.
|
||||
|
||||
The library's design was influenced by the string libraries found in MIT
|
||||
Scheme, Gambit, RScheme, MzScheme, slib, Common Lisp, Bigloo, guile, APL and
|
||||
the SML standard basis. Some of the code bears a distant family relation to
|
||||
the MIT Scheme implementation, and being derived from that code, is covered by
|
||||
the MIT Scheme copyright (which is a fairly generic "free" copyright -- see
|
||||
the source file for details). The fast KMP string-search code used in
|
||||
SUBSTRING? was loosely adapted from old slib code by Stephen Bevan.
|
||||
|
||||
The library has the following design principles:
|
||||
- *All* procedures involving character comparison are available in
|
||||
both case-sensitive and case-insensitive forms.
|
||||
|
||||
- *All* functionality is available in substring and full-string forms.
|
||||
|
||||
- The procedures are spec'd so as to permit efficient implementation in a
|
||||
Scheme that provided shared-text substrings (e.g., guile). This means that
|
||||
you should not rely on many of the substring-selecting procedures to return
|
||||
freshly-allocated strings. Careful attention is paid to the issue of which
|
||||
procedures allocate fresh storage, and which are permitted to return results
|
||||
that share storage with the arguments.
|
||||
|
||||
- Common Lisp theft:
|
||||
+ inequality functions return mismatch index.
|
||||
I generalised this so that this "protocol" is extended even to
|
||||
the equality functions. This means that clients can be handed any generic
|
||||
string-comparison function and rely on the meaning of the true value.
|
||||
|
||||
+ Common Lisp capitalisation definition
|
||||
|
||||
The library addresses some problems with the R5RS string procedures:
|
||||
- Question marks after string-comparison functions (string=?, etc.)
|
||||
This is inconsistent with numeric comparison functions, and ugly, too.
|
||||
- String-comparison functions do not provide useful true value.
|
||||
- STRING-COPY should have optional start/end args;
|
||||
SUBSTRING shouldn't specify if it copies or returns shared bits.
|
||||
- STRING-FILL! and STRING->LIST should take optional start/end args.
|
||||
- No <> function provided.
|
||||
|
||||
In the following procedure specifications:
|
||||
- Any S parameter is a string;
|
||||
|
||||
- START and END parameters are half-open string indices specifying
|
||||
a substring within a string parameter; when optional, they default
|
||||
to 0 and the length of the string, respectively. When specified, it
|
||||
must be the case that 0 <= START <= END <= (string-length S), for
|
||||
the corresponding parameter S. They typically restrict a procedure's
|
||||
action to the indicated substring.
|
||||
|
||||
- A CHAR/CHAR-SET/PRED parameter is a value used to select/search
|
||||
for a character in a string. If it is a character, it is used in
|
||||
an equality test; if it is a character set, it is used as a
|
||||
membership test; if it is a procedure, it is applied to the
|
||||
characters as a test predicate.
|
||||
|
||||
This library contains a large number of procedures, but they follow
|
||||
a consistent naming scheme. The names are composed of smaller lexemes
|
||||
in a regular way that exposes the structure and relationships between the
|
||||
procedures. This should help the programmer to recall or reconstitute the name
|
||||
of the particular procedure that he needs when writing his own code. In
|
||||
particular
|
||||
- Procedures whose names end in "-ci" are case-insensitive variants.
|
||||
- Procedures whose names end in "!" are side-effecting variants.
|
||||
These procedures generally return an unspecified value.
|
||||
- The order of common parameters is fairly consistent across the
|
||||
different procedures.
|
||||
|
||||
For more text-manipulation functionality, see also the regular expression,
|
||||
file-name, character set, and character->character partial map packages.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
* R4RS/R5RS procedures
|
||||
|
||||
The R4RS and R5RS reports define 22 string procedures. The string-lib
|
||||
package includes 8 of these exactly as defined, 4 in an extended,
|
||||
backwards-compatible way, and drops the remaining 10 (whose functionality
|
||||
is available via other bindings).
|
||||
|
||||
The 8 procedures provided exactly as documented in the reports are
|
||||
string?
|
||||
make-string
|
||||
string
|
||||
string-length
|
||||
string-ref
|
||||
string-set!
|
||||
string-append
|
||||
list->string
|
||||
|
||||
The ten functions not included are the R4RS string-comparison functions:
|
||||
string=? string-ci=?
|
||||
string<? string-ci<?
|
||||
string>? string-ci>?
|
||||
string<=? string-ci<=?
|
||||
string>=? string-ci>=?
|
||||
The string-lib package provides alternate bindings.
|
||||
|
||||
Additionally, the four extended procedures are
|
||||
|
||||
string-fill! s char [start end] -> unspecific
|
||||
string->list s [start end] -> char-list
|
||||
substring s start [end] -> string
|
||||
string-copy s [start end] -> string
|
||||
|
||||
These procedures are documented in the following section. In brief, they are
|
||||
extended to take optional start/end parameters specifying substring ranges;
|
||||
Additionally, SUBSTRING is allowed to return a value that shares storage with
|
||||
its argument.
|
||||
|
||||
|
||||
* Procedures
|
||||
|
||||
These procedures are contained in the Scheme 48 package "string-lib",
|
||||
which is open in the default user package. They are not found in the
|
||||
"scsh" package; script writers and other programmers that use the Scheme
|
||||
48 module system must open string-lib explicitly.
|
||||
|
||||
string-map proc s [start end] -> string
|
||||
string-map! proc s [start end] -> unspecified
|
||||
PROC is a char->char procedure; it is mapped over S.
|
||||
Note: no sequence order is specified.
|
||||
|
||||
string-fold kons knil s [start end] -> value
|
||||
string-fold-right kons knil s [start end] -> value
|
||||
These are the fundamental iterators for strings.
|
||||
The left-fold operator maps the KONS procedure across the
|
||||
string from left to right
|
||||
(... (kons s[2] (kons s[1] (kons s[0] knil))))
|
||||
In other words, string-fold obeys the recursion
|
||||
(string-fold kons knil s start end) =
|
||||
(string-fold kons (kons s[start] knil) start+1 end)
|
||||
|
||||
The right-fold operator maps the KONS procedure across the
|
||||
string from right to left
|
||||
(kons s[0] (... (kons s[end-3] (kons s[end-2] (kons s[end-1] knil)))))
|
||||
obeying the recursion
|
||||
(string-fold-right kons knil s start end) =
|
||||
(string-fold-right kons (kons s[end-1] knil) start end-1)
|
||||
|
||||
Examples:
|
||||
To convert a string to a list of chars:
|
||||
(string-fold-right cons '() s)
|
||||
|
||||
To count the number of lower-case characters in a string:
|
||||
(string-fold (lambda (c count)
|
||||
(if (char-set-contains? char-set:lower c)
|
||||
(+ count 1)
|
||||
count))
|
||||
0
|
||||
s)
|
||||
|
||||
string-unfold p f g seed -> string
|
||||
This is the fundamental constructor for strings.
|
||||
- G is used to generate a series of "seed" values from the initial seed:
|
||||
SEED, (G SEED), (G^2 SEED), (G^3 SEED), ...
|
||||
- P tells us when to stop -- when it returns true when applied to one
|
||||
of these seed values.
|
||||
- F maps each seed value to the corresponding character
|
||||
in the result string.
|
||||
|
||||
More precisely, the following (simple, inefficient) definition holds:
|
||||
(define (string-unfold p f g seed)
|
||||
(if (p seed) ""
|
||||
(string-append (string (f seed))
|
||||
(string-unfold p f g (g seed)))))
|
||||
|
||||
STRING-UNFOLD is a fairly powerful constructor -- you can use it to
|
||||
reverse a string, copy a string, convert a list to a string, read
|
||||
a port into a string, and so forth. Examples:
|
||||
(port->string p) = (string-unfold eof-object? values
|
||||
(lambda (x) (read-char p))
|
||||
(read-char p))
|
||||
|
||||
(list->string lis) = (string-unfold null? car cdr lis)
|
||||
|
||||
(tabulate-string f size) = (string-unfold (lambda (i) (= i size)) f add1 0)
|
||||
|
||||
To map F over a list LIS, producing a string:
|
||||
(string-unfold null? (compose f car) cdr lis)
|
||||
|
||||
string-tabulate proc len -> string
|
||||
PROC is an integer->char procedure. Construct a string of size LEN
|
||||
by applying PROC to each index to produce the corresponding string
|
||||
element. The order in which PROC is applied to the indices is not
|
||||
specified.
|
||||
|
||||
string-for-each proc s [start end] -> unspecified
|
||||
string-iter proc s [start end] -> unspecified
|
||||
Apply PROC to each character in S.
|
||||
STRING-FOR-EACH has no specified iteration order.
|
||||
STRING-ITER is required to iterate from START to END
|
||||
in increasing order.
|
||||
|
||||
string-every? pred s [start end] -> boolean
|
||||
string-any? pred s [start end] -> value
|
||||
Note: no sequence order specified.
|
||||
Checks to see if predicate PRED is true of every / any character in S.
|
||||
STRING-ANY? is witness-generating -- it applies PRED to the elements
|
||||
of S, returning the first true value it finds, otherwise false.
|
||||
|
||||
string-compare s1 s2 lt-proc eq-proc gt-proc -> values
|
||||
string-compare-ci s1 s2 lt-proc eq-proc gt-proc -> values
|
||||
Apply LT-PROC, EQ-PROC, GT-PROC to the mismatch index, depending
|
||||
upon whether S1 is less than, equal to, or greater than S2.
|
||||
The "mismatch index" is the largest index i such that for
|
||||
every 0 <= j < i, s1[j] = s2[j] -- that is, I is the first
|
||||
position that doesn't match. If S1 = S2, the mismatch index
|
||||
is simply the length of the strings; we observe the protocol
|
||||
in this redundant case for uniformity.
|
||||
|
||||
substring-compare s1 start1 end1 s2 start2 end2 lt-proc eq-proc gt-proc -> values
|
||||
substring-compare-ci s1 start1 end1 s2 start2 end2 lt-proc eq-proc gt-proc -> values
|
||||
The continuation procedures are applied to S1's mismatch index (as defined
|
||||
above). In the case of EQ-PROC, this is always END1.
|
||||
|
||||
string= s1 s2 -> #f or integer
|
||||
string<> s1 s2 -> #f or integer
|
||||
string< s1 s2 -> #f or integer
|
||||
string> s1 s2 -> #f or integer
|
||||
string<= s1 s2 -> #f or integer
|
||||
string>= s1 s2 -> #f or integer
|
||||
If the comparison operation is true, the function returns the
|
||||
mismatch index (as defined for the previous comparator functions).
|
||||
|
||||
string-ci= s1 s2 -> #f or integer
|
||||
string-ci<> s1 s2 -> #f or integer
|
||||
string-ci< s1 s2 -> #f or integer
|
||||
string-ci> s1 s2 -> #f or integer
|
||||
string-ci<= s1 s2 -> #f or integer
|
||||
string-ci>= s1 s2 -> #f or integer
|
||||
Case-insensitive variants.
|
||||
|
||||
substring= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring<> s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring< s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring> s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring<= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring>= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
|
||||
substring-ci= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring-ci<> s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring-ci< s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring-ci> s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring-ci<= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
substring-ci>= s1 start1 end1 s2 start2 end2 -> #f or integer
|
||||
These variants restrict the comparison to the indicated
|
||||
substrings of S1 and S2.
|
||||
|
||||
string-upper-case? s [start end] -> boolean
|
||||
string-lower-case? s [start end] -> boolean
|
||||
STRING-UPPER-CASE? returns true iff the string contains
|
||||
no lower-case characters. STRING-LOWER-CASE returns true
|
||||
iff the string contains no upper-case characters.
|
||||
(string-upper-case? "") => #t
|
||||
(string-lower-case? "") => #t
|
||||
(string-upper-case? "FOOb") => #f
|
||||
(string-upper-case? "U.S.A.") => #t
|
||||
|
||||
capitalize-string s [start end] -> string
|
||||
capitalize-string! s [start end] -> unspecified
|
||||
Capitalize the string: upcase the first alphanumeric character,
|
||||
and downcase the rest of the string. CAPITALIZE-STRING returns
|
||||
a freshly allocated string.
|
||||
|
||||
(capitalize-string "--capitalize tHIS sentence.") =>
|
||||
"--Capitalize this sentence."
|
||||
|
||||
(capitalize-string "see Spot run. see Nix run.") =>
|
||||
"See spot run. see nix run."
|
||||
|
||||
(capitalize-string "3com makes routers.") =>
|
||||
"3com makes routers."
|
||||
|
||||
capitalize-words s [start end] -> string
|
||||
capitalize-words! s [start end] -> unspecified
|
||||
A "word" is a maximal contiguous sequence of alphanumeric characters.
|
||||
Upcase the first character of every word; downcase the rest of the word.
|
||||
CAPITALIZE-WORDS returns a freshly allocated string.
|
||||
|
||||
(capitalize-words "HELLO, 3THErE, my nAME IS olin") =>
|
||||
"Hello, 3there, My Name Is Olin"
|
||||
|
||||
More sophisticated capitalisation procedures can be synthesized
|
||||
using CAPITALIZE-STRING and pattern matchers. In this context,
|
||||
the REGEXP-SUBSTITUTE/GLOBAL procedure may be useful for picking
|
||||
out the units to be capitalised and applying CAPITALIZE-STRING to
|
||||
their components.
|
||||
|
||||
string-upcase s [start end] -> string
|
||||
string-upcase! s [start end] -> unspecified
|
||||
string-downcase s [start end] -> string
|
||||
string-downcase! s [start end] -> unspecified
|
||||
Raise or lower the case of the alphabetic characters in the string.
|
||||
STRING-UPCASE and STRING-DOWNCASE return freshly allocated strings.
|
||||
|
||||
string-take s nchars -> string
|
||||
string-drop s nchars -> string
|
||||
string-take-right s nchars -> string
|
||||
string-drop-right s nchars -> string
|
||||
STRING-TAKE returns the first NCHARS of STRING;
|
||||
STRING-DROP returns all but the first NCHARS of STRING.
|
||||
STRING-TAKE-RIGHT returns the last NCHARS of STRING;
|
||||
STRING-DROP-RIGHT returns all but the last NCHARS of STRING.
|
||||
These generalise MIT Scheme's HEAD & TAIL functions.
|
||||
If these procedures produce the entire string, they may return either
|
||||
S or a copy of S; in some implementations, proper substrings may share
|
||||
memory with S.
|
||||
|
||||
string-pad s k [char start end] -> string
|
||||
string-pad-right s k [char start end] -> string
|
||||
Build a string of length K comprised of S padded on the left (right)
|
||||
by as many occurences of the character CHAR as needed. If S has more
|
||||
than K chars, it is truncated on the left (right) to length k. CHAR
|
||||
defaults to #\space.
|
||||
|
||||
If K is exactly the length of S, these functions may return
|
||||
either S or a copy of S.
|
||||
|
||||
string-trim s [char/char-set/pred start end] -> string
|
||||
string-trim-right s [char/char-set/pred start end] -> string
|
||||
string-trim-both s [char/char-set/pred start end] -> string
|
||||
Trim S by skipping over all characters on the left / on the right /
|
||||
on both sides that satisfy the second parameter CHAR/CHAR-SET/PRED:
|
||||
- If it is a character CHAR, characters equal to CHAR are trimmed.
|
||||
- If it is a char set CHAR-SET, characters contained in CHAR-SET
|
||||
are trimmed.
|
||||
- If it is a predicate PRED, it is a test predicate that is applied
|
||||
to the characters in S; a character causing it to return true
|
||||
is skipped.
|
||||
CHAR/CHAR/SET-PRED defaults to CHAR-SET:WHITESPACE.
|
||||
|
||||
If no trimming occurs, these functions may return either S or a copy of S;
|
||||
in some implementations, proper substrings may share memory with S.
|
||||
|
||||
(string-trim-both " The outlook wasn't brilliant, \n\r")
|
||||
=> "The outlook wasn't brilliant,"
|
||||
|
||||
string-filter s char/char-set/pred [start end] -> string
|
||||
string-delete s char/char-set/pred [start end] -> string
|
||||
Filter the string S, retaining only those characters that
|
||||
satisfy / do not satisfy the CHAR/CHAR-SET/PRED argument. If
|
||||
this argument is a procedure, it is applied to the character
|
||||
as a predicate; if it is a char-set, the character is tested
|
||||
for membership; if it is a character, it is used in an equality test.
|
||||
|
||||
If the string is unaltered by the filtering operation, these
|
||||
functions may return either S or a copy of S.
|
||||
|
||||
string-index s char/char-set/pred [start end] -> integer or #f
|
||||
string-index-right s char/char-set/pred [end start] -> integer or #f
|
||||
string-skip s char/char-set/pred [start end] -> integer or #f
|
||||
string-skip-right s char/char-set/pred [end start] -> integer or #f
|
||||
Note the inverted start/end ordering of index-right and skip-right's
|
||||
parameters.
|
||||
|
||||
Index (index-right) searches through the string from the left (right),
|
||||
returning the index of the first occurence of a character which
|
||||
- equals CHAR/CHAR-SET/PRED (if it is a character);
|
||||
- is in CHAR/CHAR-SET/PRED (if it is a char-set);
|
||||
- satisfies the predicate CHAR/CHAR-SET/PRED (if it is a procedure).
|
||||
If no match is found, the functions return false.
|
||||
|
||||
The skip functions are similar, but use the complement of the criteria:
|
||||
they search for the first char that *doesn't* satisfy the test. E.g.,
|
||||
to skip over initial whitespace, say
|
||||
(cond ((string-skip s char-set:whitespace) =>
|
||||
(lambda (i)
|
||||
;; (string-ref s i) is not whitespace.
|
||||
...)))
|
||||
|
||||
string-prefix-count s1 s2 -> integer
|
||||
string-suffix-count s1 s2 -> integer
|
||||
string-prefix-count-ci s1 s2 -> integer
|
||||
string-suffix-count-ci s1 s2 -> integer
|
||||
Return the length of the longest common prefix/suffix of the two strings.
|
||||
This is equivalent to the "mismatch index" for the strings.
|
||||
|
||||
substring-prefix-count s1 start1 end1 s2 start2 end2 -> integer
|
||||
substring-suffix-count s1 start1 end1 s2 start2 end2 -> integer
|
||||
substring-prefix-count-ci s1 start1 end1 s2 start2 end2 -> integer
|
||||
substring-suffix-count-ci s1 start1 end1 s2 start2 end2 -> integer
|
||||
Substring variants.
|
||||
|
||||
string-prefix? s1 s2 -> boolean
|
||||
string-suffix? s1 s2 -> boolean
|
||||
string-prefix-ci? s1 s2 -> boolean
|
||||
string-suffix-ci? s1 s2 -> boolean
|
||||
Is S1 a prefix/suffix of S2?
|
||||
|
||||
substring-prefix? s1 start1 end1 s2 start2 end2 -> boolean
|
||||
substring-suffix? s1 start1 end1 s2 start2 end2 -> boolean
|
||||
substring-prefix-ci? s1 start1 end1 s2 start2 end2 -> boolean
|
||||
substring-suffix-ci? s1 start1 end1 s2 start2 end2 -> boolean
|
||||
Substring variants.
|
||||
|
||||
substring? s1 s2 [start end] -> integer or false
|
||||
substring-ci? s1 s2 [start end] -> integer or false
|
||||
Return the index in S2 where S1 occurs as a substring, or false.
|
||||
The returned index is in the range [start,end).
|
||||
The current implementation uses the Knuth-Morris-Pratt algorithm.
|
||||
|
||||
string-fill! s char [start end] -> unspecified
|
||||
Store CHAR into the elements of S.
|
||||
This is the R4RS procedure extended to have optional START/END parameters.
|
||||
|
||||
string-copy! target tstart s [start end] -> unspecified
|
||||
Copy the sequence of characters from index range [START,END) in
|
||||
string S to string TARGET, beginning at index TSTART. The characters
|
||||
are copied left-to-right or right-to-left as needed -- the copy is
|
||||
guaranteed to work, even if TARGET and S are the same string.
|
||||
|
||||
substring s start [end] -> string
|
||||
string-copy s [start end] -> string
|
||||
These R4RS procedures are extended to have optional START/END parameters.
|
||||
Use STRING-COPY when you want to indicate explicitly in your code that you
|
||||
wish to allocate new storage; use SUBSTRING when you don't care if you
|
||||
get a fresh copy or share storage with the original string.
|
||||
E.g.:
|
||||
(string-copy "Beta substitution") => "Beta substitution"
|
||||
(string-copy "Beta substitution" 1 10)
|
||||
=> "eta subst"
|
||||
(string-copy "Beta substitution" 5) => "substitution"
|
||||
|
||||
SUBSTRING may return a value with shares memory with S.
|
||||
|
||||
string-reverse s [start end] -> string
|
||||
string-reverse! s [start end] -> unspecific
|
||||
Reverse the string.
|
||||
|
||||
reverse-list->string char-list -> string
|
||||
An efficient implementation of (compose string->list reverse):
|
||||
(reverse-list->string '(#\a #\B #\c)) -> "cBa"
|
||||
This is a common idiom in the epilog of string-processing loops
|
||||
that accumulate an answer in a reverse-order list.
|
||||
|
||||
string-concat string-list -> string
|
||||
Append the elements of STRING-LIST together into a single list.
|
||||
Guaranteed to return a freshly allocated list. Appears sufficiently
|
||||
often as to warrant being named.
|
||||
|
||||
string-concat/shared string-list -> string
|
||||
string-append/shared s ... -> string
|
||||
These two procedures are variants of STRING-CONCAT and STRING-APPEND
|
||||
that are permitted to return results that share storage with their
|
||||
parameters. In particular, if STRING-APPEND/SHARED is applied to just
|
||||
one argument, it may return exactly that argument, whereas STRING-APPEND
|
||||
is required to allocate a fresh string.
|
||||
|
||||
string->list s [start end] -> char-list
|
||||
The R5RS STRING->LIST procedure is extended to take optional START/END
|
||||
arguments.
|
||||
|
||||
string-null? s -> bool
|
||||
Is S the empty string?
|
||||
|
||||
xsubstring s from [to start end] -> string
|
||||
This is the "extended substring" procedure that implements replicated
|
||||
copying of a substring of some string.
|
||||
|
||||
S is a string; START and END are optional arguments that demarcate
|
||||
a substring of S, defaulting to 0 and the length of S (e.g., the whole
|
||||
string). Replicate this substring up and down index space, in both the
|
||||
positive and negative directions. For example, if S = "abcdefg", START=3,
|
||||
and END=6, then we have the conceptual bidirectionally-infinite string
|
||||
... d e f d e f d e f d e f d e f d e f d e f ...
|
||||
... -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 ...
|
||||
XSUBSTRING returns the substring of this string beginning at index FROM,
|
||||
and ending at TO (which defaults to FROM+(END-START)).
|
||||
|
||||
You can use XSUBSTRING to perform a variety of tasks:
|
||||
- To rotate a string left: (xsubstring "abcdef" 2) => "cdefab"
|
||||
- To rotate a string right: (xsubstring "abcdef" -2) => "efabcd"
|
||||
- To replicate a string: (xsubstring "abc" 0 7) => "abcabca"
|
||||
|
||||
Note that
|
||||
- The FROM/TO indices give a half-open range -- the characters from
|
||||
index FROM up to, but not including, index TO.
|
||||
- The FROM/TO indices are not in terms of the index space for string S.
|
||||
They are in terms of the replicated index space of the substring
|
||||
defined by S, START, and END.
|
||||
|
||||
It is an error if START=END -- although this is allowed by special
|
||||
dispensation when FROM=TO.
|
||||
|
||||
string-xcopy! target tstart s sfrom [sto start end] -> unspecific
|
||||
Exactly the same as XSUBSTRING, but the extracted text is written
|
||||
into the string TARGET starting at index TSTART.
|
||||
This operation is not defined if (EQ? TARGET S) -- you cannot copy
|
||||
a string on top of itself.
|
||||
|
||||
|
||||
* Lower-level procedures
|
||||
|
||||
The following procedures are useful for writing other string-processing
|
||||
functions, and are contained in the string-lib-internals package.
|
||||
|
||||
parse-start+end proc s args -> [start end rest]
|
||||
parse-final-start+end proc s args -> [start end]
|
||||
PARSE-START+END may be used to parse a pair of optional START/END arguments
|
||||
from an argument list, defaulting them to 0 and the length of some string
|
||||
S, respectively. Let the length of string S be SLEN.
|
||||
- If ARGS = (), the function returns (values 0 slen '())
|
||||
- If ARGS = (i), I is checked to ensure it is an integer, and
|
||||
that 0 <= i <= slen. Returns (values i slen (cdr rest)).
|
||||
- If ARGS = (i j ...), I and J are checked to ensure they are
|
||||
integers, and that 0 <= i <= j <= slen. Returns (values i j (cddr rest)).
|
||||
If any of the checks fail, an error condition is raised, and PROC is used
|
||||
as part of the error condition -- it should be the name of the client
|
||||
procedure whose argument list PARSE-START+END is parsing.
|
||||
|
||||
parse-final-start+end is exactly the same, except that the args list
|
||||
passed to it is required to be of length two or less; if it is longer,
|
||||
an error condition is raised. It may be used when the optional START/END
|
||||
parameters are final arguments to the procedure.
|
||||
|
||||
check-substring-spec proc s start end -> unspecific
|
||||
Check values START and END to ensure they specify a valid substring
|
||||
in S. This means that START and END are exact integers, and
|
||||
0 <= START <= END <= (STRING-LENGTH S)
|
||||
If this is not the case, an error condition is raised. PROC is used
|
||||
as part of error condition, and should be the procedure whose START/END
|
||||
parameters we are checking.
|
||||
|
||||
make-kmp-restart-vector s c= -> vector
|
||||
Build the Knuth-Morris-Pratt "restart vector," which is useful
|
||||
for quickly searching character sequences for the occurrence of
|
||||
string S. C= is a character-equality function used to construct
|
||||
the restart vector; it is usefully CHAR=? or CHAR-CI=?.
|
||||
|
||||
The definition of the restart vector RV for string S is:
|
||||
If we have matched chars 0..i-1 of S against some search string SS, and
|
||||
S[i] doesn't match SS[k], then reset i := RV[i], and try again to
|
||||
match SS[k]. If RV[i] = -1, then punt SS[k] completely, and move on to
|
||||
SS[k+1] and S[0].
|
||||
|
||||
In other words, if you have matched the first i chars of S, but
|
||||
the i+1'th char doesn't match, RV[i] tells you what the next-longest
|
||||
prefix of PATTERN is that you have matched.
|
||||
|
||||
The following string-search function shows how a restart vector
|
||||
is used to search. It can be easily adapted to search other character
|
||||
sequences (such as ports).
|
||||
|
||||
(define (find-substring pattern source start end)
|
||||
(let ((plen (string-length pattern))
|
||||
(rv (make-kmp-restart-vector pattern char=?)))
|
||||
|
||||
;; The search loop. SJ & PJ are redundant state.
|
||||
(let lp ((si start) (pi 0)
|
||||
(sj (- end start)) ; (- end si) -- how many chars left.
|
||||
(pj plen)) ; (- plen pi) -- how many chars left.
|
||||
|
||||
(if (= pi plen) (- si plen) ; Win.
|
||||
|
||||
(and (<= pj sj) ; Lose.
|
||||
|
||||
(if (char=? (string-ref source si) ; Search.
|
||||
(string-ref pattern pi))
|
||||
(lp (+ 1 si) (+ 1 pi) (- sj 1) (- pj 1)) ; Advance.
|
||||
|
||||
(let ((pi (vector-ref rv pi))) ; Retreat.
|
||||
(if (= pi -1)
|
||||
(lp (+ si 1) 0 (- sj 1) plen) ; Punt.
|
||||
(lp si pi sj (- plen pi))))))))))
|
Loading…
Reference in New Issue