181 lines
6.3 KiB
Plaintext
181 lines
6.3 KiB
Plaintext
|
_XML_ Library
|
||
|
=============
|
||
|
|
||
|
Files: xml.ss xmlr.ss xmls.ss
|
||
|
Signature: xml^
|
||
|
|
||
|
Basic XML Data Types
|
||
|
====================
|
||
|
|
||
|
Document:
|
||
|
This structure represents an XML document. The only useful part is
|
||
|
the document-element, which contains all the content. The rest of
|
||
|
of the structure contains DTD information, which isn't supported,
|
||
|
and processing-instructions.
|
||
|
|
||
|
Element:
|
||
|
Each pair of start/end tags and everything in between is an element.
|
||
|
It has the following pieces:
|
||
|
a name
|
||
|
attributes
|
||
|
contents including sub-elements
|
||
|
Xexpr:
|
||
|
S-expression representations of XML data.
|
||
|
|
||
|
The end of this document has more details.
|
||
|
|
||
|
Functions
|
||
|
=========
|
||
|
|
||
|
> read-xml : [Input-port] -> Document
|
||
|
reads in an XML document from the given or current input port
|
||
|
XML documents contain exactly one element. It throws an xml-read:error
|
||
|
if there isn't any element or if there are more than one element.
|
||
|
|
||
|
> write-xml : Document [Output-port] -> Void
|
||
|
writes a document to the given or current output port, currently
|
||
|
ignoring everything except the document's root element.
|
||
|
|
||
|
> write-xml/content : Content [Output-port] -> Void
|
||
|
writes a document's contents to the given or current output port
|
||
|
|
||
|
> display-xml : Document [Output-port] -> Void
|
||
|
just like write-xml, but newlines and indentation make the output more
|
||
|
readable, though less technically correct when white space is
|
||
|
significant.
|
||
|
|
||
|
> display-xml/content : Content [Output-port] -> Void
|
||
|
just like write-xml/content, but with indentation and newlines
|
||
|
|
||
|
> xml->xexpr : Content -> Xexpr
|
||
|
converts the interesting part of an XML document into an Xexpression
|
||
|
|
||
|
> xexpr->xml : Xexpr -> Content
|
||
|
converts an Xexpression into the interesting part of an XML document
|
||
|
|
||
|
> xexpr->string : Xexpression -> String
|
||
|
converts an Xexpression into a string representation
|
||
|
|
||
|
> eliminate-whitespace : (listof Symbol) (Bool -> Bool) -> Element -> Element
|
||
|
Some elements should not contain any text, only other tags, except they
|
||
|
often contain whitespace for formating purposes. Given a list of tag names
|
||
|
and the identity function, eliminate-whitespace produces a function that
|
||
|
filters out pcdata consisting solely of whitespace from those elements and
|
||
|
raises and error if any non-whitespace text appears. Passing in the function
|
||
|
called "not" instead of the identity function filters all elements which are not
|
||
|
named in the list. Using void filters all elements regardless of the list.
|
||
|
|
||
|
Parameters
|
||
|
==========
|
||
|
|
||
|
> empty-tag-shorthand : 'always | 'never | (listof Symbol)
|
||
|
Default: 'always
|
||
|
This determines if the output functions should use the <empty/> tag
|
||
|
notation instead of writing <empty></empty>. The first form is the
|
||
|
preferred XML notation. However, most browsers designed for HTML
|
||
|
will only properly render XHTML if the document uses a mixture of the
|
||
|
two formats. _html-empty-tags_ contains the W3 consortium's
|
||
|
recommended list of XHTML tags that should use the shorthand.
|
||
|
|
||
|
> collapse-whitespace : Bool
|
||
|
Default: #f
|
||
|
All consecutive whitespace is replaced by a single space.
|
||
|
CDATA sections are not affected.
|
||
|
|
||
|
> trim-whitespace : Bool
|
||
|
This parameter no longer exists. Consider using collapse-whitespace
|
||
|
and eliminate-whitespace instead.
|
||
|
|
||
|
> read-comments : Bool
|
||
|
Default: #f
|
||
|
Comments, by definition, should be ignored by programs. However,
|
||
|
interoperating with ad hoc extentions to other languages sometimes
|
||
|
requires processing comments anyway.
|
||
|
|
||
|
> xexpr-drop-empty-attributes : Bool
|
||
|
Default: #f
|
||
|
It's easier to write functions processing Xexpressions, if they always
|
||
|
have a list of attributes. On the other hand, it's less cumbersome to
|
||
|
write Xexpresssions by hand without empty lists of attributes
|
||
|
everywhere. Normally xml->xexpr leaves in empty attribute lists.
|
||
|
Setting this parameter to #t drops them, so further editing the
|
||
|
Xexpression by hand is less annoying.
|
||
|
|
||
|
Examples
|
||
|
========
|
||
|
|
||
|
Reading an Xexpression:
|
||
|
(xml->xexpr (document-element (read-xml input-port)))
|
||
|
|
||
|
Writing an Xexpression:
|
||
|
(empty-tag-shorthand html-empty-tags)
|
||
|
(write-xml/content (xexpr->xml `(html (head (title ,banner))
|
||
|
(body ((bgcolor "white"))
|
||
|
,text)))
|
||
|
output-port)
|
||
|
|
||
|
What this Library Doesn't Provide
|
||
|
=================================
|
||
|
|
||
|
Document Type Declaration (DTD) processing
|
||
|
Validation
|
||
|
Expanding user-defined entites
|
||
|
Reading user-defined entites in attributes
|
||
|
Unicode support
|
||
|
|
||
|
XML Datatype Details
|
||
|
====================
|
||
|
|
||
|
Note: Users of the XML collection don't need to know most of these definitions.
|
||
|
|
||
|
Note: Xexpr is the only important one to understand. Even then,
|
||
|
Processing-instructions may be ignored.
|
||
|
|
||
|
> Xexpr ::= String
|
||
|
| (list* Symbol (listof (list Symbol String)) (list Xexpr))
|
||
|
| (cons Symbol (listof Xexpr)) ;; an element with no attributes
|
||
|
| Symbol ;; symbolic entities such as
|
||
|
| Number ;; numeric entities like 
|
||
|
| Misc
|
||
|
|
||
|
> Document ::= (make-document Prolog Element (listof Processing-instruction))
|
||
|
(define-struct document (prolog element misc))
|
||
|
|
||
|
> Prolog ::= (make-prolog (listof Misc) #f)
|
||
|
(define-struct prolog (misc dtd))
|
||
|
|
||
|
> Element ::= (make-element Location Location
|
||
|
Symbol
|
||
|
(listof Attribute)
|
||
|
(listof Content))
|
||
|
(define-struct (element struct:source) (name attributes content))
|
||
|
|
||
|
> Attribute ::= (make-attribute Location Location Symbol String)
|
||
|
(define-struct (attribute struct:source) (name value))
|
||
|
|
||
|
> Content ::= Pcdata
|
||
|
| Element
|
||
|
| Entity
|
||
|
| Misc
|
||
|
|
||
|
Misc ::= Comment
|
||
|
| Processing-instruction
|
||
|
|
||
|
> Pcdata ::= (make-pcdata Location Location String)
|
||
|
(define-struct (pcdata struct:source) (string))
|
||
|
|
||
|
> Entity ::= (make-entity (U Nat Symbol))
|
||
|
(define-struct entity (text))
|
||
|
|
||
|
> Processing-instruction ::= (make-pi Location Location String (list String))
|
||
|
(define-struct (pi struct:source) (target-name instruction))
|
||
|
|
||
|
> Comment ::= (make-comment String)
|
||
|
(define-struct comment (text))
|
||
|
|
||
|
Source ::= (make-source Location Location)
|
||
|
(define-struct source (start stop))
|
||
|
|
||
|
Location ::= Nat
|
||
|
| Symbol
|