sunet/scheme/xml/doc.txt

284 lines
8.8 KiB
Plaintext
Raw Permalink Normal View History

2001-10-29 03:48:42 -05:00
_XML_ Library
=============
Files: xml.ss xmlr.ss xmls.ss
Signature: xml^
Basic XML Data Types
====================
Document:
This structure represents an XML document. The only useful part is
the document-element, which contains all the content. The rest of
of the structure contains DTD information, which isn't supported,
and processing-instructions.
Element:
Each pair of start/end tags and everything in between is an element.
It has the following pieces:
a name
attributes
contents including sub-elements
Xexpr:
S-expression representations of XML data.
The end of this document has more details.
Functions
=========
> read-xml : [Input-port] -> Document
reads in an XML document from the given or current input port
XML documents contain exactly one element. It throws an xml-read:error
if there isn't any element or if there are more than one element.
2001-12-14 09:09:35 -05:00
Malformed xml is reported with source locations in
the form `l.c/o', where l is the line number, c is
the column number and o is the number of characters
from the beginning of the file.
2001-10-29 03:48:42 -05:00
> write-xml : Document [Output-port] -> Void
writes a document to the given or current output port, currently
ignoring everything except the document's root element.
> write-xml/content : Content [Output-port] -> Void
writes a document's contents to the given or current output port
> display-xml : Document [Output-port] -> Void
just like write-xml, but newlines and indentation make the output more
readable, though less technically correct when white space is
significant.
> display-xml/content : Content [Output-port] -> Void
just like write-xml/content, but with indentation and newlines
> xml->xexpr : Content -> Xexpr
converts the interesting part of an XML document into an Xexpression
> xexpr->xml : Xexpr -> Content
converts an Xexpression into the interesting part of an XML document
> xexpr->string : Xexpression -> String
converts an Xexpression into a string representation
> eliminate-whitespace : (listof Symbol) (Bool -> Bool) -> Element -> Element
Some elements should not contain any text, only other tags, except they
often contain whitespace for formating purposes. Given a list of tag names
and the identity function, eliminate-whitespace produces a function that
filters out pcdata consisting solely of whitespace from those elements and
raises and error if any non-whitespace text appears. Passing in the function
called "not" instead of the identity function filters all elements which are not
named in the list. Using void filters all elements regardless of the list.
Parameters
==========
> empty-tag-shorthand : 'always | 'never | (listof Symbol)
Default: 'always
This determines if the output functions should use the <empty/> tag
notation instead of writing <empty></empty>. The first form is the
preferred XML notation. However, most browsers designed for HTML
will only properly render XHTML if the document uses a mixture of the
two formats. _html-empty-tags_ contains the W3 consortium's
recommended list of XHTML tags that should use the shorthand.
> collapse-whitespace : Bool
Default: #f
All consecutive whitespace is replaced by a single space.
CDATA sections are not affected.
> trim-whitespace : Bool
This parameter no longer exists. Consider using collapse-whitespace
and eliminate-whitespace instead.
> read-comments : Bool
Default: #f
Comments, by definition, should be ignored by programs. However,
interoperating with ad hoc extentions to other languages sometimes
requires processing comments anyway.
> xexpr-drop-empty-attributes : Bool
Default: #f
It's easier to write functions processing Xexpressions, if they always
have a list of attributes. On the other hand, it's less cumbersome to
write Xexpresssions by hand without empty lists of attributes
everywhere. Normally xml->xexpr leaves in empty attribute lists.
Setting this parameter to #t drops them, so further editing the
Xexpression by hand is less annoying.
Examples
========
Reading an Xexpression:
(xml->xexpr (document-element (read-xml input-port)))
Writing an Xexpression:
(empty-tag-shorthand html-empty-tags)
(write-xml/content (xexpr->xml `(html (head (title ,banner))
(body ((bgcolor "white"))
,text)))
output-port)
What this Library Doesn't Provide
=================================
Document Type Declaration (DTD) processing
Validation
Expanding user-defined entites
Reading user-defined entites in attributes
Unicode support
XML Datatype Details
====================
Note: Users of the XML collection don't need to know most of these definitions.
Note: Xexpr is the only important one to understand. Even then,
Processing-instructions may be ignored.
2001-12-14 09:09:35 -05:00
> Xexpr = String
| (list* Symbol (listof (list Symbol String)) (list Xexpr))
| (cons Symbol (listof Xexpr)) ;; an element with no attributes
| Symbol ;; symbolic entities such as &nbsp;
| Number ;; numeric entities like &#20;
| Misc
2001-10-29 03:48:42 -05:00
2001-12-14 09:09:35 -05:00
> Document = (make-document Prolog Element (listof Processing-instruction))
2001-10-29 03:48:42 -05:00
(define-struct document (prolog element misc))
2001-12-14 09:09:35 -05:00
> Prolog = (make-prolog (listof Misc) Document-type [Misc ...])
(define-struct prolog (misc dtd misc2))
The last field is a (listof Misc), but the maker accepts optional
arguments instead for backwards compatibility.
> Document-type = #f | (make-document-type Symbol External-dtd #f)
(define-struct document-type (name external inlined))
2001-10-29 03:48:42 -05:00
2001-12-14 09:09:35 -05:00
> External-dtd = (make-external-dtd/public str str)
| (make-external-dtd/system str)
| #f
(define-struct external-dtd (system))
(define-struct (external-dtd/public external-dtd) (public))
(define-struct (external-dtd/system external-dtd) ())
> Element = (make-element Location Location
2001-10-29 03:48:42 -05:00
Symbol
(listof Attribute)
(listof Content))
(define-struct (element struct:source) (name attributes content))
2001-12-14 09:09:35 -05:00
> Attribute = (make-attribute Location Location Symbol String)
2001-10-29 03:48:42 -05:00
(define-struct (attribute struct:source) (name value))
2001-12-14 09:09:35 -05:00
> Content = Pcdata
| Element
| Entity
| Misc
2001-10-29 03:48:42 -05:00
2001-12-14 09:09:35 -05:00
Misc = Comment
| Processing-instruction
2001-10-29 03:48:42 -05:00
2001-12-14 09:09:35 -05:00
> Pcdata = (make-pcdata Location Location String)
2001-10-29 03:48:42 -05:00
(define-struct (pcdata struct:source) (string))
2001-12-14 09:09:35 -05:00
> Entity = (make-entity (U Nat Symbol))
2001-10-29 03:48:42 -05:00
(define-struct entity (text))
2001-12-14 09:09:35 -05:00
> Processing-instruction = (make-pi Location Location String (list String))
2001-10-29 03:48:42 -05:00
(define-struct (pi struct:source) (target-name instruction))
2001-12-14 09:09:35 -05:00
> Comment = (make-comment String)
2001-10-29 03:48:42 -05:00
(define-struct comment (text))
2001-12-14 09:09:35 -05:00
Source = (make-source Location Location)
2001-10-29 03:48:42 -05:00
(define-struct source (start stop))
2001-12-14 09:09:35 -05:00
Location = Nat
| Symbol
The PList Library
=================
Files: plist.ss
The PList library provides the ability to read and write xml documents which
conform to the "plist" DTD, used to store 'dictionaries' of string - value
associations.
To Load
=======
(require (lib "plist.ss" "xml"))
Functions
=========
> read-plist : Port -> PLDict
reads a plist from a port, and produces a 'dict' x-expression
> write-plist : PLDict Port -> Void
writes a plist to the given port. May raise the exn:application:type
exception if the plist is badly formed.
Datatypes
=========
NB: all of these are subtypes of x-expression:
> PLDict = (list 'dict Assoc-pair ...)
> PLAssoc-pair = (list 'assoc-pair String PLValue)
> PLValue = String
| (list 'true)
| (list 'false)
| (list 'integer Integer)
| (list 'real Real)
| PLDict
| PLArray
> PLArray = (list 'array PLValue ...)
In fact, the PList DTD also defines Data and Date types, but we're ignoring
these for the moment.
Examples
========
Here's a sample PLDict:
(define my-dict
`(dict (assoc-pair "first-key"
"just a string
with some whitespace in it")
(assoc-pair "second-key"
(false))
(assoc-pair "third-key"
(dict ))
(assoc-pair "fourth-key"
(dict (assoc-pair "inner-key"
(real 3.432))))
(assoc-pair "fifth-key"
(array (integer 14)
"another string"
(true)))
(assoc-pair "sixth-key"
(array))))
Let's write it to disk:
(call-with-output-file "/Users/clements/tmp.plist"
(lambda (port)
(write-plist my-dict port))
'truncate)
Let's read it back from the disk:
(define new-dict
(call-with-input-file "/Users/clements/tmp.plist"
(lambda (port)
(read-plist port))))