Redesign the API for version 0.2

This commit is contained in:
Lassi Kortela 2021-09-05 12:50:52 +03:00
parent f89ae02193
commit 1be0d5f82f
9 changed files with 295 additions and 183 deletions

View File

@ -14,47 +14,104 @@ Pandoc can convert documents between several markup languages
uniform syntax tree. This egg supplies JSON and SXML versions of the
syntax tree.
=== Documentation
Pandoc can be called the following ways:
<parameter>(pandoc-command-line [string-list])</parameter>
* The official {{pandoc}} command.
* The unofficial {{pandoc-tar}} command.
* The unofficial {{pandoc-server}} via HTTP.
This parameter lets the user customize the command line that is given
to the operating system to run Pandoc. All Pandoc invocations start
with this command line. The default is {{'("pandoc")}}.
=== The (pandoc) library
Treat this parameter like RnRS {{command-line}}: shell syntax cannot
be used, and command line arguments should not be shell-quoted.
<procedure>(pandoc-bytevectors->json pandoc input-format bytevectors)</procedure>
<procedure>(pandoc-bytevectors->sxml pandoc input-format bytevectors)</procedure>
<procedure>(pandoc-port->json input-format input-port)</procedure>
<procedure>(pandoc-file->json input-format input-filename)</procedure>
<procedure>(pandoc-bytevector->json pandoc input-format bytevector)</procedure>
<procedure>(pandoc-bytevector->sxml pandoc input-format bytevector)</procedure>
These procedures return Pandoc's JSON parse tree. The JSON is decoded
into the canonical Scheme JSON representation used by SRFI 180, the
{{cjson}} and {{medea}} eggs, etc.: JSON arrays become Scheme vectors,
JSON objects become Scheme association lists with symbol keys, and
JSON null becomes the symbol {{'null}}.
<procedure>(pandoc-strings->json pandoc input-format strings)</procedure>
<procedure>(pandoc-strings->sxml pandoc input-format strings)</procedure>
The {{input-format}} argument is a symbol, and is supplied as Pandoc's
{{--from}} argument.
<procedure>(pandoc-string->json pandoc input-format string)</procedure>
<procedure>(pandoc-string->sxml pandoc input-format string)</procedure>
<procedure>(pandoc-files->json pandoc input-format filenames)</procedure>
<procedure>(pandoc-files->sxml pandoc input-format filenames)</procedure>
<procedure>(pandoc-file->json pandoc input-format filename)</procedure>
<procedure>(pandoc-file->sxml pandoc input-format filename)</procedure>
<procedure>(pandoc-port->json pandoc input-format port)</procedure>
<procedure>(pandoc-port->sxml pandoc input-format port)</procedure>
These procedures parse markup from various sources.
The {{pandoc}} argument is the Pandoc endpoint to use: cli, tar, or
server. The next sections explain how to create endpoints.
The {{->json}} procedures return Pandoc's JSON parse tree. The JSON is
decoded into the canonical Scheme JSON representation used by SRFI
180, the {{cjson}} and {{medea}} eggs, etc.: JSON arrays become Scheme
vectors, JSON objects become Scheme association lists with symbol
keys, and JSON null becomes the symbol {{'null}}.
The {{->sxml}} procedures are like their {{->json}} counterparts, but
instead of JSON they return an SXML conversion of Pandoc's parse tree
using HTML tags. The parse tree is easy to turn into HTML using one of
several Scheme libraries, e.g. Chicken's {{sxml-transforms}} egg.
The {{input-format}} argument is a symbol, e.g. {{markdown}} or
{{html}}, and is supplied as Pandoc's {{--from}} argument or its
equivalent for the given endpoint.
An exception is raised if the conversion is not successful.
<procedure>(pandoc-port->sxml input-format input-port)</procedure>
<procedure>(pandoc-file->sxml input-format input-filename)</procedure>
<procedure>(pandoc-json->sxml json)</procedure>
These procedures are like their {{->json}} counterparts, but instead
of JSON they return an SXML conversion of Pandoc's parse tree using
HTML tags. The parse tree is easy to turn into HTML using one of
several Scheme libraries, e.g. Chicken's {{sxml-transforms}} egg.
This is a utility procedure that parses JSON into SXML. You probably
don't need this, but you might, so it's exported anyway.
=== Caveats
=== The (pandoc cli) library
Pandoc can be quite slow, but its work could be easily parallelized by
running one instance of Pandoc per document.
<procedure>(pandoc-cli [command-name])</procedure>
Create a pandoc endpoint using the official {{pandoc}} command line
interface. The default {{command-name}} is {{"pandoc"}}.
Note that documents are converted serially, and a separate Pandoc
instance is launched for each document, making batch conversions slow.
We tried launching several Pandoc instances in parallel, but it
doesn't materially decrease the conversion time. The tar and server
endpoints can avoid this problem by sending an entire batch of
documents to the same Pandoc instance all at once, which is something
that the official CLI does not support.
Try [[https://repology.org/project/pandoc/versions|pandoc at
Repology]] to find a Pandoc package for your operating system.
=== The (pandoc tar) library
<procedure>(pandoc-tar [command-name])</procedure>
Create a pandoc endpoint using the unofficial {{pandoc-tar}} command
line interface. The default {{command-name}} is {{"pandoc-tar"}}.
See the [[https://github.com/lassik/pandoc-tar|pandoc-tar homepage]]
for installation instructions.
=== The (pandoc server) library
<procedure>(pandoc-server base-url)</procedure>
Create a pandoc endpoint using the unofficial {{pandoc-server}} HTTP
REST API. Give a {{base-url}} like {{"http://localhost:8080/"}}.
See the [[https://github.com/jgm/pandoc-server|pandoc-server
homepage]] for installation instructions.
=== Version History
* 0.1: First release
* 0.2: Redo the API. Add tar and server endpoints.
* 0.1: First release.
=== Author

View File

@ -1,38 +1,48 @@
(module pandoc
(pandoc-command-line
(#;export
pandoc-json->sxml
pandoc-port->json
pandoc-port->sxml
pandoc-file->json
pandoc-file->sxml
pandoc-json->sxml)
pandoc-files->json
pandoc-files->sxml
pandoc-bytevector->json
pandoc-bytevector->sxml
pandoc-bytevectors->json
pandoc-bytevectors->sxml
pandoc-string->json
pandoc-string->sxml
pandoc-strings->json
pandoc-strings->sxml)
(import (scheme)
(chicken base)
(only (scheme base)
bytevector
bytevector-append
read-bytevector
string->utf8)
(only (chicken io) read-byte write-byte)
(only (chicken port) copy-port)
(only (chicken port) copy-port with-input-from-string)
(only (chicken process) process process-wait)
(only (scsh-process) run/port)
(only (medea) read-json))
(define (run-read-write/old args input-port read-output)
(receive (from-sub to-sub sub) (process (car args) (cdr args))
(copy-port input-port to-sub read-byte write-byte)
(close-output-port to-sub)
(let ((output (read-output from-sub)))
(receive (sub clean-exit? exit-status) (process-wait sub)
;; Call `process-wait` before closing the last port to avoid
;; triggering the automatic `process-wait` done by `process`
;; when all ports are closed. If we relied on the implicit
;; `process-wait`, we couldn't find out the exit status.
(close-input-port from-sub)
(if (and clean-exit? (eqv? 0 exit-status)) output
(error "Error running" args))))))
(define (pandoc-port->json input-format input-port)
(let ((pandoc (string->symbol (car (pandoc-command-line)))))
(read-json (run/port (,pandoc --from ,input-format --to json)
(= 0 input-port)))))
(define (read-bytevector-all port)
(let loop ((whole (bytevector)))
(let ((part (read-bytevector 1000 port)))
(if (eof-object? part) whole
(loop (bytevector-append whole part))))))
(define (call-with-binary-input-file filename proc)
(let ((port (open-input-file filename #:binary)))
@ -40,6 +50,4 @@
(lambda () (proc port))
(lambda () (close-input-port port)))))
(define pandoc-command-line (make-parameter (list "pandoc")))
(include "pandoc.r5rs.scm"))

19
pandoc.cli.chicken.scm Normal file
View File

@ -0,0 +1,19 @@
(module (pandoc cli)
(#;export
pandoc-cli)
(import (scheme)
(chicken base)
(only (medea) read-json)
(only (scheme base) utf8->string)
(only (scsh-process) run/port))
(define (pandoc-cli #!optional command-name)
(let ((command-name (string->symbol (or command-name "pandoc"))))
(lambda (input-format bytevectors)
(map (lambda (bytevector)
(read-json
(run/port (,command-name --from ,input-format --to json)
(<< ,(utf8->string bytevector)))))
bytevectors)))))

View File

@ -20,6 +20,9 @@
(components
(extension pandoc
(source "pandoc.chicken.scm"))
(extension pandoc.cli
(source "pandoc.cli.chicken.scm")
(component-dependencies pandoc))
(extension pandoc.server
(source "pandoc.server.chicken.scm")
(component-dependencies pandoc))

View File

@ -116,13 +116,63 @@
(assert-supported-version)
(convert-many (vector->list (cdr (assq 'blocks json)))))
(define (pandoc-port->sxml input-format input-port)
(pandoc-json->sxml (pandoc-port->json input-format input-port)))
;;
(define (pandoc-file->json input-format input-filename)
(call-with-binary-input-file
input-filename
(lambda (input-port) (pandoc-port->json input-format input-port))))
(define (pandoc-bytevectors->json pandoc input-format bytevectors)
(pandoc input-format bytevectors))
(define (pandoc-file->sxml input-format input-filename)
(pandoc-json->sxml (pandoc-file->json input-format input-filename)))
(define (pandoc-strings->json pandoc input-format strings)
(pandoc-bytevectors->json
pandoc input-format
(map string->utf8 strings)))
(define (pandoc-files->json pandoc input-format filenames)
(pandoc-bytevectors->json
pandoc input-format
(map (lambda (filename)
(call-with-binary-input-file filename read-bytevector-all))
filenames)))
;;
(define (pandoc-bytevector->json pandoc input-format bytevector)
(car (pandoc-bytevectors->json pandoc input-format (list bytevector))))
(define (pandoc-string->json pandoc input-format string)
(car (pandoc-strings->json pandoc input-format (list string))))
(define (pandoc-file->json pandoc input-format filename)
(car (pandoc-files->json pandoc input-format (list filename))))
;;
(define (pandoc-bytevectors->sxml pandoc input-format bytevectors)
(map pandoc-json->sxml
(pandoc-bytevectors->json pandoc input-format bytevectors)))
(define (pandoc-strings->sxml pandoc input-format strings)
(map pandoc-json->sxml
(pandoc-strings->json pandoc input-format strings)))
(define (pandoc-files->sxml pandoc input-format filenames)
(map pandoc-json->sxml
(pandoc-files->json pandoc input-format filenames)))
;;
(define (pandoc-bytevector->sxml pandoc input-format bytevector)
(car (pandoc-bytevectors->sxml pandoc input-format (list bytevector))))
(define (pandoc-string->sxml pandoc input-format string)
(car (pandoc-strings->sxml pandoc input-format (list string))))
(define (pandoc-file->sxml pandoc input-format filename)
(car (pandoc-files->sxml pandoc input-format (list filename))))
;;
(define (pandoc-port->json pandoc input-format port)
(pandoc-bytevector->json pandoc input-format (read-bytevector-all port)))
(define (pandoc-port->sxml pandoc input-format port)
(pandoc-bytevector->sxml pandoc input-format (read-bytevector-all port)))

View File

@ -6,3 +6,4 @@
(repo git "git://github.com/lassik/scheme-{egg-name}.git")
(uri targz "https://github.com/lassik/scheme-{egg-name}/tarball/{egg-release}")
(release "0.1")
(release "0.2")

View File

@ -1,65 +1,44 @@
(module (pandoc server)
(pandoc-server-base-url
pandoc-server-strings->json
pandoc-server-strings->sxml
pandoc-server-files->json
pandoc-server-files->sxml)
(#;export
pandoc-server)
(import (scheme)
(chicken base)
(cjson)
(only (chicken port) with-input-from-string)
(only (scheme base) utf8->string)
(only (chicken io) read-string)
(only (http-client) with-input-from-request)
(only (intarweb) headers make-request)
(only (medea) read-json write-json)
(only (uri-common) uri-reference)
(only (pandoc) pandoc-json->sxml))
(only (medea) write-json)
(only (uri-common) uri-reference))
(define pandoc-server-base-url
(make-parameter "http://localhost:8080/"))
(define (pandoc-server-strings->json input-format input-strings)
(with-input-from-request
(make-request
method: 'POST
uri: (uri-reference (string-append (pandoc-server-base-url)
"convert-batch"))
headers: (headers '((content-type "application/json")
(accept "application/json"))))
(lambda ()
(let ((input-format (symbol->string input-format)))
(write-json
(list->vector
(map (lambda (input-string)
(list (cons 'from input-format)
(cons 'to "json")
(cons 'text input-string)))
input-strings)))))
(lambda ()
(let ((array (string->cjson (read-string))))
(unless (eq? cjson/array (cjson-type array))
(error "Got unexpected JSON from pandoc-server"))
(let loop ((i (- (cjson-array-size array) 1)) (results '()))
(if (< i 0) results
(loop (- i 1)
(cons (cjson-schemify
(string->cjson
(cjson-schemify
(cjson-array-ref array i))))
results))))))))
(define (pandoc-server-files->json input-format input-filenames)
(pandoc-server-strings->json
input-format
(map (lambda (filename) (with-input-from-file filename read-string))
input-filenames)))
(define (pandoc-server-strings->sxml input-format input-strings)
(map pandoc-json->sxml
(pandoc-server-strings->json input-format input-strings)))
(define (pandoc-server-files->sxml input-format input-filenames)
(map pandoc-json->sxml
(pandoc-server-files->json input-format input-filenames))))
(define (pandoc-server base-url)
(lambda (input-format bytevectors)
(with-input-from-request
(make-request
method: 'POST
uri: (uri-reference (string-append base-url "convert-batch"))
headers: (headers '((content-type "application/json")
(accept "application/json"))))
(lambda ()
(let ((input-format (symbol->string input-format)))
(write-json
(list->vector
(map (lambda (bytevector)
(list (cons 'from input-format)
(cons 'to "json")
(cons 'text (utf8->string bytevector))))
bytevectors)))))
(lambda ()
(let ((array (string->cjson (read-string))))
(unless (eq? cjson/array (cjson-type array))
(error "Got unexpected JSON from pandoc-server"))
(let loop ((i (- (cjson-array-size array) 1)) (results '()))
(if (< i 0) results
(loop (- i 1)
(cons (cjson-schemify
(string->cjson
(cjson-schemify
(cjson-array-ref array i))))
results))))))))))

View File

@ -1,21 +1,45 @@
(define-library (pandoc)
(export pandoc-command-line
pandoc-port->json
pandoc-port->sxml
pandoc-file->json
pandoc-file->sxml
pandoc-json->sxml)
(export
pandoc-json->sxml
pandoc-port->json
pandoc-port->sxml
pandoc-file->json
pandoc-file->sxml
pandoc-files->json
pandoc-files->sxml
pandoc-bytevector->json
pandoc-bytevector->sxml
pandoc-bytevectors->json
pandoc-bytevectors->sxml
pandoc-string->json
pandoc-string->sxml
pandoc-strings->json
pandoc-strings->sxml)
(import (scheme base)
(scheme file)
(scheme write))
(cond-expand
(gauche (import (only (srfi 180) json-read)
(only (gauche base) copy-port)
(only (gauche process) call-with-process-io))))
(begin
(define pandoc-command-line (make-parameter (list "pandoc")))
(define (call-with-binary-input-file filename proc)
(call-with-port (open-binary-input-file filename) proc)))
(gauche (import (only (srfi 180) json-read)
(only (gauche base) copy-port)
(only (gauche process) call-with-process-io))))
(cond-expand
(gauche (include "pandoc.gauche.scm")))
(gauche (include "pandoc.gauche.scm")))
(begin (define inexact->exact exact)
(define (call-with-binary-input-file filename proc)
(call-with-port (open-binary-input-file filename) proc))
(define (read-bytevector-all binary-input-port)
(let loop ((whole (bytevector)))
(let ((part (read-bytevector 1000)))
(if (eof-object? part) whole
(loop (bytevector-append whole part)))))))
(include "pandoc.r5rs.scm"))

View File

@ -1,44 +1,35 @@
(module (pandoc tar)
(pandoc-tar-command
pandoc-tar-strings->json
pandoc-tar-strings->sxml
pandoc-tar-files->json
pandoc-tar-files->sxml)
(#;export
pandoc-tar)
(import (scheme)
(chicken base)
(cjson)
(srfi 4)
(scheme base)
;;(srfi 4)
(only (scheme base)
bytevector
bytevector-append
bytevector-length
bytevector-u8-ref
eof-object
make-bytevector
read-bytevector
string->utf8
truncate-remainder
utf8->string
write-bytevector)
(only (chicken io) read-byte read-string write-byte write-string)
(only (chicken port)
copy-port with-input-from-string with-output-to-string)
(only (chicken process) process process-wait)
(only (scsh-process) run/port run/string)
(only (pandoc) pandoc-json->sxml))
;; (define (eof-object) #!eof)
;; (define (string->utf8 str) str)
;; (define (utf8->string str) str)
;; (define bytevector string)
;; (define (bytevector-append bvs)
;; (let ((target (make-u8vector
;; (define bytevector-length u8vector-length)
;; (define (bytevector-u8-ref s i) (char->integer (string-ref s i)))
;; (define make-bytevector make-u8vector)
;; (define read-bytevector read-string)
;; (define truncate-remainder remainder)
;; (define write-bytevector write-string)
(define (generator->list generator) ; SRFI 158
(let loop ((list '()))
(let ((elem (generator)))
(if (eof-object? elem) (reverse list) (loop (cons elem list))))))
(define pandoc-tar-command
(make-parameter "pandoc-tar"))
(define (bytevector-every? predicate bytes)
(let loop ((i 0))
(or (= i (bytevector-length bytes))
@ -138,36 +129,16 @@
(tar-write-file filename (car inputs))
(loop (+ i 1) (cdr inputs))))))
(define (pandoc-tar-bytevectors->json input-format input-bytevectors)
(let* ((pandoc-tar
(string->symbol (pandoc-tar-command)))
(stdin
(with-output-to-string
(lambda ()
(write-all-to-tar input-bytevectors))))
(stdout
(run/string (,pandoc-tar --from ,input-format --to json)
(<< ,stdin))))
(map (lambda (bytes)
(cjson-schemify (string->cjson (utf8->string bytes))))
(with-input-from-string stdout
(lambda () (generator->list tar-read-file))))))
(define (pandoc-tar-strings->json input-format input-strings)
(pandoc-tar-bytevectors->json
input-format
(map string->utf8 input-strings)))
(define (pandoc-tar-files->json input-format input-filenames)
(pandoc-tar-strings->json
input-format
(map (lambda (filename) (with-input-from-file filename read-string))
input-filenames)))
(define (pandoc-tar-strings->sxml input-format input-strings)
(map pandoc-json->sxml
(pandoc-tar-strings->json input-format input-strings)))
(define (pandoc-tar-files->sxml input-format input-filenames)
(map pandoc-json->sxml
(pandoc-tar-files->json input-format input-filenames))))
(define (pandoc-tar #!optional command-name)
(let ((command-name (string->symbol (or command-name "pandoc-tar"))))
(lambda (input-format bytevectors)
(let* ((stdin
(with-output-to-string
(lambda () (write-all-to-tar bytevectors))))
(stdout
(run/string (,command-name --from ,input-format --to json)
(<< ,stdin))))
(map (lambda (bytes)
(cjson-schemify (string->cjson (utf8->string bytes))))
(with-input-from-string stdout
(lambda () (generator->list tar-read-file)))))))))