Redesign the API for version 0.2

This commit is contained in:
Lassi Kortela 2021-09-05 12:50:52 +03:00
parent f89ae02193
commit 1be0d5f82f
9 changed files with 295 additions and 183 deletions

View File

@ -14,47 +14,104 @@ Pandoc can convert documents between several markup languages
uniform syntax tree. This egg supplies JSON and SXML versions of the uniform syntax tree. This egg supplies JSON and SXML versions of the
syntax tree. syntax tree.
=== Documentation Pandoc can be called the following ways:
<parameter>(pandoc-command-line [string-list])</parameter> * The official {{pandoc}} command.
* The unofficial {{pandoc-tar}} command.
* The unofficial {{pandoc-server}} via HTTP.
This parameter lets the user customize the command line that is given === The (pandoc) library
to the operating system to run Pandoc. All Pandoc invocations start
with this command line. The default is {{'("pandoc")}}.
Treat this parameter like RnRS {{command-line}}: shell syntax cannot <procedure>(pandoc-bytevectors->json pandoc input-format bytevectors)</procedure>
be used, and command line arguments should not be shell-quoted. <procedure>(pandoc-bytevectors->sxml pandoc input-format bytevectors)</procedure>
<procedure>(pandoc-port->json input-format input-port)</procedure> <procedure>(pandoc-bytevector->json pandoc input-format bytevector)</procedure>
<procedure>(pandoc-file->json input-format input-filename)</procedure> <procedure>(pandoc-bytevector->sxml pandoc input-format bytevector)</procedure>
These procedures return Pandoc's JSON parse tree. The JSON is decoded <procedure>(pandoc-strings->json pandoc input-format strings)</procedure>
into the canonical Scheme JSON representation used by SRFI 180, the <procedure>(pandoc-strings->sxml pandoc input-format strings)</procedure>
{{cjson}} and {{medea}} eggs, etc.: JSON arrays become Scheme vectors,
JSON objects become Scheme association lists with symbol keys, and
JSON null becomes the symbol {{'null}}.
The {{input-format}} argument is a symbol, and is supplied as Pandoc's <procedure>(pandoc-string->json pandoc input-format string)</procedure>
{{--from}} argument. <procedure>(pandoc-string->sxml pandoc input-format string)</procedure>
<procedure>(pandoc-files->json pandoc input-format filenames)</procedure>
<procedure>(pandoc-files->sxml pandoc input-format filenames)</procedure>
<procedure>(pandoc-file->json pandoc input-format filename)</procedure>
<procedure>(pandoc-file->sxml pandoc input-format filename)</procedure>
<procedure>(pandoc-port->json pandoc input-format port)</procedure>
<procedure>(pandoc-port->sxml pandoc input-format port)</procedure>
These procedures parse markup from various sources.
The {{pandoc}} argument is the Pandoc endpoint to use: cli, tar, or
server. The next sections explain how to create endpoints.
The {{->json}} procedures return Pandoc's JSON parse tree. The JSON is
decoded into the canonical Scheme JSON representation used by SRFI
180, the {{cjson}} and {{medea}} eggs, etc.: JSON arrays become Scheme
vectors, JSON objects become Scheme association lists with symbol
keys, and JSON null becomes the symbol {{'null}}.
The {{->sxml}} procedures are like their {{->json}} counterparts, but
instead of JSON they return an SXML conversion of Pandoc's parse tree
using HTML tags. The parse tree is easy to turn into HTML using one of
several Scheme libraries, e.g. Chicken's {{sxml-transforms}} egg.
The {{input-format}} argument is a symbol, e.g. {{markdown}} or
{{html}}, and is supplied as Pandoc's {{--from}} argument or its
equivalent for the given endpoint.
An exception is raised if the conversion is not successful. An exception is raised if the conversion is not successful.
<procedure>(pandoc-port->sxml input-format input-port)</procedure> <procedure>(pandoc-json->sxml json)</procedure>
<procedure>(pandoc-file->sxml input-format input-filename)</procedure>
These procedures are like their {{->json}} counterparts, but instead This is a utility procedure that parses JSON into SXML. You probably
of JSON they return an SXML conversion of Pandoc's parse tree using don't need this, but you might, so it's exported anyway.
HTML tags. The parse tree is easy to turn into HTML using one of
several Scheme libraries, e.g. Chicken's {{sxml-transforms}} egg.
=== Caveats === The (pandoc cli) library
Pandoc can be quite slow, but its work could be easily parallelized by <procedure>(pandoc-cli [command-name])</procedure>
running one instance of Pandoc per document.
Create a pandoc endpoint using the official {{pandoc}} command line
interface. The default {{command-name}} is {{"pandoc"}}.
Note that documents are converted serially, and a separate Pandoc
instance is launched for each document, making batch conversions slow.
We tried launching several Pandoc instances in parallel, but it
doesn't materially decrease the conversion time. The tar and server
endpoints can avoid this problem by sending an entire batch of
documents to the same Pandoc instance all at once, which is something
that the official CLI does not support.
Try [[https://repology.org/project/pandoc/versions|pandoc at
Repology]] to find a Pandoc package for your operating system.
=== The (pandoc tar) library
<procedure>(pandoc-tar [command-name])</procedure>
Create a pandoc endpoint using the unofficial {{pandoc-tar}} command
line interface. The default {{command-name}} is {{"pandoc-tar"}}.
See the [[https://github.com/lassik/pandoc-tar|pandoc-tar homepage]]
for installation instructions.
=== The (pandoc server) library
<procedure>(pandoc-server base-url)</procedure>
Create a pandoc endpoint using the unofficial {{pandoc-server}} HTTP
REST API. Give a {{base-url}} like {{"http://localhost:8080/"}}.
See the [[https://github.com/jgm/pandoc-server|pandoc-server
homepage]] for installation instructions.
=== Version History === Version History
* 0.1: First release * 0.2: Redo the API. Add tar and server endpoints.
* 0.1: First release.
=== Author === Author

View File

@ -1,38 +1,48 @@
(module pandoc (module pandoc
(pandoc-command-line (#;export
pandoc-json->sxml
pandoc-port->json pandoc-port->json
pandoc-port->sxml pandoc-port->sxml
pandoc-file->json pandoc-file->json
pandoc-file->sxml pandoc-file->sxml
pandoc-json->sxml)
pandoc-files->json
pandoc-files->sxml
pandoc-bytevector->json
pandoc-bytevector->sxml
pandoc-bytevectors->json
pandoc-bytevectors->sxml
pandoc-string->json
pandoc-string->sxml
pandoc-strings->json
pandoc-strings->sxml)
(import (scheme) (import (scheme)
(chicken base) (chicken base)
(only (scheme base)
bytevector
bytevector-append
read-bytevector
string->utf8)
(only (chicken io) read-byte write-byte) (only (chicken io) read-byte write-byte)
(only (chicken port) copy-port) (only (chicken port) copy-port with-input-from-string)
(only (chicken process) process process-wait) (only (chicken process) process process-wait)
(only (scsh-process) run/port) (only (scsh-process) run/port)
(only (medea) read-json)) (only (medea) read-json))
(define (run-read-write/old args input-port read-output) (define (read-bytevector-all port)
(receive (from-sub to-sub sub) (process (car args) (cdr args)) (let loop ((whole (bytevector)))
(copy-port input-port to-sub read-byte write-byte) (let ((part (read-bytevector 1000 port)))
(close-output-port to-sub) (if (eof-object? part) whole
(let ((output (read-output from-sub))) (loop (bytevector-append whole part))))))
(receive (sub clean-exit? exit-status) (process-wait sub)
;; Call `process-wait` before closing the last port to avoid
;; triggering the automatic `process-wait` done by `process`
;; when all ports are closed. If we relied on the implicit
;; `process-wait`, we couldn't find out the exit status.
(close-input-port from-sub)
(if (and clean-exit? (eqv? 0 exit-status)) output
(error "Error running" args))))))
(define (pandoc-port->json input-format input-port)
(let ((pandoc (string->symbol (car (pandoc-command-line)))))
(read-json (run/port (,pandoc --from ,input-format --to json)
(= 0 input-port)))))
(define (call-with-binary-input-file filename proc) (define (call-with-binary-input-file filename proc)
(let ((port (open-input-file filename #:binary))) (let ((port (open-input-file filename #:binary)))
@ -40,6 +50,4 @@
(lambda () (proc port)) (lambda () (proc port))
(lambda () (close-input-port port))))) (lambda () (close-input-port port)))))
(define pandoc-command-line (make-parameter (list "pandoc")))
(include "pandoc.r5rs.scm")) (include "pandoc.r5rs.scm"))

19
pandoc.cli.chicken.scm Normal file
View File

@ -0,0 +1,19 @@
(module (pandoc cli)
(#;export
pandoc-cli)
(import (scheme)
(chicken base)
(only (medea) read-json)
(only (scheme base) utf8->string)
(only (scsh-process) run/port))
(define (pandoc-cli #!optional command-name)
(let ((command-name (string->symbol (or command-name "pandoc"))))
(lambda (input-format bytevectors)
(map (lambda (bytevector)
(read-json
(run/port (,command-name --from ,input-format --to json)
(<< ,(utf8->string bytevector)))))
bytevectors)))))

View File

@ -20,6 +20,9 @@
(components (components
(extension pandoc (extension pandoc
(source "pandoc.chicken.scm")) (source "pandoc.chicken.scm"))
(extension pandoc.cli
(source "pandoc.cli.chicken.scm")
(component-dependencies pandoc))
(extension pandoc.server (extension pandoc.server
(source "pandoc.server.chicken.scm") (source "pandoc.server.chicken.scm")
(component-dependencies pandoc)) (component-dependencies pandoc))

View File

@ -116,13 +116,63 @@
(assert-supported-version) (assert-supported-version)
(convert-many (vector->list (cdr (assq 'blocks json))))) (convert-many (vector->list (cdr (assq 'blocks json)))))
(define (pandoc-port->sxml input-format input-port) ;;
(pandoc-json->sxml (pandoc-port->json input-format input-port)))
(define (pandoc-file->json input-format input-filename) (define (pandoc-bytevectors->json pandoc input-format bytevectors)
(call-with-binary-input-file (pandoc input-format bytevectors))
input-filename
(lambda (input-port) (pandoc-port->json input-format input-port))))
(define (pandoc-file->sxml input-format input-filename) (define (pandoc-strings->json pandoc input-format strings)
(pandoc-json->sxml (pandoc-file->json input-format input-filename))) (pandoc-bytevectors->json
pandoc input-format
(map string->utf8 strings)))
(define (pandoc-files->json pandoc input-format filenames)
(pandoc-bytevectors->json
pandoc input-format
(map (lambda (filename)
(call-with-binary-input-file filename read-bytevector-all))
filenames)))
;;
(define (pandoc-bytevector->json pandoc input-format bytevector)
(car (pandoc-bytevectors->json pandoc input-format (list bytevector))))
(define (pandoc-string->json pandoc input-format string)
(car (pandoc-strings->json pandoc input-format (list string))))
(define (pandoc-file->json pandoc input-format filename)
(car (pandoc-files->json pandoc input-format (list filename))))
;;
(define (pandoc-bytevectors->sxml pandoc input-format bytevectors)
(map pandoc-json->sxml
(pandoc-bytevectors->json pandoc input-format bytevectors)))
(define (pandoc-strings->sxml pandoc input-format strings)
(map pandoc-json->sxml
(pandoc-strings->json pandoc input-format strings)))
(define (pandoc-files->sxml pandoc input-format filenames)
(map pandoc-json->sxml
(pandoc-files->json pandoc input-format filenames)))
;;
(define (pandoc-bytevector->sxml pandoc input-format bytevector)
(car (pandoc-bytevectors->sxml pandoc input-format (list bytevector))))
(define (pandoc-string->sxml pandoc input-format string)
(car (pandoc-strings->sxml pandoc input-format (list string))))
(define (pandoc-file->sxml pandoc input-format filename)
(car (pandoc-files->sxml pandoc input-format (list filename))))
;;
(define (pandoc-port->json pandoc input-format port)
(pandoc-bytevector->json pandoc input-format (read-bytevector-all port)))
(define (pandoc-port->sxml pandoc input-format port)
(pandoc-bytevector->sxml pandoc input-format (read-bytevector-all port)))

View File

@ -6,3 +6,4 @@
(repo git "git://github.com/lassik/scheme-{egg-name}.git") (repo git "git://github.com/lassik/scheme-{egg-name}.git")
(uri targz "https://github.com/lassik/scheme-{egg-name}/tarball/{egg-release}") (uri targz "https://github.com/lassik/scheme-{egg-name}/tarball/{egg-release}")
(release "0.1") (release "0.1")
(release "0.2")

View File

@ -1,65 +1,44 @@
(module (pandoc server) (module (pandoc server)
(pandoc-server-base-url (#;export
pandoc-server-strings->json pandoc-server)
pandoc-server-strings->sxml
pandoc-server-files->json
pandoc-server-files->sxml)
(import (scheme) (import (scheme)
(chicken base) (chicken base)
(cjson) (cjson)
(only (chicken port) with-input-from-string) (only (scheme base) utf8->string)
(only (chicken io) read-string) (only (chicken io) read-string)
(only (http-client) with-input-from-request) (only (http-client) with-input-from-request)
(only (intarweb) headers make-request) (only (intarweb) headers make-request)
(only (medea) read-json write-json) (only (medea) write-json)
(only (uri-common) uri-reference) (only (uri-common) uri-reference))
(only (pandoc) pandoc-json->sxml))
(define pandoc-server-base-url (define (pandoc-server base-url)
(make-parameter "http://localhost:8080/")) (lambda (input-format bytevectors)
(with-input-from-request
(define (pandoc-server-strings->json input-format input-strings) (make-request
(with-input-from-request method: 'POST
(make-request uri: (uri-reference (string-append base-url "convert-batch"))
method: 'POST headers: (headers '((content-type "application/json")
uri: (uri-reference (string-append (pandoc-server-base-url) (accept "application/json"))))
"convert-batch")) (lambda ()
headers: (headers '((content-type "application/json") (let ((input-format (symbol->string input-format)))
(accept "application/json")))) (write-json
(lambda () (list->vector
(let ((input-format (symbol->string input-format))) (map (lambda (bytevector)
(write-json (list (cons 'from input-format)
(list->vector (cons 'to "json")
(map (lambda (input-string) (cons 'text (utf8->string bytevector))))
(list (cons 'from input-format) bytevectors)))))
(cons 'to "json") (lambda ()
(cons 'text input-string))) (let ((array (string->cjson (read-string))))
input-strings))))) (unless (eq? cjson/array (cjson-type array))
(lambda () (error "Got unexpected JSON from pandoc-server"))
(let ((array (string->cjson (read-string)))) (let loop ((i (- (cjson-array-size array) 1)) (results '()))
(unless (eq? cjson/array (cjson-type array)) (if (< i 0) results
(error "Got unexpected JSON from pandoc-server")) (loop (- i 1)
(let loop ((i (- (cjson-array-size array) 1)) (results '())) (cons (cjson-schemify
(if (< i 0) results (string->cjson
(loop (- i 1) (cjson-schemify
(cons (cjson-schemify (cjson-array-ref array i))))
(string->cjson results))))))))))
(cjson-schemify
(cjson-array-ref array i))))
results))))))))
(define (pandoc-server-files->json input-format input-filenames)
(pandoc-server-strings->json
input-format
(map (lambda (filename) (with-input-from-file filename read-string))
input-filenames)))
(define (pandoc-server-strings->sxml input-format input-strings)
(map pandoc-json->sxml
(pandoc-server-strings->json input-format input-strings)))
(define (pandoc-server-files->sxml input-format input-filenames)
(map pandoc-json->sxml
(pandoc-server-files->json input-format input-filenames))))

View File

@ -1,21 +1,45 @@
(define-library (pandoc) (define-library (pandoc)
(export pandoc-command-line (export
pandoc-port->json
pandoc-port->sxml pandoc-json->sxml
pandoc-file->json
pandoc-file->sxml pandoc-port->json
pandoc-json->sxml) pandoc-port->sxml
pandoc-file->json
pandoc-file->sxml
pandoc-files->json
pandoc-files->sxml
pandoc-bytevector->json
pandoc-bytevector->sxml
pandoc-bytevectors->json
pandoc-bytevectors->sxml
pandoc-string->json
pandoc-string->sxml
pandoc-strings->json
pandoc-strings->sxml)
(import (scheme base) (import (scheme base)
(scheme file) (scheme file)
(scheme write)) (scheme write))
(cond-expand (cond-expand
(gauche (import (only (srfi 180) json-read) (gauche (import (only (srfi 180) json-read)
(only (gauche base) copy-port) (only (gauche base) copy-port)
(only (gauche process) call-with-process-io)))) (only (gauche process) call-with-process-io))))
(begin
(define pandoc-command-line (make-parameter (list "pandoc")))
(define (call-with-binary-input-file filename proc)
(call-with-port (open-binary-input-file filename) proc)))
(cond-expand (cond-expand
(gauche (include "pandoc.gauche.scm"))) (gauche (include "pandoc.gauche.scm")))
(begin (define inexact->exact exact)
(define (call-with-binary-input-file filename proc)
(call-with-port (open-binary-input-file filename) proc))
(define (read-bytevector-all binary-input-port)
(let loop ((whole (bytevector)))
(let ((part (read-bytevector 1000)))
(if (eof-object? part) whole
(loop (bytevector-append whole part)))))))
(include "pandoc.r5rs.scm")) (include "pandoc.r5rs.scm"))

View File

@ -1,44 +1,35 @@
(module (pandoc tar) (module (pandoc tar)
(pandoc-tar-command (#;export
pandoc-tar-strings->json pandoc-tar)
pandoc-tar-strings->sxml
pandoc-tar-files->json
pandoc-tar-files->sxml)
(import (scheme) (import (scheme)
(chicken base) (chicken base)
(cjson) (cjson)
(srfi 4) ;;(srfi 4)
(scheme base) (only (scheme base)
bytevector
bytevector-append
bytevector-length
bytevector-u8-ref
eof-object
make-bytevector
read-bytevector
string->utf8
truncate-remainder
utf8->string
write-bytevector)
(only (chicken io) read-byte read-string write-byte write-string) (only (chicken io) read-byte read-string write-byte write-string)
(only (chicken port) (only (chicken port)
copy-port with-input-from-string with-output-to-string) copy-port with-input-from-string with-output-to-string)
(only (chicken process) process process-wait)
(only (scsh-process) run/port run/string) (only (scsh-process) run/port run/string)
(only (pandoc) pandoc-json->sxml)) (only (pandoc) pandoc-json->sxml))
;; (define (eof-object) #!eof)
;; (define (string->utf8 str) str)
;; (define (utf8->string str) str)
;; (define bytevector string)
;; (define (bytevector-append bvs)
;; (let ((target (make-u8vector
;; (define bytevector-length u8vector-length)
;; (define (bytevector-u8-ref s i) (char->integer (string-ref s i)))
;; (define make-bytevector make-u8vector)
;; (define read-bytevector read-string)
;; (define truncate-remainder remainder)
;; (define write-bytevector write-string)
(define (generator->list generator) ; SRFI 158 (define (generator->list generator) ; SRFI 158
(let loop ((list '())) (let loop ((list '()))
(let ((elem (generator))) (let ((elem (generator)))
(if (eof-object? elem) (reverse list) (loop (cons elem list)))))) (if (eof-object? elem) (reverse list) (loop (cons elem list))))))
(define pandoc-tar-command
(make-parameter "pandoc-tar"))
(define (bytevector-every? predicate bytes) (define (bytevector-every? predicate bytes)
(let loop ((i 0)) (let loop ((i 0))
(or (= i (bytevector-length bytes)) (or (= i (bytevector-length bytes))
@ -138,36 +129,16 @@
(tar-write-file filename (car inputs)) (tar-write-file filename (car inputs))
(loop (+ i 1) (cdr inputs)))))) (loop (+ i 1) (cdr inputs))))))
(define (pandoc-tar-bytevectors->json input-format input-bytevectors) (define (pandoc-tar #!optional command-name)
(let* ((pandoc-tar (let ((command-name (string->symbol (or command-name "pandoc-tar"))))
(string->symbol (pandoc-tar-command))) (lambda (input-format bytevectors)
(stdin (let* ((stdin
(with-output-to-string (with-output-to-string
(lambda () (lambda () (write-all-to-tar bytevectors))))
(write-all-to-tar input-bytevectors)))) (stdout
(stdout (run/string (,command-name --from ,input-format --to json)
(run/string (,pandoc-tar --from ,input-format --to json) (<< ,stdin))))
(<< ,stdin)))) (map (lambda (bytes)
(map (lambda (bytes) (cjson-schemify (string->cjson (utf8->string bytes))))
(cjson-schemify (string->cjson (utf8->string bytes)))) (with-input-from-string stdout
(with-input-from-string stdout (lambda () (generator->list tar-read-file)))))))))
(lambda () (generator->list tar-read-file))))))
(define (pandoc-tar-strings->json input-format input-strings)
(pandoc-tar-bytevectors->json
input-format
(map string->utf8 input-strings)))
(define (pandoc-tar-files->json input-format input-filenames)
(pandoc-tar-strings->json
input-format
(map (lambda (filename) (with-input-from-file filename read-string))
input-filenames)))
(define (pandoc-tar-strings->sxml input-format input-strings)
(map pandoc-json->sxml
(pandoc-tar-strings->json input-format input-strings)))
(define (pandoc-tar-files->sxml input-format input-filenames)
(map pandoc-json->sxml
(pandoc-tar-files->json input-format input-filenames))))