rfc.mime
- MIME message handling ¶This module provides utility procedures to handle Multipurpose Internet Mail Extensions (MIME) messages, defined in RFC2045 thorough RFC2049. Provided APIs include procedures to parse or compose MIME-specific header fields, and parse or compose MIME-encoded message bodies.
This module mainly focuses on providing low-level building-block procedures,
on top of which application-specific modules are to be built.
For example, rfc.http
uses this module to compose
multipart/form-data
message for the body of POST requests
(see rfc.http
- HTTP client).
This module is supposed to be used with rfc.822
module
(see rfc.822
- RFC822 message parsing).
A few utility procedures to parse and generate MIME-specific header fields.
{rfc.mime
}
If field is a valid header field for MIME-Version, returns
its major and minor versions in a list. Otherwise, returns #f
.
It is allowed to pass #f
to field, so that
you can directly pass the result of rfc822-header-ref
to it.
Given parsed header list by rfc822-read-headers
, you can
get mime version (currently, it should be (1 0)
) by the
following code.
(mime-parse-version (rfc822-header-ref headers "mime-version"))
Note: simple regexp such as #/\d+\.\d+/
doesn’t do this job,
for field may contain comments between tokens.
{rfc.mime
}
Parses the "content-type" header field, and returns a list such as:
(type subtype (attribute . value) ...)
where type and subtype are MIME media type and subtype in a string, respectively
(mime-parse-content-type "text/html; charset=iso-2022-jp") ⇒ ("text" "html" ("charset" . "iso-2022-jp"))
If field is not a valid content-type field, #f
is
returned.
{rfc.mime
}
Parses Content-disposition header field as specified in RFC2183.
(mime-parse-content-disposition "attachment; filename=genome.jpeg;\
modification-date=\"Wed, 12 Feb 1997 16:29:51 -0500\";")
⇒ ("attachment"
("filename" . "genome.jpeg")
("modification-date" . "Wed, 12 Feb 1997 16:29:51 -0500"))
{rfc.mime
}
These are low-level utility procedures to parse and compose parameter
part of header fields (as appeared in RFC2045 Section 5.1 etc).
Mime-parse-parameters
reads the parameter
part of the header body from an input port iport, and
returns an assoc list of the parameter names and values.
Conversely, mime-compose-parameters
takes an assoc
list of names and values, compose parameter part and
emit it to oport. When omitted, the current input
port and the current output port are used for iport
and oport, respectively. You can pass #f
to
oport and mime-compose-parameters
returns the
result in a string instead of emitting it to a port.
(call-with-input-string "; name=foo; filename=\"foo/bar/baz\"" mime-parse-parameters) ⇒ (("name" . "foo") ("filename" . "foo/bar/baz")) (mime-compose-parameters '(("name" . "foo") ("filename" . "foo/bar/baz")) #f) ⇒ "; name=foo; filename=\"foo/bar/baz\""
Mime-compose-parameters
tries to insert folding line breaks
between parameters to avoid the header line becomes too long.
You can pass the beginning column position of the parameter
part via start-column argument.
We plan to make these procedures handle RFC2231’s parameter value extension transparently in future.
{rfc.mime
}
Decodes RFC2047-encoded word. If word isn’t an encoded word,
it is returned as is.
Note that this procedure decodes only if the entire word is
an “encoded word” defined in RFC2047. If you are dealing with
a field that may contain multiple encoded word and/or unencoded parts,
use mime-decode-text
below.
(mime-decode-word "=?iso-8859-1?q?this=20is=20some=20text?=") ⇒ "this is some text"
{rfc.mime
}
Returns a string in which
all encoded words contained within text are decoded.
This procedure can deal with a header field body that may contain
mixture of non-encoded and encoded parts, and/or multiple encoded
parts. One of such header field is the Subject field of email.
(mime-decode-text "This is =?US-ASCII?q?some=20text?=") ⇒ "This is some text"
Care should be taken if you apply this procedure to a “structured”
header field body (see RFC2822 section 2.2.2).
The proper way of parsing a structured header field body is
to tokenize it first, then to decode each word using mime-decode-word
.
since the decoded text may contain characters that affects the tokenization.
(However, if you can just show the header field in human readable way
for informational purposes, you may just use mime-decode-text
on entire header field for the convenience).
{rfc.mime
}
Encodes word in the RFC2047 format. The keyword
argument charset specifies the character encoding scheme
in string or symbol.
whose default is utf-8
. If charset is other
than utf-8
and word is a complete string,
the procedure converts the character encoding to charset,
then performs transfer encoding.
(mime-encode-word "this is some text") ⇒ "=?utf-8?B?dGhpcyBpcyBzb21lIHRleHQ=?="
The keyword argument transfer-encoding specifies how
the octets are encoded to transfer-safe characters. You can
give a symbol b
, B
or base64
for Base64,
and Q
, q
, quoted-printable
for Quoted-printable
transfer encodings. An error is raised if you pass values other
than those. The default is Base64 encoding.
This procedure does not consider the length of the resulting
encoded word, which RFC2047 recommends to be less than 75 octets.
Use mime-encode-text
below to conform the line length limit.
(Note: In most Gauche procedures, a keyword argument encoding
is used to specify character encodings. In this context we have
two encodings, however, and to avoid the confusion we chose to use
the terms “charset” and “transfer-encoding” that appear in
RFC documents.)
{rfc.mime
}
Encode text in RFC2047 format if necessary, and considering
line folding if the result gets too long.
The keyword arguments charset and transfer-encoding are the same
as mime-encode-word
.
If the text only consists of printable ASCII characters, no encoding is done, and only line folding is considered. However, if a true value is given to the force argument, even ASCII-only text is encoded.
The line-width specifies the maximum line width of the result.
Its default is 76.
If the encoded word gets too long, it is splitted to multiple encoded
words and CR LF SPC sequence (“folding white space” defined in RFC2822)
are inserted inbetween.
You can suppress this behavior by passing #f
or 0
to
line-width.
Since encoded word needs some overhead characters, it doesn’t make much sense
to specify small value to line-width
. Current implementation
rejects line-width
smaller than 30.
The start-column keyword argument can be used to
shorten the first of folded
lines to make room for header field name. For example, if
you want to encode the body of a Subject header field,
you can pass the value of (string-length "Subject: ")
so that
the encoded result can directly concatenated after the header
field name. The default value is 0.
This procedure is not designed to encode parts of structured header fields, which have further restrictions such as which parts can be encoded and where the folding white spaces can be inserted. The robust way is to encode some parts first, then construct a structured header fields, considering line folding.
The streaming parser is designed so that you can decide how to do with the message body before the entire message is read.
{rfc.mime
}
The fundamental streaming parser. Port is an input port
from where the message is read. Headers is a list of headers
parsed by rfc822-read-headers
; that is, this procedure
is supposed to be called after the header part of the message
is parsed from port:
(let* ((headers (rfc822-read-headers port))) (if (mime-parse-version (rfc822-header-ref headers "mime-version")) ;; parse MIME message (mime-parse-message port headers handler) ;; retrieve a non-MIME body ...))
Mime-parse-message
analyzes headers, and calls
handler on each message body with two arguments:
(handler part-info xport)
Part-Info is a <mime-part>
structure described below
that encapsulates the information of this part of the message.
Xport is an input port, initially points to the beginning
of the body of message. The handler can read from the port
as if it is reading from the original port. However,
xport recognizes MIME boundary internally, and returns EOF
when it reaches the end of the part.
(Do not read from the original port directly, or it will mess up
the internal state of vport).
Handler can read the part into the memory, or save it to the disk, or even discard the part. Whatever it does, it has to read from vport until it returns EOF.
The return value of handler will be set in
the content
slot of part-info.
If the message has nested multipart messages, handler is
called for each "leaf" part, in depth-first order. Handler
can know its nesting level by examining part-info structure.
The message doesn’t need to be a multipart type; if it is a
MIME message
type, handler is called on the body
of enclosed message. If it is other media types such as text
or application
, handler is called on the (only) message body.
{rfc.mime
}
A structure that encloses metainformation about a MIME part.
It is constructed when the header of the part is read, and
passed to the handler that reads the body of the part.
It has the following slots:
<mime-part>
: type ¶MIME media type string. If content-type
header is omitted
to the part, an appropriate default value is set.
<mime-part>
: subtype ¶MIME media subtype string. If content-type
header is omitted
to the part, an appropriate default value is set.
<mime-part>
: parameters ¶Associative list of parameters given to content-type
header field.
<mime-part>
: transfer-encoding ¶The value of content-transfer-encoding
header field.
If the header field is omitted, an appropriate default value is set.
<mime-part>
: headers ¶The list of header fields, as parsed by rfc822-read-headers
.
<mime-part>
: parent ¶If this is a part of multipart message or encapsulated message,
points to the enclosing part’s <mime-part>
structure.
Otherwise #f
.
<mime-part>
: index ¶Sequence number of this part within the same parent.
<mime-part>
: content ¶If this part is multipart/* or message/* media type, this slot contains a list of parts within it. Otherwise, the return value of handler is stored.
<mime-part>
: source ¶This slot is only used when composing a MIME message.
The caller can set this slot a name of the file to be inserted
into this part, instead of setting the entire content of the
file to the content
slot. See
mime-compose-message
below for the more details.
{rfc.mime
}
A procedure to retrieve message body. It is intended to
to be a building block of handler to be passed to
mime-parse-message
.
Part-info is a <mime-part>
object.
Xport is an input port passed to the handler,
from which the MIME part can be read.
This procedure read from xport
until it returns EOF. It also looks at the
transfer-encoding
of part-info, and decodes
the body accordingly; that is, base64 encoding and
quoted-printable encoding is handled. The result is
written out to an output port outp.
This procedure does not handle charset conversion.
The caller must use CES conversion port as outp
(see gauche.charconv
- Character Code Conversion) if desired.
A couple of convenience procedures are defined for typical
cases on top of mime-retrieve-body
.
{rfc.mime
}
Reads in the body of mime message, decoding transfer encoding,
and returns it as a string or writes it to a file, respectively.
The simplest form of MIME message parser would be like this:
(let ((headers (rfc822-read-headers port))) (mime-parse-message port headers (cut mime-body->string <> <>)))
This reads all the message on memory (i.e. the "leaf" <mime-part>
objects’ content
field would hold the part’s body as a string),
and returns the top <mime-part>
object. Content transfer encoding
is recognized and handled, but character set conversion isn’t done.
You may want to feed the message body to a file directly, or even want to skip some body according to mime media types and/or other header information. Then you can put the logic in the handler closure. That’s the reason that this module provides building blocks, instead of all-in-one procedure.
{rfc.mime
}
Composes a MIME multipart message. Mime-compose-message
emits the result to an output port port, whose default
is the current output port. Mime-compose-message-string
makes the result into a string. You can give a boundary string
via boundary argument; when omitted, a fresh boundary string
is automatically generated by mime-make-boundary
below.
Mime-compose-message
returns the boundary string.
Mime-compose-message-string
returns two values, the result
string and the boundary string.
The content of the message is provided by the parts argument,
which can be a list of instances of <mime-part>
(see above)
or lists that describe parts. The list form is supported for
the caller’s convenience, and internally it is converted to
a list of <mime-part>
s.
The syntax of each part element in parts are defined as follow.
<part> : <mime-part> | <mime-part-desc> <mime-part> : an instance of the class <mime-part> <mime-part-desc> : (<content-type> (<header> ...) <body>) <content-type> : (<type> <subtype> <header-param> ...) <header-param> : (<key> . <value>) ... <header> : (<header-name> <encoded-header-value>) | (<header-name> (<header-value> <header-param> ...)) <body> : a string | (file <filename>) | (subparts <part> ...)
Note: In the first form of <header>
,
<encoded-header-value>
must already be encoded using RFC2047
or RFC2231 if the original value contains non-ascii characters.
In the second form, we plan to do RFC2231 encoding on behalf of
the caller; but the current version does not implement it. The
caller should not pass encoded words in this form, since it may
result double-encoding when we implement the auto encoding feature;
for the time being, the second form restricts ASCII-only values.
If <body>
is a string, it is used as the part’s content.
If <body>
is (file filename)
, the content is
read from the named file. If <body>
is
(subparts part …)
, the part becomes nested
MIME part.
It is the caller’s responsibility to match the content type and the content.
For example, if <body>
is in the third form, the
part must have multipart
content type.
The caller needs to provide proper content-transfer-encoding
header, depending on the application. If none is given, the content
is inserted into the message as is, which may be appropriate for
some applications, but if you want to use the result in email
message you certainly want to encode binary part with base64,
for example.
{rfc.mime
}
Returns a unique string that can be used as a boundary of a MIME multipart
message.