rfc.822- RFC822 message parsing
Defines a set of functions that parses and constructs the “Internet Message Format”, a text format used to exchange e-mails. The most recent specification can be found in RFC2822 (RFC2822). The format was originally defined in RFC 822, and people still call it “RFC822 format”, hence I named this module. In the following document, I also refer to the format as “RFC822 format”.
Reads RFC822 format message from an input port iport, until it reaches the end of the message header. The header fields are broken into a list of the following format:
((name body) …)
Name … are the field names, and body … are the corresponding field body, both as strings. Field names are converted to lower-case characters. Field bodies are not modified, except the folded line is unfolded. The order of fields are preserved.
By default, the parser works permissively. If EOF is encountered during parsing header, it is taken as the end of the message. And if a line that doesn’t consist neither continuing (folded) line nor start a new header field, it is simply ignored. You can change this behavior by giving true value to the keyword argument strict?; then the parser raises an error for such a malformed header.
The keyword argument reader takes a procedure that reads
a line from iport. Its default is
should be enough for most cases.
This is an old name of
rfc822-read-headers. This is kept
for the backward compatibility. The new code should use
An utility procedure to get a specific field from the parsed
header list, which is returned by
Field-name specifies the field name in a lowercase string.
If the field with given name is in header-list, the procedure
returns its value in a string. Otherwise, if default is given,
it is returned, and if not,
#f is returned.
This procedure can actually be used not only for the result of
rfc822-read-headers, but for retrieving a value keyed
by strings in a list-of-list structure:
((name value option ...) ...).
For example, the return value of
can be passed to
(see HTTP cookie handling, for
(rfc822-header-ref '(("from" "email@example.com") ("to" "firstname.lastname@example.org")) "from") ⇒ "email@example.com" ;; If no entry matches, #f is returned by default (rfc822-header-ref '(("from" "firstname.lastname@example.org") ("to" "email@example.com")) "reply-to") ⇒ #f ;; You can give the default value for no-match case (rfc822-header-ref '(("from" "firstname.lastname@example.org") ("to" "email@example.com")) "reply-to" 'none) ⇒ none ;; By giving the default value, you can distinguish ;; the no-match case and there's actually an entry with value #f. (rfc822-header-ref '(("from" "firstname.lastname@example.org") ("reply-to" #f)) "reply-to" 'none) ⇒ #f
Several procedures are provided to parse "structured" header fields
of RFC2822 messages. These procedures deal with the body of
a header field, i.e. if the header field is
To: Wandering Schemer <email@example.com>",
they parse "
Wandering Schemer <firstname.lastname@example.org>".
Most of procedures take an input port. Usually you first parse
the entire header fields by
obtain the body of the header by
then open an input string port for the body and use those
procedures to parse them.
The reason for this complexity is because you need different tokenization schemes depending on the type of the field. Rfc2822 also allows comments to appear between tokens for most cases, so a simple-minded regexp won’t do the job, since rfc2822 comment can be nested and can’t be represented by regular grammar. So, this layer of procedures are designed flexible enough to handle various syntaxes. For the standard header types, high-level parsers are also provided; see "specific field parsers" below.
A basic tokenizer. First it skips whitespaces and/or
CFWS) from iport, if any. Then
reads one token according to tokenizer-specs. If iport
reaches EOF before any token is read, EOF is returned.
Tokenizer-specs is a list of tokenizer spec, which is either a char-set or a cons of a char-set and a procedure.
CFWS, the procedure peeks a character
at the head of iport, and checks it
against the char-sets in tokenizer-specs one by one.
If a char-set that contains the character belongs to is found,
then a token is retrieved as follows:
If the tokenizer spec is just a char-set, a sequence of characters
that belong to the char-set consists a token.
If it is a cons, the procedure is called with iport to
read a token.
If the head character doesn’t match any char-sets, the character is taken from iport and returned.
The default tokenizer-specs is as follows:
(list (cons #["] rfc822-quoted-string) (cons *rfc822-atext-chars* rfc822-dot-atom))
are tokenizer procedures described below, and
is bound to a char-set of
atext specified in rfc2822.
rfc822-next-token retrieves a token
dot-atom specified in rfc2822
Using tokenizer-specs, you can customize how the header
field is parsed. For example, if you want to retrieve a token
that is either (1) a word constructed by alphabetic characters, or
(2) a quoted string, then you can call
(rfc822-next-token iport `(#[[:alpha:]] (#["] . ,rfc822-quoted-string)))
A convenience procedure. Creates an input string port for
a field body field, and calls
repeatedly on it until it consumes all input, then returns
a list of tokens. Tokenizer-specs is passed to
A utility procedure that consumes any comments and/or whitespace characters from iport, and returns the head character that is neither a whitespece nor a comment. The returned character remains in iport.
Bound to a char-set that is a valid constituent of
Bound to the default tokenizer-specs.
respectively. The double-quotes and escaping backslashes within
quoted-string are removed by
Takes RFC-822 type date string, and returns eight values:
year, month, day-of-month, hour, minutes, seconds, timezone, day-of-week.
Timezone is an offset from UT in minutes. Day-of-week is a day from sunday, and may be #f if that information is not available. Month is an integer between 1 and 12, inclusive. If the string is not parsable, all the elements are #f.
Parses RFC822 type date format and returns SRFI-19
(see SRFI-19 Date). If string can’t be parsed,
To construct rfc822 date string from SRFI-19 date, you can use
This is a sort of inverse function of
It receives a list of header data, in which each header data
(<name> <body>), and writes them out in RFC822 header
field format to the output port specified by the output keyword
argument. The default output is the current output port.
By default, the procedure assumes headers contains all the header fields, and adds an empty line in the end of output to indicate the end of the header. You can pass a true value to the continue keyword argument to prevent this, enabling more headers can be added later.
I said “a sort of” above. That’s because this function doesn’t (and can’t) do the exact inverse. Specifically, the caller is responsible for line folding and make sure each header line doesn’t exceed the “hard limit” defined by RFC2822 (998 octets). This procedure cannot do the line folding on behalf of the caller, because the places where line folding is possible depend on the semantics of each header field.
It is also the caller’s responsibility to make sure header
field bodies don’t have any characters except non-NUL US-ASCII
characters. If you want to include characters outside of that
range, you should convert them in the way allowed by the
protocol, e.g. MIME. The
(see MIME message handling) provides a convenience procedure
mime-encode-text for such purpose.
Again, this procedure cannot do the encoding automatically,
since the way the field
should be encoded depends on header fields.
What this procedure can do is to check and report such violations. By default, it runs several checks and signals an error if it finds any violations of RFC2822. You can control this checking behavior by the check keyword argument. It can take one of the following values:
Default. Signals an error if a violation is found.
Doesn’t perform any check. Trust the caller.
rfc822-write-headers finds a violation, the procedure
is called with three arguments; the header field name,
the header field body, and the type of violation explained below.
The procedure may correct the problem and return two values,
the corrected header field name and body. The returned values
are checked again. If the procedure returns the
header field name and body unchanged, an error is signaled
in the same way as
:error is specified.
The third argument passed to the procedure given to the check argument is one of the following symbols. New symbols may be added in future versions for more checks.
Incomplete string is passed.
Header field contains characters outside of US-ASCII or NUL.
Line length exceeds 998 octet limit.
The string contains CR and/or LF character that doesn’t consist of proper line folding.
<date> object (see SRFI-19 Date)
and returns a string of its rfc822 date representation.
This is a reverse operation of