rfc.822
- RFC822 message parsing ¶Defines a set of functions that parses and constructs the “Internet Message Format”, a text format used to exchange e-mails. The most recent specification can be found in RFC5322. The format was originally defined in RFC 822, and people still call it “RFC822 format”, hence I named this module. In the following document, I also refer to the format as “RFC822 format”.
{rfc.822
}
Reads RFC822 format message from an input port iport,
until it reaches the end of the message header.
The header fields are broken into a list of the following
format:
((name body) ...)
Name … are the field names, and body … are the corresponding field body, both as strings. Field names are converted to lower-case characters. Field bodies are not modified, except the folded line is unfolded. The order of fields are preserved.
By default, the parser works permissively. If EOF is encountered during parsing header, it is taken as the end of the message. And if a line that doesn’t consist neither continuing (folded) line nor start a new header field, it is simply ignored. You can change this behavior by giving true value to the keyword argument strict?; then the parser raises an error for such a malformed header.
The keyword argument reader takes a procedure that reads
a line from iport. Its default is read-line
, which
should be enough for most cases.
{rfc.822
}
Deprecated.
This is an old name of rfc822-read-headers
. This is kept
for the backward compatibility. The new code should use
rfc822-read-headers
instead.
{rfc.822
}
An utility procedure to get a specific field from the parsed
header list, which is returned by rfc822-read-headers
.
Field-name specifies the field name in a lowercase string.
If the field with given name is in header-list, the procedure
returns its value in a string. Otherwise, if default is given,
it is returned, and if not, #f
is returned.
If there are more than one header with the same field name, value of
the first one is returned. To get all values of multiple header fields,
use rfc822-header-ref*
below.
This procedure can actually be used not only for the result of
rfc822-read-headers
, but for retrieving a value keyed
by strings in a list-of-list structure: ((name value option ...) ...)
.
For example, the return value of parse-cookie-string
can be passed to rfc-822-header-ref
(see rfc.cookie
- HTTP cookie handling, for parse-cookie-string
).
(rfc822-header-ref '(("from" "foo@example.com") ("to" "bar@example.com")) "from") ⇒ "foo@example.com" ;; If no entry matches, #f is returned by default (rfc822-header-ref '(("from" "foo@example.com") ("to" "bar@example.com")) "reply-to") ⇒ #f ;; You can give the default value for no-match case (rfc822-header-ref '(("from" "foo@example.com") ("to" "bar@example.com")) "reply-to" 'none) ⇒ none ;; By giving the default value, you can distinguish ;; the no-match case and there's actually an entry with value #f. (rfc822-header-ref '(("from" "foo@example.com") ("reply-to" #f)) "reply-to" 'none) ⇒ #f
Like rfc822-header-ref
, looks up header entries in
header-list with the name field-name, however,
this procedure returns all values of matching headers
in a list. If there’s no matching headers, an empty list
is returned.
Returns an rfc822 header list which is the same as header-list except that a header with field-name and field-value is added. Field-name is converted to lowercase letters. If header-list already contains headers with field-name, such headers are excluded from the output. The header-list won’t be modified.
Several procedures are provided to parse "structured" header fields
of RFC2822 messages. These procedures deal with the body of
a header field, i.e. if the header field is
"To: Wandering Schemer <schemer@example.com>
",
they parse "Wandering Schemer <schemer@example.com>
".
Most of procedures take an input port. Usually you first parse
the entire header fields by rfc822-read-headers
,
obtain the body of the header by rfc822-header-ref
,
then open an input string port for the body and use those
procedures to parse them.
The reason for this complexity is because you need different tokenization schemes depending on the type of the field. Rfc2822 also allows comments to appear between tokens for most cases, so a simple-minded regexp won’t do the job, since rfc2822 comment can be nested and can’t be represented by regular grammar. So, this layer of procedures are designed flexible enough to handle various syntaxes. For the standard header types, high-level parsers are also provided; see "specific field parsers" below.
{rfc.822
}
A basic tokenizer. First it skips whitespaces and/or
comments (CFWS
) from iport, if any. Then
reads one token according to tokenizer-specs. If iport
reaches EOF before any token is read, EOF is returned.
Tokenizer-specs is a list of tokenizer spec, which is either a char-set or a cons of a char-set and a procedure.
After skipping CFWS
, the procedure peeks a character
at the head of iport, and checks it
against the char-sets in tokenizer-specs one by one.
If a char-set that contains the character belongs to is found,
then a token is retrieved as follows:
If the tokenizer spec is just a char-set, a sequence of characters
that belong to the char-set consists a token.
If it is a cons, the procedure is called with iport to
read a token.
If the head character doesn’t match any char-sets, the character is taken from iport and returned.
The default tokenizer-specs is as follows:
(list (cons #["] rfc822-quoted-string) (cons *rfc822-atext-chars* rfc822-dot-atom))
Where rfc822-quoted-string
and rfc822-dot-atom
are tokenizer procedures described below, and *rfc822-atext-chars*
is bound to a char-set of atext
specified in rfc2822.
This means rfc822-next-token
retrieves a token
either quoted-string
or dot-atom
specified in rfc2822
by default.
Using tokenizer-specs, you can customize how the header
field is parsed. For example, if you want to retrieve a token
that is either (1) a word constructed by alphabetic characters, or
(2) a quoted string, then you can call rfc822-next-token
by this:
(rfc822-next-token iport `(#[[:alpha:]] (#["] . ,rfc822-quoted-string)))
{rfc.822
}
A convenience procedure. Creates an input string port for
a field body field, and calls rfc822-next-token
repeatedly on it until it consumes all input, then returns
a list of tokens. Tokenizer-specs is passed to
rfc822-next-token
.
{rfc.822
}
A utility procedure that consumes any comments and/or whitespace
characters from iport, and returns the head character
that is neither a whitespace nor a comment. The returned character
remains in iport.
{rfc.822
}
Bound to a char-set that is a valid constituent of atom
.
{rfc.822
}
Bound to the default tokenizer-specs.
{rfc.822
}
Tokenizers for atom
, dot-atom
and quoted-string
,
respectively. The double-quotes and escaping backslashes within
quoted-string
are removed by rfc822-quoted-string
.
{rfc.822
}
Takes RFC-822 type date string, and returns eight values:
year, month, day-of-month, hour, minutes, seconds, timezone, day-of-week.
Timezone is an offset from UT in minutes. Day-of-week is a day from sunday, and may be #f if that information is not available. Month is an integer between 1 and 12, inclusive. If the string is not parsable, all the elements are #f.
{rfc.822
}
Parses RFC822 type date format and returns SRFI-19 <date>
object
(see Date). If string can’t be parsed,
returns #f
instead.
To construct rfc822 date string from SRFI-19 date, you can use
date->rfc822-date
below.
{rfc.822
}
This is a sort of inverse function of rfc822-read-headers
.
It receives a list of header data, in which each header data
consists of (<name> <body>)
, and writes them out in RFC822 header
field format to the output port specified by the output keyword
argument. The default output is the current output port.
By default, the procedure assumes headers contains all the header fields, and adds an empty line in the end of output to indicate the end of the header. You can pass a true value to the continue keyword argument to prevent this, enabling more headers can be added later.
I said “a sort of” above. That’s because this function doesn’t
(and can’t) do the exact inverse.
Specifically, the caller is responsible for line folding and
make sure each header line doesn’t exceed the “hard limit” defined
by RFC2822 (998 octets). If the line length of header data exceeds
that, the caller should insert newline (\r\n
) and one or
more whitespaces as needed.
This procedure cannot do the line
folding on behalf of the caller, because the places where
line folding is possible depend on the semantics of each
header field.
It is also the caller’s responsibility to make sure header
field bodies don’t have any characters except non-NUL US-ASCII
characters. If you want to include characters outside of that
range, you should convert them in the way allowed by the
protocol, e.g. MIME. The rfc.mime
module
(see rfc.mime
- MIME message handling) provides a convenience procedure
mime-encode-text
for such purpose.
Again, this procedure cannot do the encoding automatically,
since the way the field
should be encoded depends on header fields.
What this procedure can do is to check and report such violations. By default, it runs several checks and signals an error if it finds any violations of RFC2822. You can control this checking behavior by the check keyword argument. It can take one of the following values:
:error
Default. Signals an error if a violation is found.
#f, :ignore
Doesn’t perform any check. Trust the caller.
procedure
When rfc822-write-headers
finds a violation, the procedure
is called with three arguments; the header field name,
the header field body, and the type of violation explained below.
The procedure may correct the problem and return two values,
the corrected header field name and body. The returned values
are checked again. If the procedure returns the
header field name and body unchanged, an error is signaled
in the same way as :error
is specified.
The third argument passed to the procedure given to the check argument is one of the following symbols. New symbols may be added in future versions for more checks.
incomplete-string
Incomplete string is passed.
bad-character
Header field contains characters outside of US-ASCII or NUL.
line-too-long
Line length exceeds 998 octet limit.
stray-crlf
The string contains CR and/or LF character that doesn’t consist of proper line folding.