1740 lines
71 KiB
Plaintext
1740 lines
71 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Network Working Group N. Freed
|
|||
|
Request for Comments: 2045 Innosoft
|
|||
|
Obsoletes: 1521, 1522, 1590 N. Borenstein
|
|||
|
Category: Standards Track First Virtual
|
|||
|
November 1996
|
|||
|
|
|||
|
|
|||
|
Multipurpose Internet Mail Extensions
|
|||
|
(MIME) Part One:
|
|||
|
Format of Internet Message Bodies
|
|||
|
|
|||
|
Status of this Memo
|
|||
|
|
|||
|
This document specifies an Internet standards track protocol for the
|
|||
|
Internet community, and requests discussion and suggestions for
|
|||
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
STD 11, RFC 822, defines a message representation protocol specifying
|
|||
|
considerable detail about US-ASCII message headers, and leaves the
|
|||
|
message content, or message body, as flat US-ASCII text. This set of
|
|||
|
documents, collectively called the Multipurpose Internet Mail
|
|||
|
Extensions, or MIME, redefines the format of messages to allow for
|
|||
|
|
|||
|
(1) textual message bodies in character sets other than
|
|||
|
US-ASCII,
|
|||
|
|
|||
|
(2) an extensible set of different formats for non-textual
|
|||
|
message bodies,
|
|||
|
|
|||
|
(3) multi-part message bodies, and
|
|||
|
|
|||
|
(4) textual header information in character sets other than
|
|||
|
US-ASCII.
|
|||
|
|
|||
|
These documents are based on earlier work documented in RFC 934, STD
|
|||
|
11, and RFC 1049, but extends and revises them. Because RFC 822 said
|
|||
|
so little about message bodies, these documents are largely
|
|||
|
orthogonal to (rather than a revision of) RFC 822.
|
|||
|
|
|||
|
This initial document specifies the various headers used to describe
|
|||
|
the structure of MIME messages. The second document, RFC 2046,
|
|||
|
defines the general structure of the MIME media typing system and
|
|||
|
defines an initial set of media types. The third document, RFC 2047,
|
|||
|
describes extensions to RFC 822 to allow non-US-ASCII text data in
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 1]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Internet mail header fields. The fourth document, RFC 2048, specifies
|
|||
|
various IANA registration procedures for MIME-related facilities. The
|
|||
|
fifth and final document, RFC 2049, describes MIME conformance
|
|||
|
criteria as well as providing some illustrative examples of MIME
|
|||
|
message formats, acknowledgements, and the bibliography.
|
|||
|
|
|||
|
These documents are revisions of RFCs 1521, 1522, and 1590, which
|
|||
|
themselves were revisions of RFCs 1341 and 1342. An appendix in RFC
|
|||
|
2049 describes differences and changes from previous versions.
|
|||
|
|
|||
|
Table of Contents
|
|||
|
|
|||
|
1. Introduction ......................................... 3
|
|||
|
2. Definitions, Conventions, and Generic BNF Grammar .... 5
|
|||
|
2.1 CRLF ................................................ 5
|
|||
|
2.2 Character Set ....................................... 6
|
|||
|
2.3 Message ............................................. 6
|
|||
|
2.4 Entity .............................................. 6
|
|||
|
2.5 Body Part ........................................... 7
|
|||
|
2.6 Body ................................................ 7
|
|||
|
2.7 7bit Data ........................................... 7
|
|||
|
2.8 8bit Data ........................................... 7
|
|||
|
2.9 Binary Data ......................................... 7
|
|||
|
2.10 Lines .............................................. 7
|
|||
|
3. MIME Header Fields ................................... 8
|
|||
|
4. MIME-Version Header Field ............................ 8
|
|||
|
5. Content-Type Header Field ............................ 10
|
|||
|
5.1 Syntax of the Content-Type Header Field ............. 12
|
|||
|
5.2 Content-Type Defaults ............................... 14
|
|||
|
6. Content-Transfer-Encoding Header Field ............... 14
|
|||
|
6.1 Content-Transfer-Encoding Syntax .................... 14
|
|||
|
6.2 Content-Transfer-Encodings Semantics ................ 15
|
|||
|
6.3 New Content-Transfer-Encodings ...................... 16
|
|||
|
6.4 Interpretation and Use .............................. 16
|
|||
|
6.5 Translating Encodings ............................... 18
|
|||
|
6.6 Canonical Encoding Model ............................ 19
|
|||
|
6.7 Quoted-Printable Content-Transfer-Encoding .......... 19
|
|||
|
6.8 Base64 Content-Transfer-Encoding .................... 24
|
|||
|
7. Content-ID Header Field .............................. 26
|
|||
|
8. Content-Description Header Field ..................... 27
|
|||
|
9. Additional MIME Header Fields ........................ 27
|
|||
|
10. Summary ............................................. 27
|
|||
|
11. Security Considerations ............................. 27
|
|||
|
12. Authors' Addresses .................................. 28
|
|||
|
A. Collected Grammar .................................... 29
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 2]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
Since its publication in 1982, RFC 822 has defined the standard
|
|||
|
format of textual mail messages on the Internet. Its success has
|
|||
|
been such that the RFC 822 format has been adopted, wholly or
|
|||
|
partially, well beyond the confines of the Internet and the Internet
|
|||
|
SMTP transport defined by RFC 821. As the format has seen wider use,
|
|||
|
a number of limitations have proven increasingly restrictive for the
|
|||
|
user community.
|
|||
|
|
|||
|
RFC 822 was intended to specify a format for text messages. As such,
|
|||
|
non-text messages, such as multimedia messages that might include
|
|||
|
audio or images, are simply not mentioned. Even in the case of text,
|
|||
|
however, RFC 822 is inadequate for the needs of mail users whose
|
|||
|
languages require the use of character sets richer than US-ASCII.
|
|||
|
Since RFC 822 does not specify mechanisms for mail containing audio,
|
|||
|
video, Asian language text, or even text in most European languages,
|
|||
|
additional specifications are needed.
|
|||
|
|
|||
|
One of the notable limitations of RFC 821/822 based mail systems is
|
|||
|
the fact that they limit the contents of electronic mail messages to
|
|||
|
relatively short lines (e.g. 1000 characters or less [RFC-821]) of
|
|||
|
7bit US-ASCII. This forces users to convert any non-textual data
|
|||
|
that they may wish to send into seven-bit bytes representable as
|
|||
|
printable US-ASCII characters before invoking a local mail UA (User
|
|||
|
Agent, a program with which human users send and receive mail).
|
|||
|
Examples of such encodings currently used in the Internet include
|
|||
|
pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in
|
|||
|
RFC 1421, the Andrew Toolkit Representation [ATK], and many others.
|
|||
|
|
|||
|
The limitations of RFC 822 mail become even more apparent as gateways
|
|||
|
are designed to allow for the exchange of mail messages between RFC
|
|||
|
822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the
|
|||
|
inclusion of non-textual material within electronic mail messages.
|
|||
|
The current standards for the mapping of X.400 messages to RFC 822
|
|||
|
messages specify either that X.400 non-textual material must be
|
|||
|
converted to (not encoded in) IA5Text format, or that they must be
|
|||
|
discarded, notifying the RFC 822 user that discarding has occurred.
|
|||
|
This is clearly undesirable, as information that a user may wish to
|
|||
|
receive is lost. Even though a user agent may not have the
|
|||
|
capability of dealing with the non-textual material, the user might
|
|||
|
have some mechanism external to the UA that can extract useful
|
|||
|
information from the material. Moreover, it does not allow for the
|
|||
|
fact that the message may eventually be gatewayed back into an X.400
|
|||
|
message handling system (i.e., the X.400 message is "tunneled"
|
|||
|
through Internet mail), where the non-textual information would
|
|||
|
definitely become useful again.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 3]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
This document describes several mechanisms that combine to solve most
|
|||
|
of these problems without introducing any serious incompatibilities
|
|||
|
with the existing world of RFC 822 mail. In particular, it
|
|||
|
describes:
|
|||
|
|
|||
|
(1) A MIME-Version header field, which uses a version
|
|||
|
number to declare a message to be conformant with MIME
|
|||
|
and allows mail processing agents to distinguish
|
|||
|
between such messages and those generated by older or
|
|||
|
non-conformant software, which are presumed to lack
|
|||
|
such a field.
|
|||
|
|
|||
|
(2) A Content-Type header field, generalized from RFC 1049,
|
|||
|
which can be used to specify the media type and subtype
|
|||
|
of data in the body of a message and to fully specify
|
|||
|
the native representation (canonical form) of such
|
|||
|
data.
|
|||
|
|
|||
|
(3) A Content-Transfer-Encoding header field, which can be
|
|||
|
used to specify both the encoding transformation that
|
|||
|
was applied to the body and the domain of the result.
|
|||
|
Encoding transformations other than the identity
|
|||
|
transformation are usually applied to data in order to
|
|||
|
allow it to pass through mail transport mechanisms
|
|||
|
which may have data or character set limitations.
|
|||
|
|
|||
|
(4) Two additional header fields that can be used to
|
|||
|
further describe the data in a body, the Content-ID and
|
|||
|
Content-Description header fields.
|
|||
|
|
|||
|
All of the header fields defined in this document are subject to the
|
|||
|
general syntactic rules for header fields specified in RFC 822. In
|
|||
|
particular, all of these header fields except for Content-Disposition
|
|||
|
can include RFC 822 comments, which have no semantic content and
|
|||
|
should be ignored during MIME processing.
|
|||
|
|
|||
|
Finally, to specify and promote interoperability, RFC 2049 provides a
|
|||
|
basic applicability statement for a subset of the above mechanisms
|
|||
|
that defines a minimal level of "conformance" with this document.
|
|||
|
|
|||
|
HISTORICAL NOTE: Several of the mechanisms described in this set of
|
|||
|
documents may seem somewhat strange or even baroque at first reading.
|
|||
|
It is important to note that compatibility with existing standards
|
|||
|
AND robustness across existing practice were two of the highest
|
|||
|
priorities of the working group that developed this set of documents.
|
|||
|
In particular, compatibility was always favored over elegance.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 4]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Please refer to the current edition of the "Internet Official
|
|||
|
Protocol Standards" for the standardization state and status of this
|
|||
|
protocol. RFC 822 and STD 3, RFC 1123 also provide essential
|
|||
|
background for MIME since no conforming implementation of MIME can
|
|||
|
violate them. In addition, several other informational RFC documents
|
|||
|
will be of interest to the MIME implementor, in particular RFC 1344,
|
|||
|
RFC 1345, and RFC 1524.
|
|||
|
|
|||
|
2. Definitions, Conventions, and Generic BNF Grammar
|
|||
|
|
|||
|
Although the mechanisms specified in this set of documents are all
|
|||
|
described in prose, most are also described formally in the augmented
|
|||
|
BNF notation of RFC 822. Implementors will need to be familiar with
|
|||
|
this notation in order to understand this set of documents, and are
|
|||
|
referred to RFC 822 for a complete explanation of the augmented BNF
|
|||
|
notation.
|
|||
|
|
|||
|
Some of the augmented BNF in this set of documents makes named
|
|||
|
references to syntax rules defined in RFC 822. A complete formal
|
|||
|
grammar, then, is obtained by combining the collected grammar
|
|||
|
appendices in each document in this set with the BNF of RFC 822 plus
|
|||
|
the modifications to RFC 822 defined in RFC 1123 (which specifically
|
|||
|
changes the syntax for `return', `date' and `mailbox').
|
|||
|
|
|||
|
All numeric and octet values are given in decimal notation in this
|
|||
|
set of documents. All media type values, subtype values, and
|
|||
|
parameter names as defined are case-insensitive. However, parameter
|
|||
|
values are case-sensitive unless otherwise specified for the specific
|
|||
|
parameter.
|
|||
|
|
|||
|
FORMATTING NOTE: Notes, such at this one, provide additional
|
|||
|
nonessential information which may be skipped by the reader without
|
|||
|
missing anything essential. The primary purpose of these non-
|
|||
|
essential notes is to convey information about the rationale of this
|
|||
|
set of documents, or to place these documents in the proper
|
|||
|
historical or evolutionary context. Such information may in
|
|||
|
particular be skipped by those who are focused entirely on building a
|
|||
|
conformant implementation, but may be of use to those who wish to
|
|||
|
understand why certain design choices were made.
|
|||
|
|
|||
|
2.1. CRLF
|
|||
|
|
|||
|
The term CRLF, in this set of documents, refers to the sequence of
|
|||
|
octets corresponding to the two US-ASCII characters CR (decimal value
|
|||
|
13) and LF (decimal value 10) which, taken together, in this order,
|
|||
|
denote a line break in RFC 822 mail.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 5]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
2.2. Character Set
|
|||
|
|
|||
|
The term "character set" is used in MIME to refer to a method of
|
|||
|
converting a sequence of octets into a sequence of characters. Note
|
|||
|
that unconditional and unambiguous conversion in the other direction
|
|||
|
is not required, in that not all characters may be representable by a
|
|||
|
given character set and a character set may provide more than one
|
|||
|
sequence of octets to represent a particular sequence of characters.
|
|||
|
|
|||
|
This definition is intended to allow various kinds of character
|
|||
|
encodings, from simple single-table mappings such as US-ASCII to
|
|||
|
complex table switching methods such as those that use ISO 2022's
|
|||
|
techniques, to be used as character sets. However, the definition
|
|||
|
associated with a MIME character set name must fully specify the
|
|||
|
mapping to be performed. In particular, use of external profiling
|
|||
|
information to determine the exact mapping is not permitted.
|
|||
|
|
|||
|
NOTE: The term "character set" was originally to describe such
|
|||
|
straightforward schemes as US-ASCII and ISO-8859-1 which have a
|
|||
|
simple one-to-one mapping from single octets to single characters.
|
|||
|
Multi-octet coded character sets and switching techniques make the
|
|||
|
situation more complex. For example, some communities use the term
|
|||
|
"character encoding" for what MIME calls a "character set", while
|
|||
|
using the phrase "coded character set" to denote an abstract mapping
|
|||
|
from integers (not octets) to characters.
|
|||
|
|
|||
|
2.3. Message
|
|||
|
|
|||
|
The term "message", when not further qualified, means either a
|
|||
|
(complete or "top-level") RFC 822 message being transferred on a
|
|||
|
network, or a message encapsulated in a body of type "message/rfc822"
|
|||
|
or "message/partial".
|
|||
|
|
|||
|
2.4. Entity
|
|||
|
|
|||
|
The term "entity", refers specifically to the MIME-defined header
|
|||
|
fields and contents of either a message or one of the parts in the
|
|||
|
body of a multipart entity. The specification of such entities is
|
|||
|
the essence of MIME. Since the contents of an entity are often
|
|||
|
called the "body", it makes sense to speak about the body of an
|
|||
|
entity. Any sort of field may be present in the header of an entity,
|
|||
|
but only those fields whose names begin with "content-" actually have
|
|||
|
any MIME-related meaning. Note that this does NOT imply thay they
|
|||
|
have no meaning at all -- an entity that is also a message has non-
|
|||
|
MIME header fields whose meanings are defined by RFC 822.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 6]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
2.5. Body Part
|
|||
|
|
|||
|
The term "body part" refers to an entity inside of a multipart
|
|||
|
entity.
|
|||
|
|
|||
|
2.6. Body
|
|||
|
|
|||
|
The term "body", when not further qualified, means the body of an
|
|||
|
entity, that is, the body of either a message or of a body part.
|
|||
|
|
|||
|
NOTE: The previous four definitions are clearly circular. This is
|
|||
|
unavoidable, since the overall structure of a MIME message is indeed
|
|||
|
recursive.
|
|||
|
|
|||
|
2.7. 7bit Data
|
|||
|
|
|||
|
"7bit data" refers to data that is all represented as relatively
|
|||
|
short lines with 998 octets or less between CRLF line separation
|
|||
|
sequences [RFC-821]. No octets with decimal values greater than 127
|
|||
|
are allowed and neither are NULs (octets with decimal value 0). CR
|
|||
|
(decimal value 13) and LF (decimal value 10) octets only occur as
|
|||
|
part of CRLF line separation sequences.
|
|||
|
|
|||
|
2.8. 8bit Data
|
|||
|
|
|||
|
"8bit data" refers to data that is all represented as relatively
|
|||
|
short lines with 998 octets or less between CRLF line separation
|
|||
|
sequences [RFC-821]), but octets with decimal values greater than 127
|
|||
|
may be used. As with "7bit data" CR and LF octets only occur as part
|
|||
|
of CRLF line separation sequences and no NULs are allowed.
|
|||
|
|
|||
|
2.9. Binary Data
|
|||
|
|
|||
|
"Binary data" refers to data where any sequence of octets whatsoever
|
|||
|
is allowed.
|
|||
|
|
|||
|
2.10. Lines
|
|||
|
|
|||
|
"Lines" are defined as sequences of octets separated by a CRLF
|
|||
|
sequences. This is consistent with both RFC 821 and RFC 822.
|
|||
|
"Lines" only refers to a unit of data in a message, which may or may
|
|||
|
not correspond to something that is actually displayed by a user
|
|||
|
agent.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 7]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
3. MIME Header Fields
|
|||
|
|
|||
|
MIME defines a number of new RFC 822 header fields that are used to
|
|||
|
describe the content of a MIME entity. These header fields occur in
|
|||
|
at least two contexts:
|
|||
|
|
|||
|
(1) As part of a regular RFC 822 message header.
|
|||
|
|
|||
|
(2) In a MIME body part header within a multipart
|
|||
|
construct.
|
|||
|
|
|||
|
The formal definition of these header fields is as follows:
|
|||
|
|
|||
|
entity-headers := [ content CRLF ]
|
|||
|
[ encoding CRLF ]
|
|||
|
[ id CRLF ]
|
|||
|
[ description CRLF ]
|
|||
|
*( MIME-extension-field CRLF )
|
|||
|
|
|||
|
MIME-message-headers := entity-headers
|
|||
|
fields
|
|||
|
version CRLF
|
|||
|
; The ordering of the header
|
|||
|
; fields implied by this BNF
|
|||
|
; definition should be ignored.
|
|||
|
|
|||
|
MIME-part-headers := entity-headers
|
|||
|
[ fields ]
|
|||
|
; Any field not beginning with
|
|||
|
; "content-" can have no defined
|
|||
|
; meaning and may be ignored.
|
|||
|
; The ordering of the header
|
|||
|
; fields implied by this BNF
|
|||
|
; definition should be ignored.
|
|||
|
|
|||
|
The syntax of the various specific MIME header fields will be
|
|||
|
described in the following sections.
|
|||
|
|
|||
|
4. MIME-Version Header Field
|
|||
|
|
|||
|
Since RFC 822 was published in 1982, there has really been only one
|
|||
|
format standard for Internet messages, and there has been little
|
|||
|
perceived need to declare the format standard in use. This document
|
|||
|
is an independent specification that complements RFC 822. Although
|
|||
|
the extensions in this document have been defined in such a way as to
|
|||
|
be compatible with RFC 822, there are still circumstances in which it
|
|||
|
might be desirable for a mail-processing agent to know whether a
|
|||
|
message was composed with the new standard in mind.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 8]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Therefore, this document defines a new header field, "MIME-Version",
|
|||
|
which is to be used to declare the version of the Internet message
|
|||
|
body format standard in use.
|
|||
|
|
|||
|
Messages composed in accordance with this document MUST include such
|
|||
|
a header field, with the following verbatim text:
|
|||
|
|
|||
|
MIME-Version: 1.0
|
|||
|
|
|||
|
The presence of this header field is an assertion that the message
|
|||
|
has been composed in compliance with this document.
|
|||
|
|
|||
|
Since it is possible that a future document might extend the message
|
|||
|
format standard again, a formal BNF is given for the content of the
|
|||
|
MIME-Version field:
|
|||
|
|
|||
|
version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
|
|||
|
|
|||
|
Thus, future format specifiers, which might replace or extend "1.0",
|
|||
|
are constrained to be two integer fields, separated by a period. If
|
|||
|
a message is received with a MIME-version value other than "1.0", it
|
|||
|
cannot be assumed to conform with this document.
|
|||
|
|
|||
|
Note that the MIME-Version header field is required at the top level
|
|||
|
of a message. It is not required for each body part of a multipart
|
|||
|
entity. It is required for the embedded headers of a body of type
|
|||
|
"message/rfc822" or "message/partial" if and only if the embedded
|
|||
|
message is itself claimed to be MIME-conformant.
|
|||
|
|
|||
|
It is not possible to fully specify how a mail reader that conforms
|
|||
|
with MIME as defined in this document should treat a message that
|
|||
|
might arrive in the future with some value of MIME-Version other than
|
|||
|
"1.0".
|
|||
|
|
|||
|
It is also worth noting that version control for specific media types
|
|||
|
is not accomplished using the MIME-Version mechanism. In particular,
|
|||
|
some formats (such as application/postscript) have version numbering
|
|||
|
conventions that are internal to the media format. Where such
|
|||
|
conventions exist, MIME does nothing to supersede them. Where no
|
|||
|
such conventions exist, a MIME media type might use a "version"
|
|||
|
parameter in the content-type field if necessary.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 9]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822
|
|||
|
comment strings that are present must be ignored. In particular, the
|
|||
|
following four MIME-Version fields are equivalent:
|
|||
|
|
|||
|
MIME-Version: 1.0
|
|||
|
|
|||
|
MIME-Version: 1.0 (produced by MetaSend Vx.x)
|
|||
|
|
|||
|
MIME-Version: (produced by MetaSend Vx.x) 1.0
|
|||
|
|
|||
|
MIME-Version: 1.(produced by MetaSend Vx.x)0
|
|||
|
|
|||
|
In the absence of a MIME-Version field, a receiving mail user agent
|
|||
|
(whether conforming to MIME requirements or not) may optionally
|
|||
|
choose to interpret the body of the message according to local
|
|||
|
conventions. Many such conventions are currently in use and it
|
|||
|
should be noted that in practice non-MIME messages can contain just
|
|||
|
about anything.
|
|||
|
|
|||
|
It is impossible to be certain that a non-MIME mail message is
|
|||
|
actually plain text in the US-ASCII character set since it might well
|
|||
|
be a message that, using some set of nonstandard local conventions
|
|||
|
that predate MIME, includes text in another character set or non-
|
|||
|
textual data presented in a manner that cannot be automatically
|
|||
|
recognized (e.g., a uuencoded compressed UNIX tar file).
|
|||
|
|
|||
|
5. Content-Type Header Field
|
|||
|
|
|||
|
The purpose of the Content-Type field is to describe the data
|
|||
|
contained in the body fully enough that the receiving user agent can
|
|||
|
pick an appropriate agent or mechanism to present the data to the
|
|||
|
user, or otherwise deal with the data in an appropriate manner. The
|
|||
|
value in this field is called a media type.
|
|||
|
|
|||
|
HISTORICAL NOTE: The Content-Type header field was first defined in
|
|||
|
RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one
|
|||
|
that is largely compatible with the mechanism given here.
|
|||
|
|
|||
|
The Content-Type header field specifies the nature of the data in the
|
|||
|
body of an entity by giving media type and subtype identifiers, and
|
|||
|
by providing auxiliary information that may be required for certain
|
|||
|
media types. After the media type and subtype names, the remainder
|
|||
|
of the header field is simply a set of parameters, specified in an
|
|||
|
attribute=value notation. The ordering of parameters is not
|
|||
|
significant.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 10]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
In general, the top-level media type is used to declare the general
|
|||
|
type of data, while the subtype specifies a specific format for that
|
|||
|
type of data. Thus, a media type of "image/xyz" is enough to tell a
|
|||
|
user agent that the data is an image, even if the user agent has no
|
|||
|
knowledge of the specific image format "xyz". Such information can
|
|||
|
be used, for example, to decide whether or not to show a user the raw
|
|||
|
data from an unrecognized subtype -- such an action might be
|
|||
|
reasonable for unrecognized subtypes of text, but not for
|
|||
|
unrecognized subtypes of image or audio. For this reason, registered
|
|||
|
subtypes of text, image, audio, and video should not contain embedded
|
|||
|
information that is really of a different type. Such compound
|
|||
|
formats should be represented using the "multipart" or "application"
|
|||
|
types.
|
|||
|
|
|||
|
Parameters are modifiers of the media subtype, and as such do not
|
|||
|
fundamentally affect the nature of the content. The set of
|
|||
|
meaningful parameters depends on the media type and subtype. Most
|
|||
|
parameters are associated with a single specific subtype. However, a
|
|||
|
given top-level media type may define parameters which are applicable
|
|||
|
to any subtype of that type. Parameters may be required by their
|
|||
|
defining content type or subtype or they may be optional. MIME
|
|||
|
implementations must ignore any parameters whose names they do not
|
|||
|
recognize.
|
|||
|
|
|||
|
For example, the "charset" parameter is applicable to any subtype of
|
|||
|
"text", while the "boundary" parameter is required for any subtype of
|
|||
|
the "multipart" media type.
|
|||
|
|
|||
|
There are NO globally-meaningful parameters that apply to all media
|
|||
|
types. Truly global mechanisms are best addressed, in the MIME
|
|||
|
model, by the definition of additional Content-* header fields.
|
|||
|
|
|||
|
An initial set of seven top-level media types is defined in RFC 2046.
|
|||
|
Five of these are discrete types whose content is essentially opaque
|
|||
|
as far as MIME processing is concerned. The remaining two are
|
|||
|
composite types whose contents require additional handling by MIME
|
|||
|
processors.
|
|||
|
|
|||
|
This set of top-level media types is intended to be substantially
|
|||
|
complete. It is expected that additions to the larger set of
|
|||
|
supported types can generally be accomplished by the creation of new
|
|||
|
subtypes of these initial types. In the future, more top-level types
|
|||
|
may be defined only by a standards-track extension to this standard.
|
|||
|
If another top-level type is to be used for any reason, it must be
|
|||
|
given a name starting with "X-" to indicate its non-standard status
|
|||
|
and to avoid a potential conflict with a future official name.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 11]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
5.1. Syntax of the Content-Type Header Field
|
|||
|
|
|||
|
In the Augmented BNF notation of RFC 822, a Content-Type header field
|
|||
|
value is defined as follows:
|
|||
|
|
|||
|
content := "Content-Type" ":" type "/" subtype
|
|||
|
*(";" parameter)
|
|||
|
; Matching of media type and subtype
|
|||
|
; is ALWAYS case-insensitive.
|
|||
|
|
|||
|
type := discrete-type / composite-type
|
|||
|
|
|||
|
discrete-type := "text" / "image" / "audio" / "video" /
|
|||
|
"application" / extension-token
|
|||
|
|
|||
|
composite-type := "message" / "multipart" / extension-token
|
|||
|
|
|||
|
extension-token := ietf-token / x-token
|
|||
|
|
|||
|
ietf-token := <An extension token defined by a
|
|||
|
standards-track RFC and registered
|
|||
|
with IANA.>
|
|||
|
|
|||
|
x-token := <The two characters "X-" or "x-" followed, with
|
|||
|
no intervening white space, by any token>
|
|||
|
|
|||
|
subtype := extension-token / iana-token
|
|||
|
|
|||
|
iana-token := <A publicly-defined extension token. Tokens
|
|||
|
of this form must be registered with IANA
|
|||
|
as specified in RFC 2048.>
|
|||
|
|
|||
|
parameter := attribute "=" value
|
|||
|
|
|||
|
attribute := token
|
|||
|
; Matching of attributes
|
|||
|
; is ALWAYS case-insensitive.
|
|||
|
|
|||
|
value := token / quoted-string
|
|||
|
|
|||
|
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
|
|||
|
or tspecials>
|
|||
|
|
|||
|
tspecials := "(" / ")" / "<" / ">" / "@" /
|
|||
|
"," / ";" / ":" / "\" / <">
|
|||
|
"/" / "[" / "]" / "?" / "="
|
|||
|
; Must be in quoted-string,
|
|||
|
; to use within parameter values
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 12]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Note that the definition of "tspecials" is the same as the RFC 822
|
|||
|
definition of "specials" with the addition of the three characters
|
|||
|
"/", "?", and "=", and the removal of ".".
|
|||
|
|
|||
|
Note also that a subtype specification is MANDATORY -- it may not be
|
|||
|
omitted from a Content-Type header field. As such, there are no
|
|||
|
default subtypes.
|
|||
|
|
|||
|
The type, subtype, and parameter names are not case sensitive. For
|
|||
|
example, TEXT, Text, and TeXt are all equivalent top-level media
|
|||
|
types. Parameter values are normally case sensitive, but sometimes
|
|||
|
are interpreted in a case-insensitive fashion, depending on the
|
|||
|
intended use. (For example, multipart boundaries are case-sensitive,
|
|||
|
but the "access-type" parameter for message/External-body is not
|
|||
|
case-sensitive.)
|
|||
|
|
|||
|
Note that the value of a quoted string parameter does not include the
|
|||
|
quotes. That is, the quotation marks in a quoted-string are not a
|
|||
|
part of the value of the parameter, but are merely used to delimit
|
|||
|
that parameter value. In addition, comments are allowed in
|
|||
|
accordance with RFC 822 rules for structured header fields. Thus the
|
|||
|
following two forms
|
|||
|
|
|||
|
Content-type: text/plain; charset=us-ascii (Plain text)
|
|||
|
|
|||
|
Content-type: text/plain; charset="us-ascii"
|
|||
|
|
|||
|
are completely equivalent.
|
|||
|
|
|||
|
Beyond this syntax, the only syntactic constraint on the definition
|
|||
|
of subtype names is the desire that their uses must not conflict.
|
|||
|
That is, it would be undesirable to have two different communities
|
|||
|
using "Content-Type: application/foobar" to mean two different
|
|||
|
things. The process of defining new media subtypes, then, is not
|
|||
|
intended to be a mechanism for imposing restrictions, but simply a
|
|||
|
mechanism for publicizing their definition and usage. There are,
|
|||
|
therefore, two acceptable mechanisms for defining new media subtypes:
|
|||
|
|
|||
|
(1) Private values (starting with "X-") may be defined
|
|||
|
bilaterally between two cooperating agents without
|
|||
|
outside registration or standardization. Such values
|
|||
|
cannot be registered or standardized.
|
|||
|
|
|||
|
(2) New standard values should be registered with IANA as
|
|||
|
described in RFC 2048.
|
|||
|
|
|||
|
The second document in this set, RFC 2046, defines the initial set of
|
|||
|
media types for MIME.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 13]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
5.2. Content-Type Defaults
|
|||
|
|
|||
|
Default RFC 822 messages without a MIME Content-Type header are taken
|
|||
|
by this protocol to be plain text in the US-ASCII character set,
|
|||
|
which can be explicitly specified as:
|
|||
|
|
|||
|
Content-type: text/plain; charset=us-ascii
|
|||
|
|
|||
|
This default is assumed if no Content-Type header field is specified.
|
|||
|
It is also recommend that this default be assumed when a
|
|||
|
syntactically invalid Content-Type header field is encountered. In
|
|||
|
the presence of a MIME-Version header field and the absence of any
|
|||
|
Content-Type header field, a receiving User Agent can also assume
|
|||
|
that plain US-ASCII text was the sender's intent. Plain US-ASCII
|
|||
|
text may still be assumed in the absence of a MIME-Version or the
|
|||
|
presence of an syntactically invalid Content-Type header field, but
|
|||
|
the sender's intent might have been otherwise.
|
|||
|
|
|||
|
6. Content-Transfer-Encoding Header Field
|
|||
|
|
|||
|
Many media types which could be usefully transported via email are
|
|||
|
represented, in their "natural" format, as 8bit character or binary
|
|||
|
data. Such data cannot be transmitted over some transfer protocols.
|
|||
|
For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII
|
|||
|
data with lines no longer than 1000 characters including any trailing
|
|||
|
CRLF line separator.
|
|||
|
|
|||
|
It is necessary, therefore, to define a standard mechanism for
|
|||
|
encoding such data into a 7bit short line format. Proper labelling
|
|||
|
of unencoded material in less restrictive formats for direct use over
|
|||
|
less restrictive transports is also desireable. This document
|
|||
|
specifies that such encodings will be indicated by a new "Content-
|
|||
|
Transfer-Encoding" header field. This field has not been defined by
|
|||
|
any previous standard.
|
|||
|
|
|||
|
6.1. Content-Transfer-Encoding Syntax
|
|||
|
|
|||
|
The Content-Transfer-Encoding field's value is a single token
|
|||
|
specifying the type of encoding, as enumerated below. Formally:
|
|||
|
|
|||
|
encoding := "Content-Transfer-Encoding" ":" mechanism
|
|||
|
|
|||
|
mechanism := "7bit" / "8bit" / "binary" /
|
|||
|
"quoted-printable" / "base64" /
|
|||
|
ietf-token / x-token
|
|||
|
|
|||
|
These values are not case sensitive -- Base64 and BASE64 and bAsE64
|
|||
|
are all equivalent. An encoding type of 7BIT requires that the body
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 14]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
is already in a 7bit mail-ready representation. This is the default
|
|||
|
value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
|
|||
|
Content-Transfer-Encoding header field is not present.
|
|||
|
|
|||
|
6.2. Content-Transfer-Encodings Semantics
|
|||
|
|
|||
|
This single Content-Transfer-Encoding token actually provides two
|
|||
|
pieces of information. It specifies what sort of encoding
|
|||
|
transformation the body was subjected to and hence what decoding
|
|||
|
operation must be used to restore it to its original form, and it
|
|||
|
specifies what the domain of the result is.
|
|||
|
|
|||
|
The transformation part of any Content-Transfer-Encodings specifies,
|
|||
|
either explicitly or implicitly, a single, well-defined decoding
|
|||
|
algorithm, which for any sequence of encoded octets either transforms
|
|||
|
it to the original sequence of octets which was encoded, or shows
|
|||
|
that it is illegal as an encoded sequence. Content-Transfer-
|
|||
|
Encodings transformations never depend on any additional external
|
|||
|
profile information for proper operation. Note that while decoders
|
|||
|
must produce a single, well-defined output for a valid encoding no
|
|||
|
such restrictions exist for encoders: Encoding a given sequence of
|
|||
|
octets to different, equivalent encoded sequences is perfectly legal.
|
|||
|
|
|||
|
Three transformations are currently defined: identity, the "quoted-
|
|||
|
printable" encoding, and the "base64" encoding. The domains are
|
|||
|
"binary", "8bit" and "7bit".
|
|||
|
|
|||
|
The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
|
|||
|
mean that the identity (i.e. NO) encoding transformation has been
|
|||
|
performed. As such, they serve simply as indicators of the domain of
|
|||
|
the body data, and provide useful information about the sort of
|
|||
|
encoding that might be needed for transmission in a given transport
|
|||
|
system. The terms "7bit data", "8bit data", and "binary data" are
|
|||
|
all defined in Section 2.
|
|||
|
|
|||
|
The quoted-printable and base64 encodings transform their input from
|
|||
|
an arbitrary domain into material in the "7bit" range, thus making it
|
|||
|
safe to carry over restricted transports. The specific definition of
|
|||
|
the transformations are given below.
|
|||
|
|
|||
|
The proper Content-Transfer-Encoding label must always be used.
|
|||
|
Labelling unencoded data containing 8bit characters as "7bit" is not
|
|||
|
allowed, nor is labelling unencoded non-line-oriented data as
|
|||
|
anything other than "binary" allowed.
|
|||
|
|
|||
|
Unlike media subtypes, a proliferation of Content-Transfer-Encoding
|
|||
|
values is both undesirable and unnecessary. However, establishing
|
|||
|
only a single transformation into the "7bit" domain does not seem
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 15]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
possible. There is a tradeoff between the desire for a compact and
|
|||
|
efficient encoding of largely- binary data and the desire for a
|
|||
|
somewhat readable encoding of data that is mostly, but not entirely,
|
|||
|
7bit. For this reason, at least two encoding mechanisms are
|
|||
|
necessary: a more or less readable encoding (quoted-printable) and a
|
|||
|
"dense" or "uniform" encoding (base64).
|
|||
|
|
|||
|
Mail transport for unencoded 8bit data is defined in RFC 1652. As of
|
|||
|
the initial publication of this document, there are no standardized
|
|||
|
Internet mail transports for which it is legitimate to include
|
|||
|
unencoded binary data in mail bodies. Thus there are no
|
|||
|
circumstances in which the "binary" Content-Transfer-Encoding is
|
|||
|
actually valid in Internet mail. However, in the event that binary
|
|||
|
mail transport becomes a reality in Internet mail, or when MIME is
|
|||
|
used in conjunction with any other binary-capable mail transport
|
|||
|
mechanism, binary bodies must be labelled as such using this
|
|||
|
mechanism.
|
|||
|
|
|||
|
NOTE: The five values defined for the Content-Transfer-Encoding field
|
|||
|
imply nothing about the media type other than the algorithm by which
|
|||
|
it was encoded or the transport system requirements if unencoded.
|
|||
|
|
|||
|
6.3. New Content-Transfer-Encodings
|
|||
|
|
|||
|
Implementors may, if necessary, define private Content-Transfer-
|
|||
|
Encoding values, but must use an x-token, which is a name prefixed by
|
|||
|
"X-", to indicate its non-standard status, e.g., "Content-Transfer-
|
|||
|
Encoding: x-my-new-encoding". Additional standardized Content-
|
|||
|
Transfer-Encoding values must be specified by a standards-track RFC.
|
|||
|
The requirements such specifications must meet are given in RFC 2048.
|
|||
|
As such, all content-transfer-encoding namespace except that
|
|||
|
beginning with "X-" is explicitly reserved to the IETF for future
|
|||
|
use.
|
|||
|
|
|||
|
Unlike media types and subtypes, the creation of new Content-
|
|||
|
Transfer-Encoding values is STRONGLY discouraged, as it seems likely
|
|||
|
to hinder interoperability with little potential benefit
|
|||
|
|
|||
|
6.4. Interpretation and Use
|
|||
|
|
|||
|
If a Content-Transfer-Encoding header field appears as part of a
|
|||
|
message header, it applies to the entire body of that message. If a
|
|||
|
Content-Transfer-Encoding header field appears as part of an entity's
|
|||
|
headers, it applies only to the body of that entity. If an entity is
|
|||
|
of type "multipart" the Content-Transfer-Encoding is not permitted to
|
|||
|
have any value other than "7bit", "8bit" or "binary". Even more
|
|||
|
severe restrictions apply to some subtypes of the "message" type.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 16]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
It should be noted that most media types are defined in terms of
|
|||
|
octets rather than bits, so that the mechanisms described here are
|
|||
|
mechanisms for encoding arbitrary octet streams, not bit streams. If
|
|||
|
a bit stream is to be encoded via one of these mechanisms, it must
|
|||
|
first be converted to an 8bit byte stream using the network standard
|
|||
|
bit order ("big-endian"), in which the earlier bits in a stream
|
|||
|
become the higher-order bits in a 8bit byte. A bit stream not ending
|
|||
|
at an 8bit boundary must be padded with zeroes. RFC 2046 provides a
|
|||
|
mechanism for noting the addition of such padding in the case of the
|
|||
|
application/octet-stream media type, which has a "padding" parameter.
|
|||
|
|
|||
|
The encoding mechanisms defined here explicitly encode all data in
|
|||
|
US-ASCII. Thus, for example, suppose an entity has header fields
|
|||
|
such as:
|
|||
|
|
|||
|
Content-Type: text/plain; charset=ISO-8859-1
|
|||
|
Content-transfer-encoding: base64
|
|||
|
|
|||
|
This must be interpreted to mean that the body is a base64 US-ASCII
|
|||
|
encoding of data that was originally in ISO-8859-1, and will be in
|
|||
|
that character set again after decoding.
|
|||
|
|
|||
|
Certain Content-Transfer-Encoding values may only be used on certain
|
|||
|
media types. In particular, it is EXPRESSLY FORBIDDEN to use any
|
|||
|
encodings other than "7bit", "8bit", or "binary" with any composite
|
|||
|
media type, i.e. one that recursively includes other Content-Type
|
|||
|
fields. Currently the only composite media types are "multipart" and
|
|||
|
"message". All encodings that are desired for bodies of type
|
|||
|
multipart or message must be done at the innermost level, by encoding
|
|||
|
the actual body that needs to be encoded.
|
|||
|
|
|||
|
It should also be noted that, by definition, if a composite entity
|
|||
|
has a transfer-encoding value such as "7bit", but one of the enclosed
|
|||
|
entities has a less restrictive value such as "8bit", then either the
|
|||
|
outer "7bit" labelling is in error, because 8bit data are included,
|
|||
|
or the inner "8bit" labelling placed an unnecessarily high demand on
|
|||
|
the transport system because the actual included data were actually
|
|||
|
7bit-safe.
|
|||
|
|
|||
|
NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using
|
|||
|
content-transfer-encodings on composite body data may seem overly
|
|||
|
restrictive, it is necessary to prevent nested encodings, in which
|
|||
|
data are passed through an encoding algorithm multiple times, and
|
|||
|
must be decoded multiple times in order to be properly viewed.
|
|||
|
Nested encodings add considerable complexity to user agents: Aside
|
|||
|
from the obvious efficiency problems with such multiple encodings,
|
|||
|
they can obscure the basic structure of a message. In particular,
|
|||
|
they can imply that several decoding operations are necessary simply
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 17]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
to find out what types of bodies a message contains. Banning nested
|
|||
|
encodings may complicate the job of certain mail gateways, but this
|
|||
|
seems less of a problem than the effect of nested encodings on user
|
|||
|
agents.
|
|||
|
|
|||
|
Any entity with an unrecognized Content-Transfer-Encoding must be
|
|||
|
treated as if it has a Content-Type of "application/octet-stream",
|
|||
|
regardless of what the Content-Type header field actually says.
|
|||
|
|
|||
|
NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER-
|
|||
|
ENCODING: It may seem that the Content-Transfer-Encoding could be
|
|||
|
inferred from the characteristics of the media that is to be encoded,
|
|||
|
or, at the very least, that certain Content-Transfer-Encodings could
|
|||
|
be mandated for use with specific media types. There are several
|
|||
|
reasons why this is not the case. First, given the varying types of
|
|||
|
transports used for mail, some encodings may be appropriate for some
|
|||
|
combinations of media types and transports but not for others. (For
|
|||
|
example, in an 8bit transport, no encoding would be required for text
|
|||
|
in certain character sets, while such encodings are clearly required
|
|||
|
for 7bit SMTP.)
|
|||
|
|
|||
|
Second, certain media types may require different types of transfer
|
|||
|
encoding under different circumstances. For example, many PostScript
|
|||
|
bodies might consist entirely of short lines of 7bit data and hence
|
|||
|
require no encoding at all. Other PostScript bodies (especially
|
|||
|
those using Level 2 PostScript's binary encoding mechanism) may only
|
|||
|
be reasonably represented using a binary transport encoding.
|
|||
|
Finally, since the Content-Type field is intended to be an open-ended
|
|||
|
specification mechanism, strict specification of an association
|
|||
|
between media types and encodings effectively couples the
|
|||
|
specification of an application protocol with a specific lower-level
|
|||
|
transport. This is not desirable since the developers of a media
|
|||
|
type should not have to be aware of all the transports in use and
|
|||
|
what their limitations are.
|
|||
|
|
|||
|
6.5. Translating Encodings
|
|||
|
|
|||
|
The quoted-printable and base64 encodings are designed so that
|
|||
|
conversion between them is possible. The only issue that arises in
|
|||
|
such a conversion is the handling of hard line breaks in quoted-
|
|||
|
printable encoding output. When converting from quoted-printable to
|
|||
|
base64 a hard line break in the quoted-printable form represents a
|
|||
|
CRLF sequence in the canonical form of the data. It must therefore be
|
|||
|
converted to a corresponding encoded CRLF in the base64 form of the
|
|||
|
data. Similarly, a CRLF sequence in the canonical form of the data
|
|||
|
obtained after base64 decoding must be converted to a quoted-
|
|||
|
printable hard line break, but ONLY when converting text data.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 18]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
6.6. Canonical Encoding Model
|
|||
|
|
|||
|
There was some confusion, in the previous versions of this RFC,
|
|||
|
regarding the model for when email data was to be converted to
|
|||
|
canonical form and encoded, and in particular how this process would
|
|||
|
affect the treatment of CRLFs, given that the representation of
|
|||
|
newlines varies greatly from system to system, and the relationship
|
|||
|
between content-transfer-encodings and character sets. A canonical
|
|||
|
model for encoding is presented in RFC 2049 for this reason.
|
|||
|
|
|||
|
6.7. Quoted-Printable Content-Transfer-Encoding
|
|||
|
|
|||
|
The Quoted-Printable encoding is intended to represent data that
|
|||
|
largely consists of octets that correspond to printable characters in
|
|||
|
the US-ASCII character set. It encodes the data in such a way that
|
|||
|
the resulting octets are unlikely to be modified by mail transport.
|
|||
|
If the data being encoded are mostly US-ASCII text, the encoded form
|
|||
|
of the data remains largely recognizable by humans. A body which is
|
|||
|
entirely US-ASCII may also be encoded in Quoted-Printable to ensure
|
|||
|
the integrity of the data should the message pass through a
|
|||
|
character-translating, and/or line-wrapping gateway.
|
|||
|
|
|||
|
In this encoding, octets are to be represented as determined by the
|
|||
|
following rules:
|
|||
|
|
|||
|
(1) (General 8bit representation) Any octet, except a CR or
|
|||
|
LF that is part of a CRLF line break of the canonical
|
|||
|
(standard) form of the data being encoded, may be
|
|||
|
represented by an "=" followed by a two digit
|
|||
|
hexadecimal representation of the octet's value. The
|
|||
|
digits of the hexadecimal alphabet, for this purpose,
|
|||
|
are "0123456789ABCDEF". Uppercase letters must be
|
|||
|
used; lowercase letters are not allowed. Thus, for
|
|||
|
example, the decimal value 12 (US-ASCII form feed) can
|
|||
|
be represented by "=0C", and the decimal value 61 (US-
|
|||
|
ASCII EQUAL SIGN) can be represented by "=3D". This
|
|||
|
rule must be followed except when the following rules
|
|||
|
allow an alternative encoding.
|
|||
|
|
|||
|
(2) (Literal representation) Octets with decimal values of
|
|||
|
33 through 60 inclusive, and 62 through 126, inclusive,
|
|||
|
MAY be represented as the US-ASCII characters which
|
|||
|
correspond to those octets (EXCLAMATION POINT through
|
|||
|
LESS THAN, and GREATER THAN through TILDE,
|
|||
|
respectively).
|
|||
|
|
|||
|
(3) (White Space) Octets with values of 9 and 32 MAY be
|
|||
|
represented as US-ASCII TAB (HT) and SPACE characters,
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 19]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
respectively, but MUST NOT be so represented at the end
|
|||
|
of an encoded line. Any TAB (HT) or SPACE characters
|
|||
|
on an encoded line MUST thus be followed on that line
|
|||
|
by a printable character. In particular, an "=" at the
|
|||
|
end of an encoded line, indicating a soft line break
|
|||
|
(see rule #5) may follow one or more TAB (HT) or SPACE
|
|||
|
characters. It follows that an octet with decimal
|
|||
|
value 9 or 32 appearing at the end of an encoded line
|
|||
|
must be represented according to Rule #1. This rule is
|
|||
|
necessary because some MTAs (Message Transport Agents,
|
|||
|
programs which transport messages from one user to
|
|||
|
another, or perform a portion of such transfers) are
|
|||
|
known to pad lines of text with SPACEs, and others are
|
|||
|
known to remove "white space" characters from the end
|
|||
|
of a line. Therefore, when decoding a Quoted-Printable
|
|||
|
body, any trailing white space on a line must be
|
|||
|
deleted, as it will necessarily have been added by
|
|||
|
intermediate transport agents.
|
|||
|
|
|||
|
(4) (Line Breaks) A line break in a text body, represented
|
|||
|
as a CRLF sequence in the text canonical form, must be
|
|||
|
represented by a (RFC 822) line break, which is also a
|
|||
|
CRLF sequence, in the Quoted-Printable encoding. Since
|
|||
|
the canonical representation of media types other than
|
|||
|
text do not generally include the representation of
|
|||
|
line breaks as CRLF sequences, no hard line breaks
|
|||
|
(i.e. line breaks that are intended to be meaningful
|
|||
|
and to be displayed to the user) can occur in the
|
|||
|
quoted-printable encoding of such types. Sequences
|
|||
|
like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely
|
|||
|
appear in non-text data represented in quoted-
|
|||
|
printable, of course.
|
|||
|
|
|||
|
Note that many implementations may elect to encode the
|
|||
|
local representation of various content types directly
|
|||
|
rather than converting to canonical form first,
|
|||
|
encoding, and then converting back to local
|
|||
|
representation. In particular, this may apply to plain
|
|||
|
text material on systems that use newline conventions
|
|||
|
other than a CRLF terminator sequence. Such an
|
|||
|
implementation optimization is permissible, but only
|
|||
|
when the combined canonicalization-encoding step is
|
|||
|
equivalent to performing the three steps separately.
|
|||
|
|
|||
|
(5) (Soft Line Breaks) The Quoted-Printable encoding
|
|||
|
REQUIRES that encoded lines be no more than 76
|
|||
|
characters long. If longer lines are to be encoded
|
|||
|
with the Quoted-Printable encoding, "soft" line breaks
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 20]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
must be used. An equal sign as the last character on a
|
|||
|
encoded line indicates such a non-significant ("soft")
|
|||
|
line break in the encoded text.
|
|||
|
|
|||
|
Thus if the "raw" form of the line is a single unencoded line that
|
|||
|
says:
|
|||
|
|
|||
|
Now's the time for all folk to come to the aid of their country.
|
|||
|
|
|||
|
This can be represented, in the Quoted-Printable encoding, as:
|
|||
|
|
|||
|
Now's the time =
|
|||
|
for all folk to come=
|
|||
|
to the aid of their country.
|
|||
|
|
|||
|
This provides a mechanism with which long lines are encoded in such a
|
|||
|
way as to be restored by the user agent. The 76 character limit does
|
|||
|
not count the trailing CRLF, but counts all other characters,
|
|||
|
including any equal signs.
|
|||
|
|
|||
|
Since the hyphen character ("-") may be represented as itself in the
|
|||
|
Quoted-Printable encoding, care must be taken, when encapsulating a
|
|||
|
quoted-printable encoded body inside one or more multipart entities,
|
|||
|
to ensure that the boundary delimiter does not appear anywhere in the
|
|||
|
encoded body. (A good strategy is to choose a boundary that includes
|
|||
|
a character sequence such as "=_" which can never appear in a
|
|||
|
quoted-printable body. See the definition of multipart messages in
|
|||
|
RFC 2046.)
|
|||
|
|
|||
|
NOTE: The quoted-printable encoding represents something of a
|
|||
|
compromise between readability and reliability in transport. Bodies
|
|||
|
encoded with the quoted-printable encoding will work reliably over
|
|||
|
most mail gateways, but may not work perfectly over a few gateways,
|
|||
|
notably those involving translation into EBCDIC. A higher level of
|
|||
|
confidence is offered by the base64 Content-Transfer-Encoding. A way
|
|||
|
to get reasonably reliable transport through EBCDIC gateways is to
|
|||
|
also quote the US-ASCII characters
|
|||
|
|
|||
|
!"#$@[\]^`{|}~
|
|||
|
|
|||
|
according to rule #1.
|
|||
|
|
|||
|
Because quoted-printable data is generally assumed to be line-
|
|||
|
oriented, it is to be expected that the representation of the breaks
|
|||
|
between the lines of quoted-printable data may be altered in
|
|||
|
transport, in the same manner that plain text mail has always been
|
|||
|
altered in Internet mail when passing between systems with differing
|
|||
|
newline conventions. If such alterations are likely to constitute a
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 21]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
corruption of the data, it is probably more sensible to use the
|
|||
|
base64 encoding rather than the quoted-printable encoding.
|
|||
|
|
|||
|
NOTE: Several kinds of substrings cannot be generated according to
|
|||
|
the encoding rules for the quoted-printable content-transfer-
|
|||
|
encoding, and hence are formally illegal if they appear in the output
|
|||
|
of a quoted-printable encoder. This note enumerates these cases and
|
|||
|
suggests ways to handle such illegal substrings if any are
|
|||
|
encountered in quoted-printable data that is to be decoded.
|
|||
|
|
|||
|
(1) An "=" followed by two hexadecimal digits, one or both
|
|||
|
of which are lowercase letters in "abcdef", is formally
|
|||
|
illegal. A robust implementation might choose to
|
|||
|
recognize them as the corresponding uppercase letters.
|
|||
|
|
|||
|
(2) An "=" followed by a character that is neither a
|
|||
|
hexadecimal digit (including "abcdef") nor the CR
|
|||
|
character of a CRLF pair is illegal. This case can be
|
|||
|
the result of US-ASCII text having been included in a
|
|||
|
quoted-printable part of a message without itself
|
|||
|
having been subjected to quoted-printable encoding. A
|
|||
|
reasonable approach by a robust implementation might be
|
|||
|
to include the "=" character and the following
|
|||
|
character in the decoded data without any
|
|||
|
transformation and, if possible, indicate to the user
|
|||
|
that proper decoding was not possible at this point in
|
|||
|
the data.
|
|||
|
|
|||
|
(3) An "=" cannot be the ultimate or penultimate character
|
|||
|
in an encoded object. This could be handled as in case
|
|||
|
(2) above.
|
|||
|
|
|||
|
(4) Control characters other than TAB, or CR and LF as
|
|||
|
parts of CRLF pairs, must not appear. The same is true
|
|||
|
for octets with decimal values greater than 126. If
|
|||
|
found in incoming quoted-printable data by a decoder, a
|
|||
|
robust implementation might exclude them from the
|
|||
|
decoded data and warn the user that illegal characters
|
|||
|
were discovered.
|
|||
|
|
|||
|
(5) Encoded lines must not be longer than 76 characters,
|
|||
|
not counting the trailing CRLF. If longer lines are
|
|||
|
found in incoming, encoded data, a robust
|
|||
|
implementation might nevertheless decode the lines, and
|
|||
|
might report the erroneous encoding to the user.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 22]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
WARNING TO IMPLEMENTORS: If binary data is encoded in quoted-
|
|||
|
printable, care must be taken to encode CR and LF characters as "=0D"
|
|||
|
and "=0A", respectively. In particular, a CRLF sequence in binary
|
|||
|
data should be encoded as "=0D=0A". Otherwise, if CRLF were
|
|||
|
represented as a hard line break, it might be incorrectly decoded on
|
|||
|
platforms with different line break conventions.
|
|||
|
|
|||
|
For formalists, the syntax of quoted-printable data is described by
|
|||
|
the following grammar:
|
|||
|
|
|||
|
quoted-printable := qp-line *(CRLF qp-line)
|
|||
|
|
|||
|
qp-line := *(qp-segment transport-padding CRLF)
|
|||
|
qp-part transport-padding
|
|||
|
|
|||
|
qp-part := qp-section
|
|||
|
; Maximum length of 76 characters
|
|||
|
|
|||
|
qp-segment := qp-section *(SPACE / TAB) "="
|
|||
|
; Maximum length of 76 characters
|
|||
|
|
|||
|
qp-section := [*(ptext / SPACE / TAB) ptext]
|
|||
|
|
|||
|
ptext := hex-octet / safe-char
|
|||
|
|
|||
|
safe-char := <any octet with decimal value of 33 through
|
|||
|
60 inclusive, and 62 through 126>
|
|||
|
; Characters not listed as "mail-safe" in
|
|||
|
; RFC 2049 are also not recommended.
|
|||
|
|
|||
|
hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
|
|||
|
; Octet must be used for characters > 127, =,
|
|||
|
; SPACEs or TABs at the ends of lines, and is
|
|||
|
; recommended for any character not listed in
|
|||
|
; RFC 2049 as "mail-safe".
|
|||
|
|
|||
|
transport-padding := *LWSP-char
|
|||
|
; Composers MUST NOT generate
|
|||
|
; non-zero length transport
|
|||
|
; padding, but receivers MUST
|
|||
|
; be able to handle padding
|
|||
|
; added by message transports.
|
|||
|
|
|||
|
IMPORTANT: The addition of LWSP between the elements shown in this
|
|||
|
BNF is NOT allowed since this BNF does not specify a structured
|
|||
|
header field.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 23]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
6.8. Base64 Content-Transfer-Encoding
|
|||
|
|
|||
|
The Base64 Content-Transfer-Encoding is designed to represent
|
|||
|
arbitrary sequences of octets in a form that need not be humanly
|
|||
|
readable. The encoding and decoding algorithms are simple, but the
|
|||
|
encoded data are consistently only about 33 percent larger than the
|
|||
|
unencoded data. This encoding is virtually identical to the one used
|
|||
|
in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
|
|||
|
|
|||
|
A 65-character subset of US-ASCII is used, enabling 6 bits to be
|
|||
|
represented per printable character. (The extra 65th character, "=",
|
|||
|
is used to signify a special processing function.)
|
|||
|
|
|||
|
NOTE: This subset has the important property that it is represented
|
|||
|
identically in all versions of ISO 646, including US-ASCII, and all
|
|||
|
characters in the subset are also represented identically in all
|
|||
|
versions of EBCDIC. Other popular encodings, such as the encoding
|
|||
|
used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and
|
|||
|
the base85 encoding specified as part of Level 2 PostScript, do not
|
|||
|
share these properties, and thus do not fulfill the portability
|
|||
|
requirements a binary transport encoding for mail must meet.
|
|||
|
|
|||
|
The encoding process represents 24-bit groups of input bits as output
|
|||
|
strings of 4 encoded characters. Proceeding from left to right, a
|
|||
|
24-bit input group is formed by concatenating 3 8bit input groups.
|
|||
|
These 24 bits are then treated as 4 concatenated 6-bit groups, each
|
|||
|
of which is translated into a single digit in the base64 alphabet.
|
|||
|
When encoding a bit stream via the base64 encoding, the bit stream
|
|||
|
must be presumed to be ordered with the most-significant-bit first.
|
|||
|
That is, the first bit in the stream will be the high-order bit in
|
|||
|
the first 8bit byte, and the eighth bit will be the low-order bit in
|
|||
|
the first 8bit byte, and so on.
|
|||
|
|
|||
|
Each 6-bit group is used as an index into an array of 64 printable
|
|||
|
characters. The character referenced by the index is placed in the
|
|||
|
output string. These characters, identified in Table 1, below, are
|
|||
|
selected so as to be universally representable, and the set excludes
|
|||
|
characters with particular significance to SMTP (e.g., ".", CR, LF)
|
|||
|
and to the multipart boundary delimiters defined in RFC 2046 (e.g.,
|
|||
|
"-").
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 24]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Table 1: The Base64 Alphabet
|
|||
|
|
|||
|
Value Encoding Value Encoding Value Encoding Value Encoding
|
|||
|
0 A 17 R 34 i 51 z
|
|||
|
1 B 18 S 35 j 52 0
|
|||
|
2 C 19 T 36 k 53 1
|
|||
|
3 D 20 U 37 l 54 2
|
|||
|
4 E 21 V 38 m 55 3
|
|||
|
5 F 22 W 39 n 56 4
|
|||
|
6 G 23 X 40 o 57 5
|
|||
|
7 H 24 Y 41 p 58 6
|
|||
|
8 I 25 Z 42 q 59 7
|
|||
|
9 J 26 a 43 r 60 8
|
|||
|
10 K 27 b 44 s 61 9
|
|||
|
11 L 28 c 45 t 62 +
|
|||
|
12 M 29 d 46 u 63 /
|
|||
|
13 N 30 e 47 v
|
|||
|
14 O 31 f 48 w (pad) =
|
|||
|
15 P 32 g 49 x
|
|||
|
16 Q 33 h 50 y
|
|||
|
|
|||
|
The encoded output stream must be represented in lines of no more
|
|||
|
than 76 characters each. All line breaks or other characters not
|
|||
|
found in Table 1 must be ignored by decoding software. In base64
|
|||
|
data, characters other than those in Table 1, line breaks, and other
|
|||
|
white space probably indicate a transmission error, about which a
|
|||
|
warning message or even a message rejection might be appropriate
|
|||
|
under some circumstances.
|
|||
|
|
|||
|
Special processing is performed if fewer than 24 bits are available
|
|||
|
at the end of the data being encoded. A full encoding quantum is
|
|||
|
always completed at the end of a body. When fewer than 24 input bits
|
|||
|
are available in an input group, zero bits are added (on the right)
|
|||
|
to form an integral number of 6-bit groups. Padding at the end of
|
|||
|
the data is performed using the "=" character. Since all base64
|
|||
|
input is an integral number of octets, only the following cases can
|
|||
|
arise: (1) the final quantum of encoding input is an integral
|
|||
|
multiple of 24 bits; here, the final unit of encoded output will be
|
|||
|
an integral multiple of 4 characters with no "=" padding, (2) the
|
|||
|
final quantum of encoding input is exactly 8 bits; here, the final
|
|||
|
unit of encoded output will be two characters followed by two "="
|
|||
|
padding characters, or (3) the final quantum of encoding input is
|
|||
|
exactly 16 bits; here, the final unit of encoded output will be three
|
|||
|
characters followed by one "=" padding character.
|
|||
|
|
|||
|
Because it is used only for padding at the end of the data, the
|
|||
|
occurrence of any "=" characters may be taken as evidence that the
|
|||
|
end of the data has been reached (without truncation in transit). No
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 25]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
such assurance is possible, however, when the number of octets
|
|||
|
transmitted was a multiple of three and no "=" characters are
|
|||
|
present.
|
|||
|
|
|||
|
Any characters outside of the base64 alphabet are to be ignored in
|
|||
|
base64-encoded data.
|
|||
|
|
|||
|
Care must be taken to use the proper octets for line breaks if base64
|
|||
|
encoding is applied directly to text material that has not been
|
|||
|
converted to canonical form. In particular, text line breaks must be
|
|||
|
converted into CRLF sequences prior to base64 encoding. The
|
|||
|
important thing to note is that this may be done directly by the
|
|||
|
encoder rather than in a prior canonicalization step in some
|
|||
|
implementations.
|
|||
|
|
|||
|
NOTE: There is no need to worry about quoting potential boundary
|
|||
|
delimiters within base64-encoded bodies within multipart entities
|
|||
|
because no hyphen characters are used in the base64 encoding.
|
|||
|
|
|||
|
7. Content-ID Header Field
|
|||
|
|
|||
|
In constructing a high-level user agent, it may be desirable to allow
|
|||
|
one body to make reference to another. Accordingly, bodies may be
|
|||
|
labelled using the "Content-ID" header field, which is syntactically
|
|||
|
identical to the "Message-ID" header field:
|
|||
|
|
|||
|
id := "Content-ID" ":" msg-id
|
|||
|
|
|||
|
Like the Message-ID values, Content-ID values must be generated to be
|
|||
|
world-unique.
|
|||
|
|
|||
|
The Content-ID value may be used for uniquely identifying MIME
|
|||
|
entities in several contexts, particularly for caching data
|
|||
|
referenced by the message/external-body mechanism. Although the
|
|||
|
Content-ID header is generally optional, its use is MANDATORY in
|
|||
|
implementations which generate data of the optional MIME media type
|
|||
|
"message/external-body". That is, each message/external-body entity
|
|||
|
must have a Content-ID field to permit caching of such data.
|
|||
|
|
|||
|
It is also worth noting that the Content-ID value has special
|
|||
|
semantics in the case of the multipart/alternative media type. This
|
|||
|
is explained in the section of RFC 2046 dealing with
|
|||
|
multipart/alternative.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 26]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
8. Content-Description Header Field
|
|||
|
|
|||
|
The ability to associate some descriptive information with a given
|
|||
|
body is often desirable. For example, it may be useful to mark an
|
|||
|
"image" body as "a picture of the Space Shuttle Endeavor." Such text
|
|||
|
may be placed in the Content-Description header field. This header
|
|||
|
field is always optional.
|
|||
|
|
|||
|
description := "Content-Description" ":" *text
|
|||
|
|
|||
|
The description is presumed to be given in the US-ASCII character
|
|||
|
set, although the mechanism specified in RFC 2047 may be used for
|
|||
|
non-US-ASCII Content-Description values.
|
|||
|
|
|||
|
9. Additional MIME Header Fields
|
|||
|
|
|||
|
Future documents may elect to define additional MIME header fields
|
|||
|
for various purposes. Any new header field that further describes
|
|||
|
the content of a message should begin with the string "Content-" to
|
|||
|
allow such fields which appear in a message header to be
|
|||
|
distinguished from ordinary RFC 822 message header fields.
|
|||
|
|
|||
|
MIME-extension-field := <Any RFC 822 header field which
|
|||
|
begins with the string
|
|||
|
"Content-">
|
|||
|
|
|||
|
10. Summary
|
|||
|
|
|||
|
Using the MIME-Version, Content-Type, and Content-Transfer-Encoding
|
|||
|
header fields, it is possible to include, in a standardized way,
|
|||
|
arbitrary types of data with RFC 822 conformant mail messages. No
|
|||
|
restrictions imposed by either RFC 821 or RFC 822 are violated, and
|
|||
|
care has been taken to avoid problems caused by additional
|
|||
|
restrictions imposed by the characteristics of some Internet mail
|
|||
|
transport mechanisms (see RFC 2049).
|
|||
|
|
|||
|
The next document in this set, RFC 2046, specifies the initial set of
|
|||
|
media types that can be labelled and transported using these headers.
|
|||
|
|
|||
|
11. Security Considerations
|
|||
|
|
|||
|
Security issues are discussed in the second document in this set, RFC
|
|||
|
2046.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 27]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
12. Authors' Addresses
|
|||
|
|
|||
|
For more information, the authors of this document are best contacted
|
|||
|
via Internet mail:
|
|||
|
|
|||
|
Ned Freed
|
|||
|
Innosoft International, Inc.
|
|||
|
1050 East Garvey Avenue South
|
|||
|
West Covina, CA 91790
|
|||
|
USA
|
|||
|
|
|||
|
Phone: +1 818 919 3600
|
|||
|
Fax: +1 818 919 3614
|
|||
|
EMail: ned@innosoft.com
|
|||
|
|
|||
|
|
|||
|
Nathaniel S. Borenstein
|
|||
|
First Virtual Holdings
|
|||
|
25 Washington Avenue
|
|||
|
Morristown, NJ 07960
|
|||
|
USA
|
|||
|
|
|||
|
Phone: +1 201 540 8967
|
|||
|
Fax: +1 201 993 3032
|
|||
|
EMail: nsb@nsb.fv.com
|
|||
|
|
|||
|
|
|||
|
MIME is a result of the work of the Internet Engineering Task Force
|
|||
|
Working Group on RFC 822 Extensions. The chairman of that group,
|
|||
|
Greg Vaudreuil, may be reached at:
|
|||
|
|
|||
|
Gregory M. Vaudreuil
|
|||
|
Octel Network Services
|
|||
|
17080 Dallas Parkway
|
|||
|
Dallas, TX 75248-1905
|
|||
|
USA
|
|||
|
|
|||
|
EMail: Greg.Vaudreuil@Octel.Com
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 28]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
Appendix A -- Collected Grammar
|
|||
|
|
|||
|
This appendix contains the complete BNF grammar for all the syntax
|
|||
|
specified by this document.
|
|||
|
|
|||
|
By itself, however, this grammar is incomplete. It refers by name to
|
|||
|
several syntax rules that are defined by RFC 822. Rather than
|
|||
|
reproduce those definitions here, and risk unintentional differences
|
|||
|
between the two, this document simply refers the reader to RFC 822
|
|||
|
for the remaining definitions. Wherever a term is undefined, it
|
|||
|
refers to the RFC 822 definition.
|
|||
|
|
|||
|
attribute := token
|
|||
|
; Matching of attributes
|
|||
|
; is ALWAYS case-insensitive.
|
|||
|
|
|||
|
composite-type := "message" / "multipart" / extension-token
|
|||
|
|
|||
|
content := "Content-Type" ":" type "/" subtype
|
|||
|
*(";" parameter)
|
|||
|
; Matching of media type and subtype
|
|||
|
; is ALWAYS case-insensitive.
|
|||
|
|
|||
|
description := "Content-Description" ":" *text
|
|||
|
|
|||
|
discrete-type := "text" / "image" / "audio" / "video" /
|
|||
|
"application" / extension-token
|
|||
|
|
|||
|
encoding := "Content-Transfer-Encoding" ":" mechanism
|
|||
|
|
|||
|
entity-headers := [ content CRLF ]
|
|||
|
[ encoding CRLF ]
|
|||
|
[ id CRLF ]
|
|||
|
[ description CRLF ]
|
|||
|
*( MIME-extension-field CRLF )
|
|||
|
|
|||
|
extension-token := ietf-token / x-token
|
|||
|
|
|||
|
hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
|
|||
|
; Octet must be used for characters > 127, =,
|
|||
|
; SPACEs or TABs at the ends of lines, and is
|
|||
|
; recommended for any character not listed in
|
|||
|
; RFC 2049 as "mail-safe".
|
|||
|
|
|||
|
iana-token := <A publicly-defined extension token. Tokens
|
|||
|
of this form must be registered with IANA
|
|||
|
as specified in RFC 2048.>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 29]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
ietf-token := <An extension token defined by a
|
|||
|
standards-track RFC and registered
|
|||
|
with IANA.>
|
|||
|
|
|||
|
id := "Content-ID" ":" msg-id
|
|||
|
|
|||
|
mechanism := "7bit" / "8bit" / "binary" /
|
|||
|
"quoted-printable" / "base64" /
|
|||
|
ietf-token / x-token
|
|||
|
|
|||
|
MIME-extension-field := <Any RFC 822 header field which
|
|||
|
begins with the string
|
|||
|
"Content-">
|
|||
|
|
|||
|
MIME-message-headers := entity-headers
|
|||
|
fields
|
|||
|
version CRLF
|
|||
|
; The ordering of the header
|
|||
|
; fields implied by this BNF
|
|||
|
; definition should be ignored.
|
|||
|
|
|||
|
MIME-part-headers := entity-headers
|
|||
|
[fields]
|
|||
|
; Any field not beginning with
|
|||
|
; "content-" can have no defined
|
|||
|
; meaning and may be ignored.
|
|||
|
; The ordering of the header
|
|||
|
; fields implied by this BNF
|
|||
|
; definition should be ignored.
|
|||
|
|
|||
|
parameter := attribute "=" value
|
|||
|
|
|||
|
ptext := hex-octet / safe-char
|
|||
|
|
|||
|
qp-line := *(qp-segment transport-padding CRLF)
|
|||
|
qp-part transport-padding
|
|||
|
|
|||
|
qp-part := qp-section
|
|||
|
; Maximum length of 76 characters
|
|||
|
|
|||
|
qp-section := [*(ptext / SPACE / TAB) ptext]
|
|||
|
|
|||
|
qp-segment := qp-section *(SPACE / TAB) "="
|
|||
|
; Maximum length of 76 characters
|
|||
|
|
|||
|
quoted-printable := qp-line *(CRLF qp-line)
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 30]
|
|||
|
|
|||
|
RFC 2045 Internet Message Bodies November 1996
|
|||
|
|
|||
|
|
|||
|
safe-char := <any octet with decimal value of 33 through
|
|||
|
60 inclusive, and 62 through 126>
|
|||
|
; Characters not listed as "mail-safe" in
|
|||
|
; RFC 2049 are also not recommended.
|
|||
|
|
|||
|
subtype := extension-token / iana-token
|
|||
|
|
|||
|
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
|
|||
|
or tspecials>
|
|||
|
|
|||
|
transport-padding := *LWSP-char
|
|||
|
; Composers MUST NOT generate
|
|||
|
; non-zero length transport
|
|||
|
; padding, but receivers MUST
|
|||
|
; be able to handle padding
|
|||
|
; added by message transports.
|
|||
|
|
|||
|
tspecials := "(" / ")" / "<" / ">" / "@" /
|
|||
|
"," / ";" / ":" / "\" / <">
|
|||
|
"/" / "[" / "]" / "?" / "="
|
|||
|
; Must be in quoted-string,
|
|||
|
; to use within parameter values
|
|||
|
|
|||
|
type := discrete-type / composite-type
|
|||
|
|
|||
|
value := token / quoted-string
|
|||
|
|
|||
|
version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
|
|||
|
|
|||
|
x-token := <The two characters "X-" or "x-" followed, with
|
|||
|
no intervening white space, by any token>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Freed & Borenstein Standards Track [Page 31]
|
|||
|
|