Network Working Group G. Vaudreuil Request for Comments: 2421 Lucent Technologies Obsoletes: 1911 G. Parsons Category: Standards Track Northern Telecom September 1998 Voice Profile for Internet Mail - version 2 Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (1998). All Rights Reserved. Overview This document profiles Internet mail for voice messaging. It obsoletes RFC 1911 which describes version 1 of the profile. A list of changes from that document are noted in Appendix F. As well, Appendix A summarizes the protocol profiles of this version of VPIM. Please send comments on this document to the EMA VPIM Work Group mailing list: <vpim-l@ema.org> Working Group Summary This profile is not the product of an IETF working group, though several have reviewed the document. It is instead the product of the VPIM Work Group of the Electronic Messaging Association (EMA). This work group, which has representatives from most major voice mail vendors and several email vendors, has held several interoperability demonstrations between voice messaging vendors and is currently promoting VPIM trials and deployment.
Table of Contents 1. ABSTRACT .........................................................3 2. SCOPE ............................................................3 2.1 Voice Messaging System Limitations ............................3 2.2 Design Goals ..................................................4 3. PROTOCOL RESTRICTIONS ............................................5 4. VOICE MESSAGE INTERCHANGE FORMAT .................................6 4.1 Message Addressing Formats ....................................6 4.2 Message Header Fields .........................................9 4.3 Voice Message Content Types ..................................15 4.4 Other Message Content Types ..................................21 4.5 Forwarded Messages ...........................................23 4.6 Reply Messages ...............................................23 4.7 Notification Messages ........................................24 5. MESSAGE TRANSPORT PROTOCOL ......................................24 5.1 ESMTP Commands ...............................................25 5.2 ESMTP Keywords ...............................................27 5.3 ESMTP Parameters - MAIL FROM .................................28 5.4 ESMTP Parameters - RCPT TO ...................................29 5.5 ESMTP - SMTP Downgrading .....................................29 6. DIRECTORY ADDRESS RESOLUTION ....................................30 7. IMAP ............................................................30 8. MANAGEMENT PROTOCOLS ............................................30 8.1 Network Management ...........................................31 9. CONFORMANCE REQUIREMENTS ........................................31 10. SECURITY CONSIDERATIONS ........................................32 10.1 General Directive ...........................................32 10.2 Threats and Problems ........................................32 10.3 Security Techniques .........................................33 11. REFERENCES .....................................................33 12. ACKNOWLEDGMENTS ................................................36 13. AUTHORS' ADDRESSES .............................................36 14. APPENDIX A - VPIM REQUIREMENTS SUMMARY .........................37 15. APPENDIX B - EXAMPLE VOICE MESSAGES ............................45 16. APPENDIX C - EXAMPLE ERROR VOICE PROCESSING ERROR CODES ........50 17. APPENDIX D - EXAMPLE VOICE PROCESSING DISPOSITION TYPES ........51 18. APPENDIX E - IANA REGISTRATIONS ................................52 18.1 vCard EMAIL Type Definition for VPIM ........................52 18.2 Voice Content-Disposition Parameter Definition ..............52 19. APPENDIX F - CHANGE HISTORY: RFC 1911 TO THIS DOCUMENT .........54 20. FULL COPYRIGHT NOTICE ..........................................56
1. Abstract A class of special-purpose computers has evolved to provide voice messaging services. These machines generally interface to a telephone switch and provide call answering and voice messaging services. Traditionally, messages sent to a non-local machine are transported using analog networking protocols based on DTMF signaling and analog voice playback. As the demand for networking increases, there is a need for a standard high-quality digital protocol to connect these machines. The following document is a profile of the Internet standard MIME and ESMTP protocols for use as a digital voice messaging networking protocol. The profile is referred to as VPIM (Voice Profile for Internet Mail) in this document. This profile is based on earlier work in the Audio Message Interchange Specification (AMIS) group that defined a voice messaging protocol based on X.400 technology. This profile is intended to satisfy the user requirements statement from that earlier work with the industry standard ESMTP/MIME mail protocol infrastructures already used within corporate intranets. This second version of VPIM is based on implementation experience and obsoletes RFC 1911 which describes version 1 of the profile. 2. Scope MIME is the Internet multipurpose, multimedia messaging standard. This document explicitly recognizes its capabilities and provides a mechanism for the exchange of various messaging technologies, primarily voice and facsimile. This document specifies a restricted profile of the Internet multimedia messaging protocols for use between voice processing server platforms. These platforms have historically been special- purpose computers and often do not have the same facilities normally associated with a traditional Internet Email-capable computer. As a result, VPIM also specifies additional functionality as it is needed. This profile is intended to specify the minimum common set of features to allow interworking between compliant systems. 2.1 Voice Messaging System Limitations The following are typical limitations of voice messaging platform which were considered in creating this baseline profile. 1) Text messages are not normally received and often cannot be easily displayed or viewed. They can often be processed only via text-to-speech or text-to-fax features not currently present in many of these machines.
2) Voice mail machines usually act as an integrated Message Transfer Agent, Message Store and User Agent. There is no relaying of messages, and RFC 822 header fields may have limited use in the context of the limited messaging features currently deployed. 3) Voice mail message stores are generally not capable of preserving the full semantics of an Internet message. As such, use of a voice mail machine for gatewaying is not supported. In particular, storage of recipient lists, "Received" lines, and "Message-ID" may be limited. 4) Internet-style distribution/exploder mailing lists are not typically supported. Voice mail machines often implement only local alias lists, with error-to-sender and reply-to-sender behavior. Reply-all capabilities using a CC list are not generally available. 5) Error reports must be machine-parsable so that helpful responses can be voiced to users whose only access mechanism is a telephone. 6) The voice mail systems generally limit address entry to 16 or fewer numeric characters, and normally do not support alphanumeric mailbox names. Alpha characters are not generally used for mailbox identification as they cannot be easily entered from a telephone terminal. 2.2 Design Goals It is a goal of this profile to make as few restrictions and additions to the existing Internet mail protocols as possible while satisfying the requirements for interoperability with current generation voice messaging systems. This goal is motivated by the desire to increase the accessibility to digital messaging by enabling the use of proven existing networking software for rapid development. This specification is intended for use on a TCP/IP network; however, it is possible to use the SMTP protocol suite over other transport protocols. The necessary protocol parameters for such use is outside the scope of this document. This profile is intended to be robust enough to be used in an environment, such as the global Internet with installed-base gateways which do not understand MIME, though typical use is expected to be within corporate intranets. Full functionality, such as reliable error messages and binary transport, will require careful selection of gateways (e.g., via MX records) to be used as VPIM forwarding agents. Nothing in this document precludes use of general purpose MIME email packages to read and compose VPIM messages. While no
special configuration is required to receive VPIM compliant messages, some may be required to originate compliant structures. It is expected that a VPIM messaging system will be managed by a system administrator who can perform TCP/IP network configuration. When using facsimile or multiple voice encodings, it is suggested that the system administrator maintain a list of the capabilities of the networked mail machines to reduce the sending of undeliverable messages due to lack of feature support. Configuration, implementation and management of these directory listing capabilities are local matters. 3. Protocol Restrictions This protocol does not limit the number of recipients per message. Where possible, server implementations should not restrict the number of recipients in a single message. It is recognized that no implementation supports unlimited recipients, and that the number of supported recipients may be quite low. This protocol does not limit the maximum message length. Implementers should understand that some machines will be unable to accept excessively long messages. A mechanism is defined in the RFC 1425 SMTP service extensions to declare the maximum message size supported. The message size indicated in the ESMTP SIZE parameter is in bytes, not minutes or seconds. The number of bytes varies by voice encoding format and includes the MIME wrapper overhead. If the length must be known before sending, an approximate translation into minutes or seconds can be performed if the voice encoding is known. The following sections describe the restrictions and additions to Internet mail protocols that are required to be compliant with this VPIM v2 profile. Though various SMTP, ESMTP and MIME features are described here, the implementer is referred to the relevant RFCs for complete details. It is also advisable to check for IETF drafts of various Internet Mail specifications that are later than the most recent RFCs since, for example, MIME has yet to be published as a full IETF Standard. The table in Appendix A summarizes the protocol details of this profile. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [REQ].
4. Voice Message Interchange Format The voice message interchange format is a profile of the Internet Mail Protocol Suite. Any Internet Mail message containing the format defined in this section is referred to as a VPIM Message in this document. As a result, this document assumes an understanding of the Internet Mail specifications. Specifically, VPIM references components from the message format standard for Internet messages [RFC822], the Multipurpose Internet Message Extensions [MIME], the X.400 gateway specification [X.400], delivery status and message disposition notifications [REPORT][DSN][DRPT][STATUS][MDN], and the electronic business card [MIMEDIR][VCARD]. 4.1 Message Addressing Formats RFC 822 addresses are based on the domain name system. This naming system has two components: the local part, used for username or mailbox identification; and the host part, used for global machine identification. 4.1.1 VPIM Addresses The local part of the address shall be a US-ASCII string uniquely identifying a mailbox on a destination system. For voice messaging, the local part is a printable string containing the mailbox ID of the originator or recipient. While alpha characters and long mailbox identifiers are permitted, most voice mail networks rely on numeric mailbox identifiers to retain compatibility with the limited 10 digit telephone keypad. As a result, some voice messaging systems may only be able to handle a numeric local part. The reception of alphanumeric local parts on these systems may result in the address being mapped to some locally unique (but confusing to the recipient) number or, in the worst case the address could be deleted making the message un-replyable. Additionally, it may be difficult to create messages on these systems with an alphanumeric local part without complex key sequences or some form of directory lookup (see 6). The use of the domain naming system should be transparent to the user. It is the responsibility of the voice mail machine to lookup the fully-qualified domain name (FQDN) based on the address entered by the user (see 6). In the absence of a global directory, specification of the local part is expected to conform to international or private telephone numbering plans. It is likely that private numbering plans will prevail and these are left for local definition. However, it is RECOMMENDED that public telephone numbers be noted according to the international numbering plan described in [E.164]. The indication
that the local part is a public telephone number is given by a preceding `+' (the `+' would not be entered from a telephone keypad, it is added by the system as a flag). Since the primary information in the numeric scheme is contained by the digits, other character separators (e.g. `-') may be ignored (i.e. to allow parsing of the numeric local mailbox) or may be used to recognize distinct portions of the telephone number (e.g. country code). The specification of the local part of a VPIM address can be split into the four groups described below: 1) mailbox number - for use as a private numbering plan (any number of digits) - e.g. 2722@lucent.com 2) mailbox number+extension - for use as a private numbering plan with extensions any number of digits, use of `+' as separator - e.g. 2722+111@Lucent.com 3) +international number - for international telephone numbers conforming to E.164 maximum of 15 digits - e.g. +16137637582@vm.nortel.ca 4) - for international telephone numbers conforming to E.164 maximum of 15 digits, with an extension (e.g. behind a PBX) that has a maximum of 15 digits. - e.g. +17035245550+230@ema.org Note that this address format is designed to be compatible with current usage within the voice messaging industry. It is not compatible with the addressing formats of RFCs 2303-2304. It is expected that as telephony services become more widespread on the Internet, these addressing formats will converge. 4.1.2 Special Addresses Special addresses are provided for compatibility with the conventions of Internet mail. These addresses do not use numeric local addresses, both to conform to current Internet practice and to avoid conflict with existing numeric addressing plans. Two special addresses are RESERVED for use as follows: postmaster@domain By convention, a special mailbox named "postmaster" MUST exist on all systems. This address is used for diagnostics and should be checked regularly by the system manager. This mailbox is particularly likely
to receive text messages, which is not normal on a voice processing platform. The specific handling of these messages is an individual implementation choice. non-mail-user@domain If a reply to a message is not possible, such as a telephone answering message, then the special address "non-mail-user" must be used as the originator's address. Any text name such as "Telephone Answering", or the telephone number if it is available, is permitted. This special address is used as a token to indicate an unreachable originator. For compatibility with the installed base of mail user agents, implementations that generate this special address MUST send a negative delivery status notification (DSN) for reply messages sent to the undeliverable address. The status code for such NDN's is 5.1.1 "Mailbox does not exist". Example: From: Telephone Answering <non-mail-user@mycompany.com> 4.1.3 Distribution Lists There are many ways to handle distribution list (DL) expansions and none are 'standard'. Simple alias is a behavior closest to what most voice mail systems do today and what is to be used with VPIM messages. That is: Reply to the originator - (Address in the RFC822 Reply-to or From field) Errors to the submitter - (Address in the MAIL FROM: field of the ESMTP exchange and the Return-Path: RFC 822 field) Some proprietary voice messaging protocols include only the recipient of the particular copy in the envelope and include no "header fields" except date and per-message features. Most voice messaging systems do not provide for "Header Information" in their messaging queues and only include delivery information. As a result, recipient information MAY be in either the To or CC header fields. If all recipients cannot be presented (e.g. unknown DL expansion) then the recipient header fields MUST be omitted to indicate that an accurate list of recipients (e.g. for use with a reply-all capability) is not known.
4.2 Message Header Fields Internet messages contain a header information block. This header block contains information required to identify the sender, the list of recipients, the message send time, and other information intended for user presentation. Except for specialized gateway and mailing list cases, header fields do not indicate delivery options for the transport of messages. Distribution list processors are noted for modifying or adding to the header fields of messages that pass through them. VPIM systems MUST be able to accept and ignore header fields that are not defined here. The following header lines are permitted for use with VPIM voice messages: 4.2.1 From The originator's fully-qualified domain address (a mailbox address followed by the fully-qualified domain name). The user listed in this field should be presented in the voice message envelope as the originator of the message. Systems compliant with this profile SHOULD provide the text personal name of the voice message originator in a quoted phrase, if the name is available. Text names of corporate or positional mailboxes MAY be provided as a simple string. From [RFC822] Example: From: "Joe S. User" <12145551212@mycompany.com> From: Technical Support <611@serviceprovider.com> The From address SHOULD be used for replies (see 4.6). However, if the From address contains <non-mail-user@domain>, the user SHOULD NOT be offered the option to reply, nor should notifications be sent to this address. Voice mail machines may not be able to support separate attributes for the FROM, REPLY-TO, and SENDER header field and the SMTP MAIL FROM command, VPIM conforming systems SHOULD set these values to the same address. Use of addresses different than those present in the From header field address may result in unanticipated behavior.
4.2.2 To The To header contains the recipient's fully-qualified domain address. There may be one or more To: fields in any message. Example: To: +12145551213@mycompany.com Systems compliant to this profile SHOULD provide a list of recipients only if all recipients are provided. The To header MUST NOT be included in the message if the sending message transport agent (MTA) cannot resolve all the addresses in it, e.g. if an address is a DL alias for which the expansion is unknown (see 4.1.3). If present, the addresses in the To header MAY be used for a reply message to all recipients. Systems compliant to this profile MAY also discard the To addresses of incoming messages because of the inability to store the information. This would, of course, make a reply-to-all capability impossible. 4.2.3 Cc The cc header contains additional recipients' fully-qualified domain addresses. Many voice mail systems maintain only sufficient envelope information for message delivery and are not capable of storing or providing a complete list of recipients. Systems compliant to this profile SHOULD provide a list of recipients only if all disclosed recipients can be provided. The list of disclosed recipients does not include those sent via a blind copy. If not, systems SHOULD omit the To and Cc header fields to indicate that the full list of recipients is unknown. Example: Cc: +12145551213@mycompany.com Systems compliant to this profile MAY discard the Cc addresses of incoming messages as necessary. If a list of Cc or to addresses is present, these addresses MAY be used for a reply message to all recipients.
4.2.4 Date The Date header contains the date, time, and time zone in which the message was sent by the originator. The time zone SHOULD be represented in a four-digit time zone offset, such as -0500 for North American Eastern Standard Time. This may be supplemented by a time zone name in parentheses, e.g., "-0900 (PDT)". Compliant implementations SHOULD be able to convert RFC 822 date and time stamps into local time. Example: Date: Wed, 28 Jul 96 10:08:49 -0800 (PST) The sending system MUST report the time the message was sent. If the VPIM sender is relaying a message from a system which does not provide a time stamp, the time of arrival at the VPIM system SHOULD be used as the date. From [RFC822] 4.2.5 Sender The Sender header field contains the actual address of the originator if the message is sent by an agent on behalf of the author indicated in the From: field. This header field MAY be sent by VPIM conforming system. If it is present in a VPIM message, the receiving VPIM implementation may ignore the field and only present the From header field. 4.2.6 Return Path The Return-path header is added by the final delivering SMTP server. If present, it contains the address from the MAIL FROM parameter of the ESMTP exchange (see 5.1.2). Any error messages resulting from the delivery failure MUST be sent to this address (see [DRPT] for additional details). Note that if the Return-path is null ("<>"), e.g. no path, loop prevention or confidential, a notification MUST NOT be sent. If the Return path address is not available (either from this header or the MAIL FROM parameter) the From address may be used to deliver notifications. 4.2.7 Message-id The Message-id header contains a unique per-message identifier. A unique message-id MUST be generated for each message sent from a compliant implementation. The message-id is not required to be stored on the receiving system. This identifier MAY be used for tracking, auditing, and returning
receipt notification reports. From [RFC822] Example: Message-id: <12345678@mycompany.com> 4.2.8 Reply-To If present, the reply-to header provides a preferred address to which reply messages should be sent (see 4.6). Typically, voice mail systems can only support one originator of a message so it is unlikely that this field can be supported. A compliant system SHOULD NOT send a Reply-To header. However, if a reply-to header is present, a reply-to sender message MAY be sent to the address specified (that is, overwriting From). From [RFC822] This preferred address of the originator must also be provided in the originator's vCard EMAIL attribute, if present (see 4.3.3). 4.2.9 Received The Received header contains trace information added to the beginning of a RFC 822 message by MTAs. This is the only header permitted to be added by an MTA. Information in this header is useful for debugging when using an US-ASCII message reader or a header parsing tool. A compliant system MUST add Received header fields when acting as a gateway and MUST NOT remove any Received fields when relaying messages to other MTAs or gateways.. These header fields MAY be ignored or deleted when the message is received at the final destination. From [RFC822] 4.2.10 MIME Version The MIME-Version header indicates that the message conforms to the MIME message format specification. Systems compliant with this specification SHOULD include a comment with the words "(Voice 2.0)". RFC 1911 defines an earlier version of this profile and uses the token (Voice 1.0). From [MIME1][VPIM1] Example: MIME-Version: 1.0 (Voice 2.0) This identifier is intended for information only and SHOULD NOT be used to semantically identify the message as being a VPIM message. Instead, the presence of the content defined in [V-MSG] SHOULD be used if identification is necessary.
4.2.11 Content-Type The content-type header declares the type of content enclosed in the message. The typical top level content in a VPIM Message SHOULD be multipart/voice-message, a mechanism for bundling several components into a single identifiable voice message. The allowable contents are detailed in section 4.3 of this document. From [MIME2] 4.2.12 Content-Transfer-Encoding Because Internet mail was initially specified to carry only 7-bit US-ASCII text, it may be necessary to encode voice and fax data into a representation suitable for that environment. The content- transfer-encoding header describes this transformation if it is needed. Compliant implementations MUST recognize and decode the standard encodings, "Binary", "7bit, "8bit", "Base64" and "Quoted- Printable". The allowable content-transfer-encodings are specified in section 4.3. From [MIME1] 4.2.13 Sensitivity The sensitivity header, if present, indicates the requested privacy level. The case-insensitive values "Personal" and "Private" are specified. If no privacy is requested, this field is omitted. If a sensitivity header is present in the message, a compliant system MUST prohibit the recipient from forwarding this message to any other user. A compliant system, however, SHOULD allow the responder to reply to a sensitive message, but SHOULD NOT include the original message content. The sensitivity of the reply message MAY be set by the responder. If the receiving system does not support privacy and the sensitivity is one of "Personal" or "Private", a negative delivery status notification must sent to the originator with the appropriate status code indicating that privacy could not be assured. The message contents SHOULD be returned to the sender to allow for a voice context with the notification. A non-delivery notification to a private message SHOULD NOT be tagged private since it will be sent to the originator. From: [X.400] 4.2.14 Importance Indicates the requested importance to be given by the receiving system. The case-insensitive values "low", "normal" and "high" are specified. If no special importance is requested, this header may be omitted and the value assumed to be "normal".
Compliant implementations MAY use this header to indicate the importance of a message and may order messages in a recipient's mailbox. From: [X.400] 4.2.15 Subject The subject field is often provided by email systems but is not widely supported on Voice Mail platforms. For compatibility with text based mailbox interfaces, a text subject field SHOULD be generated by a compliant implementation but MAY be discarded if present by a receiving system. From [RFC822] It is recommended that voice messaging systems that do not support any text user interfaces (e.g. access only by a telephone) insert a generic subject header of "VPIM Message" for the benefit of text enabled recipients. 4.2.16 Disposition-Notification-To This header MAY be present to indicate that the sender is requesting a receipt notification from the receiving user agent. This message disposition notification (MDN) is typically sent by the user agent after the user has listened to the message and consented to an MDN being sent Example: Disposition-notification-to: +12145551213@mycompany.com The presence of a Disposition-notification-to header in a message is merely a request for an MDN described in 4.4.5. The recipients' user agents are always free to silently ignore such a request so this header does not burden any system that does not support it. From [MDN]. 4.2.17 Disposition-Notification-Options This header MAY be present to define future extensions parameters for an MDN requested by the presence of the header in the previous section. Currently no parameters are defined by this document or by [MDN]. However, this header MUST be parsed if present, if MDNs are supported. If it contains a extension parameter that is required for proper MDN generation (noted with "=required"), then an MDN MUST NOT be sent if the parameter is not understood. See [MDN] for complete details.
Example: Disposition-notification-options: whizzbang=required,foo 4.3 Voice Message Content Types MIME, introduced in [MIME1], is a general-purpose message body format that is extensible to carry a wide range of body parts. It provides for encoding binary data so that it can be transported over the 7-bit text-oriented SMTP protocol. This transport encoding (denoted by the Content-Transfer-Encoding header field) is in addition to the audio encoding required to generate a binary object. MIME defines two transport encoding mechanisms to transform binary data into a 7 bit representation, one designed for text-like data ("Quoted-Printable"), and one for arbitrary binary data ("Base64"). While Base64 is dramatically more efficient for audio data, either will work. Where binary transport is available, no transport encoding is needed, and the data can be labeled as "Binary". An implementation in compliance with this profile SHOULD send audio and/or facsimile data in binary form when binary message transport is available. When binary transport is not available, implementations MUST encode the audio and/or facsimile data as Base64. The detection and decoding of "Quoted-Printable", "7bit", and "8bit" MUST be supported in order to meet MIME requirements and to preserve interoperability with the fullest range of possible devices. However, if a content is received in a transfer encoding that cannot be rendered to the user, an appropriate negative delivery status notification MUST be sent. The content types described in this section are identified for use within the multipart/voice-message content. This content, which is the fundamental part of a VPIM message, is referred to as a VPIM voice message in this document. Only the contents profiled subsequently can be sent within a VPIM voice message construct (i.e., the mulitpart/voice-message content type) to form a simple or a more complex structure (several examples are given in Appendix B). The presence of other contents within a VPIM voice message is an error condition and SHOULD result in a negative delivery status notification. When multiple contents are present within the multipart/voice-message, they SHOULD be presented to the user in the order that they appear in the message.
4.3.1 Multipart/Voice-Message This MIME multipart structure provides a mechanism for packaging a voice message into one container that is tagged as VPIM v2 compliant. The semantic of multipart/Voice-Message (defined in [V-MSG]) is identical to multipart/mixed and may be interpreted as that by systems that do not recognize this content-type. The Multipart/Voice-Message content-type MUST only contain the profiled media and content types specified in this section (i.e. audio/*, image/*, message/rfc822 and text/directory). The most common will be: spoken name, spoken subject, the message itself, attached fax and directory info. Forwarded messages are created by simply using the message/rfc822 construct. Conformant implementations MUST send the multipart/voice-message in a VPIM message. In most cases, this Multipart/Voice-Message content will be the top level (i.e. in the Content-Type header). Conformant implementations MUST recognize the Multipart/Voice-Message content (whether it is a top level content or below a multipart/mixed) and be able to separate the contents (e.g. spoken name or spoken subject). 4.3.2 Message/RFC822 MIME requires support of the Message/RFC822 message encapsulation body part. This body part is used within a multipart/voice-message to forward complete messages (see 4.5) or to reply with original content (see 4.6). From [MIME2] 4.3.3 Text/Directory This content allows for the inclusion of a Versit vCard [VCARD] electronic business card within a VPIM message. The format is suitable as an interchange format between applications or systems, and is defined independent of the method used to transport it. It provides a useful mechanism to transport information about the originator that can be used by the receiving VPIM system (see 6) or other local applications Each vCard MUST be contained within a Text/Directory content type [MIMEDIR] within a VPIM message. [MIMEDIR] requires that the character set MUST be defined as a parameter value (typically us- ascii for VPIM) and that the profile SHOULD be defined (the value MUST be vCard within VPIM messages). Each VPIM message SHOULD be created with a Text/Directory (vCard profile) content type that MUST contain the preferred email address, telephone number, and text name of the message originator as well as
the vCard version. The vCard SHOULD contain the spoken name and role of the originator, as well as the revision date. Any other vCard attribute MAY also be present. The intent is that the vCard be used as the source of information to contact the originator (e.g., reply, call).If the text/directory content-type is included in a VPIM message, the vCard profile [VCARD] MUST be used and MUST specify at least the following attributes: TEL - Public switched telephone number in international (E.164) format (various types, typically VOICE) EMAIL - email address (various types, typically INTERNET; the type VPIM is optionally used to denote an address that supports VPIM messages(see 18.1)) VERSION - Indicates the version of the vCard profile. Version 3.0 [VCARD] MUST be used. The following attributes SHOULD be specified: N - Family Name, Given Name, Additional Names, Honorific Prefixes, and Suffixes. Because it is expected that recipients using a telephone user interface will use the information in the vCard to identify the originator, and the GUI will see the information presented in the FROM line, all present components in the text name of the FROM header field MUST match the values provided by the Vcard. ROLE - The role of the person identified in `N' or `FN', but may also be used to distinguish when the sender is a corporate or positional mailbox SOUND - spoken name sound data (various types, typically 32KADPCM) REV - Revision of vCard in ISO 8601 date format The vCard MAY use other attributes as defined in [VCARD] or extensions attributes not yet defined (e.g. capabilities). If present, the spoken name attribute MUST be denoted by a content ID pointing to an audio/* content elsewhere in the VPIM message. A typical VPIM message (i.e. no forwarded parts), MUST only contain one vCard -- more than one is an error condition. A VPIM message that contains forwarded messages, though, may contain multiple vCards. However, these vCards MUST be associated with the originator(s) of the forwarded message(s) and the originator of the forwarding message. As a result, all forwarded vCards will be
contained in message/rfc822 contents -- only the vCard of forwarding originator will be at the top-level. Example: Content-Type: text/directory; charset=us-ascii; profile=vCard Content-Transfer-Encoding: 7bit BEGIN:VCARD N:Parsons;Glenn ORG:Northern Telecom TEL;TYPE=VOICE;MSG;WORK:+1-613-763-7582 EMAIL;TYPE=INTERNET;glenn.parsons@nortel.ca EMAIL;TYPE=INTERNET;VPIM:6137637582@vm.nortel.ca SOUND;TYPE=32KADPCM;ENCODING=URI: CID:<part1@VM2-4321> REV:19960831T103310Z VERSION: 3.0 END:VCARD 4.3.4 Audio/32KADPCM An implementation compliant to this profile MUST send Audio/32KADPCM by default for voice [ADPCM]. Receivers MUST be able to accept and decode Audio/32KADPCM. Typically this body contains several minutes of message content, however if used for spoken name or subject the content should be considerably shorter (i.e. about 10 and 20 seconds respectively). If an implementation can only handle one voice body, then multiple voice bodies (if present) SHOULD be concatenated, and SHOULD NOT be discarded. It is RECOMMENDED that this be done in the same order as they were sent. Note that if an Originator Spoken Name audio body and a vCard are both present in a VPIM message, the vCard SOUND attribute MUST point to this audio body (see 4.3.3). While any valid MIME body header MAY be used, several header fields have the following semantics when included with this body part: 4.3.4.1 Content-Description: This field MAY be present to facilitate the text identification of these body parts in simple email readers. Any values may be used, though it may be useful to use values similar to those for Content- Disposition. Example: Content-Description: Big Telco Voice Message
4.3.4.2 Content-Disposition: This field MUST be present to allow the parsable identification of these body parts. This is especially useful if, as is typical, more than one Audio/32KADPCM body occurs within a single level (e.g. multipart/voice-message). Since a VPIM voice message is intended to be automatically played upon display of the message, in the order in which the audio contents occur, the audio contents must always be of type inline. However, it is still useful to include a filename value, so this should be present if this information is available. From [DISP] In order to distinguish between the various types of audio contents in a VPIM voice message a new disposition parameter "voice" is defined with the parameter values below to be used as appropriate (see 18.2): Voice-Message - the primary voice message, Voice-Message-Notification - a spoken delivery notification or spoken disposition notification, Originator-Spoken-Name - the spoken name of the originator, Recipient-Spoken-Name - the spoken name of the recipient if available to the originator and present if there is ONLY one recipient, Spoken-Subject- the spoken subject of the message, typically spoken by the originator Note that there SHOULD only be one instance of each of these types of audio contents per message level. Additional instances of a given type (i.e., parameter value) may occur within an attached forwarded voice message. Implementations that do not understand the "voice" parameter (or the Content-Disposition header) can safely ignore it, and will present the audio bodyparts in order (but will not be able to distinguish between them). Example: Content-Disposition: inline; voice=spoken-subject; filename="msg001.726" 4.3.4.3 Content-Duration: This field MAY be present to allow the specification of the length of the audio bodypart in seconds. The use of this field on reception is a local implementation issue. From [DUR]
Example: Content-Duration: 33 4.3.4.4 Content-Language: This field MAY be present to allow the specification of the spoken language of the audio bodypart. The encoding is defined in [LANG]. The use of this field on reception is a local implementation issue. Example for UK English: Content-Language: en-UK 4.3.5 Image/Tiff A common image encoding for facsimile, known as TIFF-F, is a derivative of the Tag Image File Format (TIFF) and is described in several documents. For the purposes of VPIM, the F Profile of TIFF for Facsimile (TIFF-F) is defined in [TIFF-F] and the image/tiff MIME content type is defined in [TIFFREG]. While there are several formats of TIFF, only TIFF-F is profiled for use in a VPIM voice message. Further, since the TIFF-F file format is used in a store- and-forward mode with VPIM, the image MUST be encoded so that there is only one image strip per facsimile page. All VPIM implementations that support facsimile SHOULD generate TIFF-F compatible facsimile contents in the image/tiff; application=faxbw sub-type encoding by default. An implementation MAY send this fax content in VPIM voice messages and MUST be able to recognize and display it in received messages. If a fax message is received that cannot be rendered to the user (e.g. the receiving VPIM system does not support fax), then the system MUST return the message with a negative delivery status notification with a media not supported status code. While any valid MIME body header MAY be used (e.g., Content- Disposition to indicate the filename), none are specified to have special semantics for VPIM and MAY be ignored. Note that the content type parameter application=faxbw MUST be included in outbound messages. However, inbound messages with or without this parameter MUST be rendered to the user (if the rendering software encounters an error in the file format, some form of negative delivery status notification MUST be sent to the originator).
4.3.6 Proprietary Voice or Fax Formats Proprietary voice or fax encoding formats or other standard formats MAY be supported under this profile provided a unique identifier is registered with the IANA prior to use (see [MIME4]). The voice encodings should be registered as sub-types of Audio and the fax encodings should be registered as sub-types of Image Use of any other encoding except audio/32kadpcm or image/tiff; application=faxbw reduces interoperability in the absence of explicit manual system configuration. A compliant implementation MAY use any other encoding with explicit per-destination configuration. 4.4 Other Message Content Types An implementation compliant with this profile MAY send additional contents in a VPIM message, but ONLY outside of the multipart/voice- message. The content types described in this section are identified for use with this profile. Additional contents not defined in this profile MUST NOT be used without prior explicit per-destination configuration. If an implementation receives a VPIM message that contains content types not specified in this profile, their handling is a local implementation issue (e.g. the unknown contents MAY be discarded if they cannot be presented to the recipient). Conversely, if an implementation receives a non-VPIM message (i.e., without a mulitpart/voice-message content type) with any of the contents defined in 4.3 & 4.4, it SHOULD deliver those contents, but the full message handling is a local issue (e.g. the unknown contents _or_ the entire message MAY be discarded). Implementations MUST issue negative delivery status notifications to the originator when any form of non-delivery to the recipient occurs. The multipart contents defined below MAY be sent as the top level of a VPIM message (with other noted contents below them as required.) As well, the multipart/mixed content SHOULD be used as the top level of a VPIM message to form a more complex structure (e.g., with additional content types). When multiple contents are present, they SHOULD be presented to the user in the order that they appear in the message. Several examples are given in Appendix B. 4.4.1 Multipart/Mixed MIME provides the facilities for enclosing several body parts in a single message. Multipart/Mixed SHOULD only be used for sending complex voice or multimedia messages. That is, as the top level Content-Type when sending one of the following contents (in addition to the VPIM voice message) in a VPIM message. Compliant systems MUST accept multipart/mixed body parts. From [MIME2]
4.4.2 Text/Plain MIME requires support of the basic Text/Plain content type. This content type has limited applicability within the voice messaging environment. However, because VPIM is a MIME profile, MIME requirements should be met. Compliant VPIM implementations SHOULD NOT send the Text/Plain content-type. Compliant implementations MUST accept Text/Plain messages, however, specific handling is left as an implementation decision. From [MIME2] There are several mechanisms that can be used to support text (once accepted) on voice messaging systems including text-to-speech and text-to-fax conversions. If no rendering of the text is possible (i.e., it is not possible for the recipient to determine if the text is a critical part of the message), the entire message MUST be returned to the sender with a negative delivery status notification and a media-unsupported status code. 4.4.3 Multipart/Report The Multipart/Report is used for enclosing human-readable and machine parsable notification (e.g. Message/delivery-status) body parts and any returned message content. The multipart/report content-type is used to deliver both delivery status reports indicating transport success or failure and message disposition notifications to indicate post-delivery events such as receipt notification. Compliant implementations MUST use the Multipart/Report construct. Compliant implementations MUST recognize and decode the Multipart/Report content type and its components in order to present the report to the user. From [REPORT] Multipart/Report messages from VPIM implementations SHOULD include the human-readable description of the error as a spoken audio/* content (this speech SHOULD also be made available to the notification recipient). As well, VPIM implementations MUST be able to handle (and MAY generate) Multipart/Report messages that encode the human-readable description of the error as text. Note that per [DSN] the human-readable part MUST always be present. 4.4.4 Message/Delivery-status This MIME body part is used for sending machine-parsable delivery status notifications. Compliant implementations MUST use the Message/delivery-status construct when returning messages or sending warnings. Compliant implementations MUST recognize and decode the Message/delivery-status content type and present the reason for failure to the sender of the message. From [DSN]
4.4.5 Message/Disposition-notification This MIME body part is used for sending machine-parsable receipt notification message disposition notifications. Conforming implementations SHOULD use the Message/Disposition-notification construct when sending post-delivery message status notifications. These MDNs, however, MUST only be sent in response to the presence of the Disposition-notification-to header in 4.2.16. Conforming implementations should recognize and decode the Message/Disposition- notification content type and present the notification to the user. From [MDN] 4.5 Forwarded Messages VPIM version 2 explicitly supports the forwarding of voice and fax content with voice or fax annotation. However, only the two constructs described below are acceptable in a VPIM message. Since only the first (i.e. message/rfc822) can be recognized as a forwarded message (or even multiple forwarded messages), it is RECOMMENDED that this construct be used whenever possible. Forwarded VPIM messages SHOULD be sent as a multipart/voice-message with the entire original message enclosed in a message/rfc822 content type and the annotation as a separate Audio/* or image/* body part. If the RFC822 header fields are not available for the forwarded content, simulated header fields with available information SHOULD be constructed to indicate the original sending timestamp, and the original sender as indicated in the "From" line. However, note that at least one of "From", "Subject", or "Date" MUST be present. As well, the message/rfc822 content MUST include at least the "MIME- Version", and "Content-Type" header fields. From [MIME2] In the event that forwarding information is lost through concatenation of the original message and the forwarding annotation, such as must be done in a gateway between VPIM and the AMIS voice messaging protocol, the entire audio content MAY be sent as a single Audio/* segment without including any forwarding semantics. 4.6 Reply Messages Replies to VPIM messages (and Internet mail messages) are addressed to the address noted in the reply-to header (see 4.2.8) if it is present, else the From address (see 4.2.1) is used. The vCard EMAIL attribute, if present, SHOULD be the same as the reply-to address and may be the same as the From address. While the vCard is the senders preferred address it SHOULD NOT be used to generate a reply. Also, the Return-path address should not be used for replies.
Support of multiple originator header fields is often not possible on voice messaging systems, so it may be necessary to choose only one when gatewaying a VPIM message to another voice message system. However, implementers should note that this may make it impossible to send error messages and replies to their proper destinations. In some cases, a reply message is not possible, such as with a message created by telephone answering (i.e. classic voice mail). In this case, the From field MUST contain the special address non-mail- user@domain (see 4.1.2). A null ESMTP MAIL FROM address SHOULD also be used in this case (see 5.1.2). A receiving VPIM system SHOULD NOT offer the user the option to reply to this kind of message. 4.7 Notification Messages VPIM delivery status notification messages (4.4.4) MUST be sent to the originator of the message when any form of non-delivery of the subject message or its components occurs. These error messages must be sent to the return path (4.2.6) if present, otherwise, the From (4.2.1) address may be used. VPIM Receipt Notification messages (4.4.5) should be sent to the sender specified in the Disposition-Notification-To header field (4.2.16), only after the message has been presented to the recipient or if the message has somehow been disposed of without being presented to the recipient (e.g. if it were deleted before playing it). VPIM Notification messages may be positive or negative, and can indicate delivery at the server or receipt by the client. However, the notification MUST be contained in a multipart/report container (4.4.3) and SHOULD contain a spoken error message. If a VPIM system receives a message with contents that are not understood (see 4.3 & 4.4), its handling is a local matter. A delivery status notification SHOULD be generated if the message could not be delivered because of unknown contents (e.g., on traditional voice processing systems). In some cases, the message may be delivered (with a positive DSN sent) to a mailbox before the determination of rendering can be made. 5. Message Transport Protocol Messages are transported between voice mail machines using the Internet Extended Simple Mail Transfer Protocol (ESMTP). All information required for proper delivery of the message is included in the ESMTP dialog. This information, including the sender and recipient addresses, is commonly referred to as the message
"envelope". This information is equivalent to the message control block in many analog voice messaging protocols. ESMTP is a general-purpose messaging protocol, designed both to send mail and to allow terminal console messaging. Simple Mail Transport Protocol (SMTP) was originally created for the exchange of US-ASCII 7-bit text messages. Binary and 8-bit text messages have traditionally been transported by encoding the messages into a 7-bit text-like form. [ESMTP] formalized an extension mechanism for SMTP, and subsequent RFCs have defined 8-bit text networking, command streaming, binary networking, and extensions to permit the declaration of message size for the efficient transmission of large messages such as multi-minute voice mail. The following sections list ESMTP commands, keywords, and parameters that are required and those that are optional for conformance to this profile. 5.1 ESMTP Commands 5.1.1 HELO Base SMTP greeting and identification of sender. This command is not to be sent by compliant systems unless the more-capable EHLO command is not accepted. It is included for compatibility with general SMTP implementations. Compliant servers MUST implement the HELO command for backward compatibility but clients SHOULD NOT send it unless EHLO is not supported. From [SMTP] 5.1.2 MAIL FROM (REQUIRED) Originating mailbox. This address contains the mailbox to which errors should be sent. VPIM implementations SHOULD use the same address in the MAIL FROM command as is used in the From header field. This address is not necessarily the same as the message Sender listed in the message header fields if the message was received from a gateway or sent to an Internet-style mailing list. From [SMTP, ESMTP] The MAIL FROM address SHOULD be stored in the local message store for the purposes of generating a delivery status notification to the originator. The address indicated in the MAIL FROM command SHOULD be passed as a local system parameter or placed in a Return-Path: line inserted at the beginning of a VPIM message. From [HOSTREQ] Since delivery status notifications MUST be sent to the MAIL FROM address, the use of the null address ("<>") is often used to prevent looping of messages. This null address MAY be used to note that a particular message has no return path (e.g. a telephone answer
message). From [SMTP] 5.1.3 RCPT TO Recipient's mailbox. The parameter to this command contains only the address to which the message should be delivered for this transaction. It is the set of addresses in one or more RCPT TO commands that are used for mail routing. From [SMTP, ESMTP] Note: In the event that multiple transport connections to multiple destination machines are required for the same message, the set of addresses in a given transport connection may not match the list of recipients in the message header fields. 5.1.4 DATA Initiates the transfer of message data. Support for this command is required. Compliant implementations MUST implement the SMTP DATA command for backwards compatibility. From [SMTP] 5.1.5 TURN Requests a change-of-roles, that is, the client that opened the connection offers to assume the role of server for any mail the remote machine may wish to send. Because SMTP is not an authenticated protocol, the TURN command presents an opportunity to improperly fetch mail queued for another destination. Compliant implementations SHOULD NOT implement the TURN command. From [SMTP] 5.1.6 QUIT Requests that the connection be closed. If accepted, the remote machine will reset and close the connection. Compliant implementations MUST implement the QUIT command. From [SMTP] 5.1.7 RSET Resets the connection to its initial state. Compliant implementations MUST implement the RSET command. From [SMTP] 5.1.8 VRFY Requests verification that this node can reach the listed recipient. While this functionality is also included in the RCPT TO command, VRFY allows the query without beginning a mail transfer transaction. This command is useful for debugging and tracing problems. Compliant implementations MAY implement the VRFY command. From [SMTP] (Note that the implementation of VRFY may simplify the guessing of a
recipient's mailbox or automated sweeps for valid mailbox addresses, resulting in a possible reduction in privacy. Various implementation techniques may be used to reduce the threat, such as limiting the number of queries per session.) From [SMTP] 5.1.9 EHLO The enhanced mail greeting that enables a server to announce support for extended messaging options. The extended messaging modes are discussed in subsequent sections of this document. Compliant implementations MUST implement the ESMTP command and return the capabilities indicated later in this memo. From [ESMTP] 5.1.10 BDAT The BDAT command provides a higher efficiency alternative to the earlier DATA command, especially for voice. The BDAT command provides for native binary transport of messages. Compliant implementations SHOULD support binary transport using the BDAT command [BINARY]. 5.2 ESMTP Keywords The following ESMTP keywords indicate extended features useful for voice messaging. 5.2.1 PIPELINING The "PIPELINING" keyword indicates ability of the receiving server to accept new commands before issuing a response to the previous command. Pipelining commands dramatically improves performance by reducing the number of round-trip packet exchanges and makes it possible to validate all recipient addresses in one operation. Compliant implementations SHOULD support the command pipelining indicated by this keyword. From [PIPE] 5.2.2 SIZE The "SIZE" keyword provides a mechanism by which the SMTP server can indicate the maximum size message supported. Compliant servers MUST provide size extension to indicate the maximum size message that can be accepted. Clients SHOULD NOT send messages larger than the size indicated by the server. Clients SHOULD advertise SIZE= when sending messages to servers that indicate support for the SIZE extension. From [SIZE]
5.2.3 CHUNKING The "CHUNKING" keyword indicates that the receiver will support the high-performance binary transport mode. Note that CHUNKING can be used with any message format and does not imply support for binary encoded messages. Compliant implementations MAY support binary transport indicated by this capability. From [BINARY] 5.2.4 BINARYMIME The "BINARYMIME" keyword indicates that the SMTP server can accept binary encoded MIME messages. Compliant implementations MAY support binary transport indicated by this capability. Note that support for this feature requires support of CHUNKING. From [BINARY] 5.2.5 DSN The "DSN" keyword indicates that the SMTP server will accept explicit delivery status notification requests. Compliant implementations MUST support the delivery notification extensions in [DRPT]. 5.2.6 ENHANCEDSTATUSCODES The "ENHANCEDSTATUSCODES" keyword indicates that an SMTP server augments its responses with the enhanced mail system status codes [CODES]. These codes can then be used to provide more informative explanations of error conditions, especially in the context of the delivery status notifications format defined in [DSN]. Compliant implementations SHOULD support this capability. From [STATUS] 5.3 ESMTP Parameters - MAIL FROM 5.3.1 BINARYMIME The current message is a binary encoded MIME messages. Compliant implementations SHOULD support binary transport indicated by this parameter. From [BINARY] 5.3.2 RET The RET parameter indicates whether the content of the message should be returned. Compliant systems SHOULD honor a request for returned content. From [DRPT]
5.3.3 ENVID The ENVID keyword of the SMTP MAIL command is used to specify an "envelope identifier" to be transmitted along with the message and included in any DSNs issued for any of the recipients named in this SMTP transaction. The purpose of the envelope identifier is to allow the sender of a message to identify the transaction for which the DSN was issued. Compliant implementations MAY use this parameter. From [DRPT] 5.4 ESMTP Parameters - RCPT TO 5.4.1 NOTIFY The NOTIFY parameter indicates the conditions under which a delivery report should be sent. Compliant implementations MUST honor this request. From [DRPT] 5.4.2 ORCPT The ORCPT keyword of the RCPT command is used to specify an "original" recipient address that corresponds to the actual recipient to which the message is to be delivered. If the ORCPT esmtp-keyword is used, it MUST have an associated esmtp-value, which consists of the original recipient address, encoded according to the rules below. Compliant implementations MAY use this parameter. From [DRPT] 5.5 ESMTP - SMTP Downgrading The ESMTP extensions suggested or required for conformance to VPIM fall into two categories. The first category includes features which increase the efficiency of the transport system such as SIZE, BINARYMIME, and PIPELINING. In the event of a downgrade to a less functional transport system, these features can be dropped with no functional change to the sender or recipient. The second category of features are transport extensions in support of new functions. DSN and EnhancedStatusCodes provide essential improvements in the handling of delivery status notifications to bring email to the level of reliability expected of Voice Mail. To ensure a consistent level of service across an intranet or the global Internet, it is essential that VPIM compliant ESMTP support the ESMTP DSN extension at all hops between a VPIM originating system and the recipient system. In the situation where a `downgrade' is unavoidable a relay hop may be forced (by the next hop) to forward a VPIM message without the ESMTP request for positive delivery status notification. It is RECOMMENDED that the downgrading system should continue to attempt to deliver the message, but MUST send an appropriate delivery
notification to the originator, e.g. the message left an ESMTP host and was sent (unreliably) via SMTP.