Network Working Group N. Charlton Request for Comments: 3351 Millpark Category: Informational M. Gasson Koru Solutions G. Gybels M. Spanner RNID A. van Wijk Ericsson August 2002 User Requirements for the Session Initiation Protocol (SIP) in Support of Deaf, Hard of Hearing and Speech-impaired Individuals Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved.Abstract
This document presents a set of Session Initiation Protocol (SIP) user requirements that support communications for deaf, hard of hearing and speech-impaired individuals. These user requirements address the current difficulties of deaf, hard of hearing and speech-impaired individuals in using communications facilities, while acknowledging the multi-functional potential of SIP-based communications. A number of issues related to these user requirements are further raised in this document. Also included are some real world scenarios and some technical requirements to show the robustness of these requirements on a concept-level.
Table of Contents
1. Terminology and Conventions Used in this Document................2 2. Introduction.....................................................3 3. Purpose and Scope................................................4 4. Background.......................................................4 5. Deaf, Hard of Hearing and Speech-impaired Requirements for SIP...5 5.1 Connection without Difficulty................................5 5.2 User Profile.................................................6 5.3 Intelligent Gateways.........................................6 5.4 Inclusive Design.............................................7 5.5 Resource Management..........................................7 5.6 Confidentiality and Security.................................7 6. Some Real World Scenarios........................................8 6.1 Transcoding Service..........................................8 6.2 Media Service Provider.......................................9 6.3 Sign Language Interface......................................9 6.4 Synthetic Lip-reading Support for Voice Calls...............10 6.5 Voice-Activated Menu Systems................................10 6.6 Conference Call.............................................11 7. Some Suggestions for Service Providers and User Agent Manufacturers...................................................13 8. Acknowledgements................................................14 Security Considerations.........................................14 Normative References............................................15 Informational References........................................15 Author's Addresses..............................................15 Full Copyright Statement........................................171. Terminology and Conventions Used in this Document
In this document, the key words "MUST", "MUST NOT","REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC2119[1] and indicate requirement levels for compliant SIP implementations. For the purposes of this document, the following terms are considered to have these meanings: Abilities: A person's capacity for communicating which could include a hearing or speech impairment or not. The terms Abilities and Preferences apply to both caller and call-recipient. Preferences: A person's choice of communication mode. This could include any combination of media streams, e.g., text, audio, video.
The terms Abilities and Preferences apply to both caller and call-recipient. Relay Service: A third-party or intermediary that enables communications between deaf, hard of hearing and speech-impaired people, and people without hearing or speech-impairment. Relay Services form a subset of the activities of Transcoding Services (see definition). Transcoding Services: A human or automated third party acting as an intermediary in any session between two other User Agents (being a User Agent itself), and transcoding one stream into another (e.g., voice to text or vice versa). Textphone: Sometimes called a TTY (teletypewriter), TDD (telecommunications device for the deaf) or a minicom, a textphone enables a deaf, hard of hearing or speech-impaired person to place a call to a telephone or another textphone. Some textphones use the V.18[3] protocol as a standard for communication with other textphone communication protocols world-wide. User: A deaf, hard of hearing or speech-impaired individual. A user is otherwise referred to as a person or individual, and users are referred to as people. Note: For the purposes of this document, a deaf, hard of hearing, or speech-impaired person is an individual who chooses to use SIP because it can minimize or eliminate constraints in using common communication devices. As SIP promises a total communication solution for any kind of person, regardless of ability and preference, there is no attempt to specifically define deaf, hard of hearing or speech-impaired in this document.2. Introduction
The background for this document is the recent development of SIP[2] and SIP-based communications, and a growing awareness of deaf, hard of hearing and speech-impaired issues in the technical community. The SIP capacity to simplify setting up, managing and tearing down communication sessions between all kinds of User Agents has specific implications for deaf, hard of hearing and speech-impaired individuals.
As SIP enables multiple sessions with translation between multiple types of media, these requirements aim to provide the standard for recognizing and enabling these interactions, and for a communications model that includes any and all types of SIP-networking abilities and preferences.3. Purpose and Scope
The scope of this document is firstly to present a current set of user requirements for deaf, hard of hearing and speech-impaired individuals through SIP-enabled communications. These are then followed by some real world scenarios in SIP-communications that could be used in a test environment, and some concepts of how these requirements can be developed by service providers and User Agent manufacturers. These recommendations make explicit the needs of a currently often disadvantaged user-group and attempt to match them with the capacity of SIP. It is not the intention here to prioritize the needs of deaf, hard of hearing and speech-impaired people in a way that would penalize other individuals. These requirements aim to encourage developers and manufacturers world-wide to consider the specific needs of deaf, hard of hearing and speech-impaired individuals. This document presents a world-vision where deafness, hard of hearing or speech impairment are no longer a barrier to communication.4. Background
Deaf, hard of hearing and speech-impaired people are currently often unable to use commonly available communication devices. Although this is documented[4], this does not mean that developers or manufacturers are always aware of this. Communication devices for deaf, hard of hearing and speech-impaired people are currently often primitive in design, expensive, and non-compatible with progressively designed, cheaper and more adaptable communication devices for other individuals. For example, many models of textphone are unable to communicate with other models. Additionally, non-technical human communications, for example sign languages or lip-reading, are non-standard around the world.
There are intermediary or third-party relay services (e.g. transcoding services) that facilitate communications, uni- or bi- directional, for deaf, hard of hearing and speech-impaired people. Currently relay services are mostly operator-assisted (manual), although methods of partial automation are being implemented in some areas. These services enable full access to modern facilities and conveniences for deaf, hard of hearing and speech-impaired people. Although these services are somewhat limited, their value is undeniable as compared to their previous complete unavailability. Yet communication methods in recent decades have proliferated: email, mobile phones, video streaming, etc. These methods are an advance in the development of data transfer technologies between devices. Developers and advocates of SIP agree that it is a protocol that not only anticipates the growth in real-time communications between convergent networks, but also fulfills the potential of the Internet as a communications and information forum. Further, they agree that these developments allow a standard of communication that can be applied throughout all networking communities, regardless of abilities and preferences.5. Deaf, Hard of Hearing and Speech-impaired Requirements for SIP
Introduction The user requirements in this section are provided for the benefit of service providers, User Agent manufacturers and any other interested parties in the development of products and services for deaf, hard of hearing and speech-impaired people. The user requirements are as follows:5.1 Connection without Difficulty
This requirement states: Whatever the preferences and abilities of the user and User Agent, there SHOULD be no difficulty in setting up SIP sessions. These sessions could include multiple proxies, call routing decisions, transcoding services, e.g., the relay service Typetalk[5] or other media processing, and could include multiple simultaneous or alternative media streams.
This means that any User Agent in the conversation (including transcoding services) MUST be able to add or remove a media stream from the call without having to tear it down and re-establish it.5.2 User Profile
This requirement states: Deaf, hard of hearing and speech-impaired user abilities and preferences (i.e., user profile) MUST be communicable by SIP, and these abilities and preferences MUST determine the handling of the session. The User Profile for a deaf, hard of hearing or speech-impaired person might include details about: - How media streams are received and transmitted (text, voice, video, or any combination, uni- or bi-directional). - Redirecting specific media streams through a transcoding service (e.g., the relay service Typetalk) - Roaming (e.g., a deaf person accessing their User Profile from a web-interface at an Internet cafe) - Anonymity: i.e., not revealing that a deaf person is calling, even through a transcoding service (e.g., some relay services inform the call-recipient that there is an incoming text call without saying that a deaf person is calling). Part of this requirement is to ensure that deaf, hard of hearing and speech-impaired people can keep their preferences and abilities confidential from others, to avoid possible discrimination or prejudice, while still being able to establish a SIP session.5.3 Intelligent Gateways
This requirement states: SIP SHOULD support a class of User Agents to perform as gateways for legacy systems designed for deaf, hard of hearing and speech-impaired people. For example, an individual could have a SIP User Agent acting as a gateway to a PSTN legacy textphone.
5.4 Inclusive Design
This requirement states: Where applicable, design concepts for communications (devices, applications, etc.) MUST include the abilities and preferences of deaf, hard of hearing and speech-impaired people. Transcoding services and User Agents MUST be able to connect with each other regardless of the provider or manufacturer. This means that new User Agents MUST be able to support legacy protocols through appropriate gateways.5.5 Resource Management
This requirement states: User Agents SHOULD be able to identify the content of a media stream in order to obtain such information as the cost of the media stream, if a transcoding service can support it, etc. User Agents SHOULD be able to choose among transcoding services and similar services based on their capabilities (e.g., whether a transcoding service carries a particular media stream), and any policy constraints they impose (e.g., charging for use). It SHOULD be possible for User Agents to discover the availability of alternative media streams and to choose from them.5.6 Confidentiality and Security
This requirement states: All third-party or intermediaries (transcoding services) employed in a session for deaf, hard of hearing and speech-impaired people MUST offer a confidentiality policy. All information exchanged in this type of session SHOULD be secure, that is, erased before confidentiality is breached, unless otherwise required. This means that transcoding services (e.g., interpretation, translation) MUST publish their confidentiality and security policies.
6. Some Real World Scenarios
These scenarios are intended to show some of the various types of media streams that would be initiated, managed, directed, and terminated in a SIP-enabled network, and shows how some resources might be managed between SIP-enabled networks, transcoding services and service providers. To illustrate the communications dynamic of these kinds of scenarios, each one specifically mentions the kind of media streams transmitted, and whether User Agents and Transcoding Services are involved.6.1 Transcoding Service
In this scenario, a hearing person calls the household of a deaf person and a hearing person. 1. A voice conversation is initiated between the hearing participants: ( Person A) <-----Voice ---> ( Person B) 2. During the conversation, the hearing person asks to talk with the deaf person, while keeping the voice connection open so that voice to voice communications can continue if required. 3. A Relay Service is invited into the conversation. 4. The Relay Service transcodes the hearing person's words into text. 5. Text from the hearing person's voice appears on the display of the deaf person's User Agent. 6. The deaf person types a response. 7. The Relay Service receives the text and reads it to the hearing person: ( ) <------------------Voice----------------> ( ) (Person A ) -----Voice---> ( Voice To Text ) -Text-> (Person B ) ( ) <----Voice---- (Service Provider) <-Text- ( ) 8. The hearing person asks to talk with the hearing person in the deaf person's household. 9. The Relay Service withdraws from the call.
6.2 Media Service Provider
In this scenario, a deaf person wishes to receive the content of a radio program through a text stream transcoded from the program's audio stream. 1. The deaf person attempts to establish a connection to the radio broadcast, with User Agent preferences set to receiving audio stream as text. 2. The User Agent of the deaf person queries the radio station User Agent on whether a text stream is available, other than the audio stream. 3. However, the radio station has no text stream available for a deaf listener, and responds in the negative. 4. As no text stream is available, the deaf person's User Agent requests a voice-to-text transcoding service (e.g., a real-time captioning service) to come into the conversation space. 5. The transcoding service User Agent identifies the audio stream as a radio broadcast. However, the policy of the transcoding service is that it does not accept radio broadcasts because it would overload their resources far too quickly. 6. In this case, the connection fails. Alternatively, continuing from 2 above: 3. The radio station does provide text with their audio streams. 4. The deaf person receives a text stream of the radio program. Note: To support deaf, hard of hearing and speech-impaired people, service providers are encouraged to provide text with audio streams.6.3 Sign Language Interface
In this scenario, a deaf person enables a signing avatar (e.g., ViSiCAST[6]) by setting up a User Agent to receive audio streams as XML data that will operate an avatar for sign-language. For outgoing communications, the deaf person types text that is transcoded into an audio stream for the other conversation participant.
For example: ( )-Voice->(Voice To Avatar Commands) ----XMLData-->( ) ( hearing ) (deaf ) ( Person A)<-Voice-( Text To Voice ) <--------Text-------- (Person B) ( ) (Service Provider) ( )6.4 Synthetic Lip-speaking Support for Voice Calls
In order to receive voice calls, a hard of hearing person uses lip- speaking avatar software (e.g., Synface[7]) on a PC. The lip- speaking software processes voice (audio) stream data and displays a synthetic animated face that a hard of hearing person may be able to lip-read. During a conversation, the hard of hearing person uses the lip-speaking software as support for understanding the audio stream. For example: ( ) <------------------Voice-------------->( ) ( hearing ) ( PC with ) ( hard of ) ( Person A) -------Voice-----> ( lip-speaking)---->( hearing ) ( ) ( software ) ( Person B)6.5 Voice Activated Menu Systems
In this scenario, a deaf person wishing to book cinema tickets with a credit card, uses a textphone to place the call. The cinema employs a voice-activated menu system for film titles and showing times. 1. The deaf person places a call to the cinema with a textphone: (Textphone) <-----Text ---> (Voice-activated System) 2. The cinema's voice-activated menu requests an auditory response to continue. 3. A Relay Service is invited into the conversation. 4. The Relay Service transcodes the prompts of the voice-activated menu into text. 5. Text from the voice-activated menu appears on the display of the deaf person's textphone. 6. The deaf person types a response.
7. The Relay Service receives the text and reads it to the voice- activated system: ( ) (Relay Service ) ( ) ( deaf ) -Text-> (Provider ) -Voice-> (Voice-Activated) ( Person A ) <-Text- (Text To Voice ) <-Voice- (System ) 8. The transaction is finalized with a confirmed booking time. 9. The Relay Service withdraws from the call.6.6 Conference Call
A conference call is scheduled between five people: - Person A listens and types text (hearing, no speech) - Person B recognizes sign language and signs back (deaf, no speech) - Person C reads text and speaks (deaf or hearing impaired) - Person D listens and speaks - Person E recognizes sign language and reads text and signs A conference call server calls the five people and based on their preferences sets up the different transcoding services required. Assuming English is the base language for the call, the following intermediate transcoding services are invoked: - A transcoding service (English speech to English text) - An English text to sign language service - A sign language to English text service - An English text to English speech service Note: In order to translate from English speech to sign language, a chain of intermediate transcoding services was used (transcoding and English text to sign language) because there was no speech-to-sign language available for direct translation. Accordingly, the same applied for the translation from sign language to English speech.
(Person A) ----- Text ----> ( Text-to-SL ) --- Video ----> (Person B) ---------------------- Text --------------------> (Person C) ----- Text ----> (Text-to-Speech) --- Voice ----> (Person D) ---------------------- Text --------------------> (Person E) ----- Text ----> ( Text-to-SL ) --- Video ----> (Person E) (Person B) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person A) ---- Video ----> ( SL-to-Text ) ---- Text ----> (Person C) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person D) --------------------- Video --------------------> (Person E) ---- Video ----> ( SL-to-Text ) ---- Text ----> (Person E) (Person C) --------------------- Voice --------------------> (Person A) Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person B) --------------------- Voice --------------------> (Person D) ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person E) Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person E) (Person D) --------------------- Voice --------------------> (Person A) Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person B) ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person C) ---- Voice ----> (Speech-to-Text) ---- Text ----> (Person E) Voice->(Speech-to-Text)-Text->(Text-to-SL)-Video->(Person E) (Person E) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person A) --------------------- Video --------------------> (person B) ---- Video ----> ( SL-to-Text ) ---- Text ----> (Person C) -Video-> (SL-to-Text) -Text-> (Text-to-Speech) -> (Person D) Remarks: - Some services might be shared by users and/or other services. - Person E uses two parallel streams (SL and English Text). The User Agent might perform time synchronisation when displaying the streams. However, this would require synchronisation information to be present on the streams. - The session protocols might support optional buffering of media streams, so that users and/or intermediate services could go back to previous content or to invoke a transcoding service for content they just missed. - Hearing impaired users might still receive audio as well, which they will use to drive some visual indicators so that they can better see where, for instance, the pauses are in the conversation.
7. Some Suggestions for Service Providers and User Agent Manufacturers
This section is included to encourage service providers and user agent manufacturers in developing products and services that can be used by as wide a range of individuals as possible, including deaf, hard of hearing and speech-impaired people. - Service providers and User Agent manufacturers can offer to a deaf, hard of hearing and speech-impaired person the possibility of being able to prevent their specific abilities and preferences from being made public in any transaction. - If a User Agent performs auditory signalling, for example a pager, it could also provide another signalling method; visual (e.g., a flashing light) or tactile (e.g., vibration). - Service providers who allow the user to store specific abilities and preferences or settings (i.e., a user profile) might consider storing these settings in a central repository, accessible no matter what the location of the user and regardless of the User Agent used at that time or location. - If there are several transcoding services available, the User Agent can be set to select the most economical/highest quality service. - The service provider can show the cost per minute and any minimum charge of a transcoding service call before a session starts, allowing the user a choice of engaging in the service or not. - Service providers are encouraged to offer an alternative stream to an audio stream, for example, text or data streams that operate avatars, etc. - Service providers are encouraged to provide a text alternative to voice-activated menus, e.g., answering and voice mail systems. - Manufacturers of voice-activated software are encouraged to provide an alternative visual format for software prompts, menus, messages, and status information. - Manufacturers of mobile phones are encouraged to design equipment that avoids electro-magnetic interference with hearing aids. - All services for interpreting, transliterating, or facilitating communications for deaf, hard of hearing and speech-impaired people are required to:
- Keep information exchanged during the transaction strictly confidential - Enable information exchange literally and simply, without deviating and compromising the content - Facilitate communication without bias, prejudice or opinion - Match skill-sets to the requirements of the users of the service - Behave in a professional and appropriate manner - Be fair in pricing of services - Strive to improve the skill-sets used for their services. - Conference call services might consider ways to allow users who employ transcoding services (which usually introduce a delay) to have real-time information sufficient to be able to identify gaps in the conversation so they could inject comments, as well as ways to raise their hand, vote and carry out other activities where timing of their response relative to the real-time conversation is important.8. Acknowledgements
The authors would like to thank the following individuals for their contributions to this document: David R. Oran, Cisco Mark Watson, Nortel Networks Brian Grover, RNID Anthony Rabin, RNID Michael Hammer, Cisco Henry Sinnreich, Worldcom Rohan Mahy, Cisco Julian Branston, Cedalion Hosting Services Judy Harkins, Gallaudet University, Washington, D.C. Cary Barbin, Gallaudet University, Washington, D.C. Gregg Vanderheiden, Trace R&D Center University of Wisconsin-Madison Gottfried Zimmerman, Trace R&D Center University of Wisconsin-MadisonSecurity Considerations
This document presents some privacy and security considerations. They are treated in Section 5.6 Confidentiality and Security.
Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.Informational References
[3] International Telecommunication Union (ITU), "Operational and interworking requirements for DCEs operating in the text telephone mode". ITU-T Recommendation V.18, November 2000. [4] Moore, Matthew, et al. "For Hearing People Only: Answers to Some of the Most Commonly Asked Questions About the Deaf Community, Its Culture, and the Deaf Reality". MSM Productions Ltd., 2nd Edition, September 1993. [5] http://www.typetalk.org. [6] http://www.visicast.co.uk. [7] http://www.speech.kth.se/teleface.Authors' Addresses
Nathan Charlton Millpark Limited 52 Coborn Road London E3 2DG Tel: +44-7050 803628 Fax: +44-7050 803628 EMail: nathan@millpark.com Mick Gasson Koru Solutions 30 Howland Way London SE16 6HN Tel: +44-20 7237 3488 Fax: +44-20 7237 3488 EMail: michael.gasson@korusolutions.com
Guido Gybels RNID 19-23 Featherstone Street London EC1Y 8SL Tel: +44-20 7296 8000 Textphone: +44-20 7296 8001 Fax: +44-20 7296 8199 EMail: Guido.Gybels@rnid.org.uk Mike Spanner RNID 19-23 Featherstone Street London EC1Y 8SL Tel: +44-20 7296 8000 Textphone: +44-20 7296 8001 Fax: +44-20 7296 8199 EMail: mike.spanner@rnid.org.uk Arnoud van Wijk Ericsson EuroLab Netherlands BV P.O. Box 8 5120 AA Rijen The Netherlands Fax: +31-161-247569 EMail: Arnoud.van.Wijk@eln.ericsson.se Comments can be sent to the SIPPING mailing list.
Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.