4. Types of Security Procedures 4.1 System Security Audits Most businesses undergo some sort of annual financial auditing as a regular part of their business life. Security audits are an important part of running any computing environment. Part of the security audit should be a review of any policies that concern system security, as well as the mechanisms that are put in place to enforce them. 4.1.1 Organize Scheduled Drills Although not something that would be done each day or week, scheduled drills may be conducted to determine if the procedures defined are adequate for the threat to be countered. If your major threat is one of natural disaster, then a drill would be conducted to verify your backup and recovery mechanisms. On the other hand, if your greatest threat is from external intruders attempting to penetrate your system, a drill might be conducted to actually try a penetration to observe the effect of the policies. Drills are a valuable way to test that your policies and procedures are effective. On the other hand, drills can be time- consuming and disruptive to normal operations. It is important to weigh the benefits of the drills against the possible time loss which may be associated with them. 4.1.2 Test Procedures If the choice is made to not to use scheduled drills to examine your entire security procedure at one time, it is important to test individual procedures frequently. Examine your backup procedure to make sure you can recover data from the tapes. Check log files to be sure that information which is supposed to be logged to them is being logged to them, etc.. When a security audit is mandated, great care should be used in devising tests of the security policy. It is important to clearly identify what is being tested, how the test will be conducted, and results expected from the test. This should all be documented and included in or as an adjunct to the security policy document itself. It is important to test all aspects of the security policy, both procedural and automated, with a particular emphasis on the automated mechanisms used to enforce the policy. Tests should be defined to ensure a comprehensive examination of policy features,
that is, if a test is defined to examine the user logon process, it should be explicitly stated that both valid and invalid user names and passwords will be used to demonstrate proper operation of the logon program. Keep in mind that there is a limit to the reasonableness of tests. The purpose of testing is to ensure confidence that the security policy is being correctly enforced, and not to "prove" the absoluteness of the system or policy. The goal should be to obtain some assurance that the reasonable and credible controls imposed by your security policy are adequate. 4.2 Account Management Procedures Procedures to manage accounts are important in preventing unauthorized access to your system. It is necessary to decide several things: Who may have an account on the system? How long may someone have an account without renewing his or her request? How do old accounts get removed from the system? The answers to all these questions should be explicitly set out in the policy. In addition to deciding who may use a system, it may be important to determine what each user may use the system for (is personal use allowed, for example). If you are connected to an outside network, your site or the network management may have rules about what the network may be used for. Therefore, it is important for any security policy to define an adequate account management procedure for both administrators and users. Typically, the system administrator would be responsible for creating and deleting user accounts and generally maintaining overall control of system use. To some degree, account management is also the responsibility of each system user in the sense that the user should observe any system messages and events that may be indicative of a policy violation. For example, a message at logon that indicates the date and time of the last logon should be reported by the user if it indicates an unreasonable time of last logon. 4.3 Password Management Procedures A policy on password management may be important if your site wishes to enforce secure passwords. These procedures may range from asking or forcing users to change their passwords occasionally to actively attempting to break users' passwords and then informing the user of how easy it was to do. Another part of password management policy covers who may distribute passwords - can users give their passwords to other users? Section 2.3 discusses some of the policy issues that need to be
decided for proper password management. Regardless of the policies, password management procedures need to be carefully setup to avoid disclosing passwords. The choice of initial passwords for accounts is critical. In some cases, users may never login to activate an account; thus, the choice of the initial password should not be easily guessed. Default passwords should never be assigned to accounts: always create new passwords for each user. If there are any printed lists of passwords, these should be kept off-line in secure locations; better yet, don't list passwords. 4.3.1 Password Selection Perhaps the most vulnerable part of any computer system is the account password. Any computer system, no matter how secure it is from network or dial-up attack, Trojan horse programs, and so on, can be fully exploited by an intruder if he or she can gain access via a poorly chosen password. It is important to define a good set of rules for password selection, and distribute these rules to all users. If possible, the software which sets user passwords should be modified to enforce as many of the rules as possible. A sample set of guidelines for password selection is shown below: - DON'T use your login name in any form (as-is, reversed, capitalized, doubled, etc.). - DON'T use your first, middle, or last name in any form. - DON'T use your spouse's or child's name. - DON'T use other information easily obtained about you. This includes license plate numbers, telephone numbers, social security numbers, the make of your automobile, the name of the street you live on, etc.. - DON'T use a password of all digits, or all the same letter. - DON'T use a word contained in English or foreign language dictionaries, spelling lists, or other lists of words. - DON'T use a password shorter than six characters. - DO use a password with mixed-case alphabetics. - DO use a password with non-alphabetic characters (digits or punctuation).
- DO use a password that is easy to remember, so you don't have to write it down. - DO use a password that you can type quickly, without having to look at the keyboard. Methods of selecting a password which adheres to these guidelines include: - Choose a line or two from a song or poem, and use the first letter of each word. - Alternate between one consonant and one or two vowels, up to seven or eight characters. This provides nonsense words which are usually pronounceable, and thus easily remembered. - Choose two short words and concatenate them together with a punctuation character between them. Users should also be told to change their password periodically, usually every three to six months. This makes sure that an intruder who has guessed a password will eventually lose access, as well as invalidating any list of passwords he/she may have obtained. Many systems enable the system administrator to force users to change their passwords after an expiration period; this software should be enabled if your system supports it [5, CURRY]. Some systems provide software which forces users to change their passwords on a regular basis. Many of these systems also include password generators which provide the user with a set of passwords to choose from. The user is not permitted to make up his or her own password. There are arguments both for and against systems such as these. On the one hand, by using generated passwords, users are prevented from selecting insecure passwords. On the other hand, unless the generator is good at making up easy to remember passwords, users will begin writing them down in order to remember them. 4.3.2 Procedures for Changing Passwords How password changes are handled is important to keeping passwords secure. Ideally, users should be able to change their own passwords on-line. (Note that password changing programs are a favorite target of intruders. See section 4.4 on configuration management for further information.) However, there are exception cases which must be handled
carefully. Users may forget passwords and not be able to get onto the system. The standard procedure is to assign the user a new password. Care should be taken to make sure that the real person is requesting the change and gets the new password. One common trick used by intruders is to call or message to a system administrator and request a new password. Some external form of verification should be used before the password is assigned. At some sites, users are required to show up in person with ID. There may also be times when many passwords need to be changed. If a system is compromised by an intruder, the intruder may be able to steal a password file and take it off the system. Under these circumstances, one course of action is to change all passwords on the system. Your site should have procedures for how this can be done quickly and efficiently. What course you choose may depend on the urgency of the problem. In the case of a known attack with damage, you may choose to forcibly disable all accounts and assign users new passwords before they come back onto the system. In some places, users are sent a message telling them that they should change their passwords, perhaps within a certain time period. If the password isn't changed before the time period expires, the account is locked. Users should be aware of what the standard procedure is for passwords when a security event has occurred. One well-known spoof reported by the Computer Emergency Response Team (CERT) involved messages sent to users, supposedly from local system administrators, requesting them to immediately change their password to a new value provided in the message [24]. These messages were not from the administrators, but from intruders trying to steal accounts. Users should be warned to immediately report any suspicious requests such as this to site administrators. 4.4 Configuration Management Procedures Configuration management is generally applied to the software development process. However, it is certainly applicable in a operational sense as well. Consider that the since many of the system level programs are intended to enforce the security policy, it is important that these be "known" as correct. That is, one should not allow system level programs (such as the operating system, etc.) to be changed arbitrarily. At very least, the procedures should state who is authorized to make changes to systems, under what circumstances, and how the changes should be documented. In some environments, configuration management is also desirable as applied to physical configuration of equipment. Maintaining valid
and authorized hardware configuration should be given due consideration in your security policy. 4.4.1 Non-Standard Configurations Occasionally, it may be beneficial to have a slightly non-standard configuration in order to thwart the "standard" attacks used by some intruders. The non-standard parts of the configuration might include different password encryption algorithms, different configuration file locations, and rewritten or functionally limited system commands. Non-standard configurations, however, also have their drawbacks. By changing the "standard" system, these modifications make software maintenance more difficult by requiring extra documentation to be written, software modification after operating system upgrades, and, usually, someone with special knowledge of the changes. Because of the drawbacks of non-standard configurations, they are often only used in environments with a "firewall" machine (see section 3.9.1). The firewall machine is modified in non-standard ways since it is susceptible to attack, while internal systems behind the firewall are left in their standard configurations. 5. Incident Handling 5.1 Overview This section of the document will supply some guidance to be applied when a computer security event is in progress on a machine, network, site, or multi-site environment. The operative philosophy in the event of a breach of computer security, whether it be an external intruder attack or a disgruntled employee, is to plan for adverse events in advance. There is no substitute for creating contingency plans for the types of events described above. Traditional computer security, while quite important in the overall site security plan, usually falls heavily on protecting systems from attack, and perhaps monitoring systems to detect attacks. Little attention is usually paid for how to actually handle the attack when it occurs. The result is that when an attack is in progress, many decisions are made in haste and can be damaging to tracking down the source of the incident, collecting evidence to be used in prosecution efforts, preparing for the recovery of the system, and protecting the valuable data contained on the system.
5.1.1 Have a Plan to Follow in Case of an Incident Part of handling an incident is being prepared to respond before the incident occurs. This includes establishing a suitable level of protections, so that if the incident becomes severe, the damage which can occur is limited. Protection includes preparing incident handling guidelines or a contingency response plan for your organization or site. Having written plans eliminates much of the ambiguity which occurs during an incident, and will lead to a more appropriate and thorough set of responses. Second, part of protection is preparing a method of notification, so you will know who to call and the relevant phone numbers. It is important, for example, to conduct "dry runs," in which your computer security personnel, system administrators, and managers simulate handling an incident. Learning to respond efficiently to an incident is important for numerous reasons. The most important benefit is directly to human beings--preventing loss of human life. Some computing systems are life critical systems, systems on which human life depends (e.g., by controlling some aspect of life-support in a hospital or assisting air traffic controllers). An important but often overlooked benefit is an economic one. Having both technical and managerial personnel respond to an incident requires considerable resources, resources which could be utilized more profitably if an incident did not require their services. If these personnel are trained to handle an incident efficiently, less of their time is required to deal with that incident. A third benefit is protecting classified, sensitive, or proprietary information. One of the major dangers of a computer security incident is that information may be irrecoverable. Efficient incident handling minimizes this danger. When classified information is involved, other government regulations may apply and must be integrated into any plan for incident handling. A fourth benefit is related to public relations. News about computer security incidents tends to be damaging to an organization's stature among current or potential clients. Efficient incident handling minimizes the potential for negative exposure. A final benefit of efficient incident handling is related to legal issues. It is possible that in the near future organizations may be sued because one of their nodes was used to launch a network
attack. In a similar vein, people who develop patches or workarounds may be sued if the patches or workarounds are ineffective, resulting in damage to systems, or if the patches or workarounds themselves damage systems. Knowing about operating system vulnerabilities and patterns of attacks and then taking appropriate measures is critical to circumventing possible legal problems. 5.1.2 Order of Discussion in this Session Suggests an Order for a Plan This chapter is arranged such that a list may be generated from the Table of Contents to provide a starting point for creating a policy for handling ongoing incidents. The main points to be included in a policy for handling incidents are: o Overview (what are the goals and objectives in handling the incident). o Evaluation (how serious is the incident). o Notification (who should be notified about the incident). o Response (what should the response to the incident be). o Legal/Investigative (what are the legal and prosecutorial implications of the incident). o Documentation Logs (what records should be kept from before, during, and after the incident). Each of these points is important in an overall plan for handling incidents. The remainder of this chapter will detail the issues involved in each of these topics, and provide some guidance as to what should be included in a site policy for handling incidents. 5.1.3 Possible Goals and Incentives for Efficient Incident Handling As in any set of pre-planned procedures, attention must be placed on a set of goals to be obtained in handling an incident. These goals will be placed in order of importance depending on the site, but one such set of goals might be: Assure integrity of (life) critical systems. Maintain and restore data. Maintain and restore service. Figure out how it happened. Avoid escalation and further incidents. Avoid negative publicity. Find out who did it. Punish the attackers.
It is important to prioritize actions to be taken during an incident well in advance of the time an incident occurs. Sometimes an incident may be so complex that it is impossible to do everything at once to respond to it; priorities are essential. Although priorities will vary from institution-to-institution, the following suggested priorities serve as a starting point for defining an organization's response: o Priority one -- protect human life and people's safety; human life always has precedence over all other considerations. o Priority two -- protect classified and/or sensitive data (as regulated by your site or by government regulations). o Priority three -- protect other data, including proprietary, scientific, managerial and other data, because loss of data is costly in terms of resources. o Priority four -- prevent damage to systems (e.g., loss or alteration of system files, damage to disk drives, etc.); damage to systems can result in costly down time and recovery. o Priority five -- minimize disruption of computing resources; it is better in many cases to shut a system down or disconnect from a network than to risk damage to data or systems. An important implication for defining priorities is that once human life and national security considerations have been addressed, it is generally more important to save data than system software and hardware. Although it is undesirable to have any damage or loss during an incident, systems can be replaced; the loss or compromise of data (especially classified data), however, is usually not an acceptable outcome under any circumstances. Part of handling an incident is being prepared to respond before the incident occurs. This includes establishing a suitable level of protections so that if the incident becomes severe, the damage which can occur is limited. Protection includes preparing incident handling guidelines or a contingency response plan for your organization or site. Written plans eliminate much of the ambiguity which occurs during an incident, and will lead to a more appropriate and thorough set of responses. Second, part of protection is preparing a method of notification so you will know who to call and how to contact them. For example, every member of
the Department of Energy's CIAC Team carries a card with every other team member's work and home phone numbers, as well as pager numbers. Third, your organization or site should establish backup procedures for every machine and system. Having backups eliminates much of the threat of even a severe incident, since backups preclude serious data loss. Fourth, you should set up secure systems. This involves eliminating vulnerabilities, establishing an effective password policy, and other procedures, all of which will be explained later in this document. Finally, conducting training activities is part of protection. It is important, for example, to conduct "dry runs," in which your computer security personnel, system administrators, and managers simulate handling an incident. 5.1.4 Local Policies and Regulations Providing Guidance Any plan for responding to security incidents should be guided by local policies and regulations. Government and private sites that deal with classified material have specific rules that they must follow. The policies your site makes about how it responds to incidents (as discussed in sections 2.4 and 2.5) will shape your response. For example, it may make little sense to create mechanisms to monitor and trace intruders if your site does not plan to take action against the intruders if they are caught. Other organizations may have policies that affect your plans. Telephone companies often release information about telephone traces only to law enforcement agencies. Section 5.5 also notes that if any legal action is planned, there are specific guidelines that must be followed to make sure that any information collected can be used as evidence. 5.2 Evaluation 5.2.1 Is It Real? This stage involves determining the exact problem. Of course many, if not most, signs often associated with virus infections, system intrusions, etc., are simply anomalies such as hardware failures. To assist in identifying whether there really is an incident, it is usually helpful to obtain and use any detection software which may be available. For example, widely available software packages can greatly assist someone who thinks there may be a virus in a Macintosh computer. Audit information is also extremely useful, especially in determining whether there is a network attack. It is extremely important to obtain a system
snapshot as soon as one suspects that something is wrong. Many incidents cause a dynamic chain of events to occur, and an initial system snapshot may do more good in identifying the problem and any source of attack than most other actions which can be taken at this stage. Finally, it is important to start a log book. Recording system events, telephone conversations, time stamps, etc., can lead to a more rapid and systematic identification of the problem, and is the basis for subsequent stages of incident handling. There are certain indications or "symptoms" of an incident which deserve special attention: o System crashes. o New user accounts (e.g., the account RUMPLESTILTSKIN has unexplainedly been created), or high activity on an account that has had virtually no activity for months. o New files (usually with novel or strange file names, such as data.xx or k). o Accounting discrepancies (e.g., in a UNIX system you might notice that the accounting file called /usr/admin/lastlog has shrunk, something that should make you very suspicious that there may be an intruder). o Changes in file lengths or dates (e.g., a user should be suspicious if he/she observes that the .EXE files in an MS DOS computer have unexplainedly grown by over 1800 bytes). o Attempts to write to system (e.g., a system manager notices that a privileged user in a VMS system is attempting to alter RIGHTSLIST.DAT). o Data modification or deletion (e.g., files start to disappear). o Denial of service (e.g., a system manager and all other users become locked out of a UNIX system, which has been changed to single user mode). o Unexplained, poor system performance (e.g., system response time becomes unusually slow). o Anomalies (e.g., "GOTCHA" is displayed on a display terminal or there are frequent unexplained "beeps"). o Suspicious probes (e.g., there are numerous unsuccessful login attempts from another node). o Suspicious browsing (e.g., someone becomes a root user on a UNIX system and accesses file after file in one user's account, then another's). None of these indications is absolute "proof" that an incident is
occurring, nor are all of these indications normally observed when an incident occurs. If you observe any of these indications, however, it is important to suspect that an incident might be occurring, and act accordingly. There is no formula for determining with 100 percent accuracy that an incident is occurring (possible exception: when a virus detection package indicates that your machine has the nVIR virus and you confirm this by examining contents of the nVIR resource in your Macintosh computer, you can be very certain that your machine is infected). It is best at this point to collaborate with other technical and computer security personnel to make a decision as a group about whether an incident is occurring. 5.2.2 Scope Along with the identification of the incident is the evaluation of the scope and impact of the problem. It is important to correctly identify the boundaries of the incident in order to effectively deal with it. In addition, the impact of an incident will determine its priority in allocating resources to deal with the event. Without an indication of the scope and impact of the event, it is difficult to determine a correct response. In order to identify the scope and impact, a set of criteria should be defined which is appropriate to the site and to the type of connections available. Some of the issues are: o Is this a multi-site incident? o Are many computers at your site effected by this incident? o Is sensitive information involved? o What is the entry point of the incident (network, phone line, local terminal, etc.)? o Is the press involved? o What is the potential damage of the incident? o What is the estimated time to close out the incident? o What resources could be required to handle the incident? 5.3 Possible Types of Notification When you have confirmed that an incident is occurring, the appropriate personnel must be notified. Who and how this notification is achieved is very important in keeping the event under control both from a technical and emotional standpoint.
5.3.1 Explicit First of all, any notification to either local or off-site personnel must be explicit. This requires that any statement (be it an electronic mail message, phone call, or fax) provides information about the incident that is clear, concise, and fully qualified. When you are notifying others that will help you to handle an event, a "smoke screen" will only divide the effort and create confusion. If a division of labor is suggested, it is helpful to provide information to each section about what is being accomplished in other efforts. This will not only reduce duplication of effort, but allow people working on parts of the problem to know where to obtain other information that would help them resolve a part of the incident. 5.3.2 Factual Another important consideration when communicating about the incident is to be factual. Attempting to hide aspects of the incident by providing false or incomplete information may not only prevent a successful resolution to the incident, but may even worsen the situation. This is especially true when the press is involved. When an incident severe enough to gain press attention is ongoing, it is likely that any false information you provide will not be substantiated by other sources. This will reflect badly on the site and may create enough ill-will between the site and the press to damage the site's public relations. 5.3.3 Choice of Language The choice of language used when notifying people about the incident can have a profound effect on the way that information is received. When you use emotional or inflammatory terms, you raise the expectations of damage and negative outcomes of the incident. It is important to remain calm both in written and spoken notifications. Another issue associated with the choice of language is the notification to non-technical or off-site personnel. It is important to accurately describe the incident without undue alarm or confusing messages. While it is more difficult to describe the incident to a non-technical audience, it is often more important. A non-technical description may be required for upper-level management, the press, or law enforcement liaisons. The importance of these notifications cannot be underestimated and may make the difference between handling the incident properly and escalating to some higher level of damage.
5.3.4 Notification of Individuals o Point of Contact (POC) people (Technical, Administrative, Response Teams, Investigative, Legal, Vendors, Service providers), and which POCs are visible to whom. o Wider community (users). o Other sites that might be affected. Finally, there is the question of who should be notified during and after the incident. There are several classes of individuals that need to be considered for notification. These are the technical personnel, administration, appropriate response teams (such as CERT or CIAC), law enforcement, vendors, and other service providers. These issues are important for the central point of contact, since that is the person responsible for the actual notification of others (see section 5.3.6 for further information). A list of people in each of these categories is an important time saver for the POC during an incident. It is much more difficult to find an appropriate person during an incident when many urgent events are ongoing. In addition to the people responsible for handling part of the incident, there may be other sites affected by the incident (or perhaps simply at risk from the incident). A wider community of users may also benefit from knowledge of the incident. Often, a report of the incident once it is closed out is appropriate for publication to the wider user community. 5.3.5 Public Relations - Press Releases One of the most important issues to consider is when, who, and how much to release to the general public through the press. There are many issues to consider when deciding this particular issue. First and foremost, if a public relations office exists for the site, it is important to use this office as liaison to the press. The public relations office is trained in the type and wording of information released, and will help to assure that the image of the site is protected during and after the incident (if possible). A public relations office has the advantage that you can communicate candidly with them, and provide a buffer between the constant press attention and the need of the POC to maintain control over the incident. If a public relations office is not available, the information released to the press must be carefully considered. If the information is sensitive, it may be advantageous to provide only minimal or overview information to the press. It is quite possible that any information provided to the press will be
quickly reviewed by the perpetrator of the incident. As a contrast to this consideration, it was discussed above that misleading the press can often backfire and cause more damage than releasing sensitive information. While it is difficult to determine in advance what level of detail to provide to the press, some guidelines to keep in mind are: o Keep the technical level of detail low. Detailed information about the incident may provide enough information for copy-cat events or even damage the site's ability to prosecute once the event is over. o Keep the speculation out of press statements. Speculation of who is causing the incident or the motives are very likely to be in error and may cause an inflamed view of the incident. o Work with law enforcement professionals to assure that evidence is protected. If prosecution is involved, assure that the evidence collected is not divulged to the press. o Try not to be forced into a press interview before you are prepared. The popular press is famous for the "2am" interview, where the hope is to catch the interviewee off guard and obtain information otherwise not available. o Do not allow the press attention to detract from the handling of the event. Always remember that the successful closure of an incident is of primary importance. 5.3.6 Who Needs to Get Involved? There now exists a number of incident response teams (IRTs) such as the CERT and the CIAC. (See sections 3.9.7.3.1 and 3.9.7.3.4.) Teams exists for many major government agencies and large corporations. If such a team is available for your site, the notification of this team should be of primary importance during the early stages of an incident. These teams are responsible for coordinating computer security incidents over a range of sites and larger entities. Even if the incident is believed to be contained to a single site, it is possible that the information available through a response team could help in closing out the incident. In setting up a site policy for incident handling, it may be desirable to create an incident handling team (IHT), much like those teams that already exist, that will be responsible for handling computer security incidents for the site (or organization). If such a team is created, it is essential that communication lines be opened between this team and other IHTs. Once an incident is under way, it is difficult to open a trusted
dialogue between other IHTs if none has existed before. 5.4 Response A major topic still untouched here is how to actually respond to an event. The response to an event will fall into the general categories of containment, eradication, recovery, and follow-up. Containment The purpose of containment is to limit the extent of an attack. For example, it is important to limit the spread of a worm attack on a network as quickly as possible. An essential part of containment is decision making (i.e., determining whether to shut a system down, to disconnect from a network, to monitor system or network activity, to set traps, to disable functions such as remote file transfer on a UNIX system, etc.). Sometimes this decision is trivial; shut the system down if the system is classified or sensitive, or if proprietary information is at risk! In other cases, it is worthwhile to risk having some damage to the system if keeping the system up might enable you to identify an intruder. The third stage, containment, should involve carrying out predetermined procedures. Your organization or site should, for example, define acceptable risks in dealing with an incident, and should prescribe specific actions and strategies accordingly. Finally, notification of cognizant authorities should occur during this stage. Eradication Once an incident has been detected, it is important to first think about containing the incident. Once the incident has been contained, it is now time to eradicate the cause. Software may be available to help you in this effort. For example, eradication software is available to eliminate most viruses which infect small systems. If any bogus files have been created, it is time to delete them at this point. In the case of virus infections, it is important to clean and reformat any disks containing infected files. Finally, ensure that all backups are clean. Many systems infected with viruses become periodically reinfected simply because people do not systematically eradicate the virus from backups. Recovery Once the cause of an incident has been eradicated, the recovery
phase defines the next stage of action. The goal of recovery is to return the system to normal. In the case of a network-based attack, it is important to install patches for any operating system vulnerability which was exploited. Follow-up One of the most important stages of responding to incidents is also the most often omitted---the follow-up stage. This stage is important because it helps those involved in handling the incident develop a set of "lessons learned" (see section 6.3) to improve future performance in such situations. This stage also provides information which justifies an organization's computer security effort to management, and yields information which may be essential in legal proceedings. The most important element of the follow-up stage is performing a postmortem analysis. Exactly what happened, and at what times? How well did the staff involved with the incident perform? What kind of information did the staff need quickly, and how could they have gotten that information as soon as possible? What would the staff do differently next time? A follow-up report is valuable because it provides a reference to be used in case of other similar incidents. Creating a formal chronology of events (including time stamps) is also important for legal reasons. Similarly, it is also important to as quickly obtain a monetary estimate of the amount of damage the incident caused in terms of any loss of software and files, hardware damage, and manpower costs to restore altered files, reconfigure affected systems, and so forth. This estimate may become the basis for subsequent prosecution activity by the FBI, the U.S. Attorney General's Office, etc.. 5.4.1 What Will You Do? o Restore control. o Relation to policy. o Which level of service is needed? o Monitor activity. o Constrain or shut down system. 5.4.2 Consider Designating a "Single Point of Contact" When an incident is under way, a major issue is deciding who is in charge of coordinating the activity of the multitude of players. A major mistake that can be made is to have a number of "points of contact" (POC) that are not pulling their efforts together. This will only add to the confusion of the event, and will probably
lead to additional confusion and wasted or ineffective effort. The single point of contact may or may not be the person "in charge" of the incident. There are two distinct rolls to fill when deciding who shall be the point of contact and the person in charge of the incident. The person in charge will make decisions as to the interpretation of policy applied to the event. The responsibility for the handling of the event falls onto this person. In contrast, the point of contact must coordinate the effort of all the parties involved with handling the event. The point of contact must be a person with the technical expertise to successfully coordinate the effort of the system managers and users involved in monitoring and reacting to the attack. Often the management structure of a site is such that the administrator of a set of resources is not a technically competent person with regard to handling the details of the operations of the computers, but is ultimately responsible for the use of these resources. Another important function of the POC is to maintain contact with law enforcement and other external agencies (such as the CIA, DoD, U.S. Army, or others) to assure that multi-agency involvement occurs. Finally, if legal action in the form of prosecution is involved, the POC may be able to speak for the site in court. The alternative is to have multiple witnesses that will be hard to coordinate in a legal sense, and will weaken any case against the attackers. A single POC may also be the single person in charge of evidence collected, which will keep the number of people accounting for evidence to a minimum. As a rule of thumb, the more people that touch a potential piece of evidence, the greater the possibility that it will be inadmissible in court. The section below (Legal/Investigative) will provide more details for consideration on this topic. 5.5 Legal/Investigative 5.5.1 Establishing Contacts with Investigative Agencies It is important to establish contacts with personnel from investigative agencies such as the FBI and Secret Service as soon as possible, for several reasons. Local law enforcement and local security offices or campus police organizations should also be informed when appropriate. A primary reason is that once a major attack is in progress, there is little time to call various personnel in these agencies to determine exactly who the correct point of contact is. Another reason is that it is important to
cooperate with these agencies in a manner that will foster a good working relationship, and that will be in accordance with the working procedures of these agencies. Knowing the working procedures in advance and the expectations of your point of contact is a big step in this direction. For example, it is important to gather evidence that will be admissible in a court of law. If you don't know in advance how to gather admissible evidence, your efforts to collect evidence during an incident are likely to be of no value to the investigative agency with which you deal. A final reason for establishing contacts as soon as possible is that it is impossible to know the particular agency that will assume jurisdiction in any given incident. Making contacts and finding the proper channels early will make responding to an incident go considerably more smoothly. If your organization or site has a legal counsel, you need to notify this office soon after you learn that an incident is in progress. At a minimum, your legal counsel needs to be involved to protect the legal and financial interests of your site or organization. There are many legal and practical issues, a few of which are: 1. Whether your site or organization is willing to risk negative publicity or exposure to cooperate with legal prosecution efforts. 2. Downstream liability--if you leave a compromised system as is so it can be monitored and another computer is damaged because the attack originated from your system, your site or organization may be liable for damages incurred. 3. Distribution of information--if your site or organization distributes information about an attack in which another site or organization may be involved or the vulnerability in a product that may affect ability to market that product, your site or organization may again be liable for any damages (including damage of reputation). 4. Liabilities due to monitoring--your site or organization may be sued if users at your site or elsewhere discover that your site is monitoring account activity without informing users. Unfortunately, there are no clear precedents yet on the liabilities or responsibilities of organizations involved in a security incident or who might be involved in supporting an investigative effort. Investigators will often encourage organizations to help trace and monitor intruders -- indeed, most
investigators cannot pursue computer intrusions without extensive support from the organizations involved. However, investigators cannot provide protection from liability claims, and these kinds of efforts may drag out for months and may take lots of effort. On the other side, an organization's legal council may advise extreme caution and suggest that tracing activities be halted and an intruder shut out of the system. This in itself may not provide protection from liability, and may prevent investigators from identifying anyone. The balance between supporting investigative activity and limiting liability is tricky; you'll need to consider the advice of your council and the damage the intruder is causing (if any) in making your decision about what to do during any particular incident. Your legal counsel should also be involved in any decision to contact investigative agencies when an incident occurs at your site. The decision to coordinate efforts with investigative agencies is most properly that of your site or organization. Involving your legal counsel will also foster the multi-level coordination between your site and the particular investigative agency involved which in turn results in an efficient division of labor. Another result is that you are likely to obtain guidance that will help you avoid future legal mistakes. Finally, your legal counsel should evaluate your site's written procedures for responding to incidents. It is essential to obtain a "clean bill of health" from a legal perspective before you actually carry out these procedures. 5.5.2 Formal and Informal Legal Procedures One of the most important considerations in dealing with investigative agencies is verifying that the person who calls asking for information is a legitimate representative from the agency in question. Unfortunately, many well intentioned people have unknowingly leaked sensitive information about incidents, allowed unauthorized people into their systems, etc., because a caller has masqueraded as an FBI or Secret Service agent. A similar consideration is using a secure means of communication. Because many network attackers can easily reroute electronic mail, avoid using electronic mail to communicate with other agencies (as well as others dealing with the incident at hand). Non-secured phone lines (e.g., the phones normally used in the business world) are also frequent targets for tapping by network intruders, so be careful!
There is no established set of rules for responding to an incident when the U.S. Federal Government becomes involved. Except by court order, no agency can force you to monitor, to disconnect from the network, to avoid telephone contact with the suspected attackers, etc.. As discussed in section 5.5.1, you should consult the matter with your legal counsel, especially before taking an action that your organization has never taken. The particular agency involved may ask you to leave an attacked machine on and to monitor activity on this machine, for example. Your complying with this request will ensure continued cooperation of the agency--usually the best route towards finding the source of the network attacks and, ultimately, terminating these attacks. Additionally, you may need some information or a favor from the agency involved in the incident. You are likely to get what you need only if you have been cooperative. Of particular importance is avoiding unnecessary or unauthorized disclosure of information about the incident, including any information furnished by the agency involved. The trust between your site and the agency hinges upon your ability to avoid compromising the case the agency will build; keeping "tight lipped" is imperative. Sometimes your needs and the needs of an investigative agency will differ. Your site may want to get back to normal business by closing an attack route, but the investigative agency may want you to keep this route open. Similarly, your site may want to close a compromised system down to avoid the possibility of negative publicity, but again the investigative agency may want you to continue monitoring. When there is such a conflict, there may be a complex set of tradeoffs (e.g., interests of your site's management, amount of resources you can devote to the problem, jurisdictional boundaries, etc.). An important guiding principle is related to what might be called "Internet citizenship" [22, IAB89, 23] and its responsibilities. Your site can shut a system down, and this will relieve you of the stress, resource demands, and danger of negative exposure. The attacker, however, is likely to simply move on to another system, temporarily leaving others blind to the attacker's intention and actions until another path of attack can be detected. Providing that there is no damage to your systems and others, the most responsible course of action is to cooperate with the participating agency by leaving your compromised system on. This will allow monitoring (and, ultimately, the possibility of terminating the source of the threat to systems just like yours). On the other hand, if there is damage to computers illegally accessed through your system, the choice is more complicated: shutting down the intruder may prevent further damage to systems, but might make it impossible to track down the intruder. If there has been damage, the decision about whether it is important to leave systems up to catch the intruder
should involve all the organizations effected. Further complicating the issue of network responsibility is the consideration that if you do not cooperate with the agency involved, you will be less likely to receive help from that agency in the future. 5.6 Documentation Logs When you respond to an incident, document all details related to the incident. This will provide valuable information to yourself and others as you try to unravel the course of events. Documenting all details will ultimately save you time. If you don't document every relevant phone call, for example, you are likely to forget a good portion of information you obtain, requiring you to contact the source of information once again. This wastes yours and others' time, something you can ill afford. At the same time, recording details will provide evidence for prosecution efforts, providing the case moves in this direction. Documenting an incident also will help you perform a final assessment of damage (something your management as well as law enforcement officers will want to know), and will provide the basis for a follow-up analysis in which you can engage in a valuable "lessons learned" exercise. During the initial stages of an incident, it is often infeasible to determine whether prosecution is viable, so you should document as if you are gathering evidence for a court case. At a minimum, you should record: o All system events (audit records). o All actions you take (time tagged). o All phone conversations (including the person with whom you talked, the date and time, and the content of the conversation). The most straightforward way to maintain documentation is keeping a log book. This allows you to go to a centralized, chronological source of information when you need it, instead of requiring you to page through individual sheets of paper. Much of this information is potential evidence in a court of law. Thus, when you initially suspect that an incident will result in prosecution or when an investigative agency becomes involved, you need to regularly (e.g., every day) turn in photocopied, signed copies of your logbook (as well as media you use to record system events) to a document custodian who can store these copied pages in a secure place (e.g., a safe). When you submit information for storage, you should in return receive a signed, dated receipt from the document custodian. Failure to observe these procedures can result in invalidation of any evidence you obtain in a court of law.
6. Establishing Post-Incident Procedures 6.1 Overview In the wake of an incident, several actions should take place. These actions can be summarized as follows: 1. An inventory should be taken of the systems' assets, i.e., a careful examination should determine how the system was affected by the incident, 2. The lessons learned as a result of the incident should be included in revised security plan to prevent the incident from re-occurring, 3. A new risk analysis should be developed in light of the incident, 4. An investigation and prosecution of the individuals who caused the incident should commence, if it is deemed desirable. All four steps should provide feedback to the site security policy committee, leading to prompt re-evaluation and amendment of the current policy. 6.2 Removing Vulnerabilities Removing all vulnerabilities once an incident has occurred is difficult. The key to removing vulnerabilities is knowledge and understanding of the breach. In some cases, it is prudent to remove all access or functionality as soon as possible, and then restore normal operation in limited stages. Bear in mind that removing all access while an incident is in progress will obviously notify all users, including the alleged problem users, that the administrators are aware of a problem; this may have a deleterious effect on an investigation. However, allowing an incident to continue may also open the likelihood of greater damage, loss, aggravation, or liability (civil or criminal). If it is determined that the breach occurred due to a flaw in the systems' hardware or software, the vendor (or supplier) and the CERT should be notified as soon as possible. Including relevant telephone numbers (also electronic mail addresses and fax numbers) in the site security policy is strongly recommended. To aid prompt acknowledgment and understanding of the problem, the flaw should be described in as much detail as possible, including details about how to exploit the flaw.
As soon as the breach has occurred, the entire system and all its components should be considered suspect. System software is the most probable target. Preparation is key to recovering from a possibly tainted system. This includes checksumming all tapes from the vendor using a checksum algorithm which (hopefully) is resistant to tampering [10]. (See sections 3.9.4.1, 3.9.4.2.) Assuming original vendor distribution tapes are available, an analysis of all system files should commence, and any irregularities should be noted and referred to all parties involved in handling the incident. It can be very difficult, in some cases, to decide which backup tapes to recover from; consider that the incident may have continued for months or years before discovery, and that the suspect may be an employee of the site, or otherwise have intimate knowledge or access to the systems. In all cases, the pre-incident preparation will determine what recovery is possible. At worst-case, restoration from the original manufactures' media and a re-installation of the systems will be the most prudent solution. Review the lessons learned from the incident and always update the policy and procedures to reflect changes necessitated by the incident. 6.2.1 Assessing Damage Before cleanup can begin, the actual system damage must be discerned. This can be quite time consuming, but should lead into some of the insight as to the nature of the incident, and aid investigation and prosecution. It is best to compare previous backups or original tapes when possible; advance preparation is the key. If the system supports centralized logging (most do), go back over the logs and look for abnormalities. If process accounting and connect time accounting is enabled, look for patterns of system usage. To a lesser extent, disk usage may shed light on the incident. Accounting can provide much helpful information in an analysis of an incident and subsequent prosecution. 6.2.2 Cleanup Once the damage has been assessed, it is necessary to develop a plan for system cleanup. In general, bringing up services in the order of demand to allow a minimum of user inconvenience is the best practice. Understand that the proper recovery procedures for the system are extremely important and should be specific to the site. It may be necessary to go back to the original distributed tapes and recustomize the system. To facilitate this worst case
scenario, a record of the original systems setup and each customization change should be kept current with each change to the system. 6.2.3 Follow up Once you believe that a system has been restored to a "safe" state, it is still possible that holes and even traps could be lurking in the system. In the follow-up stage, the system should be monitored for items that may have been missed during the cleanup stage. It would be prudent to utilize some of the tools mentioned in section 3.9.8.2 (e.g., COPS) as a start. Remember, these tools don't replace continual system monitoring and good systems administration procedures. 6.2.4 Keep a Security Log As discussed in section 5.6, a security log can be most valuable during this phase of removing vulnerabilities. There are two considerations here; the first is to keep logs of the procedures that have been used to make the system secure again. This should include command procedures (e.g., shell scripts) that can be run on a periodic basis to recheck the security. Second, keep logs of important system events. These can be referenced when trying to determine the extent of the damage of a given incident. 6.3 Capturing Lessons Learned 6.3.1 Understand the Lesson After an incident, it is prudent to write a report describing the incident, method of discovery, correction procedure, monitoring procedure, and a summary of lesson learned. This will aid in the clear understanding of the problem. Remember, it is difficult to learn from an incident if you don't understand the source. 6.3.2 Resources 6.3.2.1 Other Security Devices, Methods Security is a dynamic, not static process. Sites are dependent on the nature of security available at each site, and the array of devices and methods that will help promote security. Keeping up with the security area of the computer industry and their methods will assure a security manager of taking advantage of the latest technology.
6.3.2.2 Repository of Books, Lists, Information Sources Keep an on site collection of books, lists, information sources, etc., as guides and references for securing the system. Keep this collection up to date. Remember, as systems change, so do security methods and problems. 6.3.2.3 Form a Subgroup Form a subgroup of system administration personnel that will be the core security staff. This will allow discussions of security problems and multiple views of the site's security issues. This subgroup can also act to develop the site security policy and make suggested changes as necessary to ensure site security. 6.4 Upgrading Policies and Procedures 6.4.1 Establish Mechanisms for Updating Policies, Procedures, and Tools If an incident is based on poor policy, and unless the policy is changed, then one is doomed to repeat the past. Once a site has recovered from and incident, site policy and procedures should be reviewed to encompass changes to prevent similar incidents. Even without an incident, it would be prudent to review policies and procedures on a regular basis. Reviews are imperative due to today's changing computing environments. 6.4.2 Problem Reporting Procedures A problem reporting procedure should be implemented to describe, in detail, the incident and the solutions to the incident. Each incident should be reviewed by the site security subgroup to allow understanding of the incident with possible suggestions to the site policy and procedures.