STAMP-Related Publications





BOOKS


Engineering a Safer World: Applying Systems Thinking to Safety by Nancy Leveson. Published by MIT Press (January 2012). ( MIT Press Webpage) .

Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. This book describes a new approach to safety and risk management that is better suited to today's complex, sociotechnical, software-intensive world. The new approach is based on modern systems thinking and systems theory. It revisits and updates ideas pioneered by 1950's aerospace engineers in their System Safety concept. The new approach has now been used extensively on real-world systems and it is proving to be more effective, less expensive, and easier to use.

The book describes a new model of causation (STAMP or Systems-Theoretic Accident Model and Processes) that can be used to improve the design,
operation, and management of potentially dangerous systems or products. The accident analysis, hazard analysis, and system engineering techniques techniques built on STAMP can be used to improve the design, operation, and management of potentially dangerous systems or products.

An STPA Primer, Version 0 . The beginning of a primer for using STPA. Some advanced topics are missing but instruction and examples are provided for basic analyses. The Primer is a so-far unpublished supplement to Engineering A Safer World. MIT Press will publish it when it is finished, but like the original book, it may take a few years to finish.

Safeware: System Safety and Computers by Nancy Leveson. Published by Addison Wesley (1995). (HTML Table of Contents)

This 1995 book examines past accidents and what is currently known about building safe electromechanical systems to see what lessons can be applied to new computer-controlled systems. Most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices. In addition, accidents will not be prevented by technological fixes alone, but will require control of all aspects of the development and operation of the system. A methodology for building safety-critical systems is outlined. While this book predates STAMP, it does lay the foundation for it.


Back to the top


BOOK CHAPTERS


Technical and Managerial Factors in the NASA Challenger and Columbia Losses: Looking Forward to the Future by Nancy Leveson, in Handelsman and Kleinman (editors), Controveries in Science and Technology , University of Wisconsin Press, 2007.

This essay examines the technical and organizational factors leading to the Challenger and Columbia accidents and what we can learn from them. While accidents are often described in terms of a chain of directly related events leading to a loss, examining this event chain does not explain why the events themselves occurred. In fact, accidents are better conceived as complex processes involving indirect and non-linear interactions among people, societal and organizational structures, engineering activities, and physical system components. They are rarely the result of a chance occurrence of random events, but usually result from the migration of a system (organization) toward a state of high risk where almost any deviation will result in a loss. Understanding enough about the Challenger and Columbia accidents to prevent future ones, therefore, requires not only determining what was wrong at the time of the losses, but also why the high standards of the Apollo program deteriorated over time and allowed the conditions cited by the Rogers Commission as the root causes of the Challenger loss and why the fixes instituted after Challenger became ineffective over time, i.e., why the manned space program has a tendency to migrate to states of such high risk and poor decision-making processes that an accident becomes almost inevitable.

Software and the Challenge of Flight Control by Nancy Leveson. To appear as a chapter in Space Shuttle Legacy: How We Did It/What We Learned edited by Roger Launius, James Craig, and John Krige and to be published in AIAA in 2013.
Not related to STAMP, but may be of interest to those interested in the risks of software. This is a chapter I wrote for a forthcoming book on the legacy of the Space Shuttle. This chapter describes the challenges NASA faced in creating the Space Shuttle software (and for Gemini and Apollo before that). Although facing incredible challenges, the Shuttle software is remarkably good. This chapter explains why I think that was so and what we can learn about developing software today. In many ways, software engineering is moving in the opposite direction from the practices that made this software so successful.
Back to the top


JOURNAL PAPERS


A New Accident Model for Engineering Safer Systems by Nancy Leveson. Safety Science, Vol. 42, No. 4, April 2004.
A new model of accidents is proposed based on systems theory. Systems are viewed as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control. Accidents result from inadequate control or enforcement of safety-related constraints on the system. Instead of defining safety management in terms of preventing component failure events, it is defined as a continuous control task to impose the constraints necessary to limit system behavior to safe changes and adaptations. Accidents can be understood, using this model, in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This model provides a theoretical foundation for the introduction of unique new types of accident analysis, hazard analysis, design for safety, risk assessment techniques, and approaches to designing performance monitoring and safety metrics.

Applying Systems Thinking to Analyze and Learn from Events by Nancy Leveson, presented at NeTWorK 2008: Event Analysis and Learning from Events, Berlin, August 2008 and later published in Safety Science,Vol. 49, No. 1, January 2010, pp. 55-64.

Why don't the approaches we use to learn from events, most of which go back for decades and have been incrementally improved over time, work well in today's world? Maybe the answer can be found by reexamining the underlying assumptions and paradigms in safety and identifying any potential disconnects with the world as it exists today. While abstractions and simplications are useful in dealing with complex systems and problems, those that are counter to reality can hinder us from making forward progress. Most of the new research in this field never questions these assumptions and paradigms. It is important to devote some effort to examining our foundations, which is what I try to do in this paper. There are too many beliefs in accident analysis---starting with the assumption that analyzing events and learning from them is adequate---that are accepted without question.


Moving Beyond Normal Accidents and High Reliability Organizations: An Alternative Approach to Safety in Complex Systems by Nancy Leveson, Karen Marais, Nicolas Dulac, and John Carroll, Organizational Studies , Vol 30, Feb/Mar 2009, Sage Publishers, pp. 227-249.

Organizational factors play a role in all accidents and are a critical part of understanding and preventing them. Two prominent sociological schools of thought have addressed the organizational aspects of safety: normal Accident Theory and High Reliability Organizations (HRO). In this paper, we argue that the conclusions of HRO reseachers are limited in their applicability and usefulness to complex, high-risk systems and following some of the recommendations could actually contribute to accidents. Normal Accident Theory, on the other hand, does recognize the difficulties involved but is unnecessarily pessimistic about the possibility of effectively dealing with them. An alternative systems approach to safety is described.

Safety Assurance in NextGen and Complex Transportation Systems by Cody Harrison Fleming, Melissa Spencer, John Thomas, Nancy Leveson, and Chris Wilkinson, Safety Science , in press.

The methods currently used to assure the safety of planned changes in our air transportation systems were developed 50 years ago for systems composed primarily of hardware components and of much less complexity than the systems we are building today. These methods are not powerful enough to handle the complex, human and software intensive systems being planned and introduced today. This paper describes an alternative and demonstrates it on a new NextGen procedure to allow more flight level changes over oceanic and other regions with limited radar coverage. The new approach and results are compared with the results obtained by the more traditional methods being used for NextGen.
Hazard Analysis of Complex Spacecraft using Systems Theoretic Process Analysis by Takuto Ishimatsu, Nancy G. Leveson, John Thomas, Cody Fleming, Masafumi Katahira, Yuko Miyamoto, Ryo Ujiie, Haruka Nakao, and Nobuyuki Hoshino, AIAA Journal of Spacecraft and Rockets , in press, 2013.

A new hazard analysis technique, called System-Theoretic Process Analysis (STPA) is capable of identifying potential hazardous design flaws, including software and system design errors and unsafe interactions among multiple system components. Detailed procedures for performing the hazard analysis were developed and the feasibility and utility of using in on complex systems was demonstrated by applying it to the Japanese Aerospace Exploration Agency H-II Transfer Vehicle. In a comparison of the results of this new hazard analysis technique to those of the standard fault tree analysis used in the design and certification of the H-II Transfer Vehicle, System-Theoretic Hazard Analysis found all the hazardous scenarios identified in the fault tree analysis as well as additional causal factors that had not been) identified by fault tree analysis.
Drawbacks in Using the Term "Systems of Systems," by Nancy Leveson, Journal of Biomedical Instrumentation and Technology, March/April 2013.

This essay was written after attending an AAMI/FDA meeting on interoperability in medical devices. In it I express my puzzlement over the term system-of-systems and why it is misleading and may lead to dead ends in solving system safety problems.
The Use of Safety Cases in Certification and Regulation by Nancy Leveson. An earlier version of this paper appeared in the Journal of System Safety , Nov/Dec 2011. The version here is updated from that version and includes more material.

Starting with my involvement with the Presidential Oil Spill Commission (on Deepwater Horizon), I started studying the engineering and law literature and have become concerned by the push to use safety cases in the certification of many industries in the U.S. This paper describes what I have learned and my conclusions about the dangers of this approach.
Applying System Engineering to Pharmaceutical Safety by Nancy Leveson, Matthieu Couturier, John Thomas, Meghan Dierks, David Wierz, Bruce Psaty, Stan Finkelstein. Journal of Healthcare Engineering, Sept. 2012.

While engineering techniques are used in the development of medical devices and have been applied to individual healthcare processes, such as the use of checklists in surgery and ICUs, the application of system engineering techniques to larger healthcare systems is less common. System safety is the part of system engineering that uses modeling and analysis to identify hazards and to design the system to eliminate or control them. In this paper, we demonstrate how to apply a new, safety engineering static and dynamic modeling and analysis approach to healthcare systems. Pharmaceutical safety is used as the example in the paper, but the same approach is potentially applicable to other complex healthcare systems.

One use for such modeling and analysis is to provide a rigorous way to evaluate the efficacy of potential policy changes as a whole. Less than effective changes may be made when they are created piecemeal to fix a current set of adverse events. Existing pressures and influences, not changed by the new procedures, can defeat the intent of the changes by leading to unintended and counterbalancing actions by system stakeholders. System engineering techniques can be used in re-engineering the system as a whole to achieve the system goals, including both enhancing the safety of current drugs while, at the same time, encouraging the development of new drugs.

Software Challenges in Achieving Space Safety by Nancy Leveson. Journal of the British Interplanetary Society, Vol. 62, 2009.

Techniques developed for hardware reliability and safety do not work on software-intensive systems; software does not satisfy the assumptions underlying these techniques. The new problems and why the current approaches are not effective for complex, software-intensive systems are first described. Then a new approach to hazard analysis and safety-driven design is presented. Rather than being based on reliability theory, as most current safety engineering techniques are, the new approach builds on system and control theory.

A Systems-Theoretic Approach to Safety in Software-Intensive Systems by Nancy Leveson. IEEE Trans. on Dependable and Secure Computing, January 2005.

Traditional accident models were devised to explain losses caused by failures of physical devices in relatively simple systems. They are less useful for explaining accidents in software-intensive systems and for non-technical aspects of safety such as organizational culture and human decision-making. This paper describes how systems theory can be used to form new accident models that better explain system accidents (accidents arising from the interactions among components rather than individual component failure), software-related accidents, and the role of human decision-making. Such models consider the social and technical aspects of systems as one integrated process and may be useful for other emergent system properties such as security. The loss of a Milstar satellite being launched by a Titan/Centaul launch vehicle is used as an illustration of the approach.

Back to the top


MASTERS THESES


A STAMP Analysis of the LEX Comair 5191 Accident , by Paul S. Nelson, Master's Thesis, Lund University, Sweden, June 2008, supervised by Prof. Sidney Dekker.

Abstract: A new view, a holistic systems view, that sees individuals in systems, is growing. It is a view which sees ``human error is an effect of trouble deeper inside the system.. [where] we must turn to the system in which people work: the design of equipment, the usefulness of procedures, the existence of goal conflicts and production pressure." (Dekker, 2007) A new, holistic systems perspective, accident model is used for analysis of the Comair 5191 accident in Lexington, KY on August 27, 2006. The new model is called: Systems-Theoretic Accident Modeling and Processes (STAMP). It incorporates three basic components: constraints, hierarchical levels of control, and process loops. Accidents are understood ``in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This STAMP analysis of the 5191 accident illustrates the usefulness of the STAMP model to foster evaluation of the whole system and uncover useful levers for elimination of future loss potential thereby making progress on safety.

  • Comparison of SOAM and STAMP for ATM Incident Investigation by Richard Arnold, Master's Thesis, Lund University, Sweden, 2009, supervised by Prof. Sidney Dekker.

    Abstract: Systemic Occurrence Analysis Methodology (SOAM) is promoted by Eurocontrol for the analysis of Air Traffic Management (ATM) occurrences. Systems Theoretic Accident Model and Process (STAMP) based on systems theory has been defined by professor Nancy Leveson (MIT) to explain systems accidents (accidents arising from the interactions among components rather than individual component failure). This research analyzes an ATM occurrence using SOAM and STAMP and compares their usefulness in identifying systemic countermeasures. The results show that SOAM is a useful heuristic and a powerful communication device but that it is weak with respect to emergent phenomena and non-linear interactions. SOAM directs the investigator to consider the context in which the events occurred; barriers that failed and organizational factors; the "holes in the Swiss cheese," but not into the processes which created them, or how the whole system can migrate towards the boundaries of safe operations. STAMP directs the investigator more deeply into the mechanism of the interactions between system components, and how systems adapt over time. STAMP helps identify the controls and constraints necessary to prevent undesirable interactions between system components. STAMP also directs the investigation through a structured analysis of the upper levels of the system's control structure which helps to identify high level systemic countermeasures. The global ATM system is undergoing a period of rapid technological and political change. In Europe the Single European Sky ATM Research (SESAR) and in the US the NextGen programs mean that the ATM is moving from centralized human controlled systems to semi automated distributed decision making. Continuous Descent Arrivals flown on datalinked 4D flight paths that are tailored to local constraints and timed for merging traffic require digital information sharing and Collaborative Decision Making on a grand scale, as well as Functional Airspace Blocks designed for optimal airspace efficiency and safety. Detailed new systemic models like STAMP are now necessary to prevent undesirable interactions between normally functioning system components and to understand changes over time in increasingly complex ATM systems.

    A CAST Analysis of a U.S. Coast Guard Aviation Mishap , by Jon Hickey, MIT Master's Thesis, May 2012, supervised by Dr. Qi van Eikema Hommes.

    Abstract: During a 22-month period, between 2008 and 2010, the U.S. Coast Guard experienced seven Class-A aviation mishaps resulting in the loss of 14 Coast Guard aviators and seven Coast Guard aircraft. This represents the highest Class-A aviation mishap rate the Coast Guard has experienced in 30 years. Following each Class-A mishap, the Coast Guard conducted Mishap Analysis Boards (MAB) in accordance with Coast Guard aviation policy. A MAB involves a detailed investigation and report on the causal and contributing factors of a specific mishap and is conducted in accordance with the Department of Defense Human Factors Analysis and Classification System (DOD HFACS) which is based on the "Swiss Cheese" accident causal analysis model. Individual MAB results did not identify common causal or contributing factors that may be causing systemic failures within the aviation safety system. Subsequently, the Coast Guard completed a more system-focused safety analysis known as the Aviation Safety Assessment Action Plan (ASAAP) comprised of five components: 1) Operational Hazard Analysis; 2) Aviation Safety Survey; 3) Aviation Leadership Improvement Study; 4) Independent Data Analysis Study; and 5) Industry Benchmarking Study. ASAAP recently concluded "complacency in the cockpit and chain of command as the leading environmental factor in the rash of serious aviation mishaps." Although the ASAAP study examined Coast Guard aviation more holistically than individual MABs, it did not apply systems theory and systems engineering approaches.

    This thesis applies Dr. Leveson's Systems Theoretic Accident Model and Processes (STAMP) model to identify, evaluate, eliminate, and control system hazards through analysis, design, and management procedures, in order to more fully examine the Coast Guard's aviation system for potential systemic sources of safety hazards. The case study used in this thesis is the September 2008 mishap, involving a Coast Guard helicopter (CG-6505) conducting hoist training with a Coast Guard small boat, which resulted in the loss of the helicopter and its four-person crew. The analysis identified enhancements to Coast Guard aviation system controls that were not expressly identified as part of the MAB and ASAAP study. These findings will complement the Coast Guard's MAB and ASAAP results to better understand and eliminate systemic Coast Guard aviation safety hazards with the aim of preventing future mishaps. Finally, by comparing the results of the STAMP analysis and the MAB, this thesis attempts to answer the question, "is the STAMP model better than the 'Swiss Cheese' model in identifying causes of the accidents?"

    Application of CAST and STPA to Railroad Safety. , by Airong Dong, MIT Master's Thesis, May 2012.

    Abstract: The accident analysis method called STAMP (System-Theoretic Accident Model), developed by Prof. Nancy Leveson from MIT, was used here to re-analyze a High Speed Train accident in China. On July 23rd, 2011, 40 people were killed and 120 injured on the Yong-Wen High Speed Line. The purpose of this new analysis was to apply the broader view suggested by STAMP, considering the whole sociotechnological system and not only equipment failures and operators mistakes, in order to come up with new findings, conclusions and recommendations for the High Speed Train System in China.

    The STAMP analysis revealed that the existing safety culture in the whole train organization, the Ministry of Railway and all its sub-organizations in both the Train Development and Train Operation channels, do not meet the safety challenges involved in a high risk system like this---running frequent trains on the same line at 250km/h, with hundreds of passenger on board. The safety hazards were not systematically analyzed (not at the top level nor at the design level), safety constraints and safety requirements were very vaguely phrased, and no real enforcement was applied on safe design and implementation nor on safe operation. It looks like no clear policy on the performance/safety dilemma existed, nor the necessary safety education and training.

    Following from the STAMP analysis, one of the major recommendations in this thesis is to create a professional Train Safety Authority at the highest level, to be in charge of creating and supervising the rules for both Engineering and Operations, those two being highly interrelated with respect to safety. Specific Control Structures are recommended too, along with some detailed technical recommendations regarding the fail-safe design of the equipment involved in the accident.

    Another major recommendation is to design the safety critical systems, like the signaling control system using STPA ((System Theoretic Process Analysis), a hazard analysis technique. In the second part of this thesis, STPA is applied to another signaling system---Communication Based Train Control (CBTC) system---which is similar to the one presented in the first part. The primary goal of STPA is to include the new causal factors identified in STAMP that are not handled by the older techniques. It aims to identify accident scenarios that encompass the entire accident process, including design errors, social, organizational, and management factors contributing to accidents. These are demonstrated in the STPA analysis section.

    Engineering Financial Safety: A System-Theoretic Case Study from the Financial Crisis , by Melissa Spencer, MIT TPP (Technology and Policy Program) Master's Thesis, May 2012.

    Abstract: There is currently much systems-based thinking going into understanding safety in complex socio-technical systems and in developing useful accident analysis methods. However, when it comes to complex systems without clear physical components, the techniques for understanding accidents are antiquated and ineffective. This thesis uses a promising new engineering-based accident analysis methodology, CAST (Casual Analysis using STAMP, or Systems Theoretic Accident Models and Processes) to understand an aspect of the financial crisis of 2007-2008.

    This thesis demonstrates how CAST can be used to understand the context and control problems that led to the collapse and rapid acquisition of the investment bank Bear Stearns in March 2008. It seeks to illustrate the technological and regulatory change that provided the context for the Bear Stearns accidents and then demonstrates how a top-down systematic method of analysis can produce more insight into the accident than traditional financial accident investigations such as congressionally mandated inquiries.

    A Systems Theoretic Application to Design for the Safety of Medical Diagnostic Devices , by Vincent Balgos, MIT SDM Master's Thesis, February 2012, supervised by Dr. Qi van Eikema Hommes.

    Abstract: In today's environment, medical technology is rapidly advancing to deliver tremendous value to physicians, nurses, and medical staff in order to support them to ultimately serve a common goal: provide safe and effective medical care for patients. However, these complex medical systems are contributing to the increasing number of healthcare accidents each year. These accidents present unnecessary risk and injury to the very population these systems are designed to help. Thus the current safety engineering techniques that are widely practiced by the healthcare industry during medical system development are inadequate in preventing these tragic accidents. Therefore, there is a need for a new approach to design safety into medical systems.

    This thesis demonstrated that a holistic approach to safety design using the Systems Theoretic Accident Model and Process (STAMP) and Causal Analysis based on STAMP (CAST) was more effective than the traditional, linear chain-of-events model of Failure Mode Effects and Criticality Analysis (FMECA). The CAST technique was applied to a medical case accident involving a complex diagnostic analyzer system. The results of the CAST analysis were then compared to the original FMECA hazards. By treating safety as a control problem, the CAST analysis was capable of identifying an array of hazards beyond what was detected by the current regulatory approved technique. From these hazards, new safety design requirements and recommendations were generated for the case system that could have prevented the case accident. These safety design requirements can also be utilized in new medical diagnostic system development efforts to prevent future medical accidents, and protect the patient from unnecessary harm.

    A Systems Approach to Food Accident Analysis , by John Helferich, MIT SDM Thesis, May 2012. This thesis won the "Best SDM Master's Thesis" award at MIT last year for the System Design and Management Program.

    Abstract: Food borne illnesses lead to 3000 deaths per year in the United States. Some industries, such as aviation, have made great strides increasing safety through careful accident analysis leading to changes in industry practices. In the food industry, the current methods of accident analysis are grounded in regulations developed when the food industry was far simpler than today. The food industry has become more complex with international supply chains and a consumer desire for fresher food. This thesis demonstrates that application of a system theoretic accident analysis method, CAST, results in more learning than the current method of accident analysis. This increased learning will lead to improved safety performance in the food production system

    Application of a System Safety Framework in Hybrid Socio-Technical Environment of Eurasia.
    by Azamat MIT SDM Thesis, 20. This thesis won the "Best SDM Master's Thesis" award at MIT.

    Abstract: The political transformation and transition of post-Soviet societies have led to hybrid structures in political, economic and technological domains. In such hybrid structures the roles of government, state enterprise, private business and civil society are not clearly defined. These roles shift depending on formal and informal interests, availability and competition for limited resources, direct and indirect financial benefits, internal and external agendas. In an abstract sense, a hybrid is "anything derived from heterogeneous sources, or composed of elements of different or incongruous kinds." If transition is a process from one state to another, hybrid is a state unto itself. In the context of this thesis Hybrid Socio-Technical Environment means the co-existence of different institutions and policies, state and private business entities, old and new technologies, managerial models and practices of planning and market economies, collectivist and individualist value systems.

    Rapid technological progress, coupled with shifts in political and economic structures, may produce long-lasting disturbances in a society. Such disturbances are result of the hybrid society's contradictory nature. Some of these disturbances appear in the form of large-scale systemic accidents, such as the Sayano-Shushenskaya Hydroelectric Power Station accident. The rigid and outdated Soviet socio-technical system was broken down into multiple independent systems and subsystems to increase operational flexibility, with very limited capital investment. A twenty-year transition period (1990-2010), proved the survivability of the Soviet system, which was able to perform its primary functions even with partial capacity. However, recent large-scale accidents are clear signs that the system is stretching beyond its limits. Changes in the socio-technical landscape (multiple stakeholders and variety of interests) suggest that the traditional approaches of Reliability Theory, with its inward focus, may not be an effective tool in identifying emerging challenges. The outward-focused System theory approach takes into consideration key characteristics of the changing hybrid socio-technical landscape, as well as motivations of multiple stakeholders. The research concludes that insufficient capital investment and backlog in maintenance shifts are key systemic factors that allow migration of organizational behavior from a safe to an unsafe state. Additional analysis has to be conducted to prove this conclusion.

    Developing System-Based Leading Indicators for Proactive Risk Management in the Chemical Processing Industry by Ibrahim Khawaji, MIT ESD Master's Thesis, May 2012.

    Abstract: The chemical processing industry has faced challenges with achieving improvements in safety performance, and accidents continue to occur. When accidents occur, they usually have a confluence of multiple factors, suggesting that there are underlying complex systemic problems. Moreover, accident investigations often reveal that accidents were preventable and that many of the problems were known prior to those accidents, suggesting that there may have been early warning signs.

    System-based analysis addresses systemic aspects and leading indicators enable the detection of ineffective controls and degradation of the system. Together, they could enable taking needed actions before an incident or a loss event. To develop process safety indicators, the chemical processing industry currently uses guidelines that are mainly based on the concepts of the "Swiss Cheese Model" and the "Accident Pyramid." The current guidelines lack a systemic approach for developing process safety indicators; the guidelines view indicators as independent measures of the safety of a system (e.g. a failure of a barrier), which can be misleading because it would not identify ineffective controls, such as those associated with the migration of the system towards an unsafe state, or associated with interdependencies between barriers. Moreover, process safety indicators that are currently used in the chemical industry are more focused on lagging as opposed to leading indicators.

    The main objective of this thesis is to develop a structured system-based method that can assist a hydrocarbon/chemical processing organization in developing system-based process safety leading indicators. Building on developed safety control structures and the associated safety constraints, the proposed method can be used to develop both technical and organizational leading indicators based on the controls, feedbacks, and process models, which, ultimately, can ensure that there is an effective control structure.

    Integrating Safety into an Engineering Contractor's System Engineering Process using the Guidelines of STAMP, by Lorena Pelegrin, Master's Thesis, Herriot-Watt University, August 2012.

    Abstract "Engineering Contractor"(EC)is a group of engineering and consulting companies providing services worldwide in the fields of oil and gas, water and environment, energy and climate protection, and transport and structures. Because currently there is no consolidated system engineering process that includes designing for safety systematically and the top management of EC has understood the responsibility of EC in the safety of the systems they engineer, the present thesis was proposed.

    An initial review on how safety is addressed in the system engineering process in EC was performed. The fundamentals of using STAMP in system engineering were used as guidelines to check against. The hypotheses included that EC varies widely the approach to safety depending on the different client requirements and involvement of individuals, and that the results of safety-related activities have a weak impact on the system design and often are used as instruments to legitimize a design rather than to improve the safety of the system. The survey confirmed the hypotheses to a great extent.

    After the initial review, the results were analyzed in terms of identification of current practice and feasibility of STAMP implementation in EC. A case study on implementation of the new techniques to a project example was also developed for illustration purposes. Finally, high-level guidelines and a strategy for implementation of STAMP in EC were derived.

    This work concludes that the use of STAMP principles and the guidelines given in Leveson's "Engineering a Safer World" provide a comprehensive, detailed and useful frameword for evaluating how an organization designs for safety and for defining measures specifically tailored to an organization. This work also demonstrates that while a fundamental departure from traditional safety engineering and hazard analysis techniques might seem a difficult campaign to undertake, it is possible to incorporate many elements of STAMP and STPA in the short term with significant impact on how safety is designed into the system and, moreover, with a by-product improvement in the efficiency of engineering management activities and the quality of the engineering work delivered.

    A System Theoretic Safety Analysis of Friendly Fire Prevention in Ground Based Missile Systems, by Scott McCarthy, MIT SDM Master's Thesis, January 2013.

    Abstract: This thesis uses STAMP to analyze a friendly fire accident that occurred on 22 March 03 between a British Tornado aircraft and a US Patriot Missile battery. This causation model analyzs system constraints, control loops, and process models to identify inadequate control structures leading to hazards and preventative measures that may be taken to reduce the effects of these hazards. By using a system-based causation model like STAMP, rather than a traditional chain of events model, this thesis aimed to identify systemic factors and component interactions that may have contributed to the accident, rather than simply analyzing component failures. Additionally, care was taken to understand the rationale for decisions that were made, rather than assigning blame. The analysis identified a number of areas in which control flaws or inadequacies led to the friendly fire incident. A set of recommendations was developed that may help to prevent similar accidents from occurring in the future.
    Back to the top


    Ph.D. DISSERTATIONS


    Extending and Automating a Systems-Theoretic Hazard Analysis for Requirements Generation and Analysis by John Thomas, MIT Ph.D. Dissertation, June 2013.

    Abstract:

    Systems Theoretic Process Analysis (STPA) is a powerful new hazard analysis method designed to go beyond traditional safety techniques---such as Fault Tree Analysis (FTA)---that overlook important causes of accidents like flawed requirements, dysfunctional component interactions, and software errors. Although traditional techniques have been effective at analyzing and reducing accidents caused by component failures, modern complex systems have introduced new problems that can be much more difficult to anticipate, analyze, and prevent. In addition, a new class of accidents, component interaction accidents, has become increasingly prevalent in today.s complex systems and can occur even when systems operate exactly as designed and without any component failures.

    While STPA has proven to be effective at addressing these problems, its application thus far has been ad-hoc with no rigorous procedures or model-based design tools to guide the analysis. In addition, although no formal structure has yet been defined for STPA, the process is based on a control-theoretic framework that could be formalized and adapted to facilitate development of automated methods that assist in analyzing complex systems. This dissertation defines a formal mathematical structure underlying STPA and introduces a procedure for systematically performing an STPA analysis based on that structure. A method for using the results of the hazard analysis to generate formal safety-critical, model-based system and software requirements is also presented. Techniques to automate both the STPA analysis and the requirements generation are introduced, as well as a method to detect conflicts between safety requirements and other functional model-based requirements during early development of the system.

    Accident Analysis and Hazard Analysis for Human and Organizational Factors by Margaret Stringfellow, October 2010.

    Abstract: {Abridged]

    Current hazard analysis methods, adapted from traditional accident models, are not able to evaluate the potential for risk migration, or comprehensively identify accident scenarios involving humans and organizations. Thus, system engineers are not able to design systems that prevent loss events related to human error or organizational factors. State of the art methods for human and organization hazard analysis are, at best, elaborate event-based classification schemes for potential errors. Current human and organization hazard analysis methods are not suitable for use as part of the system engineering process.

    Systems must be analyzed with methods that identify all human and organization related hazards during the design process, so that this information can be used to change the design so that human error and organization errors do not occur. Errors must be more than classified and categorized, errors must be prevented in design. A new type of hazard analysis method that identifies hazardous scenarios involving humans and organizations is needed for both systems in conception and those already in the field.

    This thesis contains novel new approaches to accident analysis and hazard analysis. Both methods are based on principles found in the Human Factors, Organizational Safety and System Safety literature. It is hoped that the accident analysis method should aid engineers in understanding how human actions and decisions are connected to the accident and aid in the development of blame-free reports that encourage learning from accidents. The goal for the hazard analysis method is that it will be useful in: 1) designing systems to be safe; 2) diagnosing policies or pressures and identifying design flaws that contribute to high-risk operations; 3) identifying designs that are resistant to pressures that increase risk; and 4) allowing system decision-makers to predict how proposed or current policies will affect safety. To assess the accident analysis method, a comparison with state of the art methods is conducted. To demonstrate the feasibility of the method applied to hazard analysis; it is applied to several systems in various domains.

    A Framework for Dynamic Safety and Risk Management Modeling in Complex Systems by Nicolas Dulac, February 2007.

    Almost all traditional hazard analysis or risk assessment techniques, such as failure modes and effect analysis (FMEA), fault tree analysis (FTA), and probabilistic risk analysis (PRA) rely on a chain-of-event paradigm of accident causation. Event-based techniques have some limitations for the study of modern engineering systems. Specifically, they are not suited to handle complex software-intensive systems, complex human-machine interactions, and systems-of-systems with distributed decision-making that cut across both physical and organizational boundaries. [...]

    The main contribution of this thesis is the augmentation of STAMP with a dynamic executable modeling framework in order to further improve safety in the development and operation of complex engineering systems. This executable modeling framework: 1) enables the dynamic analysis of safety-related decision-making in complex systems, 2) assists with the design and testing of non-intuitive policies and processes to better mitigate risks and prevent time-dependent risk increase, and 3) enables the identification of technical and organizational factors to detect and monitor states of increasing risk before an accident occurs.

    The modeling framework is created by combining STAMP safety control structures with system dynamic modeling principles. A component-based model-building methodology is proposed to facilitate the building of customized STAMP-based dynamic risk management models and make them accessible to managers and engineers with limited simulation experience. A library of generic executable components is provided as a basis for model creation, refinement, and validation. A toolset is assembled to identify risk increase patterns, analyze time-dependent risks, assist engineers and managers in safety-related decisionmaking, create and test risk mitigation actions and policies, and monitor the system for states of increasing risk.

    Development of a Systematic Risk Management Approach for CO2 Capture, Transport, and Storage Projects by Jaleh Samadi, L'Ecole Nationale Superieure des Mines de Paris Ph.D. dissertation, December, 2012

    Abstract: A systematic risk management framework for CO2 capture, transport, and storage projects is proposed. The approach is founded on the concepts of system thinking, STAMP, STPA, and system dynamics. The objective is to provide a means of decision making for these types of projects in the actual context where the future of the technology is uncertain.
    Systems Theoretic Hazard Analysis (STPA) Applied to the Risk Review of Complex Systems: An Example from the Medical Device Industry by Blandine Antoine, MIT Ph.D. dissertation, December, 2012

    Abstract: Methods developed by system engineers could beneficially be applied to the challenge of ensuring patient safety in health care delivery. Achieving safe operations in this and other settings requires that system behavior be bound by safety constraints. These must be defined and enforced at every stage of system design, system operations, and, when applicable, system retirement.

    Traditional methods to identify and document hazards, and the corresponding safety constraints, are lacking in their ability to account for human, software, and sub-system interactions in highly technical systems. STAMP, a systems-theoretic accident causality model, was created to overcome these limitations. STAMP offers consideration for context and design features that can lead to unsafe behavior, including behavior resulting from unsafe interactions among correctly operating system elements. The application of STAMP hazard analysis method STPA to five subsystems of the experimental PROSCAN proton therapy system operated by the Paul Scherrer Institute in Switzerland demonstrates how STPA can augment design and risk review activities of complex systems. The STPA methodology is also advanced by creating notations and a process to document, query, and visualize the possibly large number of hazardous scenarios identified by STPA analyses, with the goal of facilitating their review and use by their intended audience.

    Back to the top


    CONFERENCE PAPERS


    A System-Theoretic Hazard Analysis Methodology for a Non-advocate Safety Assessment of the Ballistic Missile Defense System by Steve Pereira, Grady Lee, and Jeffrey Howard. Proceedings of the 2006 AIAA Missile Sciences Conference, Monterey, CA, November 2006.

    The Missile Defense Agency (MDA) is developing the Ballistic Missile Defense System (BMDS) as a layered defense to defeat all ranges of threats in all phases of flight (boost, midcourse, and terminal). The BMDS integrates into a single system a number of Elements that had been developed independently, such as SBIRS/DSP, Aegis BMD, and Ground-based Midcourse Defense (GMD). The Elements of the BMDS have active safety programs, but complexity, coupling, and safety risk are introduced by their integration into a single system. Assessing the safety of the integrated BMDS required analysts to come up to speed using existing Element project documentation, assess the safety risk of the system, and make recommendations regarding hazard mitigation and risk acceptance. This effort often required conducting hazard analyses to supplement existing Element analysis work; working with existing engineering artifacts; and making recommendations for hazard mitigations late in the system life cycle, when there is less flexibility for design changes. This paper presents a safety assessment methodology based on STPA (a systems-theoretic hazard analysis); the assessment methodology provides an organized, methodical, and effective means to assess safety risk and develop appropriate hazard mitigations regardless of where in the life cycle the assessment is started.

    Modeling and Hazard Analysis using STPA by Takuto Ishimatsu, Nancy Leveson, John Thomas, Masa Katahira, Yuko Miyamoto, Haruka Nakao. Presented at the Conference of the International Association for the Advancement of Space Safety, Huntsville, Alabama, May 2010.

    A joint research project between MIT and JAXA/JAMSS is investigating the application of a new hazard analysis technique, called STPA, to the system and software in the HTV. STPA is based on systems theory rather than reliability theory. It treats safety as a control problem rather than a failure problem. Traditional hazard analysis focuses on component failures but software does not fail in this way. Software most often contributes to accidents by commanding the spacecraft into an unsafe state (e.g., turning off the descent engines prematurely) or by not issuing required commands. That makes the standard hazard analysis techniques of limited usefulness on software-intensive systems, which describes most spacecraft built today.

    This paper describes the experimental application of STPA to the JAXA HTV (unmanned cargo transfer vehicle to the International Space Station). Because the HTV was originally developed using fault tree analysis and following the NASA standards for safety-critical systems, the results of our experimental application of STPA can be compared with these more traditional safety engineering approaches in terms of the problems identified and the resources required to use it.

    Multiple Controller Contributions to Hazards by Takuto Ishimatsu, Nancy Leveson, Cody Fleming, Masa Katahira, Yuko Miyamoto, and Haruka Nakao. This paper was presented at the Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.

    One contributor to hazards in complex systems arises out of unsafe interactions among multiple controllers. The basic problem is that in complex systems, hazards can be created by interactions among components that are each operating "correctly." STPA is a new hazard analysis that includes both system hazards caused by component failures (as do the traditional analysis techniques) and also those caused by unsafe interactions among components that may not have individually failed. The first descriptions of STPA, however, did not include examples of how to handle potential problems that occur between multiple controllers. We have created an approach to identify possible unsafe interactions among multiple controllers so that the system can be designed to eliminate any ambiguity or potential for unsafe controller interactions. In this paper, we describe the analysis technique and demonstrate its use for the HTV during the critical approach phase. Once these hazardous interactions are identified, they can then be eliminated or controlled through system design or operational procedures.
    Safety-Guided Design of Crew Return Vehicle in the Concept Design Phase using STAMP/STPA by Haruka Nakao, Masa Katahira, Yuko Miyamoto, and Nancy Leveson. This paper was presented at Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.

    In the concept development and design phase of a new space system, such as a Crew Vehicle, designers tend to focus on how to implement new technology. Designers also consider the difficulty of using the new technology and trade off several system design candidates. Then they choose an optimal design from the candidates. Safety should be a key aspect driving optimal concept design. However, in past concept design activities, safety analysis such as FTA has not used to drive the design because such analysis techniques focus on component failure and component failure cannot be considered in the concept design phase.

    The solution to these problems is to apply a new hazard analysis technique, called STAMP/STPA. STAMP/STPA defines safety as a control problem rather than a failure problem and identifies hazardous scenarios and their causes. Defining control flow is the essential in concept design phase. Therefore STAMP/STPA could be a useful tool to assess the safety of system candidates and to be part of the rationale for choosing a design as the baseline of the system. In this paper, we explain our case study of safety guided concept design using STPA, the new hazard analysis technique, and model-based specification technique on the Crew Return Vehicle design and evaluate the benefits of using STAMP/STPA in concept development phase.

    A Hazard Analysis Based Approach to Improve the Landing Safety of a Blended-wing-body Remotely Piloted Vehicle. This paper was written by Lu Yi, Zhang Shuguang, and Li Xue-qing from the School of Transportation Science, Beihang University, Beijing.
    This paper describes the use of STPA to investigate the cause of an unexpected landing safety problem in
    flight experiments for a UAV. They were able to identify an uncontrolled system behavior "path sagging phenomenon" and validate it by wind tunnel experiment data. Subsequent flight experiments showed that the hazard had been correctly identified and landing safety improved. An additional paper on the details of the STPA analysis is currently in journal review.

    A System Theoretic Analysis of the "7.23" Yong-Tai-Wen Railway Accident . This paper, by Dajiang Suo from the Computer Science and Technology Dept., Tsinghua University, Beijing, China, was presented at the 1st STAMP/STPA Workshop held at MIT on April 26-28, 2012.

    This paper analyzes the "7.23" Yongwen Railway accident in China from a system theoretic perspective. In particular, the STAMP safety control structure for this accident has been constructed and divided into two respective processes including system development and operation, which are then analyzed at each level. Furthermore, to understand why and how the system evolved over time, system dynamics models are constructed to describe the changes indirectly leading to the accident. As can be seen, this analysis raises some questions which are not included in the investigation report but critical to the comprehensive understanding of the accident. Based on the analysis results, recommendations are generated aiming at preventing the same kind of accidents in the future.

    Application of a Safety-Driven Design Methodology to An Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nicholas Dulac, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. IEEE Aerospace Conference, Big Sky, Montana, March 2008.

    A conference paper on one of our early applications of STPA and intent specifications on a JPL exploratory spacecraft. Technical reports with more details can be found below. We have evolved the techniques somewhat since this time.

    Applying Systems Thinking to Aviation Psychology
    by Nancy Leveson. International Symposium on Aviation Psychology 2013.
    A short paper suggesting ways to integrate human factors into engineering hazard analysis.
    Back to the top


    TECHNICAL REPORTS


    STPA Analysis of NextGen Interval Management Components: Ground Interval Management (GIM) and Flight Deck Interval Management (FIM) Plants by Cody Fleming, M. Seth Placke, and Nancy G. Leveson

    The next generation of air traffic management systems will involve significant changes in the way air traffic control is done today. Reliance on software is increasing and allowing greater system complexity. Humans are assuming supervisory roles over automation, requiring more cognitively complex human decision-making. Control is shifting from the ground to the aircraft and shared responsibilities. In addition, coupling and interconnections between land, airborne, and space systems introduces more potential for accidents stemming from unsafe and unintended component interactions.

    This report documents the results of a research project for the FAA applying STPA to two important components of interval management, itself an important component of Trajectory-Based Operations (TBO). The causal factors identified by STPA but not included in the IM-S ConOps include potential lack of coordination between controllers both within and across sectors, timing of IM-S clearances relative to other required clearances, potential lack of synchronization between surveillance sources provided to controllers and their tools, and conflicts between IM automation and other tools and ATC tasks.

    Evaluating the Safety of Digital Instrumentation and Control Systems in Nuclear Power Plants by John Thomas, Francisco Luiz de Lemos, and Nancy Leveson, MIT Technical Report

    This final report for a U.S. Nuclear Regulatory Commission grant contains a brief STPA tutorial, a case study of STPA applied to an example Pressurized Water Reactor (PWR), an analsis of the results of the STPA analysis, and potential use of STPA in the licensing of nuclear power plants.
    Safety Assurance in NextGen by Cody Fleming, Melissa Spencer, Nancy Leveson, and Chris Wilkinson (Honeywell). NASA Technical Report NASA/CR-2012-217553.

    The methods currently used in assuring the safety of changes to our transportation systems were developed 50 years ago for systems composed primarily of hardware components and of much less complexity than the transportation systems we are building today. These methods are not powerful enough to handle accidents involving interactions of components (system design errors) and not just failure of individual components, software, and the cognitively complex tasks being assigned to operators today. More powerful methods are needed.

    This paper describes a possible alternative and demonstrates it on a new NextGen procedure to allow more flight level changes over oceanic and other regions with limited radar coverage. A complete intent specification for ITP is created using the results of the STPA hazard analysis. The new approach and results are compared with the results obtained by the more traditional methods being used for NextGen. More important, the paper discusses the important philosophical differences that underlie the two approaches and the implications of these differences for selecting appropriate tools for safely integrating changes into complex transportation infrastructures.

    Demonstration of a New Dynamic Approach to Risk Analysis for NASA's Constellation Program by Nicolas Dulac, Brandon Owens, Nancy Leveson, Betty Barrett, John Carroll, Joel Cutcher-Gershenfeld, Stephen Friedenthal, Joseph Laracy, and Joseph Sussman. Final Report of the NASA Exploration Systems Mission Directorate Associate Administrator. , March 2007.

    Effective risk management is the development of complex aerospace systems requires the balancing of multiple risk components including safety, cost, performance, and schedule. Safety considerations are especially critical during system development because it is very difficult to design or "inspect" safety into a system during operation. This report describes the results of a study conducted at the request of the NASA Exploration Systems Mission Directorate (ESMD) to evaluate the usefulness of a new model of accident causation (STAMP) and STAMP-based system dynamics models in the development of new spacecraft systems. In addition to fulfilling the specific needs of ESMD, the study is part of our on-going effort to develop and refine techniques for modeling and treating organizational safety culture as a dynamic control problem.

    Risk Analysis of the NASA Independent Technical Authority by Nancy Leveson and Nicholas Dulac with contributions by Joel Cutcher-Gershenfeld, John Carroll, Betty Barrett and Stephen Friedenthal.

    The application of STAMP and STPA to an organizational risk analysis. The study was conducted using NASA Space Shuttle Operations after the Columbia loss and the risks involved in a new management structure, the Independent Technical Authority, created to attempt to avoid such losses in the future. We also defined leading indicators to detect when risk is increasing in the future.

    STAMP-Based Analysis of a Refinery Overflow Accident by Nancy Leveson, Margaret Stringfellow, and John Thomas.

    The application of STAMP (CAST) to a real refinery overflow accident. The official accident report is included and the STAMP analysis is compared with the official report and conclusions.

    A Safety-Driven, Model-Based System Engineering Methodology, Part I by Margaret Stringfellow Herring, Brandon D. Owens, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. MIT Technical Report, December 2007.

    The final report for a JPL grant to demonstrate a safety-driven, model-based system engineering methodology on a JPL spacecraft. In this methodolgy, safety is folded into and drives the design process rather than being conducted as a separate activity. The methodology integrates MIT's STAMP accident model and the hazard analysis method based on it (called STPA), intent specifications(a structured system engineering specification framework and model-based specification language), and JPL's State Analysis (a system modeling approach).

    A Safety-Driven, Model-Based System Engineering Methodology, Part II: Application of the Methodology to an Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. MIT Technical Report, December 2007.

    A sample intent specification created for a Outer Planets Explorer spacecraft as part of a safety-driven, model-based system engineering demonstration for JPL.

    Back to the top
    Back to the top


    SECURITY PAPERS


    An Integrated Approach to Safety and Security Based on Systems Theory by William Young and Nancy Leveson. Communications of the ACM, Vol. 57, No. 2, pp. 31-35, February 2014.

    This paper describes how both safety and security can be modeled and handled by STAMP.

    Systems Thinking for Safety and Security by William Young and Nancy Leveson. ACSAC 2013

    The fundamental challenge facing security professionals is preventing losses, be they operational, financial or mission losses. As a result, one could argue that security professionals share this challenge with safety professionals. Despite their shared challenge, there is little evidence that recent advances that enable one community to better prevent losses have been shared with the other for possible implementation. Limitations in current safety approaches have led researchers and practitioners to develop new models and techniques. These techniques could potentially benefit the field of security. This paper describes a new systems thinking approach to safety that may be suitable for meeting the challenge of securing complex systems against cyber disruptions. Systems-Theoretic Process Analysis for Security (STPA-Sec) augments traditional security approaches by introducing a top-down analysis process designed to help a multidisciplinary team consisting of security, operations, and domain experts identify and constrain the system from entering vulnerable states that lead to losses. This new framework shifts the focus of the security analysis away from threats as the proximate cause of losses and focuses instead on the broader system structure that allowed the system to enter a vulnerable system state that the threat exploits to produce the disruption leading to the loss.

    Back to the top