Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. This book describes a new approach to safety and risk management that is better suited to today's complex, sociotechnical, software-intensive world. The new approach is based on modern systems thinking and systems theory. It revisits and updates ideas pioneered by 1950's aerospace engineers in their System Safety concept. The new approach has now been used extensively on real-world systems and it is proving to be more effective, less expensive, and easier to use.
The book describes a new model of causation (STAMP or Systems-Theoretic
Accident Model and Processes) that can be used to improve the design,
This 1995 book examines past accidents and what is currently known about building safe electromechanical systems to see what lessons can be applied to new computer-controlled systems. Most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices. In addition, accidents will not be prevented by technological fixes alone, but will require control of all aspects of the development and operation of the system. A methodology for building safety-critical systems is outlined. While this book predates STAMP, it does lay the foundation for it.
Technical and Managerial Factors in the NASA Challenger and Columbia Losses: Looking Forward to the Future by Nancy Leveson, in Handelsman and Kleinman (editors), Controveries in Science and Technology , University of Wisconsin Press, 2007.
This essay examines the technical and organizational factors leading to the Challenger and Columbia accidents and what we can learn from them. While accidents are often described in terms of a chain of directly related events leading to a loss, examining this event chain does not explain why the events themselves occurred. In fact, accidents are better conceived as complex processes involving indirect and non-linear interactions among people, societal and organizational structures, engineering activities, and physical system components. They are rarely the result of a chance occurrence of random events, but usually result from the migration of a system (organization) toward a state of high risk where almost any deviation will result in a loss. Understanding enough about the Challenger and Columbia accidents to prevent future ones, therefore, requires not only determining what was wrong at the time of the losses, but also why the high standards of the Apollo program deteriorated over time and allowed the conditions cited by the Rogers Commission as the root causes of the Challenger loss and why the fixes instituted after Challenger became ineffective over time, i.e., why the manned space program has a tendency to migrate to states of such high risk and poor decision-making processes that an accident becomes almost inevitable.Software and the Challenge of Flight Control by Nancy Leveson. To appear as a chapter in Space Shuttle Legacy: How We Did It/What We Learned edited by Roger Launius, James Craig, and John Krige and to be published in AIAA in 2013.
Not related to STAMP, but may be of interest to those interested in the risks of software. This is a chapter I wrote for a forthcoming book on the legacy of the Space Shuttle. This chapter describes the challenges NASA faced in creating the Space Shuttle software (and for Gemini and Apollo before that). Although facing incredible challenges, the Shuttle software is remarkably good. This chapter explains why I think that was so and what we can learn about developing software today. In many ways, software engineering is moving in the opposite direction from the practices that made this software so successful.
A Systems Approach to Risk Management Through Leading Safety Indicators by Nancy Leveson, Journal of Reliability Engineering and System Safety, in press.
The goal of leading indicators for safety is to identify the potential for an accident before it occurs. Past efforts have focused on identifying general leading indicators, such as maintenance backlog, that apply widely in an industry or even across industries. Other recommendations produce more system-specific leading indicators, but start from system hazard analysis and thus are limited by the causes considered by the traditional hazard analysis techniques. Most rely on quantitative metrics, often based on probabilistic risk assessments. This paper describes a new and different approach to identifying system-specific leading indicators and provides guidance in designing a risk management structure to generate, monitor and use the results. The approach is based on the STAMP (System-Theoretic Accident Model and Processes) model of accident causation and tools that have been designed to build on that model. STAMP extends current accident causality to include more complex causes than simply component failures and chains of failure events or deviations from operational expectations. It incorporates basic principles of systems thinking and is based on systems theory rather than traditional reliability theory.Applying Systems Thinking to Analyze and Learn from Events by Nancy Leveson, presented at NeTWorK 2008: Event Analysis and Learning from Events, Berlin, August 2008 and later published in Safety Science,Vol. 49, No. 1, January 2010, pp. 55-64.
Why don't the approaches we use to learn from events, most of which go back for decades and have been incrementally improved over time, work well in today's world? Maybe the answer can be found by reexamining the underlying assumptions and paradigms in safety and identifying any potential disconnects with the world as it exists today. While abstractions and simplications are useful in dealing with complex systems and problems, those that are counter to reality can hinder us from making forward progress. Most of the new research in this field never questions these assumptions and paradigms. It is important to devote some effort to examining our foundations, which is what I try to do in this paper. There are too many beliefs in accident analysis---starting with the assumption that analyzingA New Accident Model for Engineering Safer Systems by Nancy Leveson. Safety Science, Vol. 42, No. 4, April 2004.
A new model of accidents is proposed based on systems theory. Systems are viewed as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control. Accidents result from inadequate control or enforcement of safety-related constraints on the system. Instead of defining safety management in terms of preventing component failure events, it is defined as a continuous control task to impose the constraints necessary to limit system behavior to safe changes and adaptations. Accidents can be understood, using this model, in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This model provides a theoretical foundation for the introduction of unique new types of accident analysis, hazard analysis, design for safety, risk assessment techniques, and approaches to designing performance monitoring and safety metrics.Applying Systems Thinking to Analyze and Learn from Events by Nancy Leveson, presented at NeTWorK 2008: Event Analysis and Learning from Events, Berlin, August 2008 and later published in Safety Science,Vol. 49, No. 1, January 2010, pp. 55-64.
Why don't the approaches we use to learn from events, most of which go back for decades and have been incrementally improved over time, work well in today's world? Maybe the answer can be found by reexamining the underlying assumptions and paradigms in safety and identifying any potential disconnects with the world as it exists today. While abstractions and simplications are useful in dealing with complex systems and problems, those that are counter to reality can hinder us from making forward progress. Most of the new research in this field never questions these assumptions and paradigms. It is important to devote some effort to examining our foundations, which is what I try to do in this paper. There are too many beliefs in accident analysis---starting with the assumption that analyzing events and learning from them is adequate---that are accepted without question.
Moving Beyond Normal Accidents and High Reliability Organizations: An Alternative Approach to Safety in Complex Systems by Nancy Leveson, Karen Marais, Nicolas Dulac, and John Carroll, Organizational Studies , Vol 30, Feb/Mar 2009, Sage Publishers, pp. 227-249.
Organizational factors play a role in all accidents and are a critical part of understanding and preventing them. Two prominent sociological schools of thought have addressed the organizational aspects of safety: normal Accident Theory and High Reliability Organizations (HRO). In this paper, we argue that the conclusions of HRO reseachers are limited in their applicability and usefulness to complex, high-risk systems and following some of the recommendations could actually contribute to accidents. Normal Accident Theory, on the other hand, does recognize the difficulties involved but is unnecessarily pessimistic about the possibility of effectively dealing with them. An alternative systems approach to safety is described.Safety Assurance in NextGen and Complex Transportation Systems by Cody Harrison Fleming, Melissa Spencer, John Thomas, Nancy Leveson, and Chris Wilkinson, Safety Science , in press.
The methods currently used to assure the safety of planned changes in our air transportation systems were developed 50 years ago for systems composed primarily of hardware components and of much less complexity than the systems we are building today. These methods are not powerful enough to handle the complex, human and software intensive systems being planned and introduced today. This paper describes an alternative and demonstrates it on a new NextGen procedure to allow more flight level changes over oceanic and other regions with limited radar coverage. The new approach and results are compared with the results obtained by the more traditional methods being used for NextGen.Hazard Analysis of Complex Spacecraft using Systems Theoretic Process Analysis by Takuto Ishimatsu, Nancy G. Leveson, John Thomas, Cody Fleming, Masafumi Katahira, Yuko Miyamoto, Ryo Ujiie, Haruka Nakao, and Nobuyuki Hoshino, AIAA Journal of Spacecraft and Rockets , in press, 2013.
A new hazard analysis technique, called System-Theoretic Process Analysis (STPA) is capable of identifying potential hazardous design flaws, including software and system design errors and unsafe interactions among multiple system components. Detailed procedures for performing the hazard analysis were developed and the feasibility and utility of using in on complex systems was demonstrated by applying it to the Japanese Aerospace Exploration Agency H-II Transfer Vehicle. In a comparison of the results of this new hazard analysis technique to those of the standard fault tree analysis used in the design and certification of the H-II Transfer Vehicle, System-Theoretic Hazard Analysis found all the hazardous scenarios identified in the fault tree analysis as well as additional causal factors that had not been) identified by fault tree analysis.Drawbacks in Using the Term "Systems of Systems," by Nancy Leveson, Journal of Biomedical Instrumentation and Technology, March/April 2013.
This essay was written after attending an AAMI/FDA meeting on interoperability in medical devices. In it I express my puzzlement over the term system-of-systems and why it is misleading and may lead to dead ends in solving system safety problems.The Use of Safety Cases in Certification and Regulation by Nancy Leveson. An earlier version of this paper appeared in the Journal of System Safety , Nov/Dec 2011. The version here is updated from that version and includes more material.
Starting with my involvement with the Presidential Oil Spill Commission (on Deepwater Horizon), I started studying the engineering and law literature and have become concerned by the push to use safety cases in the certification of many industries in the U.S. This paper describes what I have learned and my conclusions about the dangers of this approach.Applying System Engineering to Pharmaceutical Safety by Nancy Leveson, Matthieu Couturier, John Thomas, Meghan Dierks, David Wierz, Bruce Psaty, Stan Finkelstein. Journal of Healthcare Engineering, Sept. 2012.
While engineering techniques are used in the development of medical devices and have been applied to individual healthcare processes, such as the use of checklists in surgery and ICUs, the application of system engineering techniques to larger healthcare systems is less common. System safety is the part of system engineering that uses modeling and analysis to identify hazards and to design the system to eliminate or control them. In this paper, we demonstrate how to apply a new, safety engineering static and dynamic modeling and analysis approach to healthcare systems. Pharmaceutical safety is used as the example in the paper, but the same approach is potentially applicable to other complex healthcare systems.Software Challenges in Achieving Space Safety by Nancy Leveson. Journal of the British Interplanetary Society, Vol. 62, 2009.
Techniques developed for hardware reliability and safety do not work on software-intensive systems; software does not satisfy the assumptions underlying these techniques. The new problems and why the current approaches are not effective for complex, software-intensive systems are first described. Then a new approach to hazard analysis and safety-driven design is presented. Rather than being based on reliability theory, as most current safety engineering techniques are, the new approach builds on system and control theory.
A Systems-Theoretic Approach to Safety in Software-Intensive Systems by Nancy Leveson. IEEE Trans. on Dependable and Secure Computing, January 2005.
Traditional accident models were devised to explain losses caused by failures of physical devices in relatively simple systems. They are less useful for explaining accidents in software-intensive systems and for non-technical aspects of safety such as organizational culture and human decision-making. This paper describes how systems theory can be used to form new accident models that better explain system accidents (accidents arising from the interactions among components rather than individual component failure), software-related accidents, and the role of human decision-making. Such models consider the social and technical aspects of systems as one integrated process and may be useful for other emergent system properties such as security. The loss of a Milstar satellite being launched by a Titan/Centaul launch vehicle is used as an illustration of the approach.
Application of STPA to the Integration of Multiple Control Systems: A Case Study and New Approach , by Matthew Seth Placke, Master's Thesis, Engineering Systems Division, MIT, June 2014
Abstract: A new approach for analyzing multiple control systems within the STPA framework is developed and demonstrated. The new approach meets the growing need of system engineers to analyze integrated control systems, that may or may not have been developed in a coordinated manner, and assess them for safety and performance. This need comes from the increasing proliferation of embedded control systems across domains including defense, energy, healthcare, automotive, aerospace, and consumer products. When multiple embedded control systems are integrated together, they have the potential to operate in uncoordinated and conflicting ways which might hinder their performance and lead to unsafe behavior.
Extending the Human-Controller Methodology in Systems-Theoretic Process Analysis (STPA), by Cameron L. Thornberry, Master's Thesis, Aeronautics and Astronautics, MIT, June 2014
Abstract: Traditional hazard analysis techniques are grounded in reliability theory and analyze the human controller---if at all---in terms of estimated or calculated probabilities of failure. Characterizing sub-optimal human performance as ``human error'' offers limited explanation for accidents and is inadequate in improving the safety of human control in complex, automated systems such as today's aerospace systems. In an alternate approach founded on systems and control theory, Systems-Theoretic Process Analysis (STPA) is a hazard analysis technique that can be applied in order to derive causal factors related to human controllers within the context of the system and its design. The goal of this thesis was to extend the current human-controller analysis in STPA to benefit the investigation of more structured and detailed causal factors related to the human operator.
A STAMP Analysis of the LEX Comair 5191 Accident , by Paul S. Nelson, Master's Thesis, Lund University, Sweden, June 2008, supervised by Prof. Sidney Dekker.
Abstract: A new view, a holistic systems view, that sees individuals in systems, is growing. It is a view which sees ``human error is an effect of trouble deeper inside the system.. [where] we must turn to the system in which people work: the design of equipment, the usefulness of procedures, the existence of goal conflicts and production pressure." (Dekker, 2007) A new, holistic systems perspective, accident model is used for analysis of the Comair 5191 accident in Lexington, KY on August 27, 2006. The new model is called: Systems-Theoretic Accident Modeling and Processes (STAMP). It incorporates three basic components: constraints, hierarchical levels of control, and process loops. Accidents are understood ``in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This STAMP analysis of the 5191 accident illustrates the usefulness of the STAMP model to foster evaluation of the whole system and uncover useful levers for elimination of future loss potential thereby making progress on safety.
Comparison of SOAM and STAMP for ATM Incident Investigation
by Richard Arnold, Master's Thesis, Lund University, Sweden, 2009,
supervised by Prof. Sidney Dekker.
Abstract: Systemic Occurrence Analysis Methodology (SOAM) is promoted by Eurocontrol for the analysis of Air Traffic Management (ATM) occurrences. Systems Theoretic Accident Model and Process (STAMP) based on systems theory has been defined by professor Nancy Leveson (MIT) to explain systems accidents (accidents arising from the interactions among components rather than individual component failure). This research analyzes an ATM occurrence using SOAM and STAMP and compares their usefulness in identifying systemic countermeasures. The results show that SOAM is a useful heuristic and a powerful communication device but that it is weak with respect to emergent phenomena and non-linear interactions. SOAM directs the investigator to consider the context in which the events occurred; barriers that failed and organizational factors; the "holes in the Swiss cheese," but not into the processes which created them, or how the whole system can migrate towards the boundaries of safe operations. STAMP directs the investigator more deeply into the mechanism of the interactions between system components, and how systems adapt over time. STAMP helps identify the controls and constraints necessary to prevent undesirable interactions between system components. STAMP also directs the investigation through a structured analysis of the upper levels of the system's control structure which helps to identify high level systemic countermeasures. The global ATM system is undergoing a period of rapid technological and political change. In Europe the Single European Sky ATM Research (SESAR) and in the US the NextGen programs mean that the ATM is moving from centralized human controlled systems to semi automated distributed decision making. Continuous Descent Arrivals flown on datalinked 4D flight paths that are tailored to local constraints and timed for merging traffic require digital information sharing and Collaborative Decision Making on a grand scale, as well as Functional Airspace Blocks designed for optimal airspace efficiency and safety. Detailed new systemic models like STAMP are now necessary to prevent undesirable interactions between normally functioning system components and to understand changes over time in increasingly complex ATM systems.
A CAST Analysis of a U.S. Coast Guard Aviation Mishap , by Jon Hickey, MIT Master's Thesis, May 2012, supervised by Dr. Qi van Eikema Hommes.
Abstract: During a 22-month period, between 2008 and 2010, the U.S. Coast Guard experienced seven Class-A aviation mishaps resulting in the loss of 14 Coast Guard aviators and seven Coast Guard aircraft. This represents the highest Class-A aviation mishap rate the Coast Guard has experienced in 30 years. Following each Class-A mishap, the Coast Guard conducted Mishap Analysis Boards (MAB) in accordance with Coast Guard aviation policy. A MAB involves a detailed investigation and report on the causal and contributing factors of a specific mishap and is conducted in accordance with the Department of Defense Human Factors Analysis and Classification System (DOD HFACS) which is based on the "Swiss Cheese" accident causal analysis model. Individual MAB results did not identify common causal or contributing factors that may be causing systemic failures within the aviation safety system. Subsequently, the Coast Guard completed a more system-focused safety analysis known as the Aviation Safety Assessment Action Plan (ASAAP) comprised of five components: 1) Operational Hazard Analysis; 2) Aviation Safety Survey; 3) Aviation Leadership Improvement Study; 4) Independent Data Analysis Study; and 5) Industry Benchmarking Study. ASAAP recently concluded "complacency in the cockpit and chain of command as the leading environmental factor in the rash of serious aviation mishaps." Although the ASAAP study examined Coast Guard aviation more holistically than individual MABs, it did not apply systems theory and systems engineering approaches.
Application of CAST and STPA to Railroad Safety. , by Airong Dong, MIT Master's Thesis, May 2012.
Abstract: The accident analysis method called STAMP (System-Theoretic Accident Model), developed by Prof. Nancy Leveson from MIT, was used here to re-analyze a High Speed Train accident in China. On July 23rd, 2011, 40 people were killed and 120 injured on the Yong-Wen High Speed Line. The purpose of this new analysis was to apply the broader view suggested by STAMP, considering the whole sociotechnological system and not only equipment failures and operators mistakes, in order to come up with new findings, conclusions and recommendations for the High Speed Train System in China.
Engineering Financial Safety: A System-Theoretic Case Study from the Financial Crisis , by Melissa Spencer, MIT TPP (Technology and Policy Program) Master's Thesis, May 2012.
Abstract: There is currently much systems-based thinking going into understanding safety in complex socio-technical systems and in developing useful accident analysis methods. However, when it comes to complex systems without clear physical components, the techniques for understanding accidents are antiquated and ineffective. This thesis uses a promising new engineering-based accident analysis methodology, CAST (Casual Analysis using STAMP, or Systems Theoretic Accident Models and Processes) to understand an aspect of the financial crisis of 2007-2008.
A Systems Theoretic Application to Design for the Safety of Medical Diagnostic Devices , by Vincent Balgos, MIT SDM Master's Thesis, February 2012, supervised by Dr. Qi van Eikema Hommes.
Abstract: In today's environment, medical technology is rapidly advancing to deliver tremendous value to physicians, nurses, and medical staff in order to support them to ultimately serve a common goal: provide safe and effective medical care for patients. However, these complex medical systems are contributing to the increasing number of healthcare accidents each year. These accidents present unnecessary risk and injury to the very population these systems are designed to help. Thus the current safety engineering techniques that are widely practiced by the healthcare industry during medical system development are inadequate in preventing these tragic accidents. Therefore, there is a need for a new approach to design safety into medical systems.
A Systems Approach to Food Accident Analysis , by John Helferich, MIT SDM Thesis, May 2012. This thesis won the "Best SDM Master's Thesis" award at MIT last year for the System Design and Management Program.
Abstract: Food borne illnesses lead to 3000 deaths per year in the United States. Some industries, such as aviation, have made great strides increasing safety through careful accident analysis leading to changes in industry practices. In the food industry, the current methods of accident analysis are grounded in regulations developed when the food industry was far simpler than today. The food industry has become more complex with international supply chains and a consumer desire for fresher food. This thesis demonstrates that application of a system theoretic accident analysis method, CAST, results in more learning than the current method of accident analysis. This increased learning will lead to improved safety performance in the food production system
of a System Safety Framework in Hybrid Socio-Technical Environment
Abstract: The political transformation and transition of post-Soviet societies have led to hybrid structures in political, economic and technological domains. In such hybrid structures the roles of government, state enterprise, private business and civil society are not clearly defined. These roles shift depending on formal and informal interests, availability and competition for limited resources, direct and indirect financial benefits, internal and external agendas. In an abstract sense, a hybrid is "anything derived from heterogeneous sources, or composed of elements of different or incongruous kinds." If transition is a process from one state to another, hybrid is a state unto itself. In the context of this thesis Hybrid Socio-Technical Environment means the co-existence of different institutions and policies, state and private business entities, old and new technologies, managerial models and practices of planning and market economies, collectivist and individualist value systems.Developing System-Based Leading Indicators for Proactive Risk Management in the Chemical Processing Industry by Ibrahim Khawaji, MIT ESD Master's Thesis, May 2012.
Abstract: The chemical processing industry has faced challenges with achieving improvements in safety performance, and accidents continue to occur. When accidents occur, they usually have a confluence of multiple factors, suggesting that there are underlying complex systemic problems. Moreover, accident investigations often reveal that accidents were preventable and that many of the problems were known prior to those accidents, suggesting that there may have been early warning signs.Integrating Safety into an Engineering Contractor's System Engineering Process using the Guidelines of STAMP, by Lorena Pelegrin, Master's Thesis, Herriot-Watt University, August 2012.
Abstract "Engineering Contractor"(EC)is a group of engineering and consulting companies providing services worldwide in the fields of oil and gas, water and environment, energy and climate protection, and transport and structures. Because currently there is no consolidated system engineering process that includes designing for safety systematically and the top management of EC has understood the responsibility of EC in the safety of the systems they engineer, the present thesis was proposed.A System Theoretic Safety Analysis of Friendly Fire Prevention in Ground Based Missile Systems, by Scott McCarthy, MIT SDM Master's Thesis, January 2013.
Abstract: This thesis uses STAMP to analyze a friendly fire accident that occurred on 22 March 03 between a British Tornado aircraft and a US Patriot Missile battery. This causation model analyzs system constraints, control loops, and process models to identify inadequate control structures leading to hazards and preventative measures that may be taken to reduce the effects of these hazards. By using a system-based causation model like STAMP, rather than a traditional chain of events model, this thesis aimed to identify systemic factors and component interactions that may have contributed to the accident, rather than simply analyzing component failures. Additionally, care was taken to understand the rationale for decisions that were made, rather than assigning blame. The analysis identified a number of areas in which control flaws or inadequacies led to the friendly fire incident. A set of recommendations was developed that may help to prevent similar accidents from occurring in the future.
Safety-Driven Early Concept Analysis and Development by Cody Harrison Fleming, MIT Ph.D. Dissertation, January 2015.
Abstract: As aerospace systems become increasingly complex and the roles of human operators and autonomous software continue to evolve, traditional safety-related analytical methods are becoming inadequate. Traditional hazard analysis tools are based on an accident causality model that does not capture many of the complex behaviors found in modern engineered systems. Additionally, these traditional approaches are most effective during late stages of system development, when detailed design information is available. However, system safety cannot cost-effectively be assured by discovering problems at these late stages and adding expensive updates to the design. Rather, safety should be designed into the system from its very conception. The primary barrier to achieving this objective is the lack of effectiveness of the existing analytical tools during early concept development.
Extending and Automating a Systems-Theoretic Hazard Analysis for Requirements Generation and Analysis by John Thomas, MIT Ph.D. Dissertation, June 2013.
Accident Analysis and Hazard Analysis for Human and Organizational Factors by Margaret Stringfellow, October 2010.
A Framework for Dynamic Safety and Risk Management Modeling in Complex Systems by Nicolas Dulac, February 2007.
Almost all traditional hazard analysis or risk assessment techniques, such as failure modes and effect analysis (FMEA), fault tree analysis (FTA), and probabilistic risk analysis (PRA) rely on a chain-of-event paradigm of accident causation. Event-based techniques have some limitations for the study of modern engineering systems. Specifically, they are not suited to handle complex software-intensive systems, complex human-machine interactions, and systems-of-systems with distributed decision-making that cut across both physical and organizational boundaries. [...]
Development of a Systematic Risk Management Approach for CO2 Capture, Transport, and Storage Projects by Jaleh Samadi, L'Ecole Nationale Superieure des Mines de Paris Ph.D. dissertation, December, 2012
Abstract: A systematic risk management framework for CO2 capture, transport, and storage projects is proposed. The approach is founded on the concepts of system thinking, STAMP, STPA, and system dynamics. The objective is to provide a means of decision making for these types of projects in the actual context where the future of the technology is uncertain.Systems Theoretic Hazard Analysis (STPA) Applied to the Risk Review of Complex Systems: An Example from the Medical Device Industry by Blandine Antoine, MIT Ph.D. dissertation, December, 2012
Abstract: Methods developed by system engineers could beneficially be applied to the challenge of ensuring patient safety in health care delivery. Achieving safe operations in this and other settings requires that system behavior be bound by safety constraints. These must be defined and enforced at every stage of system design, system operations, and, when applicable, system retirement.
A System-Theoretic Hazard Analysis Methodology for a Non-advocate Safety Assessment of the Ballistic Missile Defense System by Steve Pereira, Grady Lee, and Jeffrey Howard. Proceedings of the 2006 AIAA Missile Sciences Conference, Monterey, CA, November 2006.
The Missile Defense Agency (MDA) is developing the Ballistic Missile Defense System (BMDS) as a layered defense to defeat all ranges of threats in all phases of flight (boost, midcourse, and terminal). The BMDS integrates into a single system a number of Elements that had been developed independently, such as SBIRS/DSP, Aegis BMD, and Ground-based Midcourse Defense (GMD). The Elements of the BMDS have active safety programs, but complexity, coupling, and safety risk are introduced by their integration into a single system. Assessing the safety of the integrated BMDS required analysts to come up to speed using existing Element project documentation, assess the safety risk of the system, and make recommendations regarding hazard mitigation and risk acceptance. This effort often required conducting hazard analyses to supplement existing Element analysis work; working with existing engineering artifacts; and making recommendations for hazard mitigations late in the system life cycle, when there is less flexibility for design changes. This paper presents a safety assessment methodology based on STPA (a systems-theoretic hazard analysis); the assessment methodology provides an organized, methodical, and effective means to assess safety risk and develop appropriate hazard mitigations regardless of where in the life cycle the assessment is started.
Assuring Safety of NextGen Procedures by Cody H. Fleming, Nancy G. Leveson, M. Seth Placke. Presented at the Tenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2013).
This paper introduces an innovative approach to analyzing safety in the next generation of air traffic management systems. The proposed method is based on systems and control theory and is able to capture system design and component interaction causes that are increasingly frequent in accidents. The new methodology is applicable during the entire design lifecycle from early concept selection through final certification. Hazard analysis of a completed NextGen concept, In-Trail Procedure, is demonstrated as well as use in the early concept development of Trajectory Based Operations.
Modeling and Hazard Analysis using STPA by Takuto Ishimatsu, Nancy Leveson, John Thomas, Masa Katahira, Yuko Miyamoto, Haruka Nakao. Presented at the Conference of the International Association for the Advancement of Space Safety, Huntsville, Alabama, May 2010.
A joint research project between MIT and JAXA/JAMSS is investigating the application of a new hazard analysis technique, called STPA, to the system and software in the HTV. STPA is based on systems theory rather than reliability theory. It treats safety as a control problem rather than a failure problem. Traditional hazard analysis focuses on component failures but software does not fail in this way. Software most often contributes to accidents by commanding the spacecraft into an unsafe state (e.g., turning off the descent engines prematurely) or by not issuing required commands. That makes the standard hazard analysis techniques of limited usefulness on software-intensive systems, which describes most spacecraft built today.
Multiple Controller Contributions to Hazards by Takuto Ishimatsu, Nancy Leveson, Cody Fleming, Masa Katahira, Yuko Miyamoto, and Haruka Nakao. This paper was presented at the Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.
One contributor to hazards in complex systems arises out of unsafe interactions among multiple controllers. The basic problem is that in complex systems, hazards can be created by interactions among components that are each operating "correctly." STPA is a new hazard analysis that includes both system hazards caused by component failures (as do the traditional analysis techniques) and also those caused by unsafe interactions among components that may not have individually failed. The first descriptions of STPA, however, did not include examples of how to handle potential problems that occur between multiple controllers. We have created an approach to identify possible unsafe interactions among multiple controllers so that the system can be designed to eliminate any ambiguity or potential for unsafe controller interactions. In this paper, we describe the analysis technique and demonstrate its use for the HTV during the critical approach phase. Once these hazardous interactions are identified, they can then be eliminated or controlled through system design or operational procedures.Safety-Guided Design of Crew Return Vehicle in the Concept Design Phase using STAMP/STPA by Haruka Nakao, Masa Katahira, Yuko Miyamoto, and Nancy Leveson. This paper was presented at Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.
In the concept development and design phase of a new space system, such as a Crew Vehicle, designers tend to focus on how to implement new technology. Designers also consider the difficulty of using the new technology and trade off several system design candidates. Then they choose an optimal design from the candidates. Safety should be a key aspect driving optimal concept design. However, in past concept design activities, safety analysis such as FTA has not used to drive the design because such analysis techniques focus on component failure and component failure cannot be considered in the concept design phase.A Hazard Analysis Based Approach to Improve the Landing Safety of a Blended-wing-body Remotely Piloted Vehicle. This paper was written by Lu Yi, Zhang Shuguang, and Li Xue-qing from the School of Transportation Science, Beihang University, Beijing.
This paper describes the use of STPA to investigate the cause of an unexpected landing safety problem in
A System Theoretic Analysis of the "7.23" Yong-Tai-Wen Railway Accident
. This paper, by Dajiang Suo from the Computer Science and
Technology Dept., Tsinghua University, Beijing, China, was presented at
the 1st STAMP/STPA Workshop held at MIT on April 26-28, 2012.
This paper analyzes the "7.23" Yongwen Railway accident in China from a system theoretic perspective. In particular, the STAMP safety control structure for this accident has been constructed and divided into two respective processes including system development and operation, which are then analyzed at each level. Furthermore, to understand why and how the system evolved over time, system dynamics models are constructed to describe the changes indirectly leading to the accident. As can be seen, this analysis raises some questions which are not included in the investigation report but critical to the comprehensive understanding of the accident. Based on the analysis results, recommendations are generated aiming at preventing the same kind of accidents in the future.
Application of a Safety-Driven Design Methodology to An Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nicholas Dulac, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. IEEE Aerospace Conference, Big Sky, Montana, March 2008.
A conference paper on one of our early applications of STPA and intent specifications on a JPL exploratory spacecraft. Technical reports with more details can be found below. We have evolved the techniques somewhat since this time.Applying Systems Thinking to Aviation Psychology
A short paper suggesting ways to integrate human factors into engineering hazard analysis.
A Comparison of STPA and the ARP 4761 Safety Assessment Process by Nancy Leveson, Chris Wilkinson, Cody Fleming, John Thomas, and Ian Tracy
The goal of this report is to compare the approach widely used to assess and certify aircraft with a new, systems-theoretic hazard analysis technique called STPA and to determine whether there are important factors missing from the commonly used approach.STPA Analysis of NextGen Interval Management Components: Ground Interval Management (GIM) and Flight Deck Interval Management (FIM) Plants by Cody Fleming, M. Seth Placke, and Nancy G. Leveson
The next generation of air traffic management systems will involve significant changes in the way air traffic control is done today. Reliance on software is increasing and allowing greater system complexity. Humans are assuming supervisory roles over automation, requiring more cognitively complex human decision-making. Control is shifting from the ground to the aircraft and shared responsibilities. In addition, coupling and interconnections between land, airborne, and space systems introduces more potential for accidents stemming from unsafe and unintended component interactions.Evaluating the Safety of Digital Instrumentation and Control Systems in Nuclear Power Plants by John Thomas, Francisco Luiz de Lemos, and Nancy Leveson, MIT Technical Report
This final report for a U.S. Nuclear Regulatory Commission grant contains a brief STPA tutorial, a case study of STPA applied to an example Pressurized Water Reactor (PWR), an analsis of the results of the STPA analysis, and potential use of STPA in the licensing of nuclear power plants.Safety Assurance in NextGen by Cody Fleming, Melissa Spencer, Nancy Leveson, and Chris Wilkinson (Honeywell). NASA Technical Report NASA/CR-2012-217553.
The methods currently used in assuring the safety of changes to our transportation systems were developed 50 years ago for systems composed primarily of hardware components and of much less complexity than the transportation systems we are building today. These methods are not powerful enough to handle accidents involving interactions of components (system design errors) and not just failure of individual components, software, and the cognitively complex tasks being assigned to operators today. More powerful methods are needed.Demonstration of a New Dynamic Approach to Risk Analysis for NASA's Constellation Program by Nicolas Dulac, Brandon Owens, Nancy Leveson, Betty Barrett, John Carroll, Joel Cutcher-Gershenfeld, Stephen Friedenthal, Joseph Laracy, and Joseph Sussman. Final Report of the NASA Exploration Systems Mission Directorate Associate Administrator. , March 2007.
Effective risk management is the development of complex aerospace systems requires the balancing of multiple risk components including safety, cost, performance, and schedule. Safety considerations are especially critical during system development because it is very difficult to design or "inspect" safety into a system during operation. This report describes the results of a study conducted at the request of the NASA Exploration Systems Mission Directorate (ESMD) to evaluate the usefulness of a new model of accident causation (STAMP) and STAMP-based system dynamics models in the development of new spacecraft systems. In addition to fulfilling the specific needs of ESMD, the study is part of our on-going effort to develop and refine techniques for modeling and treating organizational safety culture as a dynamic control problem.Risk Analysis of the NASA Independent Technical Authority by Nancy Leveson and Nicholas Dulac with contributions by Joel Cutcher-Gershenfeld, John Carroll, Betty Barrett and Stephen Friedenthal.
The application of STAMP and STPA to an organizational risk analysis. The study was conducted using NASA Space Shuttle Operations after the Columbia loss and the risks involved in a new management structure, the Independent Technical Authority, created to attempt to avoid such losses in the future. We also defined leading indicators to detect when risk is increasing in the future.A Safety-Driven, Model-Based System Engineering Methodology, Part I by Margaret Stringfellow Herring, Brandon D. Owens, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. MIT Technical Report, December 2007.
The final report for a JPL grant to demonstrate a safety-driven, model-based system engineering methodology on a JPL spacecraft. In this methodolgy, safety is folded into and drives the design process rather than being conducted as a separate activity. The methodology integrates MIT's STAMP accident model and the hazard analysis method based on it (called STPA), intent specifications(a structured system engineering specification framework and model-based specification language), and JPL's State Analysis (a system modeling approach).A Safety-Driven, Model-Based System Engineering Methodology, Part II: Application of the Methodology to an Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. MIT Technical Report, December 2007.
A sample intent specification created for a Outer Planets Explorer spacecraft as part of a safety-driven, model-based system engineering demonstration for JPL.
An Integrated Approach to Safety and Security Based on Systems Theory by William Young and Nancy Leveson. Communications of the ACM, Vol. 57, No. 2, pp. 31-35, February 2014.
Systems Thinking for Safety and Security by William Young and Nancy Leveson. ACSAC 2013