Systems Theoretic Process Analysis Applied to Air Force Acquisition Technical Requirements Development by Sarah E. Summers (Major, USAF)
The Air Force experienced 12 Class A aviation mishaps in 2016, which resulted in 16 fatalities and 9 destroyed aircraft. So far in 2017, The Air Force has again experienced 12 Class A mishaps with 5 fatalities and 7 destroyed aircraft. In addition to these mishaps, development of new aircraft or modifications to aircraft often take well over the planned duration. Developmental test identifies design deficiencies that must be addressed before the aircraft is fielded, which requires expensive and lengthy redesign cycles. A systems approach to design with humans included as part of the system can improve both the development process and aviation safety.Safety Analysis in Early Concept Development and Requirements Generation by Nancy G. Leveson
This paper was submitted to INCOSE 2018, but has not yet been accepted. It describes how to use STPA from the very first steps in system engineering concept analysis using an aircraft braking system example.Drawbacks in using the term "System of Systems." by Nancy G. Leveson
A short white paper on the common confusion obout the term "system of systems." In systems theory (and systems thinking), a "system of systems" is simply a system. Thinking of it as something different is causing confusion and wasted energy and time. The white paper was written in frustration after attending an AAMI meeting where the confusion was rampant and leading to poor FDA oversight of medical devices, but the term is common (and confused) in most every industry. This white paper was published as a perspective in the Journal of Biomedical Instrumentation and Technology, 2013.CAST Analysis of the Shell Moerdijk Accident by Nancy G. Leveson
This example CAST analysis of the explosion and fire at the Shell Moerdijk chemical plant in the Netherlands in 2014 was created for a E.U. MAHB (Major Accidents Hazard Board) benchmarking exercise to compare different accident analysis techniques.White paper on compliance of STPA with MIL-STD-882E and AMCOM 385-17 by Nancy G. Leveson
This informal paper provides a detailed analysis of the compliance of STPA with the tasks in MIL-STD-882E and Army AMCOM Regulation 385-17.Application of Systems and Control Theory-Based Hazard Analysis to Radiation Oncology, by Todd Pawlicki, Aubrey Samost, Derek Brown, Ryan Manger, Gwe-Ya Kim, and Nancy Leveson. Journal of Medical Physics , in press, 2016.
The purpose of this paper is to investigate and demonstrate the application of STPA to radiation oncology. STPA is demonstrated on a new on-line adaptive cranial radiosurgery procedure. The results are compared with a standard FMEA that was applied to the same procedure by an independent group.Systems-Theoretic Safety Analyses Extended for Coordination by Kip Edward Johnson, MIT Dissertation, Aeronautics and Astronautics Dept., February 2017.
When interdependent conditions exist among decision units, safety results in part from coordination. Safety analysis methods should correspondingly address coordination. However, state-of-the-art safety analysis methods have limited guidance for analytical inquiry into coordination between interdependent decision systems. This thesis presents theoretical and applied research to address the knowledge gap by extending STAMP (Systems-Theoretic Accident Model and Processes)-based analysis methods STPA (System-Theoretic Process Analysis) and CAST (Causal Analysis based on STAMP).
Systems Thinking Applied to Automation and Workplace Safety. by Nathaniel Arthur Peper, MIT Masters Thesis, June 2017.
Abstract: This thesis presents the results of a study to compare Systems-Theoretic Process Analysis (STPA), a hazard analysis methodology based on a new model of accident causation called Systems-Theoretic Accident Model and Processes (STAMP), with the traditional assessments recommended by industry standards for analyzing safety risks in modern manufacturing workplaces that are increasingly incorporating automated systems. These increasingly complex, modern socio-technical systems are introducing new problems in the manufacturing environment that traditional methods of analysis were not designed to analyze. While these traditional methods have previously proven effective at analyzing hazards, the increasing levels of complexity and technological advancement in the factories are surpassing the limits of traditional assessment capabilities. Today’s continuous search for opportunities to automate manufacturing process makes this a critical time to ensure that the hazard analysis methodologies in use are capable of providing an effective and efficient analysis.Monitoring Safety During Airline Operations: A Systems Approach. by Andrea Scarinci, MIT Masters Thesis, June 2017.
Abstract: Flight Operation Quality Assurance (FOQA) programs are today customary among major airlines. Technological progress has made it possible to monitor more than 1000 parameters per flight. Given the limited amount of resources an airline can allocate to analyze this amount of data, a need has emerged for more effective approaches to extract useful information out of FOQA programs.Systems-Theoretic Process Analysis of Small Unmanned Aerial System Use at Edwards Air Force Base. by Sarah A. Folse, MIT Aeronautics and Astronautices Masters Thesis,June 2017.
Abstract: As the military moves forward with unmanned aerial vehicles, Edwards AFB must adjust its operations in order to accommodate testing these UASs in all stages of development. With a focus on Small UAS, this thesis applied Nancy Leveson’s Systems-Theoretic Process Analysis to the problem to system requirements and recommendations.The Underestimated Value of Safety in Achieving Organization Goals: CAST Analysis of the Macondo Accident. by Maria Fernanda Tafur Munoz, MIT Engineering and Management Masters Thesis, June 2017.
Abstract: On April 20, 2010, an explosion in the rig Deepwater Horizon performing drilling operations on the Macondo Prospect Well, in the Gulf of Mexico, led to the largest oil spill in the history of the petroleum industry. Eleven crewmembers lost their lives and around 4.9 million barrels of oil were discharged into the ocean until the continuous subsea blowout of the well was contained in September 19, 2010.Engineering for Humans: A New Extension to STPA by Megan Elizabeth France, MIT Aeronautics and Astronautics Masters Thesis, June 2017.
Abstract: From space shuttles to airplanes to everyday automobiles, today’s From space shuttles to airplanes to everyday automobiles, today’s systems are increasingly complex—and increasingly connected. In order to ensure that increased complexity does not simply bring an increased number of accidents, this new complexity demands new safety analysis tools.Systems-Theoretic Accident Model and Processes (STAMP) Applied to a U.S. Coast Guard Buoy Tender Integrated Control System. by Paul D. Stukus, MIT SDM Masters Thesis, June 2017,
Abstract: The Systems-Theoretic Accident Model (STAMP) developed by MIT’s Dr. Nancy Leveson was applied in this thesis to a ship navigation control system used on U.S. Coast Guard buoy tenders.Learning from Accidents that are a consequence of complex systems. by John Thomas and Shem Malmquist, ISASI Conference
Abstract: As the technical and non-technical systems we are building become increasingly complex, the causes of accidents are also becoming more complex. It is becoming more and more difficult to isolate a single or even a few obvious root causes among the abundance of direct and indirect factors that contribute to modern accidents. There is also a growing recognition of the need to better understand human behaviors that contribute to accidents—why might it have made sense at the time for these people to do what they did? Unfortunately, there are few methods to systematically pose and answer these questions and it can be easy to simply treat human error as a conclusion rather than a potential indication of deeper trouble. In addition, the importance of systemic factors, organizational issues, and other highlevel factors is widely accepted but there are still few systematic and rigorous methods that can be applied broadly across the entire sociotechnical system including interconnected technical, human, organizational, regulatory, and other issues.
Systems-Theoretic Process Analysis and Safety-Guided Design of Military Systems by David Craig Horney, MIT Aeronautics and Astronautics Masters Thesis,June 2017.
Abstract: Increasingly complex software enabled systems demand a new hazard analysis and safety-guided design technique in order to meet stringent safety standards and expectations. System Theoretic Process Analysis (STPA) proves to be a powerful tool to identify, describe and help mitigate hazards from the earliest conceptual development through the operations of a system. A future military aircraft example demonstrates STPA’s applicability for preliminary hazard analysis, analysis of alternatives, organizational design, developmental test, and into operations. STPA is a hazard analysis framework that helps manage risks and safety responsibilities throughout the entire lifecycle of a system.
Engineering a Safer World: Applying Systems Thinking to Safety by Nancy Leveson. Published by MIT Press (January 2012).
Engineering has experienced a technological revolution, but the basic engineering techniques applied in safety and reliability engineering, created in a simpler, analog world, have changed very little over the years. This book describes a new approach to safety and risk management that is better suited to today's complex, sociotechnical, software-intensive world. The new approach is based on modern systems thinking and systems theory. It revisits and updates ideas pioneered by 1950's aerospace engineers in their System Safety concept. The new approach has now been used extensively on real-world systems and it is proving to be more effective, less expensive, and easier to use.
The beginning of a primer for using STPA. Some advanced topics are missing but instruction and examples are provided for basic analyses. The Primer is a so-far unpublished supplement to Engineering A Safer World. MIT Press will publish it when it is finished, but like the original book, it may take a few years to finish.
This 1995 book examines past accidents and what is currently known about building safe electromechanical systems to see what lessons can be applied to new computer-controlled systems. Most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices. In addition, accidents will not be prevented by technological fixes alone, but will require control of all aspects of the development and operation of the system. A methodology for building safety-critical systems is outlined. While this book predates STAMP, it does lay the foundation for it.
Applying Systems Thinking to Aviation Psychology by Nancy Leveson, in M.A. Vidulich, P.S. Tsang, and J.M. Flach, Advances in Aviation Psychology: Volume 1, Ashgate Publishing, 2014.
Hazard analysis is at the heart of system safety. But most of the current widely used hazard analysis techniques either exclude humans or treat them superficially. STAMP and an associated new hazard analysis method called System-Theoretic Process Analysis (STPA) are described along with the resulting implications for more sophisticated handling of humans in engineering analysis and design. Proposed changes to ATC (NextGen) are used as an example. Finally, open questions are described in which the aviation psychology community could provide important contributions.Technical and Managerial Factors in the NASA Challenger and Columbia Losses: Looking Forward to the Future by Nancy Leveson, in Handelsman and Kleinman (editors), Controveries in Science and Technology , University of Wisconsin Press, 2007.
This essay examines the technical and organizational factors leading to the Challenger and Columbia accidents and what we can learn from them. While accidents are often described in terms of a chain of directly related events leading to a loss, examining this event chain does not explain why the events themselves occurred. In fact, accidents are better conceived as complex processes involving indirect and non-linear interactions among people, societal and organizational structures, engineering activities, and physical system components. They are rarely the result of a chance occurrence of random events, but usually result from the migration of a system (organization) toward a state of high risk where almost any deviation will result in a loss. Understanding enough about the Challenger and Columbia accidents to prevent future ones, therefore, requires not only determining what was wrong at the time of the losses, but also why the high standards of the Apollo program deteriorated over time and allowed the conditions cited by the Rogers Commission as the root causes of the Challenger loss and why the fixes instituted after Challenger became ineffective over time, i.e., why the manned space program has a tendency to migrate to states of such high risk and poor decision-making processes that an accident becomes almost inevitable.Software and the Challenge of Flight Control by Nancy Leveson. To appear as a chapter in Space Shuttle Legacy: How We Did It/What We Learned edited by Roger Launius, James Craig, and John Krige and to be published in AIAA in 2013.
Not related to STAMP, but may be of interest to those interested in the risks of software. This is a chapter I wrote for a forthcoming book on the legacy of the Space Shuttle. This chapter describes the challenges NASA faced in creating the Space Shuttle software (and for Gemini and Apollo before that). Although facing incredible challenges, the Shuttle software is remarkably good. This chapter explains why I think that was so and what we can learn about developing software today. In many ways, software engineering is moving in the opposite direction from the practices that made this software so successful.
A Systems Approach to Risk Management Through Leading Safety Indicators by Nancy Leveson, Journal of Reliability Engineering and System Safety, in press.
The goal of leading indicators for safety is to identify the potential for an accident before it occurs. Past efforts have focused on identifying general leading indicators, such as maintenance backlog, that apply widely in an industry or even across industries. Other recommendations produce more system-specific leading indicators, but start from system hazard analysis and thus are limited by the causes considered by the traditional hazard analysis techniques. Most rely on quantitative metrics, often based on probabilistic risk assessments. This paper describes a new and different approach to identifying system-specific leading indicators and provides guidance in designing a risk management structure to generate, monitor and use the results. The approach is based on the STAMP (System-Theoretic Accident Model and Processes) model of accident causation and tools that have been designed to build on that model. STAMP extends current accident causality to include more complex causes than simply component failures and chains of failure events or deviations from operational expectations. It incorporates basic principles of systems thinking and is based on systems theory rather than traditional reliability theory.Applying Systems Thinking to Analyze and Learn from Events by Nancy Leveson, presented at NeTWorK 2008: Event Analysis and Learning from Events, Berlin, August 2008 and later published in Safety Science,Vol. 49, No. 1, January 2010, pp. 55-64.
Why don't the approaches we use to learn from events, most of which go back for decades and have been incrementally improved over time, work well in today's world? Maybe the answer can be found by reexamining the underlying assumptions and paradigms in safety and identifying any potential disconnects with the world as it exists today. While abstractions and simplications are useful in dealing with complex systems and problems, those that are counter to reality can hinder us from making forward progress. Most of the new research in this field never questions these assumptions and paradigms. It is important to devote some effort to examining our foundations, which is what I try to do in this paper. There are too many beliefs in accident analysis---starting with the assumption that analyzingThe Systems Approach to Medicine: Controversy and Misconceptions by Sidney W.A. Dekker and Nancy G. Leveson. BMJ Quality and Safety, Vol. 24, No. 1, August 2014 (online version)
The 'systems approach' to patient safety in healthcare has recently led to questions about its ethics and practical utility. In this viewpoint we clarify the systems approach, by examining two popular misunderstandings of it: (1) the systematization and standardization of practice, which reduces actor autonomy; and (2) an approach that seeks explanations for success and failure outside of individual people. We argue that giving people a procedure to follow or blaming the system when things go wrong, both misconstrue the system approach.A New Accident Model for Engineering Safer Systems by Nancy Leveson. Safety Science, Vol. 42, No. 4, April 2004.
A new model of accidents is proposed based on systems theory. Systems are viewed as interrelated components that are kept in a state of dynamic equilibrium by feedback loops of information and control. Accidents result from inadequate control or enforcement of safety-related constraints on the system. Instead of defining safety management in terms of preventing component failure events, it is defined as a continuous control task to impose the constraints necessary to limit system behavior to safe changes and adaptations. Accidents can be understood, using this model, in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This model provides a theoretical foundation for the introduction of unique new types of accident analysis, hazard analysis, design for safety, risk assessment techniques, and approaches to designing performance monitoring and safety metrics.An Integrated Approach to Safety and Security Based on Systems Theory by William Young and Nancy Leveson, Communications of the ACM , Vol. 57, No. 2, February 2014, pp. 31-35.
Using STAMP and STPA, an integrated and more powerful approach to safety and security is possible. This paper shows how these two emergent system properties can be integrated.
Moving Beyond Normal Accidents and High Reliability Organizations: An Alternative Approach to Safety in Complex Systems by Nancy Leveson, Karen Marais, Nicolas Dulac, and John Carroll, Organizational Studies , Vol 30, Feb/Mar 2009, Sage Publishers, pp. 227-249.
Organizational factors play a role in all accidents and are a critical part of understanding and preventing them. Two prominent sociological schools of thought have addressed the organizational aspects of safety: normal Accident Theory and High Reliability Organizations (HRO). In this paper, we argue that the conclusions of HRO reseachers are limited in their applicability and usefulness to complex, high-risk systems and following some of the recommendations could actually contribute to accidents. Normal Accident Theory, on the other hand, does recognize the difficulties involved but is unnecessarily pessimistic about the possibility of effectively dealing with them. An alternative systems approach to safety is described.Application of Systems and Control Theory-Based Hazard Analysis to Radiation Oncology, by Todd Pawlicki, Aubrey Samost, Derek Brown, Ryan Manger, Gwe-Ya Kim, and Nancy Leveson. Journal of Medical Physics , in press, 2016.
The purpose of this paper is to investigate and demonstrate the application of STPA to radiation oncology. STPA is demonstrated on a new on-line adaptive cranial radiosurgery procedure. The results are compared with a standard FMEA that was applied to the same procedure by an independent group.Rasmussen's Legacy: A Paradigm Change in Engineering for Safety. by Nancy Leveson, Applied Ergnomics, , Special Issue on Reflecting on the Legacy of Jens Rasmussen, in press 2016.
This paper reflects on three applications of Rasmussen's ideas to system engineering practice: intent specifications, STAMP, and extensions of STPA to include more sophisticated human behavior in hazard analysis.Intent Specifications: An Approach to Building Human-Centered Specifications, by Nancy Leveson, IEEE Transactions on Software Engineering , Vol. 26, No. 1, January 2000.
This paper examines and proposes an approach to writing software specifications, based on research in systems theory, cognitive psychology, and human-machine interaction. The goal is to provide specifications that support human problem solving and the tasks that humans must perform in software development and evolution. A type of specification, called intent specifications is constructed upon this underlying foundation.A Systems Approach to Analyzing and Preventing Hospital Adverse Events by Nancy Leveson, Aubrey Samost, Sidney Dekker, Stan Finkelstein, and Jai Raman. Journal of Patient Safety , in press, 2016
CAST is illustrated on a set of adverse cardiovascular surgery events at a large medical center. The reasons behind individual behavior were related to the design of the system involved, not negligence or incompetence on the part of individuals. The CAST results suggest ways to change the context in which decisions are made and thus improve decision making and reduce the risk of an accident.When a Checklist is Not Enough: How to Improve Them and What Else is Needed, by Jai Raman, Aubrety Samost, Nancy Leveson, Nikola Dobrilovic, Maggie Oldham, Sidney Dekker, and Stan Finkelstein. Journal of Thoracic and Cardiovascular Surgery, in press, 2016
Checklists are being introduced to enhance patient safety in hospitals, but the results have been mixed. The goal of this research was to understand why time-outs and checklists are sometimes not effective in preventing surgical adverse events and to identify additional measures needed to reduce these events.Improving Hazard Analysis and Certification of Integrated Modular Avionics by Cody Harrison Fleming and Nancy G. Leveson Journal of Aerospace Information Systems , Vol. 11, No. 6, June 2014.
Integrated modular avionics systems present new opportunities and benefits for developing advanced aircraft avionics, as well as a series of challenges related to hazard analysis and certification. This paper addresses some of those challenges and proposes a new procedure for improving hazard analysis of integrated modular avionics systems. A significant objective of integrated modular avionics architectures is the ability to develop individual software applications independently and then integrate those applications onto one platform. It has been very difficult for both designers and certifiers to understand and predict how the system will behave when the applications are integrated into one system. Traditional fault-based hazard analysis techniques are limited with respect to this problem. This paper uses a different technique, called Systems-Theoretic Process Analysis (STPA), to identify hazardous behavior that emerges when individual applications are integrated. STPA is a systems-theoretic hazard analysis technique that accounts for hazardous behavior due to component interaction, including cases when the components have not failed. STPA is extended in this paper to account for behavior that emerges when software applications share data. The paper illustrates the new approach with an example that includes real-world avionics functions.Safety Assurance in NextGen and Complex Transportation Systems by Cody Harrison Fleming, Melissa Spencer, John Thomas, Nancy Leveson, and Chris Wilkinson, Safety Science , in press.
The methods currently used to assure the safety of planned changes in our air transportation systems were developed 50 years ago for systems composed primarily of hardware components and of much less complexity than the systems we are building today. These methods are not powerful enough to handle the complex, human and software intensive systems being planned and introduced today. This paper describes an alternative and demonstrates it on a new NextGen procedure to allow more flight level changes over oceanic and other regions with limited radar coverage. The new approach and results are compared with the results obtained by the more traditional methods being used for NextGen.Hazard Analysis of Complex Spacecraft using Systems Theoretic Process Analysis by Takuto Ishimatsu, Nancy G. Leveson, John Thomas, Cody Fleming, Masafumi Katahira, Yuko Miyamoto, Ryo Ujiie, Haruka Nakao, and Nobuyuki Hoshino, AIAA Journal of Spacecraft and Rockets , in press, 2013.
A new hazard analysis technique, called System-Theoretic Process Analysis (STPA) is capable of identifying potential hazardous design flaws, including software and system design errors and unsafe interactions among multiple system components. Detailed procedures for performing the hazard analysis were developed and the feasibility and utility of using in on complex systems was demonstrated by applying it to the Japanese Aerospace Exploration Agency H-II Transfer Vehicle. In a comparison of the results of this new hazard analysis technique to those of the standard fault tree analysis used in the design and certification of the H-II Transfer Vehicle, System-Theoretic Hazard Analysis found all the hazardous scenarios identified in the fault tree analysis as well as additional causal factors that had not been) identified by fault tree analysis.Drawbacks in Using the Term "Systems of Systems," by Nancy Leveson, Journal of Biomedical Instrumentation and Technology, March/April 2013.
This essay was written after attending an AAMI/FDA meeting on interoperability in medical devices. In it I express my puzzlement over the term system-of-systems and why it is misleading and may lead to dead ends in solving system safety problems.The Use of Safety Cases in Certification and Regulation by Nancy Leveson. An earlier version of this paper appeared in the Journal of System Safety , Nov/Dec 2011. The version here is updated from that version and includes more material.
Starting with my involvement with the Presidential Oil Spill Commission (on Deepwater Horizon), I started studying the engineering and law literature and have become concerned by the push to use safety cases in the certification of many industries in the U.S. This paper describes what I have learned and my conclusions about the dangers of this approach.Applying System Engineering to Pharmaceutical Safety by Nancy Leveson, Matthieu Couturier, John Thomas, Meghan Dierks, David Wierz, Bruce Psaty, Stan Finkelstein. Journal of Healthcare Engineering, Sept. 2012.
While engineering techniques are used in the development of medical devices and have been applied to individual healthcare processes, such as the use of checklists in surgery and ICUs, the application of system engineering techniques to larger healthcare systems is less common. System safety is the part of system engineering that uses modeling and analysis to identify hazards and to design the system to eliminate or control them. In this paper, we demonstrate how to apply a new, safety engineering static and dynamic modeling and analysis approach to healthcare systems. Pharmaceutical safety is used as the example in the paper, but the same approach is potentially applicable to other complex healthcare systems.Software Challenges in Achieving Space Safety by Nancy Leveson. Journal of the British Interplanetary Society, Vol. 62, 2009.
Techniques developed for hardware reliability and safety do not work on software-intensive systems; software does not satisfy the assumptions underlying these techniques. The new problems and why the current approaches are not effective for complex, software-intensive systems are first described. Then a new approach to hazard analysis and safety-driven design is presented. Rather than being based on reliability theory, as most current safety engineering techniques are, the new approach builds on system and control theory.
A Systems-Theoretic Approach to Safety in Software-Intensive Systems by Nancy Leveson. IEEE Trans. on Dependable and Secure Computing, January 2005.
Traditional accident models were devised to explain losses caused by failures of physical devices in relatively simple systems. They are less useful for explaining accidents in software-intensive systems and for non-technical aspects of safety such as organizational culture and human decision-making. This paper describes how systems theory can be used to form new accident models that better explain system accidents (accidents arising from the interactions among components rather than individual component failure), software-related accidents, and the role of human decision-making. Such models consider the social and technical aspects of systems as one integrated process and may be useful for other emergent system properties such as security. The loss of a Milstar satellite being launched by a Titan/Centaul launch vehicle is used as an illustration of the approach.
Identification of Leading Indicators for Producibility Risk in Early-Stage Aerospace Product Development by Allen J. Ball, MIT Master's Thesis, June 2015.
Abstract: Producibility is an emergent property of product development and manufacturing systems that encapsulates quality, product compliance, cost, and schedule. Detailed product definition and process variation have traditionally been a focus area for understanding risk for producibility losses. It is proposed for this investigation that while assumptions inherent to product configuration and process selection can significantly impact producibility, producibility risk and realized producibility losses are primarily indicated by organizational design assumptions and associated phased implementation of programmatic governance.Safety-Guided Design Analysis in Multi-Purposed Japanese Unmanned Transfer Vehicle. by Ryo Ujiie, System Design and Management Master's Thesis, September 2016.
Abstract: As with other critical systems, space systems are also getting larger and more complex. Although Japan Aerospace Exploration Agency (JAXA) has designed various spacecraft and had not experienced any serious accident for more than 10 years, loss of an astronomical satellite finally happened in 2016 even though the development process was not drastically different from the past. The accident implies that the complexity of space systems can no longer be managed by the traditional safety analysis. Furthermore, in huge system developments, the fluidity of design is rapidly lost as the development proceeds. Thus, creating a safer system design in the early development phase that is capable of handling various undesirable scenarios will significantly contribute to the success of huge and complex system development.
Systems Theoretic Process Analysis Applied to an Offshore Supply Vessel Dynamic Positioning System. by Blake Ryan Abrecht, MIT M.S. in Engineering Systems Thesis, June 2016.
Abstract: This research demonstrates the effectiveness of Systems Theoretic Process Analysis (STPA) and the advantages that result from using this new safety analysis method compared to traditional techniques. To do this, STPA was used to analyze a case study involving Naval Offshore Supply Vessels (OSV) that incorporate software-intensive dynamic positioning in support of target vessel escort operations. The analysis begins by analyzing the OSVs in the context of the Navy’s organizational structure and then delves into assessing the functional relationship between OSV system components that can lead to unsafe control and the violation of existing safety constraints. The results of this analysis show that STPA found all of the component failures identified through independently conducted traditional safety analyses of the OSV system. Furthermore, the analysis shows that STPA finds many additional safety issues that were either not identified or inadequately mitigated through the use of Fault Tree Analysis and Failure Modes and Effects Analysis on this system.
Systems Theoretic Accident Analysis of an Offshore Supply Vessel Collision. by John Michael Mackovjak, Master of Science in Technology and Policy, MIT, June 2016.
Abstract: This thesis uses Dr. Leveson’s Systems-Theoretic Accident Model and Process (STAMP) model of accident causation to analyze a collision in late July 2014 between two Offshore Supply Vessels equipped with software-intensive Dynamic Positioning Systems. The Causal Analysis based on STAMP (CAST) is compared with the Root Cause Analysis, a traditional chain of events based model, used by the original investigation team after the collision. Linear chain of event models like the Root Cause Analysis often look for a broken component or incorrect action within the proximal sequence of events leading to the accident. CAST examines a system’s entire safety control structure to assess why the system constraints, control loops, and process models were either inadequate or flawed. This thesis aims at identifying how the safety control structure of the Offshore Supply Vessel operations could be improved by identifying the systemic factors and component interactions that contributed to the collision.
STAMP applied to Fukushima Daiichi nuclear disasteer and the safety of nuclear power plants in Japan. by Daisuke Uesako, MIT Master's Thesis, System Design and Management Program, June 2016.
Abstract: On March 11, 2011, a huge tsunami generated after the Great East Japan Earthquake triggered an extremely severe nuclear accident at the Fukushima Daiichi Nuclear Power Plant. This thesis analyzes why the stakeholders could not prevent the Fukushima Daiichi nuclear disaster, and, with regard to the future nuclear safety in Japan, what the potentially hazardous control actions could be. Because of the complex sociotechnical nature of nuclear power plants, System-Theoretic Accident Model and Processes (STAMP)—specifically, Causal Analysis based on STAMP (CAST) and System-Theoretic Process Analysis (STPA)—is used for these analyses.
Application of STPA to the Integration of Multiple Control Systems: A Case Study and New Approach , by Matthew Seth Placke, Master's Thesis, Engineering Systems Division, MIT, June 2014
Abstract: A new approach for analyzing multiple control systems within the STPA framework is developed and demonstrated. The new approach meets the growing need of system engineers to analyze integrated control systems, that may or may not have been developed in a coordinated manner, and assess them for safety and performance. This need comes from the increasing proliferation of embedded control systems across domains including defense, energy, healthcare, automotive, aerospace, and consumer products. When multiple embedded control systems are integrated together, they have the potential to operate in uncoordinated and conflicting ways which might hinder their performance and lead to unsafe behavior.
Extending the Human-Controller Methodology in Systems-Theoretic Process Analysis (STPA), by Cameron L. Thornberry, Master's Thesis, Aeronautics and Astronautics, MIT, June 2014
Abstract: Traditional hazard analysis techniques are grounded in reliability theory and analyze the human controller---if at all---in terms of estimated or calculated probabilities of failure. Characterizing sub-optimal human performance as ``human error'' offers limited explanation for accidents and is inadequate in improving the safety of human control in complex, automated systems such as today's aerospace systems. In an alternate approach founded on systems and control theory, Systems-Theoretic Process Analysis (STPA) is a hazard analysis technique that can be applied in order to derive causal factors related to human controllers within the context of the system and its design. The goal of this thesis was to extend the current human-controller analysis in STPA to benefit the investigation of more structured and detailed causal factors related to the human operator.
A STAMP Analysis of the LEX Comair 5191 Accident , by Paul S. Nelson, Master's Thesis, Lund University, Sweden, June 2008, supervised by Prof. Sidney Dekker.
Abstract: A new view, a holistic systems view, that sees individuals in systems, is growing. It is a view which sees ``human error is an effect of trouble deeper inside the system.. [where] we must turn to the system in which people work: the design of equipment, the usefulness of procedures, the existence of goal conflicts and production pressure." (Dekker, 2007) A new, holistic systems perspective, accident model is used for analysis of the Comair 5191 accident in Lexington, KY on August 27, 2006. The new model is called: Systems-Theoretic Accident Modeling and Processes (STAMP). It incorporates three basic components: constraints, hierarchical levels of control, and process loops. Accidents are understood ``in terms of why the controls that were in place did not prevent or detect maladaptive changes, that is, by identifying the safety constraints that were violated and determining why the controls were inadequate in enforcing them. This STAMP analysis of the 5191 accident illustrates the usefulness of the STAMP model to foster evaluation of the whole system and uncover useful levers for elimination of future loss potential thereby making progress on safety.
System Theoretic Safety Analysis of the Sewol-Ho Ferry Accident in South Korea , by Yisug Kwon, MIT Master's Thesis, December 2015.
Abstract: The disaster of the Sewol-Ho, which took place on April 16, 2014 was one of the worst maritime disasters in South Korea in decades, and rescuing only 172 of total 476 people triggered the thorough accident investigations. As the results of the investigations performed by the Korea Maritime Safety Tribunal and the Board of Audit and Inspection of Korea, 399 people among whom blamed for the accident were arrested, 154 of them were in jail, many safety policies and manuals were found inadequate, new safeguards against the kinds of accidents were implemented, and Korean high and low governments’ structures which were related to the accident were reorganized: disbanding the 61-year-old Republic of Korea Coast Guard and establishing a new Ministry responsible for Korean public safety. The accident investigation reports, however, have the limitations of revealing the most important systemic causal mechanism leading to a more complete understanding of the reason why the accident occurred, and therefore, appear to be inadequate in designing and obtaining the sociotechnical system level safety because they did not apply system engineering tools in the investigations.A Systems Approach to Patient Safety: Preventing and Predicting Medical Accidents Using Systems Theory, , by Aubrey Samost, MIT Master's Thesis, June 2015.
Abstract: Patient safety has become a critical concept in healthcare as clinicians seek to provide quality healthcare to every patient in a healthcare system that has grown far more complex than the days of the independent doctor and his black bag making house calls. Accidents in present-day healthcare systems are complicated, with environmental factors, interactions between clinicians, and the pressures exerted by managerial decisions all contributing to these medical mishaps. Despite this complexity, accidents are analyzed using simplistic and outdated techniques modeling systems as mere linear chains of events, when the reality lies far from those neat cause and effect relationships. Further compounding efforts to promote patient safety is the reliance on reactive approaches to safety, waiting for accidents to occur before enacting changes, like a dangerous game of whack-a-mole. What little work is done in prospective hazard analysis tends to be concentrated in niche areas and relies heavily on older analytic techniques.Comparison of SOAM and STAMP for ATM Incident Investigation by Richard Arnold, Master's Thesis, Lund University, Sweden, 2009, supervised by Prof. Sidney Dekker.
Abstract: Systemic Occurrence Analysis Methodology (SOAM) is promoted by Eurocontrol for the analysis of Air Traffic Management (ATM) occurrences. Systems Theoretic Accident Model and Process (STAMP) based on systems theory has been defined by professor Nancy Leveson (MIT) to explain systems accidents (accidents arising from the interactions among components rather than individual component failure). This research analyzes an ATM occurrence using SOAM and STAMP and compares their usefulness in identifying systemic countermeasures. The results show that SOAM is a useful heuristic and a powerful communication device but that it is weak with respect to emergent phenomena and non-linear interactions. SOAM directs the investigator to consider the context in which the events occurred; barriers that failed and organizational factors; the "holes in the Swiss cheese," but not into the processes which created them, or how the whole system can migrate towards the boundaries of safe operations. STAMP directs the investigator more deeply into the mechanism of the interactions between system components, and how systems adapt over time. STAMP helps identify the controls and constraints necessary to prevent undesirable interactions between system components. STAMP also directs the investigation through a structured analysis of the upper levels of the system's control structure which helps to identify high level systemic countermeasures. The global ATM system is undergoing a period of rapid technological and political change. In Europe the Single European Sky ATM Research (SESAR) and in the US the NextGen programs mean that the ATM is moving from centralized human controlled systems to semi automated distributed decision making. Continuous Descent Arrivals flown on datalinked 4D flight paths that are tailored to local constraints and timed for merging traffic require digital information sharing and Collaborative Decision Making on a grand scale, as well as Functional Airspace Blocks designed for optimal airspace efficiency and safety. Detailed new systemic models like STAMP are now necessary to prevent undesirable interactions between normally functioning system components and to understand changes over time in increasingly complex ATM systems.
A CAST Analysis of a U.S. Coast Guard Aviation Mishap , by Jon Hickey, MIT Master's Thesis, May 2012, supervised by Dr. Qi van Eikema Hommes.
Abstract: During a 22-month period, between 2008 and 2010, the U.S. Coast Guard experienced seven Class-A aviation mishaps resulting in the loss of 14 Coast Guard aviators and seven Coast Guard aircraft. This represents the highest Class-A aviation mishap rate the Coast Guard has experienced in 30 years. Following each Class-A mishap, the Coast Guard conducted Mishap Analysis Boards (MAB) in accordance with Coast Guard aviation policy. A MAB involves a detailed investigation and report on the causal and contributing factors of a specific mishap and is conducted in accordance with the Department of Defense Human Factors Analysis and Classification System (DOD HFACS) which is based on the "Swiss Cheese" accident causal analysis model. Individual MAB results did not identify common causal or contributing factors that may be causing systemic failures within the aviation safety system. Subsequently, the Coast Guard completed a more system-focused safety analysis known as the Aviation Safety Assessment Action Plan (ASAAP) comprised of five components: 1) Operational Hazard Analysis; 2) Aviation Safety Survey; 3) Aviation Leadership Improvement Study; 4) Independent Data Analysis Study; and 5) Industry Benchmarking Study. ASAAP recently concluded "complacency in the cockpit and chain of command as the leading environmental factor in the rash of serious aviation mishaps." Although the ASAAP study examined Coast Guard aviation more holistically than individual MABs, it did not apply systems theory and systems engineering approaches.
System Theoretic Process Analysis of Electric Power Steering for Automotive Applications , by Rodrigo Sotomayor Martinez, MIT Master's Thesis, June 2015.
The automotive industry is constantly challenged with meeting and exceeding customer expectations while reducing time to market of new products in order to remain competitive. Providing new features and functionality into vehicles for customer satisfaction is becoming more challenging and driving design complexity to a higher level. Although traditional methods of Product Development Failure Mode identification such as FMEA (Failure Mode and Effect Analysis) or FTA (Fault Three Analysis) have been used to analyze failures in automotive systems, there are limitations when it comes to design errors, flawed requirements, human factors implications, and component interaction accidents in which all components operated as required but the system behavior was not as expected. In order to determine if there is room for improvement in current automotive product development process, this thesis applies Dr. Nancy Leveson's Systems-Theoretic Process Analysis (STPA) technique to compare and contrast with a Failure Modes and Effects Analysis (FMEA) approach as used in the automotive industry through a case study. A formal method of comparing results is proposed. This study found limitations with FMEA in terms of identifying unsafe interactions between systems, anticipating human error and other behaviors dependent on human interaction, identifying engineering design flaws, and producing requirements. STPA was able to find causes that had a direct relationship with those found in FMEA while also finding a portion of causes related to a higher level of abstraction than those in FMEA. STPA also found a subset of causes that FMEA was not able to find, which relate mainly to engineering design flaws and system interaction.Managing Design Changes using Safety-Guided Design for a Safety Critical Automotive System , by John Sgueglia, MIT Master's Thesis, June 2015.
The use of software to control automotive safety critical functions, such as throttle, braking and steering has been increasing. The automotive industry has a need for safety analysis methods and design processes to ensure these systems function safely. Many current recommendations still focus on traditional methods, which worked well for electromechanical designs but are not adequate for software intensive complex systems. System Theoretic Accident Model and Process (STAMP) and the associated System Theoretic Process Analysis (STPA) method have been found to identify hazards for complex systems and can be effective earlier in the design process than current automotive techniques. The design of a complex safety-critical system will require many decisions that can potentially impact the system's safety. A safety analysis should be performed on the new design to understand any potential safety issues. Methods that can help identify where and how the change impacts the analysis would be a useful tool for designers and managers. This could reduce the amount of time needed to evaluate changes and to ensure the safety goals of the system are met. This thesis demonstrates managing design changes for the safety-guided design of an automotive safety-critical shift-by-wire system. The current safety related analysis methods and standards common to the automotive industry and the system engineering methods and research in the use of requirements traceability for impact analysis in engineering change management was reviewed. A procedure was proposed to identify the impact of design changes to the safety analysis performed with STPA. Suggested guidelines were proposed to identify the impact of the change on the safety analysis performed with STPA. It was shown how the impact of the design changes were incorporated into the STPA results to ensure safety constraints are managed with respect to these changes to maintain the safety controls of the system throughout the design process. Finally the feasibility of the procedure was demonstrated through the integration of the procedure with requirements traceability based on system engineering practicesSystem-Theoretic Process Analysis of the Air Force Test Center Safety Management System. , by Nicholas Chung, MIT Master's Thesis, February 2014.
The Air Force Test Center (AFTC) faces new challenges as it continues into the 21st century as the world's leader in developmental flight test. New technologies are becoming ever more sophisticated and less transparent, driving an increase in complexity for tests designed to evaluate them. This shift will place more demands on the AFTC Safety Management System to effectively analyze hazards and preempt the conditions that lead to accidents. In order to determine whether the AFTC Safety Management System is prepared to handle new safety challenges, this thesis applied Dr. Nancy Leveson's Systems-Theoretic Process Analysis (STPA) technique. The safety management system was analyzed and potential safety constraint violations due to systemic factors, unsafe component interactions, as well as component failures were investigated. The analysis identified the key features that make the system effective; gaps in the sub-processes, roles, responsibilities, and tools; and opportunities to improve the system. These findings will provide insights on how the AFTC Safety Management System can be improved with the aim of preventing accidents from occurring during flight test operations. Finally, this thesis demonstrated the effectiveness of the STPA technique at hazard analysis on an organizational process.Application of CAST to Hospital Adverse Events , by Meaghan O'Neil, MIT Master's Thesis, May 2014.
Despite the passage of 15 years since the Institute of Medicine sought to galvanize the nation with its report To Err is Human, the authors' goal to dramatically improve the quality of healthcare delivery in the United States has yet to be accomplished. While the report and subsequent efforts make frequent reference to the challenges of designing and obtaining system safety, few system tools have been applied in the healthcare industry. Instead, methods such as root cause analysis (RCA) are the current accepted industry standards. The Systems Theoretic Accident Model and Processes (STAMP) is a model created by Dr. Nancy Leveson that has been successfully applied in a number of industries worldwide to improve system safety. STAMP has the capability to aid the healthcare industry professionals in reaching their goal of improving the quality of patient care. This thesis applies the Causal Accident Systems Theoretic (CAST) accident analysis tool, created by Dr. Leveson based on STAMP, to a hospital accident. The accident reviewed is a realistic, fictionalized accident described by a case study created by the VA to train healthcare personnel in the VA RCA methodology. This thesis provides an example of the application of CAST and provides a comparison of the method to the outcomes of an RCA performed by the VA independently on the same case. The CAST analysis demonstrated that a broader set of causes was identified by the systems approach compared to that of the RCA. This enhanced ability to identify causality led to the identification of additional system improvements. Continued future efforts should be taken to aid in the adoption of a systems approach such as CAST throughout the healthcare industry to ensure the realization of the quality improvements outlined by the IOM in 1999.Application of Systems-Theoretic Approach to Risk Analysis of High-Speed Rail Project Management in the U.S. , by Soshi Kawakami, MIT Master's Thesis, June 2014.
High-speed rail (HSR) is drawing attention as an environmentally-friendly transportation mode, and is expected to be a solution for sociotechnical transportation issues in many societies. Currently, its market has been rapidly expanding all over the world. In the US, the Federal Railroad Administration (FRA) released a strategic vision to develop new HSRs in 2008, specifically focusing on 10 corridors, including the Northeast Corridor (NEC) from Boston to Washington D.C. With such rapid growth, safety is a growing concern in HSR projects; in fact, there have been two HSR accidents over the past three years. In developing a new HSR system, it is crucial to conduct risk analysis based on lessons learned from these past accidents. Furthermore, for risk analysis of complex sociotechnical systems such as HSR systems, a holistic system-safety approach focusing not only on physical domains but also on institutional levels is essential. With these perspectives, this research proposes a new system-based safety risk analysis methodology for complex sociotechnical systems. This methodology is based on the system safety approach, called STAMP (System-Theoretic Accident Model and Processes). As a case study, the proposed HSR project in the NEC is analyzed by this methodology. This methodology includes steps of conducting STAMP-based accident analysis, developing a safety model of the HSR system in the NEC, and analyzing safety risks of it based on lessons learned from the analyzed accidents, with a specific focus on the institutional structure. As a result of this analysis, 58 NEC-specific risks are identified, and with them, weaknesses of safety-related regulations applied to the project are discussed. Additionally, this research introduces System Dynamics to analyze further detailed causal relations of the identified risks and discusses its potential usage for risk analysis. This thesis research concludes with specific recommendations about safety management in the project in the NEC, making a point that the proposed methodology can be valuable for the actual project processes as a "safety-guided institutional design" tool.Application of CAST and STPA to Railroad Safety. , by Airong Dong, MIT Master's Thesis, May 2012.
Abstract: The accident analysis method called STAMP (System-Theoretic Accident Model), developed by Prof. Nancy Leveson from MIT, was used here to re-analyze a High Speed Train accident in China. On July 23rd, 2011, 40 people were killed and 120 injured on the Yong-Wen High Speed Line. The purpose of this new analysis was to apply the broader view suggested by STAMP, considering the whole sociotechnological system and not only equipment failures and operators mistakes, in order to come up with new findings, conclusions and recommendations for the High Speed Train System in China.
Engineering Financial Safety: A System-Theoretic Case Study from the Financial Crisis , by Melissa Spencer, MIT TPP (Technology and Policy Program) Master's Thesis, May 2012.
Abstract: There is currently much systems-based thinking going into understanding safety in complex socio-technical systems and in developing useful accident analysis methods. However, when it comes to complex systems without clear physical components, the techniques for understanding accidents are antiquated and ineffective. This thesis uses a promising new engineering-based accident analysis methodology, CAST (Casual Analysis using STAMP, or Systems Theoretic Accident Models and Processes) to understand an aspect of the financial crisis of 2007-2008.
A Systems Theoretic Application to Design for the Safety of Medical Diagnostic Devices , by Vincent Balgos, MIT SDM Master's Thesis, February 2012, supervised by Dr. Qi van Eikema Hommes.
Abstract: In today's environment, medical technology is rapidly advancing to deliver tremendous value to physicians, nurses, and medical staff in order to support them to ultimately serve a common goal: provide safe and effective medical care for patients. However, these complex medical systems are contributing to the increasing number of healthcare accidents each year. These accidents present unnecessary risk and injury to the very population these systems are designed to help. Thus the current safety engineering techniques that are widely practiced by the healthcare industry during medical system development are inadequate in preventing these tragic accidents. Therefore, there is a need for a new approach to design safety into medical systems.
of a System Safety Framework in Hybrid Socio-Technical Environment
Abstract: The political transformation and transition of post-Soviet societies have led to hybrid structures in political, economic and technological domains. In such hybrid structures the roles of government, state enterprise, private business and civil society are not clearly defined. These roles shift depending on formal and informal interests, availability and competition for limited resources, direct and indirect financial benefits, internal and external agendas. In an abstract sense, a hybrid is "anything derived from heterogeneous sources, or composed of elements of different or incongruous kinds." If transition is a process from one state to another, hybrid is a state unto itself. In the context of this thesis Hybrid Socio-Technical Environment means the co-existence of different institutions and policies, state and private business entities, old and new technologies, managerial models and practices of planning and market economies, collectivist and individualist value systems.Developing System-Based Leading Indicators for Proactive Risk Management in the Chemical Processing Industry by Ibrahim Khawaji, MIT ESD Master's Thesis, May 2012.
Abstract: The chemical processing industry has faced challenges with achieving improvements in safety performance, and accidents continue to occur. When accidents occur, they usually have a confluence of multiple factors, suggesting that there are underlying complex systemic problems. Moreover, accident investigations often reveal that accidents were preventable and that many of the problems were known prior to those accidents, suggesting that there may have been early warning signs.Integrating Safety into an Engineering Contractor's System Engineering Process using the Guidelines of STAMP, by Lorena Pelegrin, Master's Thesis, Herriot-Watt University, August 2012.
Abstract "Engineering Contractor"(EC)is a group of engineering and consulting companies providing services worldwide in the fields of oil and gas, water and environment, energy and climate protection, and transport and structures. Because currently there is no consolidated system engineering process that includes designing for safety systematically and the top management of EC has understood the responsibility of EC in the safety of the systems they engineer, the present thesis was proposed.A System Theoretic Safety Analysis of Friendly Fire Prevention in Ground Based Missile Systems, by Scott McCarthy, MIT SDM Master's Thesis, January 2013.
Abstract: This thesis uses STAMP to analyze a friendly fire accident that occurred on 22 March 03 between a British Tornado aircraft and a US Patriot Missile battery. This causation model analyzs system constraints, control loops, and process models to identify inadequate control structures leading to hazards and preventative measures that may be taken to reduce the effects of these hazards. By using a system-based causation model like STAMP, rather than a traditional chain of events model, this thesis aimed to identify systemic factors and component interactions that may have contributed to the accident, rather than simply analyzing component failures. Additionally, care was taken to understand the rationale for decisions that were made, rather than assigning blame. The analysis identified a number of areas in which control flaws or inadequacies led to the friendly fire incident. A set of recommendations was developed that may help to prevent similar accidents from occurring in the future.
Safety benefit assessment, vehicle trial safety and crash analysis of automated driving: a Systems Theoretic approach. by Stephanie Alvarez, Ecole Mines Paris Tech, Ph.D. Dissertation, June 2017.
Abstract: The research conducted in this thesis aimed to examine the safety benefit, trial safety and the accident analysis of automated driving, by applying the STAMP model and associated methods STPA and CAST. The STAMP-based approach was selected to independently address the three issues by modeling and analyzing the multiple levels of the entire road transport system and their interactions.
Safety-Driven Early Concept Analysis and Development by Cody Harrison Fleming, MIT Ph.D. Dissertation, January 2015.
Abstract: As aerospace systems become increasingly complex and the roles of human operators and autonomous software continue to evolve, traditional safety-related analytical methods are becoming inadequate. Traditional hazard analysis tools are based on an accident causality model that does not capture many of the complex behaviors found in modern engineered systems. Additionally, these traditional approaches are most effective during late stages of system development, when detailed design information is available. However, system safety cannot cost-effectively be assured by discovering problems at these late stages and adding expensive updates to the design. Rather, safety should be designed into the system from its very conception. The primary barrier to achieving this objective is the lack of effectiveness of the existing analytical tools during early concept development.
Extending and Automating a Systems-Theoretic Hazard Analysis for Requirements Generation and Analysis by John Thomas, MIT Ph.D. Dissertation, June 2013.
Using STPA to Inform Developmental Product Testing by Major Daniel R. Montes, U.S. Air Force, MIT Ph.D. Dissertation, February 2016.
Abstract: Developmental product testing currently evaluates system safety the same way it evaluates system performance: it attempts to isolate individual components’ behaviors to evaluate their reliability. However, today’s systems are often irreducible because of their complexity, leaving current practices ineffective at identifying safety deficiencies. Evolving to a modern systems-based hazard analysis is important for product development. Products stand to benefit during the testing stage, before initial fielding. In test, designs meet operation for the first time, and use practices and organizational influences both contribute to the safety of the system. By evaluating safety as an emergent property, hazards that emerge because of the testing process itself can be mitigated, and hazards that exist because of the inherent system design and use philosophy can be identified and traced throughout development and fielding.
Accident Analysis and Hazard Analysis for Human and Organizational Factors by Margaret Stringfellow, October 2010.
Abstract: Current hazard analysis methods, adapted from traditional accident models, are not able to evaluate the potential for risk migration, or comprehensively identify accident scenarios involving humans and organizations. Thus, system engineers are not able to design systems that prevent loss events related to human error or organizational factors. State of the art methods for human and organization hazard analysis are, at best, elaborate event-based classification schemes for potential errors. Current human and organization hazard analysis methods are not suitable for use as part of the system engineering process.
A Framework for Dynamic Safety and Risk Management Modeling in Complex Systems by Nicolas Dulac, February 2007.
Almost all traditional hazard analysis or risk assessment techniques, such as failure modes and effect analysis (FMEA), fault tree analysis (FTA), and probabilistic risk analysis (PRA) rely on a chain-of-event paradigm of accident causation. Event-based techniques have some limitations for the study of modern engineering systems. Specifically, they are not suited to handle complex software-intensive systems, complex human-machine interactions, and systems-of-systems with distributed decision-making that cut across both physical and organizational boundaries. [...]
Development of a Systematic Risk Management Approach for CO2 Capture, Transport, and Storage Projects by Jaleh Samadi, L'Ecole Nationale Superieure des Mines de Paris Ph.D. dissertation, December, 2012
Abstract: A systematic risk management framework for CO2 capture, transport, and storage projects is proposed. The approach is founded on the concepts of system thinking, STAMP, STPA, and system dynamics. The objective is to provide a means of decision making for these types of projects in the actual context where the future of the technology is uncertain.Systems Theoretic Hazard Analysis (STPA) Applied to the Risk Review of Complex Systems: An Example from the Medical Device Industry by Blandine Antoine, MIT Ph.D. dissertation, December, 2012
Abstract: Methods developed by system engineers could beneficially be applied to the challenge of ensuring patient safety in health care delivery. Achieving safe operations in this and other settings requires that system behavior be bound by safety constraints. These must be defined and enforced at every stage of system design, system operations, and, when applicable, system retirement.A New Approach to Risk Analysis with a Focus on Organizational Risk Factors. by Karen Marais, MIT Ph.D. dissertation, June, 2005
Abstract: Preventing accidents in complex socio-technical systems requires an approach to risk management that continuously monitors risk and identifies potential areas of concern before they lead to hazards, and constrains hazards before they lead to accidents. This research introduces the concept of continuous participative risk management, in which risks are continuously monitored throughout the lifetime of a system, and members from all levels of the organization are involved both in risk analysis and in risk mitigation.
A System-Theoretic Hazard Analysis Methodology for a Non-advocate Safety Assessment of the Ballistic Missile Defense System by Steve Pereira, Grady Lee, and Jeffrey Howard. Proceedings of the 2006 AIAA Missile Sciences Conference, Monterey, CA, November 2006.
The Missile Defense Agency (MDA) is developing the Ballistic Missile Defense System (BMDS) as a layered defense to defeat all ranges of threats in all phases of flight (boost, midcourse, and terminal). The BMDS integrates into a single system a number of Elements that had been developed independently, such as SBIRS/DSP, Aegis BMD, and Ground-based Midcourse Defense (GMD). The Elements of the BMDS have active safety programs, but complexity, coupling, and safety risk are introduced by their integration into a single system. Assessing the safety of the integrated BMDS required analysts to come up to speed using existing Element project documentation, assess the safety risk of the system, and make recommendations regarding hazard mitigation and risk acceptance. This effort often required conducting hazard analyses to supplement existing Element analysis work; working with existing engineering artifacts; and making recommendations for hazard mitigations late in the system life cycle, when there is less flexibility for design changes. This paper presents a safety assessment methodology based on STPA (a systems-theoretic hazard analysis); the assessment methodology provides an organized, methodical, and effective means to assess safety risk and develop appropriate hazard mitigations regardless of where in the life cycle the assessment is started.
A New Approach to Hazard Analysis for Rotorcraft by Blake Abrecht, Dave Arterburn, David Horney, Brandon Abel, Jon Schneider, and Nancy Leveson. Proceedings of the 2016 American Helicopter Society Technical Meeting, Huntsville, AL, February 2016.
Abstract : STPA is a new hazard analysis technique that can identify more hazard causes than traditional techniques. It is based on the assumption that accidents result from unsafe control rather than component failures. To demonstrate and evaluate STPA for its application to rotorcraft, it was used to analyze the UH-60MU Warning, Caution, and Advisory (WCA) system associated with the electrical and fly-by-wire flight control system (FCS). STPA results were compared with an independently conducted hazard analysis of the UH-60MU using traditional safety processes described in SAE ARP 4761 and MIL-STD-882E. STPA found the same hazard causes as the traditional techniques and also identified things not found using traditional methods, including design flaws, human behavior, and component integration and interactions. The analysis includes organizational and physical components of systems and can be used to design safety into the system from the beginning of development while being compliant with MIL-STD-882.
Integrating Systems Safety into Systems Engineering during Concept Development by Cody Harrison Fleming and Nancy Levseon, Proceedings of the 2015 International Symposium on System Engineering (INCOSE), Seattle, July 2015 [Best Paper Award]
Abstract: Safety should be designed into systems from their very conception, which can be achieved by integrating powerful hazard analysis techniques into the general systems engineering process. The primary barrier to achieving this objective is the lack of effectiveness of the existing analytical tools during early concept development.
Integration of Multiple Active Safety Systems Using STPA by Seth Placke, John Thomas, and Dajiang Suo, SAE Technical Paper 2015-01-0277, April 2015, doi:10.4271/2015-01-0277.
Abstract: Automobiles are becoming ever more complex as advanced safety features are integrated into the vehicle platform. As the pace of integration and complexity of new features rises, it is becoming increasingly difficult for system engineers to assess the impact of new additions on vehicle safety and performance. In response to this challenge, a new approach for analyzing multiple control systems as an extension to the Systems Theoretic Process Analysis (STPA) framework has been developed. The new approach meets the growing need of system engineers to analyze integrated control systems, that may or may not have been developed in a coordinated manner, and assess them for safety and performance.
An Integrated Approach to Requirements Development and Hazard Analysis. by John Thomas, John Sgueglia, Dajiang Suo, and Nancy Leveson. SAE Technical Paper 2015-01-0274, April 2015, doi:10.4271/2015-01-0277.
Abstract: The introduction of new safety critical features using software-intensive systems presents a growing challenge to hazard analysis and requirements development. These systems are rich in feature content and can interact with other vehicle systems in complex ways, making the early development of proper requirements critical. Catching potential problems as early as possible is essential because the cost increases exponentially the longer problems remain undetected. However, in practice these problems are often subtle and can remain undetected until integration, testing, production, or even later, when the cost of fixing them is the highest.
Including Safety during Early Development Phases of Future Air Traffic Management Concepts. by Cody H. Fleming and Nancy Levseon. Eleventh USA/Europe Air Traffic Management Research and Development Seminar (ATM2015) June 2015.
Abstract: Safety should be designed into future air traffic management systems from their very conception, which can be achieved by integrating powerful hazard analysis techniques into the general systems engineering process. The primary barrier to achieving this objective is the lack of effectiveness of the existing analytical tools during early concept development. This paper introduces a new technique, which is based on a more powerful model of accident causality—called systems-theoretic accident model and process (STAMP)—that can capture behaviors that are prevalent in these complex, software-intensive systems. The goals are to (1) develop rigorous, systematic tools for the analysis of future ATM concepts in order to identify potentially hazardous scenarios and undocumented assumptions, and (2) extend these tools to assist stakeholders in the development of concepts using a safety-driven approach.
Incorporating New Methods of Classifying Domain Information for Use in Safety Hazard Analysis. by Nancy Leveson, Daniel Montes, and Leia Stirling. Proceedings of the International Symposium on Aviation Psychology, Dayton, Ohio, May 2015.
Abstract: The increase of interacting humans and autonomous components in complex systems necessitates rigorous methods to classify domain information pertaining to controllers in the system. Systems-Theoretic Process Analysis (STPA) was developed at MIT as a method for identifying hazardous scenarios from a system design in order to generate functional system requirements to eliminate or control those scenarios. An STPA analysis, while systems-based and including human operators (e.g., pilots and air-traffic controllers) in the scenarios, is currently limited in the types of human contribution to accidents that it can identify (which are primarily related to situation awareness). This paper extends STPA in three ways: first, the analysis of the controller mental model was updated to include more system features; second, fundamental human-engineering considerations were added; and third, types and sources of decision-making influences that transfer from the planning cycle to the operations cycle were identified.
Assuring Safety of NextGen Procedures by Cody H. Fleming, Nancy G. Leveson, M. Seth Placke. Presented at the Tenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2013).
This paper introduces an innovative approach to analyzing safety in the next generation of air traffic management systems. The proposed method is based on systems and control theory and is able to capture system design and component interaction causes that are increasingly frequent in accidents. The new methodology is applicable during the entire design lifecycle from early concept selection through final certification. Hazard analysis of a completed NextGen concept, In-Trail Procedure, is demonstrated as well as use in the early concept development of Trajectory Based Operations.
Modeling and Hazard Analysis using STPA by Takuto Ishimatsu, Nancy Leveson, John Thomas, Masa Katahira, Yuko Miyamoto, Haruka Nakao. Presented at the Conference of the International Association for the Advancement of Space Safety, Huntsville, Alabama, May 2010.
A joint research project between MIT and JAXA/JAMSS is investigating the application of a new hazard analysis technique, called STPA, to the system and software in the HTV. STPA is based on systems theory rather than reliability theory. It treats safety as a control problem rather than a failure problem. Traditional hazard analysis focuses on component failures but software does not fail in this way. Software most often contributes to accidents by commanding the spacecraft into an unsafe state (e.g., turning off the descent engines prematurely) or by not issuing required commands. That makes the standard hazard analysis techniques of limited usefulness on software-intensive systems, which describes most spacecraft built today.
Multiple Controller Contributions to Hazards by Takuto Ishimatsu, Nancy Leveson, Cody Fleming, Masa Katahira, Yuko Miyamoto, and Haruka Nakao. This paper was presented at the Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.
One contributor to hazards in complex systems arises out of unsafe interactions among multiple controllers. The basic problem is that in complex systems, hazards can be created by interactions among components that are each operating "correctly." STPA is a new hazard analysis that includes both system hazards caused by component failures (as do the traditional analysis techniques) and also those caused by unsafe interactions among components that may not have individually failed. The first descriptions of STPA, however, did not include examples of how to handle potential problems that occur between multiple controllers. We have created an approach to identify possible unsafe interactions among multiple controllers so that the system can be designed to eliminate any ambiguity or potential for unsafe controller interactions. In this paper, we describe the analysis technique and demonstrate its use for the HTV during the critical approach phase. Once these hazardous interactions are identified, they can then be eliminated or controlled through system design or operational procedures.Safety-Guided Design of Crew Return Vehicle in the Concept Design Phase using STAMP/STPA by Haruka Nakao, Masa Katahira, Yuko Miyamoto, and Nancy Leveson. This paper was presented at Conference of the International Association for the Advancement of Space Safety , Versailles, France, October 2011.
In the concept development and design phase of a new space system, such as a Crew Vehicle, designers tend to focus on how to implement new technology. Designers also consider the difficulty of using the new technology and trade off several system design candidates. Then they choose an optimal design from the candidates. Safety should be a key aspect driving optimal concept design. However, in past concept design activities, safety analysis such as FTA has not used to drive the design because such analysis techniques focus on component failure and component failure cannot be considered in the concept design phase.A System Theoretic Analysis of the "7.23" Yong-Tai-Wen Railway Accident . This paper, by Dajiang Suo from the Computer Science and Technology Dept., Tsinghua University, Beijing, China, was presented at the 1st STAMP/STPA Workshop held at MIT on April 26-28, 2012.
This paper analyzes the "7.23" Yongwen Railway accident in China from a system theoretic perspective. In particular, the STAMP safety control structure for this accident has been constructed and divided into two respective processes including system development and operation, which are then analyzed at each level. Furthermore, to understand why and how the system evolved over time, system dynamics models are constructed to describe the changes indirectly leading to the accident. As can be seen, this analysis raises some questions which are not included in the investigation report but critical to the comprehensive understanding of the accident. Based on the analysis results, recommendations are generated aiming at preventing the same kind of accidents in the future.
Application of a Safety-Driven Design Methodology to An Outer Planet Exploration Mission by Brandon D. Owens, Margaret Stringfellow Herring, Nicholas Dulac, Nancy Leveson, Michel Ingham, and Kathryn Ann Weiss. IEEE Aerospace Conference, Big Sky, Montana, March 2008.
A conference paper on one of our early applications of STPA and intent specifications on a JPL exploratory spacecraft. Technical reports with more details can be found below. We have evolved the
The goal of this report is to compare the approach widely used to assess and certify aircraft with a new, systems-theoretic hazard analysis technique called STPA and to determine whether there are important factors missing from the commonly used approach. The wheel brake example in ARP 4761 is used in the comparison.Systems Theoretic Process Analysis (STPA) of an Offshore Supply Vessel Dynamic Positioning System, by Blake Abrecht and Nancy Leveson MIT Lincoln Laboratory Research Report, Feb. 17, 2016
To demonstrate the effectiveness of STPA and the advantages that result from using this new safety analysis method compared to traditional techniques, STPA was used to analyze Naval Offshore Supply Vessels (OSV) that utilize software-intensive dynamic positioning in support of target vessel escort operations. The analysis begins by analyzing the OSVs in the context of the Navy’s organizational structure and then delves into assessing the functional relationship between OSV system components that can lead to unsafe control and the violation of existing safety constraints.Evaluating the Safety of Digital Instrumentation and Control Systems in Nuclear Power Plants by John Thomas, Francisco Luiz de Lemos, and Nancy Leveson, Research Report: NRC-HQ-11-6-04-0060, November 2012
A demonstration of the applicability, feasibility, and relative efficacy of using STPA in the licensing of digital nuclear power plants. STPA has the potential to augment existing review and certification or licensing regime with the aim of not only providing means to assess hazards associated with the introduction of digital technology in nuclear power plants, but also tools to evaluate the extent to which these hazards are adequately mitigated by the encompassing system architecture and to generate recommendations for safety-driven improvements when they are needed. STPA can assist in the classification of components as safety-related vs. non-safety-related; in identifying potential operator errors and their causes and safety culture flaws; in broadeing the standard analysis and oversight to social, organizational, and managerial factors; assisting in understanding applicant functional designs; and enhancing the revew of candidate designs.STPA Analysis of NextGen Interval Management Components: Ground Interval Management (GIM) and Flight Deck Interval Management (FIM) by Cody H. Fleming, M. Seth Placke, and Nancy Leveson, FAA and Lincoln Lab, September 2013.
Safety Assurance in NextGen by Cody Harrison Fleming, Melissa Spencer, Nancy Leveson, and Chris Wilkinson, NASA Research Report NASA/CR-2012-217553
This technical report is one of the deliverables for a NASA-sponsored research project where STPA was demonstrated, evaluated, and compared both to the more traditional hazard analysis approaches from decades past as well as newer certification approaches used by the FAA and Eurocontrol. For this case study, a new ATC procedure, called ATSA-ITP (Airborne Traffic Situational Awareness In-Trail Procedure) was used becvause the safety analysis had already been performed and safety requirements generated.Risk Analysis of NASA Independent Technical Authority by Nancy Leveson and Nicolas Dulac (co-investigators include John Carroll, Joel Cutcher-Gershenfeld, Betty Barrett, David Zipkin) February 2005
To assist with the planning of a NASA assessment of the health of Independent Technical Authority (ITA), we performed perform a risk analysis, based on STAMP, to identify and understand the risks and vulnerabilities of this new organizational structure and to identify the metrics and measures of effectiveness that would be most effective in the planned assessment. This report describes the results of our risk analysis and presents recommendations for both metrics and measures of effectiveness and for potential improvements in the ITA process and organizational design to minimize the risks we identified.Demonstration of a New Dynamic Approach to Risk Analysis for NASA's Constellation Program, by Nicolas Dulac, Brandon Owers, and Nancy Leveson (co-investigators include John Carroll, Joel Cutcher-Gershenfeld, Betty Barrett, and Stephen Friedenthal, Joseph Laracy, and Joseph Sussman) March 2007
This report summarizes the results of a study conducted at the request of the NASA Exploration Systems Mission Directorate (ESMD) to evaluate the usefulness of systems theoretic analysis and modeling of safety risk in the development of exploration systems. In addition to fulfilling the specific needs of ESMD, this study is part of an ongoing effort to develop and refine techniques for modeling and treating organizational safety culture as a dynamic control problem.Safety-Driven Model-Based System Engineering Methodology Part I: Methodology Description and Safety-Driven Model-Based System Engineering Methodology Part II: Application of the Methodology to an Outer Planet Exploration Mission by Brandon Owens, Margaret Stringfellow Herring, Nancy Leveson (MIT) and Mitch Ingham, Kathryn Weiss JPL). December 2007.
This report presents one of our first attempts to create a Safety-Driven, Model-Based System Engineering Methodology to fold hazard analysis into the design process rather than being conducted as a separate activity. The work was done with NASA JPL. The methodology integrates STAMP, STPA, intent specifications (a structured, constraint-based system engineering specification framework), and JPL's State Analysis (a model-based systems engineering approach). Part I describes the methodology while Part II shows an application of the methodology to an outer planet exploration mission.