Philosophy and the study of technology failure

image: Adolf von Menzel, The Iron Rolling Mill (Modern Cyclopes)
 

Readers may have noticed that my current research interests have to do with organizational dysfunction and largescale technology failures. I am interested in probing the ways in which organizational failures and dysfunctions have contributed to large accidents like Bhopal, Fukushima, and the Deepwater Horizon disaster. I’ve had to confront an important question in taking on this research interest: what can philosophy bring to the topic that would not be better handled by engineers, organizational specialists, or public policy experts?

One answer is the diversity of viewpoint that a philosopher can bring to the discussion. It is evident that technology failures invite analysis from all of these specialized experts, and more. But there is room for productive contribution from reflective observers who are not committed to any of these disciplines. Philosophers have a long history of taking on big topics outside the defined canon of “philosophical problems”, and often those engagements have proven fruitful. In this particular instance, philosophy can look at organizations and technology in a way that is more likely to be interdisciplinary, and perhaps can help to see dimensions of the problem that are less apparent from a purely disciplinary perspective.

There is also a rationale based on the terrain of the philosophy of science. Philosophers of biology have usually attempted to learn as much about the science of biology as they can manage, but they lack the level of expertise of a research biologist, and it is rare for a philosopher to make an original contribution to the scientific biological literature. Nonetheless it is clear that philosophers have a great deal to add to scientific research in biology. They can contribute to better reasoning about the implications of various theories, they can probe the assumptions about confirmation and explanation that are in use, and they can contribute to important conceptual disagreements. Biology is in a better state because of the work of philosophers like David Hull and Elliot Sober.

Philosophers have also made valuable contributions to science and technology studies, bringing a viewpoint that incorporates insights from the philosophy of science and a sensitivity to the social groundedness of technology. STS studies have proven to be a fruitful place for interaction between historians, sociologists, and philosophers. Here again, the concrete study of the causes and context of large technology failure may be assisted by a philosophical perspective.

There is also a normative dimension to these questions about technology failure for which philosophy is well prepared. Accidents hurt people, and sometimes the causes of accidents involve culpable behavior by individuals and corporations. Philosophers have a long history of contribution to these kinds of problems of fault, law, and just management of risks and harms.

Finally, it is realistic to say that philosophy has an ability to contribute to social theory. Philosophers can offer imagination and critical attention to the problem of creating new conceptual schemes for understanding the social world. This capacity seems relevant to the problem of describing, analyzing, and explaining largescale failures and disasters.

The situation of organizational studies and accidents is in some ways more hospitable for contributions by a philosopher than other “wicked problems” in the world around us. An accident is complicated and complex but not particularly obscure. The field is unlike quantum mechanics or climate dynamics, which are inherently difficult for non-specialists to understand. The challenge with accidents is to identify a multi-layered analysis of the causes of the accident that permits observers to have a balanced and operative understanding of the event. And this is a situation where the philosopher’s perspective is most useful. We can offer higher-level descriptions of the relative importance of different kinds of causal factors. Perhaps the role here is analogous to messenger RNA, providing a cross-disciplinary kind of communications flow. Or it is analogous to the role of philosophers of history who have offered gentle critique of the cliometrics school for its over-dependence on a purely statistical approach to economic history.

So it seems reasonable enough for a philosopher to attempt to contribute to this set of topics, even if the disciplinary expertise a philosopher brings is more weighted towards conceptual and theoretical discussions than undertaking original empirical research in the domain.

What I expect to be the central finding of this research is the idea that a pervasive and often unrecognized cause of accidents is a systemic organizational defect of some sort, and that it is enormously important to have a better understanding of common forms of these deficiencies. This is a bit analogous to a paradigm shift in the study of accidents. And this view has important policy implications. We can make disasters less frequent by improving the organizations through which technology processes are designed and managed.

System safety

An ongoing thread of posts here is concerned with organizational causes of large technology failures. The driving idea is that failures, accidents, and disasters usually have a dimension of organizational causation behind them. The corporation, research office, shop floor, supervisory system, intra-organizational information flow, and other social elements often play a key role in the occurrence of a gas plant fire, a nuclear power plant malfunction, or a military disaster. There is a tendency to look first and foremost for one or more individuals who made a mistake in order to explain the occurrence of an accident or technology failure; but researchers such as Perrow, Vaughan, Tierney, and Hopkins have demonstrated in detail the importance of broadening the lens to seek out the social and organizational background of an accident.

It seems important to distinguish between system flaws and organizational dysfunction in considering all of the kinds of accidents mentioned here. We might specify system safety along these lines. Any complex process has the potential for malfunction. Good system design means creating a flow of events and processes that make accidents inherently less likely. Part of the task of the designer and engineer is to identify chief sources of harm inherent in the process — release of energy, contamination of food or drugs, unplanned fission in a nuclear plant — and design fail-safe processes so that these events are as unlikely as possible. Further, given the complexity of contemporary technology systems it is critical to attempt to anticipate unintended interactions among subsystems — each of which is functioning correctly but that lead to disaster in unusual but possible interaction scenarios.

In a nuclear processing plant, for example, there is the hazard of radioactive materials being brought into proximity with each other in a way that creates unintended critical mass. Jim Mahaffey’s Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima offers numerous examples of such unintended events, from the careless handling of plutonium scrap in a machining process to the transfer of a fissionable liquid from a vessel of one shape to another. We might try to handle these risks as an organizational problem: more and better training for operatives about the importance of handling nuclear materials according to established protocols, and effective supervision and oversight to ensure that the protocols are observed on a regular basis. But it is also possible to design the material processes within a nuclear plant in a way that makes unintended criticality virtually impossible — for example, by storing radioactive solutions in containers that simply cannot be brought into close proximity with each other.

Nancy Leveson is a national expert on defining and applying principles of system safety. Her book Engineering a Safer World: Systems Thinking Applied to Safety is a thorough treatment of her thinking about this subject. She offers a handful of compelling reasons for believing that safety is a system-level characteristic that requires a systems approach: the fast pace of technological change, reduced ability to learn from experience, the changing nature of accidents, new types of hazards, increasing complexity and coupling, decreasing tolerance for single accidents, difficulty in selecting priorities and making tradeoffs , more complex relationships between humans and automation, and changing regulatory and public view of safety (kl 130 ff.). Particularly important in this list is the comment about complexity and coupling: “The operation of some systems is so complex that it defies the understanding of all but a few experts, and sometimes even they have incomplete information about the system’s potential behavior” (kl 137). 

Given the fact that safety and accidents are products of whole systems, she is critical of the accident methodology generally applied to serious industrial, aerospace, and chemical accidents. This methodology involves tracing the series of events that led to the outcome, and identifying one or more events as the critical cause of the accident. However, she writes:

In general, event-based models are poor at representing systemic accident factors such as structural deficiencies in the organization, management decision making, and flaws in the safety culture of the or industry. An accident model should encourage a broad view of accident mechanisms that expands the investigation beyond the proximate evens.A narrow focus on technological components and pure engineering activities or a similar narrow focus on operator errors may lead to ignoring some of the most important factors in terms of preventing future accidents. (kl 452)

Here is a definition of system safety offered later in ESW in her discussion of the emergence of the concept within the defense and aerospace fields in the 1960s:

System Safety … is a subdiscipline of system engineering. It was created at the same time and for the same reasons. The defense community tried using the standard safety engineering techniques on their complex new systems, but the limitations became clear when interface and component interaction problems went unnoticed until it was too late, resulting in many losses and near misses. When these early aerospace accidents were investigated, the causes of a large percentage of them were traced to deficiencies in design, operations, and management. Clearly, big changes were needed. System engineering along with its subdiscipline, System Safety, were developed to tackle these problems. (kl 1007)

Here Leveson mixes system design and organizational dysfunctions as system-level causes of accidents. But much of her work in this book and her earlier Safeware: System Safety and Computers gives extensive attention to the design faults and component interactions that lead to accidents — what we might call system safety in the narrow or technical sense.

A systems engineering approach to safety starts with the basic assumption that some properties of systems, in this case safety, can only be treated adequately in the context of the social and technical system as a whole. A basic assumption of systems engineering is that optimization of individual components or subsystems will not in general lead to a system optimum; in fact, improvement of a particular subsystem may actually worsen the overall system performance because of complex, nonlinear interactions among the components. (kl 1007) 

Overall, then, it seems clear that Leveson believes that both organizational features and technical system characteristics are part of the systems that created the possibility for accidents like Bhopal, Fukushima, and Three Mile Island. Her own accident model designed to help identify causes of accidents, STAMP (Systems-Theoretic Accident Model and Processes) emphasizes both kinds of system properties.

Using this new causality model … changes the emphasis in system safety from preventing failures to enforcing behavioral safety constraints. Component failure accidents are still included, but or conception of causality is extended to include component interaction accidents. Safety is reformulated as a control problem rather than a reliability problem. (kl 1062)

In this framework, understanding why an accident occurred requires determining why the control was ineffective. Preventing future accidents requires shifting from a focus on preventing failures to the broader goal of designing and implementing controls that will enforce the necessary constraints. (kl 1084)

 Leveson’s brief analysis of the Bhopal disaster in 1984 (kl 384 ff.) emphasizes the organizational dysfunctions that led to the accident — and that were completely ignored by the Indian state’s accident investigation of the accident: out-of-service gauges, alarm deficiencies, inadequate response to prior safety audits, shortage of oxygen masks, failure to inform the police or surrounding community of the accident, and an environment of cost cutting that impaired maintenance and staffing. “When all the factors, including indirect and systemic ones, are considered, it becomes clear that the maintenance worker was, in fact, only a minor and somewhat irrelevant player in the loss. Instead, degradation in the safety margin occurred over time and without any particular single decision to do so but simply as a series of decisions that moved the plant slowly toward a situation where any slight error would lead to a major accident” (kl 447).

Patient safety

An issue which is of concern to anyone who receives treatment in a hospital is the topic of patient safety. How likely is it that there will be a serious mistake in treatment — wrong-site surgery, incorrect medication or radiation dose, exposure to a hospital-acquired infection? The current evidence is alarming. (Martin Makary et al estimate that over 250,000 deaths per year result from medical mistakes — making medical error now the third leading cause of mortality in the United States (link).) And when these events occur, where should we look for assigning responsibility — at the individual providers, at the systems that have been implemented for patient care, at the regulatory agencies responsible for overseeing patient safety?

Medical accidents commonly demonstrate a complex interaction of factors, from the individual provider to the technologies in use to failures of regulation and oversight. We can look at a hospital as a place where caring professionals do their best to improve the health of their patients while scrupulously avoiding errors. Or we can look at it as an intricate system involving the recording and dissemination of information about patients; the administration of procedures to patients (surgery, medication, radiation therapy). In this sense a hospital is similar to a factory with multiple intersecting locations of activity. Finally, we can look at it as an organization — a system of division of labor, cooperation, and supervision by large numbers of staff whose joint efforts lead to health and accidents alike. Obviously each of these perspectives is partially correct. Doctors, nurses, and technicians are carefully and extensively trained to diagnose and treat their patients. The technology of the hospital — the digital patient record system, the devices that administer drugs, the surgical robots — can be designed better or worse from a safety point of view. And the social organization of the hospital can be effective and safe, or it can be dysfunctional and unsafe. So all three aspects are relevant both to safe operations and the possibility of chronic lack of safety.

So how should we analyze the phenomenon of patient safety? What factors can be identified that distinguish high safety hospitals from low safety? What lessons can be learned from the study of accidents and mistakes that cumulatively lead to a hospitals patient safety record?

The view that primarily emphasizes expertise and training of individual practitioners is very common in the healthcare industry, and yet this approach is not particularly useful as a basis for improving the safety of healthcare systems. Skill and expertise are necessary conditions for effective medical treatment; but the other two zones of accident space are probably more important for reducing accidents — the design of treatment systems and the organizational features that coordinate the activities of the various individuals within the system.

Dr. James Bagian is a strong advocate for the perspective of treating healthcare institutions as systems. Bagian considers both technical systems characteristics of processes and the organizational forms through which these processes are carried out and monitored. And he is very skilled at teasing out some of the ways in which features of both system and organization lead to avoidable accidents and failures. I recall his description of a safety walkthrough he had done in a major hospital. He said that during the tour he noticed a number of nurses’ stations which were covered with yellow sticky notes. He observed that this is both a symptom and a cause of an accident-prone organization. It means that individual caregivers were obligated to remind themselves of tasks and exceptions that needed to be observed. Far better was to have a set of systems and protocols that made sticky notes unnecessary. Here is the abstract from a short summary article by Bagian on the current state of patient safety:

Abstract

The traditional approach to patient safety in health care has ranged from reticence to outward denial of serious flaws. This undermines the otherwise remarkable advances in technology and information that have characterized the specialty of medical practice. In addition, lessons learned in industries outside health care, such as in aviation, provide opportunities for improvements that successfully reduce mishaps and errors while maintaining a standard of excellence. This is precisely the call in medicine prompted by the 1999 Institute of Medicine report “To Err Is Human: Building a Safer Health System.” However, to effect these changes, key components of a successful safety system must include: (1) communication, (2) a shift from a posture of reliance on human infallibility (hence “shame and blame”) to checklists that recognize the contribution of the system and account for human limitations, and (3) a cultivation of non-punitive open and/or de-identified/anonymous reporting of safety concerns, including close calls, in addition to adverse events.

(Here is the Institute of Medicine study to which Bagian refers; link.)

Nancy Leveson is an aeronautical and software engineer who has spent most of her career devoted to designing safe systems. Her book Engineering a Safer World: Systems Thinking Applied to Safety is a recent presentation of her theories of systems safety. She applies these approaches to problems of patient safety with several co-authors in “A Systems Approach to Analyzing and Preventing Hospital Adverse Events” (link). Here is the abstract and summary of findings for that article:

Objective:

This study aimed to demonstrate the use of a systems theory-based accident analysis technique in health care applications as a more powerful alternative to the chain-of-event accident models currently underpinning root cause analysis methods.

Method:

A new accident analysis technique, CAST [Causal Analysis based on Systems Theory], is described and illustrated on a set of adverse cardiovascular surgery events at a large medical center. The lessons that can be learned from the analysis are compared with those that can be derived from the typical root cause analysis techniques used today.

Results:

The analysis of the 30 cardiovascular surgery adverse events using CAST revealed the reasons behind unsafe individual behavior, which were related to the design of the system involved and not negligence or incompetence on the part of individuals. With the use of the system-theoretic analysis results, recommendations can be generated to change the context in which decisions are made and thus improve decision making and reduce the risk of an accident.

Conclusions:

The use of a systems-theoretic accident analysis technique can assist in identifying causal factors at all levels of the system without simply assigning blame to either the frontline clinicians or technicians involved. Identification of these causal factors in accidents will help health care systems learn from mistakes and design system-level changes to prevent them in the future.

Crucial in this article is this research group’s effort to identify causes “at all levels of the system without simply assigning blame to either the frontline clinicians or technicians involved”. The key result is this: “The analysis of the 30 cardiovascular surgery adverse events using CAST revealed the reasons behind unsafe individual behavior, which were related to the design of the system involved and not negligence or incompetence on the part of individuals.”

Bagian, Leveson, and others make a crucial point: in order to substantially increase the performance of hospitals and the healthcare system more generally when it comes to patient safety, it will be necessary to extend the focus of safety analysis from individual incidents and agents to the systems and organizations through which these accidents were possible. In other words, attention to systems and organizations is crucial if we are to significantly reduce the frequency of medical and hospital mistakes.

(The Makary et al estimate of 250,000 deaths caused by medical error has been questioned on methodological grounds. See Aaron Carroll’s thoughtful rebuttal (NYT 8/15/16; link).)

Safety culture or safety behavior?

Andrew Hopkins is a much-published expert on industrial safety who has an important set of insights into the causes of industrial accidents. Much of his career has focused on the oil and gas industry, but he has written on other sectors as well. Particularly interesting are several books: Failure to Learn: The BP Texas City Refinery DisasterDisastrous Decisions: The Human and Organisational Causes of the Gulf of Mexico Blowout; and Lessons from Longford: The ESSO Gas Plant Explosion. He also provides a number of interesting working papers here.

One of his interesting working papers is on the topic of safety culture in the drilling industry, “Why safety cultures don’t work” (link).

Companies that set out to create a “safety culture” often expend huge amounts of resource trying to change the way operatives, foremen and supervisory staff think and feel about safety. The results are often disappointing.(1)

Changing the way people think is nigh impossible, but setting up organizational structures that monitor compliance with procedure, even if that procedure is seen as redundant or unnecessary, is doable. (3)

Hopkins’ central point is that safety requires change of routine behavior, not in the first instance change of culture or thought. This means that management and regulatory agencies need to establish safe practices and then enforce compliance through internal and external measures. He uses the example of seat belt usage: campaigns to encourage the use of seat belts had little effect, but behavior changed when fines were imposed on drivers who continued to refrain from seat belt usage.

His central focus here, as in most of his books, is on the processes involved in the drilling industry. He makes the point that the incentives that are established in oil and gas drilling are almost entirely oriented towards maximizing speed and production. Exhortations towards “safe practices” are ineffectual in this context.

Much of his argument here comes down to the contrast between high-likelihood, low-harm accidents and low-likelihood, high-harm accidents. The steps required to prevent low-likelihood, high-harm accidents are generally not visible in the workplace, precisely because the sequences that lead to them are highly uncommon. Routine safety procedures will not reduce the likelihood of occurrence of the high-harm accident.

Hopkins offers the example of the air traffic control industry. The ultimate disaster in air traffic control is a mid-air collision. Very few such incidents have occurred. The incident Hopkins refers to was a mid-air collision over Uberlinger, Germany in 2002. But procedures in air traffic control give absolute priority to preventing such disasters, and the solution is to identify a key precursor event to a mid-air collision and ensure that these precursor events are recorded, investigated, and reacted to when they occur. The relevant precursor event in air traffic control is a proximity of two aircraft at a distance of 1.5 miles or less. The required separation is 2 miles. Air traffic control regulations and processes require a full investigation and reaction for all incidents of separation that occur with 1.5 miles of separation or less. Air traffic control is a high-reliability industry precisely because it gives priority and resources to the prevention, not only of the disastrous incidents themselves, but the the precursors that may lead to them. “This is a clear example of the way a highreliability organization operates. It works out what the most catastrophic event is likely to be, regardless of how rare such events are in recent experience, and devises good indicators of how well the prevention of that catastrophe is being managed. It is a way of thinking that is highly unusual in the oil and gas industry” (2).

The drilling industry does not commonly follow similar high-level safety management. A drilling blowout is the incident of greatest concern in the drilling industry. There are, according to Hopkins, several obvious precursor events to a well blowout: well kicks and cementing failures. It is Hopkins’ contention that safety in the drilling industry would be greatly enhanced (with respect to the catastrophic events that are both low-probability and high-harm) if procedures were reoriented so that priority attention and tracking were given to these kinds of precursor events. By reducing or eliminating the occurrence of the precursor events, major accidents would be prevented.

Another organizational factor that Hopkins highlights is the role that safety officers play within the organization. In high-reliability organizations, safety officers have an organizationally privileged role; in low-reliability organizations their voices seem to disappear in the competition among many managerial voices with other interests (speed, production, public relations). (This point is explored in an earlier post; link.)

Prior to Macondo [the Deepwater Horizon oil spill], BP’s process safety structure was decentralized. The safety experts had very little power. They lacked strong reporting lines to the centre and answered to commercial managers who tended to put production ahead of engineering excellence. After Macondo, BP reversed this. Now, what I call the “voices of safety” are powerful and heard loud and clear in the boardroom. (3)

Ominously, Hopkins makes a prescient point about the crucial role played by regulatory agencies in enhancing safety in high-risk industries.

Many regulatory regimes, however, particularly that of the US, are not functioning as they ought to. Regulators need to be highly skilled and resourced and must be able to match the best minds in industry in order to have competent discussions about the risk-management strategies of the corporations. In the US they’re not doing that yet. The best practice recognized worldwide is the safety case regime, in use in UK and Norway. (4)

Given the militantly anti-regulatory stance of the current US federal administration and the aggressive lack of attention its administrators pay to scientific and technical expertise, this is a very sobering source of worry about the future of industrial, chemical, and nuclear safety in the US.

Empowering the safety officer?

How can industries involving processes that create large risks of harm for individuals or populations be modified so they are more capable of detecting and eliminating the precursors of harmful accidents? How can nuclear accidents, aviation crashes, chemical plant explosions, and medical errors be reduced, given that each of these activities involves large bureaucratic organizations conducting complex operations and with substantial inter-system linkages? How can organizations be reformed to enhance safety and to minimize the likelihood of harmful accidents?

One of the lessons learned from the Challenger space shuttle disaster is the importance of a strongly empowered safety officer in organizations that deal in high-risk activities. This means the creation of a position dedicated to ensuring safe operations that falls outside the normal chain of command. The idea is that the normal decision-making hierarchy of a large organization has a built-in tendency to maintain production schedules and avoid costly delays. In other words, there is a built-in incentive to treat safety issues with lower priority than most people would expect.

If there had been an empowered safety officer in the launch hierarchy for the Challenger launch in 1986, there is a good chance this officer would have listened more carefully to the Morton-Thiokol engineering team’s concerns about low temperature damage to O-rings and would have ordered a halt to the launch sequence until temperatures in Florida raised to the critical value. The Rogers Commission faulted the decision-making process leading to the launch decision in its final report on the accident (The Report of the Presidential Commission on the Space Shuttle Challenger Accident – The Tragedy of Mission 51-L in 1986 – Volume OneVolume TwoVolume Three).

This approach is productive because empowering a safety officer creates a different set of interests in the management of a risky process. The safety officer’s interest is in safety, whereas other decision makers are concerned about revenues and costs, public relations, reputation, and other instrumental goods. So a dedicated safety officer is empowered to raise safety concerns that other officers might be hesitant to raise. Ordinary bureaucratic incentives may lead to underestimating risks or concealing faults; so lowering the accident rate requires giving some individuals the incentive and power to act effectively to reduce risks.

Similar findings have emerged in the study of medical and hospital errors. It has been recognized that high-risk activities are made less risky by empowering all members of the team to call a halt in an activity when they perceive a safety issue. When all members of the surgical team are empowered to halt a procedure when they note an apparent error, serious operating-room errors are reduced. (Here is a report from the American College of Obstetricians and Gynecologists on surgical patient safety; link. And here is a 1999 National Academy report on medical error; link.)

The effectiveness of a team-based approach to safety depends on one central fact. There is a high level of expertise embodied in the staff operating a surgical suite, an engineering laboratory, or a drug manufacturing facility. By empowering these individuals to stop a procedure when they judge there is an unrecognized error in play, this greatly extend the amount of embodied knowledge involved in a process. The surgeon, the commanding officer, or the lab director is no longer the sole expert whose judgments count.

But it also seems clear that these innovations don’t work equally well in all circumstances. Take nuclear power plant operations. In Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima James Mahaffey documents multiple examples of nuclear accidents that resulted from the efforts of mid-level workers to address an emerging problem in an improvised way. In the case of nuclear power plant safety, it appears that the best prescription for safety is to insist on rigid adherence to pre-established protocols. In this case the function of a safety officer is to monitor operations to ensure protocol conformance — not to exercise independent judgment about the best way to respond to an unfavorable reactor event.

It is in fact an interesting exercise to try to identify the kinds of operations in which these innovations are likely to be effective.

Here is a fascinating interview in Slate with Jim Bagian, a former astronaut, one-time director of the Veteran Administration’s National Center for Patient Safety, and distinguished safety expert; link. Bagian emphasizes the importance of taking a system-based approach to safety. Rather than focusing on finding blame for specific individuals whose actions led to an accident, Bagian emphasizes the importance of tracing back to the institutional, organizational, or logistic background of the accident. What can be changed in the process — of delivering medications to patients, of fueling a rocket, or of moving nuclear solutions around in a laboratory — that make the likelihood of an accident substantially lower?

The safety principles involved here seem fairly simple: cultivate a culture in which errors and near-misses are reported and investigated without blame; empower individuals within risky processes to halt the process if their expertise and experience indicates the possibility of a significant risky error; create individuals within organizations whose interests are defined in terms of the identification and resolution of unsafe practices or conditions; and share information about safety within the industry and with the public.

Nuclear accidents

 
diagrams: Chernobyl reactor before and after
 

Nuclear fission is one of the world-changing discoveries of the mid-twentieth century. The atomic bomb projects of the United States led to the atomic bombing of Japan in August 1945, and the hope for limitless electricity brought about the proliferation of a variety of nuclear reactors around the world in the decades following World War II. And, of course, nuclear weapons proliferated to other countries beyond the original circle of atomic powers.

Given the enormous energies associated with fission and the dangerous and toxic properties of radioactive components of fission processes, the possibility of a nuclear accident is a particularly frightening one for the modern public. The world has seen the results of several massive nuclear accidents — Chernobyl and Fukushima in particular — and the devastating results they have had on human populations and the social and economic wellbeing of the regions in which they occurred.

Safety is therefore a paramount priority in the nuclear industry, both in research labs and military and civilian applications. So what is the situation of safety in the nuclear sector? Jim Mahaffey’s Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima is a detailed and carefully researched attempt to answer this question. And the information he provides is not reassuring. Beyond the celebrated and well-known disasters at nuclear power plants (Three Mile Island, Chernobyl, Fukushima), Mahaffey refers to hundreds of accidents involving reactors, research laboratories, weapons plants, and deployed nuclear weapons that have had less public awareness. These accidents resulted in a very low number of lives lost, but their frequency is alarming. They are indeed “normal accidents” (Perrow, Normal Accidents: Living with High-Risk Technologies. For example:

  • a Japanese fishing boat is contaminated by fallout from Castle Bravo test of hydrogen bomb; lots of radioactive fish at the markets in Japan (March 1, 1954) (kl 1706)
  • one MK-6 atomic bomb is dropped on Mars Bluff, South Carolina, after a crew member accidentally pulled the emergency bomb release handle (February 5, 1958) (kl 5774)
  • Fermi 1 liquid sodium plutonium breeder reactor experiences fuel meltdown during startup trials near Detroit (October 4, 1966) (kl 4127)

Mahaffey also provides detailed accounts of the most serious nuclear accidents and meltdowns during the past forty years, Three Mile Island, Chernobyl, and Fukushima.

The safety and control of nuclear weapons is of particular interest. Here is Mahaffey’s summary of “Broken Arrow” events — the loss of atomic and fusion weapons:

Did the Air Force ever lose an A-bomb, or did they just misplace a few of them for a short time? Did they ever drop anything that could be picked up by someone else and used against us? Is humanity going to perish because of poisonous plutonium spread that was snapped up by the wrong people after being somehow misplaced? Several examples will follow. You be the judge. 

Chuck Hansen [

U.S. Nuclear Weapons – The Secret History

] was wrong about one thing. He counted thirty-two “Broken Arrow” accidents. There are now sixty-five documented incidents in which nuclear weapons owned by the United States were lost, destroyed, or damaged between 1945 and 1989. These bombs and warheads, which contain hundreds of pounds of high explosive, have been abused in a wide range of unfortunate events. They have been accidentally dropped from high altitude, dropped from low altitude, crashed through the bomb bay doors while standing on the runway, tumbled off a fork lift, escaped from a chain hoist, and rolled off an aircraft carrier into the ocean. Bombs have been abandoned at the bottom of a test shaft, left buried in a crater, and lost in the mud off the coast of Georgia. Nuclear devices have been pounded with artillery of a foreign nature, struck by lightning, smashed to pieces, scorched, toasted, and burned beyond recognition. Incredibly, in all this mayhem, not a single nuclear weapon has gone off accidentally, anywhere in the world. If it had, the public would know about it. That type of accident would be almost impossible to conceal. (kl 5527)

There are a few common threads in the stories of accident and malfunction that Mahaffey provides. First, there are failures of training and knowledge on the part of front-line workers. The physics of nuclear fission are often counter-intuitive, and the idea of critical mass does not fully capture the danger of a quantity of fissionable material. The geometry of the storage of the material makes a critical difference in going critical. Fissionable material is often transported and manipulated in liquid solution; and the shape and configuration of the vessel in which the solution is held makes a difference to the probability of exponential growth of neutron emission — leading to runaway fission of the material. Mahaffey documents accidents that occurred in nuclear materials processing plants that resulted from plant workers applying what they knew from industrial plumbing to their efforts to solve basic shop-floor problems. All too often the result was a flash of blue light and the release of a great deal of heat and radioactive material.

Second, there is a fault at the opposite end of the knowledge spectrum — the tendency of expert engineers and scientists to believe that they can solve complicated reactor problems on the fly. This turned out to be a critical problem at Chernobyl (kl 6859).

The most difficult problem to handle is that the reactor operator, highly trained and educated with an active and disciplined mind, is liable to think beyond the rote procedures and carefully scheduled tasks. The operator is not a computer, and he or she cannot think like a machine. When the operator at NRX saw some untidy valve handles in the basement, he stepped outside the procedures and straightened them out, so that they were all facing the same way. (kl 2057)

There are also clear examples of inappropriate supervision in the accounts shared by Mahaffey. Here is an example from Chernobyl.

[Deputy chief engineer] Dyatlov was enraged. He paced up and down the control panel, berating the operators, cursing, spitting, threatening, and waving his arms. He demanded that the power be brought back up to 1,500 megawatts, where it was supposed to be for the test. The operators, Toptunov and Akimov, refused on grounds that it was against the rules to do so, even if they were not sure why. 

Dyatlov turned on Toptunov. “You lying idiot! If you don’t increase power, Tregub will!”  

Tregub, the Shift Foreman from the previous shift, was officially off the clock, but he had stayed around just to see the test. He tried to stay out of it. 

Toptunov, in fear of losing his job, started pulling rods. By the time he had wrestled it back to 200 megawatts, 205 of the 211 control rods were all the way out. In this unusual condition, there was danger of an emergency shutdown causing prompt supercriticality and a resulting steam explosion. At 1: 22: 30 a.m., a read-out from the operations computer advised that the reserve reactivity was too low for controlling the reactor, and it should be shut down immediately. Dyatlov was not worried. “Another two or three minutes, and it will be all over. Get moving, boys! (kl 6887)

This was the turning point in the disaster.

A related fault is the intrusion of political and business interests into the design and conduct of high-risk nuclear actions. Leaders want a given outcome without understanding the technical details of the processes they are demanding; subordinates like Toptunov are eventually cajoled or coerced into taking the problematic actions. The persistence of advocates for liquid sodium breeder reactors represents a higher-level example of the same fault. Associated with this role of political and business interests is an impulse towards secrecy and concealment when accidents occur and deliberate understatement of the public dangers created by an accident — a fault amply demonstrated in the Fukushima disaster.

Atomic Accidents provides a fascinating history of events of which most of us are unaware. The book is not primarily intended to offer an account of the causes of these accidents, but rather the ways in which they unfolded and the consequences they had for human welfare. (Generally speaking his view is that nuclear accidents in North America and Western Europe have had remarkably few human casualties.) And many of the accidents he describes are exactly the sorts of failures that are common in all largescale industrial and military processes.

(Largescale technology failure has come up frequently here. See these posts for analysis of some of the organizational causes of technology failure (link, link, link).)

Thinking about disaster

 

Charles Perrow is a very talented sociologist who has put his finger on some of the central weaknesses of the American social-economic-political system.  He has written about corporations (Organizing America: Wealth, Power, and the Origins of Corporate Capitalism), technology failure (Normal Accidents: Living with High-Risk Technologies), and organizations (Complex Organizations: A Critical Essay).  (Here is an earlier post on his historical account of the corporation in America; link.) These sound like very different topics — but they’re not, really.  Organizations, power, the conflict between private interests and the public good, and the social and technical causes of great public harms have been the organizing themes of his research for a very long time.

His current book is truly scary.  In The Next Catastrophe: Reducing Our Vulnerabilities to Natural, Industrial, and Terrorist Disasters he carefully surveys the conjunction of factors that make 21st-century America almost uniquely vulnerable to major disasters — actual and possible.  Hurricane Katrina is one place to start — a concentration of habitation, dangerous infrastructure, vulnerable toxic storage, and wholly inadequate policies of water and land use led to a horrific loss of life and a permanent crippling of a great American city.  The disaster was foreseeable and foreseen, and yet few effective steps were taken to protect the city and river system from catastrophic flooding.  And even more alarming — government and the private sector have taken almost none of the prudent steps after the disaster that would mitigate future flooding.

Perrow’s analysis includes natural disasters (floods, hurricanes, earthquakes), nuclear power plants, chemical plants, the electric power transmission infrastructure, and the Internet — as well as the threat of deliberate attacks by terrorists against high-risk targets.   In each case he documents the extreme risks that our society faces from a combination of factors: concentration of industry and population, lax regulation, ineffective organizations of management and oversight, and an inability on the part of Congress to enact legislation that seriously interferes with the business interests of major corporations even for the purpose of protecting the public.

His point is a simple one: we can’t change the weather, the physics of nuclear power, or the destructive energy contained in an LNG farm; but we can take precautions today that significantly reduce the possible effects of accidents caused by these factors in the future. His general conclusion is a very worrisome one: our society is essentially unprotected from major natural disasters and industrial accidents, and we have only very slightly increased our safety when it comes to preventing deliberate terrorist attacks.

This book has been about the inevitable inadequacy of our efforts to protect us from major disasters. It locates the inevitable inadequacy in the limitations of formal organizations. We cannot expect them to do an adequate job in protecting us from mounting natural, industrial, and terrorist disasters.  It locates the avoidable inadequacy of our efforts in our failure to reduce the size of the targets, and thus minimize the extent of harm these disasters can do. (chapter 9)

A specific failure in our current political system is the failure to construct an adequate and safety-enhancing system of regulation:

Stepping outside of the organization itself, we come to a third source of organizational failure, that of regulation. Every chapter on disasters in this book has ended with a call for better regulation and re-regulation, since we need both new regulations in the face of new technologies and threats and the restoration of past regulations that had disappeared or been weakened since the 1960s and 1970s. (chapter 9)

The central vulnerabilities that Perrow points to are systemic and virtually ubiquitous across the United States — concentration and centralization.  He is very concerned about the concentration of people in high-risk areas (flood and earthquake zones, for example); he is concerned about the centralized power wielded by mega-organizations and corporations in our society; and he is concerned about the concentration of highly dangerous infrastructure in places where it puts large populations at risk.  He refers repeatedly to the risk posed by the transport by rail of huge quantities of chlorine gas through densely populated areas — 90 tons at a time; the risk presented by LNG and propane storage farms in areas vulnerable to flooding and consequent release or explosion; the lethal consequences that would ensue from a winter-time massive failure of the electric power grid.

Perrow is an organizational expert; and he recognizes the deep implications that follow from the inherent obstacles that confront large organizations, both public or private.  Co-optation by powerful private interests, failure of coordination among agencies, lack of effective communication in the preparation of policies and emergency responses — these organizational tendencies can reduce organizations like FEMA or the NRC to almost complete inability to perform their public functions.

Organizations, as I have often noted, are tools that can be used by those within and without them for purposes that have little to do with their announced goals. (Kindle loc, 1686)

Throughout the book Perrow offers careful, detailed reviews of the effectiveness and consistency of the government agencies and the regulatory legislation that have been deployed to contain these risks.  Why was FEMA such an organizational failure?  What’s wrong with the Department of Homeland Security?  Why are chronic issues of system safety in nuclear power plants and chemical plants not adequately addressed by the corresponding regulatory agencies?  Perrow goes through these examples in great detail and demonstrates the very ordinary social mechanisms through which organizations lose effectiveness.  The book serves as a case-study review of organizational failures.

Perrow’s central point is stark: the American political system lacks the strength to take the long-term steps it needs to in order to mitigate the worst effects of natural (or intentional) disasters that are inevitable in our future.  We need consistent investment for long-term benefits; we need effective regulation of powerful actors; and we need long-term policies that mitigate future disasters.  But so far we have failed in each of these areas.  Private interests are too strong, an ideology of free choice and virtually unrestrained use of property leads to dangerous residential and business development, and Federal and state agencies lack the political will to enact the effective regulations that would be necessary to raise the safety threshold in dangerous industries and developments. And, of course, the determined attack on “government regulations” that has been underway from the right since the Reagan years just further worsens the ability of agencies to regulate these powerful businesses — the nuclear power industry, the chemical industry, the oil and gas industry, …

One might think that the risks that Perrow describes are fairly universal across modern societies.  But Perrow notes that these problems seem more difficult and fundamental in the United States than in Europe.  The Netherlands has centuries of experience in investing in and regulating developments having to do with the control of water; European countries have managed to cooperate on the management of rivers and flood plains; and most have much stronger regulatory regimes for the high risk technologies and infrastructure sectors.

The book is scary, and we need to pay attention to the social and natural risks that Perrow documents so vividly.  And we need collectively to take steps to realistically address these risks.  We need to improve the organizations we create, both public and private, aimed at mitigating large risks.  And we need to substantially improve upon the reach and effectiveness of the regulatory systems that govern these activities.  But Perrow insists that improving organizations and leadership, and creating better regulations, can only take us so far.  So we also need to reduce the scope of damage that will occur when disaster strikes.  We need to design our social system for “soft landings” when disasters occur.  Fundamentally, his advice is to decentralize dangerous infrastructure and to be much more cautious about development in high-risk zones.

Given the limited success we can expect from organizational, executive, and regulatory reform, we should attend to reducing the damage that organizations can do by reducing their size.  Smaller organizations have a smaller potential for harm, just as smaller concentrations of populations in areas vulnerable to natural, industrial, and terrorist disasters present smaller targets. (chapter 9)

If owners assume more responsibility for decisions about design and location — for example, by being required to purchase realistically priced flood or earthquake insurance — then there would be less new construction in hurricane alleyways or high-risk earthquake areas.  Rather than integrated mega-organizations and corporations providing goods and services, Perrow argues for the effectiveness of networks of small firms.  And he argues that regulations and law can be designed that give the right incentives to developers and home buyers about where to locate their businesses and homes, reflecting the true costs associated with risky locations. Realistically priced mandatory flood insurance would significantly alter the population density in hurricane alleys.  And our policies and regulations should make a systematic effort to disperse dangerous concentrations of industrial and nuclear materials wherever possible.

 

System safety engineering and the Deepwater Horizon

The Deepwater Horizon oil rig explosion, fire, and uncontrolled release of oil into the Gulf is a disaster of unprecedented magnitude.  This disaster in the Gulf of Mexico appears to be more serious in objective terms than the Challenger space shuttle disaster in 1986 — in terms both of immediate loss of life and in terms of overall harm created. And sadly, it appears likely that the accident will reveal equally severe failures of management of enormously hazardous processes, defects in the associated safety engineering analysis, and inadequacies of the regulatory environment within which the activity took place.  The Challenger disaster fundamentally changed the ways that we thought about safety in the aerospace field.  It is likely that this disaster too will force radical new thinking and new procedures concerning how to deal with the inherently dangerous processes associated with deep-ocean drilling.

Nancy Leveson is an important expert in the area of systems safety engineering, and her book, Safeware: System Safety and Computers, is a genuinely important contribution.  Leveson led the investigation of the role that software design might have played in the Challenger disaster (link).  Here is a short, readable white paper of hers on system safety engineering (link) that is highly relevant to the discussions that will need to occur about deep-ocean drilling.  The paper does a great job of laying out how safety has been analyzed in several high-hazard industries, and presents a set of basic principles for systems safety design.  She discusses aviation, the nuclear industry, military aerospace, and the chemical industry; and she points out some important differences across industries when it comes to safety engineering.  Here is an instructive description of the safety situation in military aerospace in the 1950s and 1960s:

Within 18 months after the fleet of 71 Atlas F missiles became operational, four blew up in their silos during operational testing. The missiles also had an extremely low launch success rate.  An Air Force manual describes several of these accidents: 

     An ICBM silo was destroyed because the counterweights, used to balance the silo elevator on the way up and down in the silo, were designed with consideration only to raising a fueled missile to the surface for firing. There was no consideration that, when you were not firing in anger, you had to bring the fueled missile back down to defuel. 

     The first operation with a fueled missile was nearly successful. The drive mechanism held it for all but the last five feet when gravity took over and the missile dropped back. Very suddenly, the 40-foot diameter silo was altered to about 100-foot diameter. 

     During operational tests on another silo, the decision was made to continue a test against the safety engineer’s advice when all indications were that, because of high oxygen concentrations in the silo, a catastrophe was imminent. The resulting fire destroyed a missile and caused extensive silo damage. In another accident, five people were killed when a single-point failure in a hydraulic system caused a 120-ton door to fall. 

     Launch failures were caused by reversed gyros, reversed electrical plugs, bypass of procedural steps, and by management decisions to continue, in spite of contrary indications, because of schedule pressures. (from the Air Force System Safety Handbook for Acquisition Managers, Air Force Space Division, January 1984)

Leveson’s illustrations from the history of these industries are fascinating.  But even more valuable are the principles of safety engineering that she recapitulates.  These principles seem to have many implications for deep-ocean drilling and associated technologies and systems.  Here is her definition of systems safety:

System safety uses systems theory and systems engineering approaches to prevent foreseeable accidents and to minimize the result of unforeseen ones.  Losses in general, not just human death or injury, are considered. Such losses may include destruction of property, loss of mission, and environmental harm. The primary concern of system safety is the management of hazards: their identification, evaluation, elimination, and control through analysis, design and management procedures.

Here are several fundamental principles of designing safe systems that she discusses:
  • System safety emphasizes building in safety, not adding it on to a completed design.
  • System safety deals with systems as a whole rather than with subsystems or components.
  • System safety takes a larger view of hazards than just failures.
  • System safety emphasizes analysis rather than past experience and standards.
  • System safety emphasizes qualitative rather than quantitative approaches.
  • Recognition of tradeoffs and conflicts.
  • System safety is more than just system engineering.

And here is an important summary observation about the complexity of safe systems:

Safety is an emergent property that arises at the system level when components are operating together. The events leading to an accident may be a complex combination of equipment failure, faulty maintenance, instrumentation and control problems, human actions, and design errors. Reliability analysis considers only the possibility of accidents related to failures; it does not investigate potential damage that could result from successful operation of the individual components.

How do these principles apply to the engineering problem of deep-ocean drilling?  Perhaps the most important implications are these: a safe system needs to be based on careful and comprehensive analysis of the hazards that are inherently involved in the process; it needs to be designed with an eye to handling those hazards safely; and it can’t be done in a piecemeal, “fly-test-fly” fashion.

It would appear that deep-ocean drilling is characterized by too little analysis and too much confidence in the ability of engineers to “correct” inadvertent outcomes (“fly-fix-fly”).  The accident that occurred in the Gulf last month can be analyzed into several parts. First is the explosion and fire that destroyed the drilling rig and led to the tragic loss of life of 11 rig workers. And the second is the uncalculated harms caused by the uncontrolled venting of perhaps a hundred thousand barrels of crude oil to date into the Gulf of Mexico, now threatening the coasts and ecologies of several states.  Shockingly, there is now no high-reliability method for capping the well at a depth of over 5,000 feet; so the harm can continue to worsen for a very extended period of time.

The safety systems on the platform itself will need to be examined in detail. But the bottom line will probably look something like this: the platform is a complex system vulnerable to explosion and fire, and there was always a calculable (though presumably small) probability of catastrophic fire and loss of the ship. This is pretty analogous to the problem of safety in aircraft and other complex electro-mechanical systems. The loss of life in the incident is terrible but confined.  Planes crash and ships sink.

What elevates this accident to a globally important catastrophe is what happened next: destruction of the pipeline leading from the wellhead 5,000 feet below sea level to containers on the surface; and the failure of the shutoff valve system on the ocean floor. These two failures have resulted in unconstrained release of a massive and uncontrollable flow of crude oil into the Gulf and the likelihood of environmental harms that are likely to be greater than the Exxon Valdez.

Oil wells fail on the surface, and they are difficult to control. But there is a well-developed technology that teams of oil fire specialists like Red Adair employ to cap the flow and end the damage. We don’t have anything like this for wells drilled under water at the depth of this incident; this accident is less accessible than objects in space for corrective intervention. So surface well failures conform to a sort of epsilon-delta relationship: an epsilon accident leads to a limited delta harm. This deep-ocean well failure in the Gulf is catastrophically different: the relatively small incident on the surface is resulting in an unbounded and spiraling harm.

So was this a foreseeable hazard? Of course it was. There was always a finite probability of total loss of the platform, leading to destruction of the pipeline. There was also a finite probability of failure of the massive sea-floor emergency shutoff valve. And, critically, it was certainly known that there is no high-reliability fix in the event of failure of the shutoff valve. The effort to use the dome currently being tried by BP is untested and unproven at this great depth. The alternative of drilling a second well to relieve pressure may work; but it will take weeks or months. So essentially, when we reach the end of this failure pathway, we arrive at this conclusion: catastrophic, unbounded failure. If you reach this point in the fault tree, there is almost nothing to be done. And this is a totally irrational outcome to tolerate; how could any engineer or regulatory agency have accepted the circumstances of this activity, given that one possible failure pathway would lead predictably to unbounded harms?

There is one line of thought that might have led to the conclusion that deep ocean drilling is acceptably safe: engineers and policy makers might have optimistically overestimated the reliability of the critical components. If we estimate that the probability of failure of the platform is 1/1000, failure of the pipeline is 1/100, and failure of the emergency shutoff valve is 1/10,000 — then one might say that the probability of the nightmare scenario is vanishingly small: one in a billion. Perhaps one might reason that we can disregard scenarios with this level of likelihood. Reasoning very much like this was involved in the original safety designs of the shuttle (Safeware: System Safety and Computers). But several things are now clear: this disaster was not virtually impossible. In fact, it actually occurred. And second, it seems likely enough that the estimates of component failure are badly understated.

What does this imply about deep ocean drilling? It seems inescapable that the current state of technology does not permit us to take the risk of this kind of total systems failure. Until there is a reliable and reasonably quick technology for capping a deep-ocean well, the small probability of this kind of failure makes the use of the technology entirely unjustifiable. It makes no sense at all to play Russian roulette when the cost of failure is massive and unconstrained ecological damage.

There is another aspect of this disaster that needs to be called out, and that is the issue of regulation. Just as the nuclear industry requires close, rigorous regulation and inspection, so deep-ocean drilling must be rigorously regulated. The stakes are too high to allow the oil industry to regulate itself. And unfortunately there are clear indications of weak regulation in this industry (link).

(Here are links to a couple of earlier posts on safety and technology failure (link, link).)

Patient safety — Canada and France


Patient safety is a key issue in managing and assessing a regional or national health system. There are very sizable variations in patient safety statistics across hospitals, with significantly higher rates of infection and mortality in some institutions than others. Why is this? And what can be done in order to improve the safety performance of low-safety institutions, and to improve the overall safety performance of the hospital environment nationally?

Previous posts have made the point that safety is the net effect of a complex system within a hospital or chemical plant, including institutions, rules, practices, training, supervision, and day-to-day behavior by staff and supervisors (post, post). And experts on hospital safety agree that improvements in safety require careful analysis of patient processes in order to redesign processes so as to make infections, falls, improper medications, and unnecessary mortality less likely. Institutional design and workplace culture have to change if safety performance is to improve consistently and sustainably. (Here is a posting providing a bit more discussion of the institutions of a hospital; post.)

But here is an important question: what are the features of the social and legal environment that will make it most likely that hospital administrators will commit themselves to a thorough-going culture and management of safety? What incentives or constraints need to exist to offset the impulses of cost-cutting and status quo management that threaten to undermine patient safety? What will drive the institutional change in a health system that improving patient safety requires?

Several measures seem clear. One is state regulation of hospitals. This exists in every state; but the effectiveness of regulatory regimes varies widely across context. So understanding the dynamics of regulation and enforcement is a crucial step to improving hospital quality and patient safety. The oversight of rigorous hospital accreditation agencies is another important factor for improvement. For example, the Joint Commission accredits thousands of hospitals in the United States (web page) through dozens of accreditation and certification programs. Patient safety is the highest priority underlying Joint Commission standards of accreditation. So regulation and the formulation of standards are part of the answer. But a particularly important policy tool for improving safety performance is the mandatory collection and publication of safety statistics, so that potential patients can decide between hospitals on the basis of their safety performance. Publicity and transparency are crucial parts of good management behavior; and secrecy is a refuge of poor performance in areas of public concern such as safety, corruption, or rule-setting. (See an earlier post on the relationship between publicity and corruption.)

But here we have a little bit of a conundrum: achieving mandatory publication of safety statistics is politically difficult, because hospitals have a business interest in keeping these data private. So there was a lot of resistance to mandatory reporting of basic patient safety data in the US over the past twenty years. Fortunately, the public interest in having these data readily available has largely prevailed, and hospitals are now required to publish a broader and broader range of data on patient safety, including hospital-induced infection rates, ventilator-induced pneumonias, patient falls, and mortality rates. Here is a useful tool from USA Today that lets the public and the patient gather information about his/her hospital options and how these compare with other hospitals regionally and nationally. This is an effective accountability mechanism that inevitably drives hospitals towards better performance.

Canada has been very active in this area. Here is a website published by the Ontario Ministry of Health and Long-Term Care. The province requires hospitals to report a number of factors that are good indicators of patient safety: several kinds of hospital-born infections; central-line primary bloodstream infection and ventilator-associated pneumonia; surgical-site infection prevention activity; and hospital-standardized mortality ratio. The user can explore the site and find that there are in fact wide variations across hospitals in the province. This is likely to change patient choice; but it also serves as an instant guide for regulatory agencies and local hospital administrators as they attempt to focus attention on poor management practices and institutional arrangements. (It would be helpful for the purpose of comparison if the data could be easily downloaded into a spreadsheet.)

On first principles, it seems likely that any country that has a hospital system in which the safety performance of each hospital is kept secret will also show a wide distribution of patient safety outcomes across institutions, and will have an overall safety record that is much lower than it could be. This is because secrecy gives hospital administrators the ability to conceal the risks their institutions impose on patients through bad practices. So publicity and regular publication of patient safety information seems to be a necessary precondition to maintaining a high-safety hospital system.

But here is the crucial point: many countries continue to permit secrecy when it comes to hospital safety. In particular, this seems to be true in France. It seems that the French medical and hospital system continues to display a very high degree of secrecy and opacity when it comes to patient safety. In fact, anecdotal information about French hospitals suggests a wide range of levels of hospital-born infections in different hospitals. Hospital-born infections (infections nosocomiales) are an important and rising cause of patient illness and morbidity. And there are well-known practices and technologies that substantially reduce the incidence of these infections. But the implementation of these practices requires strong commitment and dedication at the unit level; and this degree of commitment is unlikely to occur in an environment of secrecy.

In fact, I have not been able to discover any of the tools that are now available for measuring patient safety in hospitals in North America in application to hospitals in France. But without this regular reporting, there is no mechanism through which institutions with bad safety performance can be “ratcheted” up into better practices and better safety outcomes. The impression that is given in the French medical system is that the doctors and the medical authorities are sacrosanct; patients are not expected to question their judgment, and the state appears not to require institutions to report and publish fundamental safety information. Patients have very little power and the media so far seem to have paid little attention to the issues of patient safety in French hospitals. This 2007 article in LePoint seems to be a first for France in that it provides quantitative rankings of a large number of hospitals in their treatment of a number of diseases. But it does not provide the kinds of safety information — infections, falls, pneumonias — that are core measures of patient safety.

There is a French state agency, OFFICE NATIONAL D’INDEMNISATION DES ACCIDENTS MÉDICAUX (ONIAM), that provides compensation to patients who can demonstrate that their injuries are the result of hospital-induced causes, including especially hospital-associated infections. But it appears that this agency is restricted to after-the-fact recognition of hospital errors rather than pro-active programs designed to reduce hospital errors. And here is a French government web site devoted to the issue of hospital infections. It announces a multi-pronged strategy for controlling the problem of infections nosocomiales, including the establishment of a national program of surveillance of the rates of these infections. So far, however, I have not been able to locate web resources that would provide hospital-level data about infection rates.

So I am offering a hypothesis that I would be very happy to find to be refuted: that the French medical establishment continues to be bureaucratically administered with very little public exposure of actual performance when it comes to patient safety. And without this system of publicity, it seems very likely that there are wide and tragic variations across French hospitals with regard to patient safety.

Are there French medical sociologists and public health researchers who are working on the issue of patient safety in French hospitals? Can good contemporary French sociologists like Céline Béraud, Baptiste Coulmont, and Philippe Masson offer some guidance on this topic (post)? If readers are aware of databases and patient safety research programs in France that are relevant to these topics, I would be very happy to hear about them.

Update: Baptiste Coulmont (blog) passes on this link to Réseau d’alerte d’investigations et de surveillance des infections nosocomia (RAISIN) within the Institut de veille sanitaire. The site provides research reports and regional assessments of nosocomia incidence. It does not appear to provide data at the level of the specific hospitals and medical centers. Baptiste refers also to work by Jean Peneff, a French medical sociologist and author of La France malade de ses médecins. Here is a link to a subsequent research report by Peneff. Thanks, Baptiste.

Institutions, procedures, norms


One of the noteworthy aspects of the framing offered by Victor Nee and Mary Brinton of the assumptions of the new institutionalism is the very close connection they postulate between institutions and norms. (See the prior posting on this subject). So what is the connection between institutions and norms?

The idea that an institution is nothing more than a collections of norms, formal and informal, seems incomplete on its face. Institutions also depend on rules, procedures, protocols, sanctions, and habits and practices. These other social behavioral factors perhaps intersect in various ways with the workings of social norms, but they are not reducible to a set of norms. And this is to say that institutions are not reducible to a collection of norms.

Consider for example the institutions that embody the patient safety regime in a hospital. What are the constituents of the institutions through which hospitals provide for patient safety? Certainly there are norms, both formal and informal, that are deliberately inculcated and reinforced and that influence the behavior of nurses, pharmacists, technicians, and doctors. But there are also procedures — checklists in operating rooms; training programs — rehearsals of complex crisis activities; routinized behaviors — “always confirm the patient’s birthday before initiating a procedure”; and rules — “physicians must disclose financial relationships with suppliers”. So the institutions defining the management of patient safety are a heterogeneous mix of social factors and processes.

A key feature of an institution, then, is the set of procedures and protocols that it embodies. In fact, we might consider a short-hand way of specifying an institution in terms of the set of procedures it specifies for behavior in stereotyped circumstances of crisis, conflict, cooperation, and mundane interactions with stakeholders. Organizations have usually created specific ways of handling typical situations: handling an intoxicated customer in a restaurant, making sure that no “wrong site” surgeries occur in an operating room, handling the flow of emergency supplies into a region when a large disaster occurs. The idea here is that the performance of the organization, and the individuals within it, will be more effective at achieving the desired goals of the organization if plans and procedures have been developed to coordinate actions in the most effective way possible. This is the purpose of an airline pilot’s checklist before takeoff; it forces the pilot to go through a complete procedure that has been developed for the purpose of avoiding mistakes. Spontaneous, improvised action is sometimes unavoidable; but organizations have learned that they are more effective when they thoughtfully develop procedures for handling their high-risk activities.

This is the point at which the categories of management oversight and staff training come into play. It is one thing to have designed an effective set of procedures for handling a given complex task; but this achievement is only genuinely effective if agents within the organization in fact follow the procedures and protocols. Training is the umbrella activity that describes the processes through which the organization attempts to achieve a high level of shared knowledge about the organization’s procedures. And management oversight is the umbrella activity that describes the processes of supervision and motivation through which the organization attempts to ensure that its agents follow the procedures and protocols.

In fact, one of the central findings in the area of safety research is that the specific content of the procedures of an organization that engages in high-risk activities is crucially important to the overall safety performance of the organization. Apparently small differences in procedure can have an important effect on safety. To take a fairly trivial example, the construction of a stylized vocabulary and syntax for air traffic controllers and pilots increases safety by reducing the possibility of ambiguous communications; so two air traffic systems that were identical except with respect to the issue of standardized communications protocols will be expected to have different safety records. Another key finding falls more on the “norms and culture” side of the equation; it is frequently observed that high-risk organizations need to embody a culture of safety that permeates the whole organization.

We might postulate that norms come into the story when we get to the point of asking what motivates a person to conform to the prescribed procedure or rule — though there are several other social-behavioral mechanisms that work at this level as well (trained habits, well enforced sanctions, for example). But more fundamentally, the explanatory value of the micro-institutional analysis may come in at the level of the details of the procedures and rules in contrast to other possible embodiments — rather than at the level of the question, what makes these procedures effective in most participants’ conduct?

We might say, then, that an institution can be fully specified when we provide information about:

  • the procedures, policies, and protocols it imposes on its participants
  • the training and educational processes the institution relies on for instilling appropriate knowledge about its procedures and rules in its participants
  • the management, supervision, enforcement, and incentive mechanisms it embodies to assure a sufficient level of compliance among its participants
  • the norms of behavior that typical participants have internalized with respect to action within the institution

And the distinctive performance characteristics of the institution may derive from the specific nature of the arrangements that are described at each of these levels.

System safety is a good example to consider from the point of view of the new institutionalism. Two airlines may have significantly different safety records. And the explanation may be at any of these levels: they may have differences in formalized procedures, they may have differences in training regimes, they may have differences in management oversight effectiveness, or they may have different normative cultures at the rank-and-file level. It is a central insight of the new institutionalism that the first level may be the most important for explaining the overall safety records of the two companies, even though mechanisms may fail at any of the other levels as well. Procedural differences generally lead to significant and measurable differences in the quality of organizational results. (Nancy Leveson’s Safeware: System Safety and Computers provides a great discussion of many of these issues.)

%d bloggers like this: