Empowering the safety officer?

How can industries involving processes that create large risks of harm for individuals or populations be modified so they are more capable of detecting and eliminating the precursors of harmful accidents? How can nuclear accidents, aviation crashes, chemical plant explosions, and medical errors be reduced, given that each of these activities involves large bureaucratic organizations conducting complex operations and with substantial inter-system linkages? How can organizations be reformed to enhance safety and to minimize the likelihood of harmful accidents?

One of the lessons learned from the Challenger space shuttle disaster is the importance of a strongly empowered safety officer in organizations that deal in high-risk activities. This means the creation of a position dedicated to ensuring safe operations that falls outside the normal chain of command. The idea is that the normal decision-making hierarchy of a large organization has a built-in tendency to maintain production schedules and avoid costly delays. In other words, there is a built-in incentive to treat safety issues with lower priority than most people would expect.

If there had been an empowered safety officer in the launch hierarchy for the Challenger launch in 1986, there is a good chance this officer would have listened more carefully to the Morton-Thiokol engineering team’s concerns about low temperature damage to O-rings and would have ordered a halt to the launch sequence until temperatures in Florida raised to the critical value. The Rogers Commission faulted the decision-making process leading to the launch decision in its final report on the accident (The Report of the Presidential Commission on the Space Shuttle Challenger Accident – The Tragedy of Mission 51-L in 1986 – Volume OneVolume TwoVolume Three).

This approach is productive because empowering a safety officer creates a different set of interests in the management of a risky process. The safety officer’s interest is in safety, whereas other decision makers are concerned about revenues and costs, public relations, reputation, and other instrumental goods. So a dedicated safety officer is empowered to raise safety concerns that other officers might be hesitant to raise. Ordinary bureaucratic incentives may lead to underestimating risks or concealing faults; so lowering the accident rate requires giving some individuals the incentive and power to act effectively to reduce risks.

Similar findings have emerged in the study of medical and hospital errors. It has been recognized that high-risk activities are made less risky by empowering all members of the team to call a halt in an activity when they perceive a safety issue. When all members of the surgical team are empowered to halt a procedure when they note an apparent error, serious operating-room errors are reduced. (Here is a report from the American College of Obstetricians and Gynecologists on surgical patient safety; link. And here is a 1999 National Academy report on medical error; link.)

The effectiveness of a team-based approach to safety depends on one central fact. There is a high level of expertise embodied in the staff operating a surgical suite, an engineering laboratory, or a drug manufacturing facility. By empowering these individuals to stop a procedure when they judge there is an unrecognized error in play, this greatly extend the amount of embodied knowledge involved in a process. The surgeon, the commanding officer, or the lab director is no longer the sole expert whose judgments count.

But it also seems clear that these innovations don’t work equally well in all circumstances. Take nuclear power plant operations. In Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima James Mahaffey documents multiple examples of nuclear accidents that resulted from the efforts of mid-level workers to address an emerging problem in an improvised way. In the case of nuclear power plant safety, it appears that the best prescription for safety is to insist on rigid adherence to pre-established protocols. In this case the function of a safety officer is to monitor operations to ensure protocol conformance — not to exercise independent judgment about the best way to respond to an unfavorable reactor event.

It is in fact an interesting exercise to try to identify the kinds of operations in which these innovations are likely to be effective.

Here is a fascinating interview in Slate with Jim Bagian, a former astronaut, one-time director of the Veteran Administration’s National Center for Patient Safety, and distinguished safety expert; link. Bagian emphasizes the importance of taking a system-based approach to safety. Rather than focusing on finding blame for specific individuals whose actions led to an accident, Bagian emphasizes the importance of tracing back to the institutional, organizational, or logistic background of the accident. What can be changed in the process — of delivering medications to patients, of fueling a rocket, or of moving nuclear solutions around in a laboratory — that make the likelihood of an accident substantially lower?

The safety principles involved here seem fairly simple: cultivate a culture in which errors and near-misses are reported and investigated without blame; empower individuals within risky processes to halt the process if their expertise and experience indicates the possibility of a significant risky error; create individuals within organizations whose interests are defined in terms of the identification and resolution of unsafe practices or conditions; and share information about safety within the industry and with the public.

Technology lock-in accidents

image: diagram of molten salt reactor

Organizational and regulatory features are sometimes part of the causal background of important technology failures. This is particularly true in the history of nuclear power generation. The promise of peaceful uses of atomic energy was enormously attractive at the end of World War II. In abstract terms the possibility of generating useable power from atomic reactions was quite simple. What was needed was a controllable fission reaction in which the heat produced by fission could be captured to run a steam-powered electrical generator.

The technical challenges presented by harnessing nuclear fission in a power plant were large. Fissionable material needed to be produced as useable fuel sources. A control system needed to be designed to maintain the level of fission at a desired level. And, most critically, a system for removing heat from the fissioning fuel needed to be designed so that the reactor core would not overheat and melt down, releasing energy and radioactive materials into the environment.

Early reactor designs took different approaches to the heat-removal problem. Liquid metal reactors used a metal like sodium as the fluid that would run through the core removing heat to a heat sink for dispersal; and water reactors used pressurized water to serve that function. The sodium breeder reactor design appeared to be a viable approach, but incidents like the Fermi 1 disaster near Detroit cast doubt on the wisdom of using this approach. The reactor design that emerged as the dominant choice in civilian power production was the light water reactor. But light water reactors presented their own technological challenges, including most especially the risk of a massive steam explosion in the event of a power interruption to the cooling plant. In order to obviate this risk reactor designs involved multiple levels of redundancy to ensure that no such power interruption would occur. And much of the cost of construction of a modern light water power plant is dedicated to these systems — containment vessels, redundant power supplies, etc. In spite of these design efforts, however, light water reactors at Three Mile Island and Fukushima did in fact melt down under unusual circumstances — with particularly devastating results in Fukushima. The nuclear power industry in the United States essentially died as a result of public fears of the possibility of meltdown of nuclear reactors near populated areas — fears that were validated by several large nuclear disasters.

What is interesting about this story is that there was an alternative reactor design that was developed by US nuclear scientists and engineers in the 1950s that involved a significantly different solution to the problem of harnessing the heat of a nuclear reaction and that posed a dramatically lower level of risk of meltdown and radioactive release. This is the molten salt reactor, first developed at the Oak Ridge National Laboratory facility in the 1950s. This was developed as part of the loopy idea of creating an atomic-powered aircraft that could remain aloft for months. This reactor design operates at atmospheric pressure, and the technological challenges of maintaining a molten salt cooling system are readily solved. The fact that there is no water involved in the cooling system means that the greatest danger in a nuclear power plant, a violent steam explosion, is eliminated entirely. Molten salt will not turn to steam, and the risk of a steam-based explosion is removed completely. Chinese nuclear energy researchers are currently developing a next generation of molten salt reactors, and there is a likelihood that they will be successful in designing a reactor system that is both more efficient in terms of cost and dramatically safer in terms of low-probability, high-cost accidents (link). This technology also has the advantage of making much more efficient use of the nuclear fuel, leaving a dramatically smaller amount of radioactive waste to dispose of.

So why did the US nuclear industry abandon the molten-salt reactor design? This seems to be a situation of lock-in by an industry and a regulatory system. Once the industry settled on the light water reactor design, it was implemented by the Nuclear Regulatory Commission in terms of the regulations and licensing requirements for new nuclear reactors. It was subsequently extremely difficult for a utility company or a private energy corporation to invest in the research and development and construction costs that would be associated with a radical change of design. There is currently an effort by an American company to develop a new-generation molten salt reactor, and the process is inhibited by the knowledge that it will take a minimum of ten years to gain certification and licensing for a possible commercial plant to be based on the new design (link).

This story illustrates the possibility that a process of technology development may get locked into a particular approach that embodies substantial public risk, and it may be all but impossible to subsequently adopt a different approach. In another context Thomas Hughes refers to this as technological momentum, and it is clear that there are commercial, institutional, and regulatory reasons for this “stickiness” of a major technology once it is designed and adopted. In the case of nuclear power the inertia associated with light water reactors is particularly unfortunate, given that it blocked other solutions that were both safer and more economical.

(Here is a valuable review of safety issues in the nuclear power industry; link.)

Nuclear accidents

 
diagrams: Chernobyl reactor before and after
 

Nuclear fission is one of the world-changing discoveries of the mid-twentieth century. The atomic bomb projects of the United States led to the atomic bombing of Japan in August 1945, and the hope for limitless electricity brought about the proliferation of a variety of nuclear reactors around the world in the decades following World War II. And, of course, nuclear weapons proliferated to other countries beyond the original circle of atomic powers.

Given the enormous energies associated with fission and the dangerous and toxic properties of radioactive components of fission processes, the possibility of a nuclear accident is a particularly frightening one for the modern public. The world has seen the results of several massive nuclear accidents — Chernobyl and Fukushima in particular — and the devastating results they have had on human populations and the social and economic wellbeing of the regions in which they occurred.

Safety is therefore a paramount priority in the nuclear industry, both in research labs and military and civilian applications. So what is the situation of safety in the nuclear sector? Jim Mahaffey’s Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima is a detailed and carefully researched attempt to answer this question. And the information he provides is not reassuring. Beyond the celebrated and well-known disasters at nuclear power plants (Three Mile Island, Chernobyl, Fukushima), Mahaffey refers to hundreds of accidents involving reactors, research laboratories, weapons plants, and deployed nuclear weapons that have had less public awareness. These accidents resulted in a very low number of lives lost, but their frequency is alarming. They are indeed “normal accidents” (Perrow, Normal Accidents: Living with High-Risk Technologies. For example:

  • a Japanese fishing boat is contaminated by fallout from Castle Bravo test of hydrogen bomb; lots of radioactive fish at the markets in Japan (March 1, 1954) (kl 1706)
  • one MK-6 atomic bomb is dropped on Mars Bluff, South Carolina, after a crew member accidentally pulled the emergency bomb release handle (February 5, 1958) (kl 5774)
  • Fermi 1 liquid sodium plutonium breeder reactor experiences fuel meltdown during startup trials near Detroit (October 4, 1966) (kl 4127)

Mahaffey also provides detailed accounts of the most serious nuclear accidents and meltdowns during the past forty years, Three Mile Island, Chernobyl, and Fukushima.

The safety and control of nuclear weapons is of particular interest. Here is Mahaffey’s summary of “Broken Arrow” events — the loss of atomic and fusion weapons:

Did the Air Force ever lose an A-bomb, or did they just misplace a few of them for a short time? Did they ever drop anything that could be picked up by someone else and used against us? Is humanity going to perish because of poisonous plutonium spread that was snapped up by the wrong people after being somehow misplaced? Several examples will follow. You be the judge. 

Chuck Hansen [

U.S. Nuclear Weapons – The Secret History

] was wrong about one thing. He counted thirty-two “Broken Arrow” accidents. There are now sixty-five documented incidents in which nuclear weapons owned by the United States were lost, destroyed, or damaged between 1945 and 1989. These bombs and warheads, which contain hundreds of pounds of high explosive, have been abused in a wide range of unfortunate events. They have been accidentally dropped from high altitude, dropped from low altitude, crashed through the bomb bay doors while standing on the runway, tumbled off a fork lift, escaped from a chain hoist, and rolled off an aircraft carrier into the ocean. Bombs have been abandoned at the bottom of a test shaft, left buried in a crater, and lost in the mud off the coast of Georgia. Nuclear devices have been pounded with artillery of a foreign nature, struck by lightning, smashed to pieces, scorched, toasted, and burned beyond recognition. Incredibly, in all this mayhem, not a single nuclear weapon has gone off accidentally, anywhere in the world. If it had, the public would know about it. That type of accident would be almost impossible to conceal. (kl 5527)

There are a few common threads in the stories of accident and malfunction that Mahaffey provides. First, there are failures of training and knowledge on the part of front-line workers. The physics of nuclear fission are often counter-intuitive, and the idea of critical mass does not fully capture the danger of a quantity of fissionable material. The geometry of the storage of the material makes a critical difference in going critical. Fissionable material is often transported and manipulated in liquid solution; and the shape and configuration of the vessel in which the solution is held makes a difference to the probability of exponential growth of neutron emission — leading to runaway fission of the material. Mahaffey documents accidents that occurred in nuclear materials processing plants that resulted from plant workers applying what they knew from industrial plumbing to their efforts to solve basic shop-floor problems. All too often the result was a flash of blue light and the release of a great deal of heat and radioactive material.

Second, there is a fault at the opposite end of the knowledge spectrum — the tendency of expert engineers and scientists to believe that they can solve complicated reactor problems on the fly. This turned out to be a critical problem at Chernobyl (kl 6859).

The most difficult problem to handle is that the reactor operator, highly trained and educated with an active and disciplined mind, is liable to think beyond the rote procedures and carefully scheduled tasks. The operator is not a computer, and he or she cannot think like a machine. When the operator at NRX saw some untidy valve handles in the basement, he stepped outside the procedures and straightened them out, so that they were all facing the same way. (kl 2057)

There are also clear examples of inappropriate supervision in the accounts shared by Mahaffey. Here is an example from Chernobyl.

[Deputy chief engineer] Dyatlov was enraged. He paced up and down the control panel, berating the operators, cursing, spitting, threatening, and waving his arms. He demanded that the power be brought back up to 1,500 megawatts, where it was supposed to be for the test. The operators, Toptunov and Akimov, refused on grounds that it was against the rules to do so, even if they were not sure why. 

Dyatlov turned on Toptunov. “You lying idiot! If you don’t increase power, Tregub will!”  

Tregub, the Shift Foreman from the previous shift, was officially off the clock, but he had stayed around just to see the test. He tried to stay out of it. 

Toptunov, in fear of losing his job, started pulling rods. By the time he had wrestled it back to 200 megawatts, 205 of the 211 control rods were all the way out. In this unusual condition, there was danger of an emergency shutdown causing prompt supercriticality and a resulting steam explosion. At 1: 22: 30 a.m., a read-out from the operations computer advised that the reserve reactivity was too low for controlling the reactor, and it should be shut down immediately. Dyatlov was not worried. “Another two or three minutes, and it will be all over. Get moving, boys! (kl 6887)

This was the turning point in the disaster.

A related fault is the intrusion of political and business interests into the design and conduct of high-risk nuclear actions. Leaders want a given outcome without understanding the technical details of the processes they are demanding; subordinates like Toptunov are eventually cajoled or coerced into taking the problematic actions. The persistence of advocates for liquid sodium breeder reactors represents a higher-level example of the same fault. Associated with this role of political and business interests is an impulse towards secrecy and concealment when accidents occur and deliberate understatement of the public dangers created by an accident — a fault amply demonstrated in the Fukushima disaster.

Atomic Accidents provides a fascinating history of events of which most of us are unaware. The book is not primarily intended to offer an account of the causes of these accidents, but rather the ways in which they unfolded and the consequences they had for human welfare. (Generally speaking his view is that nuclear accidents in North America and Western Europe have had remarkably few human casualties.) And many of the accidents he describes are exactly the sorts of failures that are common in all largescale industrial and military processes.

(Largescale technology failure has come up frequently here. See these posts for analysis of some of the organizational causes of technology failure (link, link, link).)

Declining industries

Why is it so difficult for leaders in various industries and sectors to seriously address the existential threats that sometimes arise? Planning for marginal changes in the business environment is fairly simple; problems can be solved, costs can be cut, and the firm can stay in the black. But how about more radical but distant threats? What about the grocery sector when confronted by Amazon’s radical steps in food selling? What about Polaroid or Kodak when confronted by the rise of digital photography in the 1990s? What about the US steel industry in the 1960s when confronted with rising Asian competition and declining manufacturing facilities?

From the outside these companies and sectors seem like dodos incapable of confronting the threats that imperil them. They seem to be ignoring oncoming train wrecks simply because these catastrophes are still in the distant future. And yet the leaders in these companies were generally speaking talented, motivated men and women. So what are the organizational or cognitive barriers that arise to make it difficult for leaders to successfully confront the biggest threats they face?

Part of the answer seems to be the fact that distant hazards seem smaller than the more immediate and near-term challenges that an organization must face; so there is a systematic bias towards myopic decision-making. This sounds like a Kahneman-Tversky kind of cognitive shortcoming.

A second possible explanation is that it is easy enough to persuade oneself that distant threats will either resolve themselves organically or that the organization will discover novel solutions in the future. This seems to be part of the reason that climate-change foot-draggers take the position they do: that “things will sort out”, “new technologies will help solve the problems in the future.” This sounds like a classic example of weakness of the will — an unwillingness to rationally confront hard truths about the future that ought to influence choices today but often don’t.

Then there is the timeframe of accountability that is in place in government, business, and non-profit organizations alike. Leaders are rewarded and punished for short-term successes and failures, not prudent longterm planning and preparation. This is clearly true for term-limited elected officials, but it is equally true for executives whose stakeholders evaluate performance based on quarterly profits rather than longterm objectives and threats.

We judge harshly those leaders who allow their firms or organizations to perish because of a chronic failure to plan for substantial change in the environments in which they will need to operate in the future. Nero is not remembered kindly for his dedication to his fiddle. And yet at any given time, many industries are in precisely that situation. What kind of discipline and commitment can protect organizations against this risk?

This is an interesting question in the abstract. But it is also a challenging question for people who care about the longterm viability of colleges and universities. Are there forces at work today that will bring about existential crisis for universities in twenty years (enrollments, tuition pressure, technology change)? Are there technological or organizational choices that should be made today that would help to avert those crises in the future? And are university leaders taking the right steps to prepare their institutions for the futures they will face in several decades?

Gaining compliance

Organizations always involve numerous staff members whose behavior has the potential for creating significant risk for individuals and the organization but who are only loosely supervised. This situation unavoidably raises principal-agent problems. Let’s assume that the great majority of staff members are motivated by good intentions and ethical standards. That means that there are a small number of individuals whose behavior is not ethical and well intentioned. What arrangements can an organization put in place to prevent bad behavior and protect individuals and the integrity of the organization?

For certain kinds of bad behavior there are well understood institutional arrangements that work well to detect and deter the wrong actions. This is especially true for business transactions, purchasing, control of cash, expense reporting and reimbursement, and other financial processes within the organization. The audit and accounting functions within almost every sophisticated organization permit a reasonably high level of confidence in the likelihood of detection of fraud, theft, and misreporting. This doesn’t mean that corrupt financial behavior does not occur; but audits make it much more difficult to succeed in persistent dishonest behavior. So an organization with an effective audit function is likely to have a reasonably high level of compliance in the areas where standard audits can be effectively conducted.

A second kind of compliance effort has to do with the culture and practice of observer reporting of misbehavior. Compliance hotlines allow individuals who have observed (or suspected) bad behavior to report that behavior to responsible agents who are obligated to investigate these allegations. Policies that require reporting of certain kinds of bad behavior to responsible officers of the organization — sexual harassment, racial discrimination, or fraudulent actions, for example — should have the effect of revealing some kinds of misbehavior, and deterring others from engaging in bad behavior. So a culture and expectation of reporting is helpful in controlling bad behavior.

A third approach that some organizations take to compliance is to place a great deal of emphasis the moral culture of the organization — shared values, professional duty, and role responsibilities. Leaders can support and facilitate a culture of voluntary adherence to the values and policies of the organization, so that virtually all members of the organization fall in the “well-intentioned” category. The thrust off this approach is to make large efforts at eliciting voluntary good behavior. Business professor David Hess has done a substantial amount of research on these final two topics (link, link).

Each of these organizational mechanisms has some efficacy. But unfortunately they do not suffice to create an environment where we can be highly confident that serious forms of misconduct do not occur. In particular, reporting and culture are only partially efficacious when it comes to private and covert behavior like sexual assault, bullying, and discriminatory speech and behavior in the workplace. This leads to an important question: are there more intrusive mechanisms of supervision and observation that would permit organizations to discover patterns of misconduct even if they remain unreported by observers and victims? Are there better ways for an organization to ensure that no one is subject to the harmful actions of a predator or harasser?

A more active strategy for an organization committed to eliminating sexual assault is to attempt to predict the environments where inappropriate interpersonal behavior is possible and to redesign the setting so the behavior is substantially less likely. For example, a hospital may require that any physical examinations of minors must be conducted in the presence of a chaperone or other health professional. A school of music or art may require that after-hours private lessons are conducted in semi-public locations. These rules would deprive a potential predator of the seclusion needed for the bad behavior. And the practitioner who is observed violating the rule would then be suspect and subject to further investigation and disciplinary action.

Here is perhaps a farfetched idea: a “behavior audit” that is periodically performed in settings where inappropriate covert behavior is possible. Here we might imagine a process in which a random set of people are periodically selected for interview who might have been in a position to have been subject to inappropriate behavior. These individuals would then be interviewed with an eye to helping to surface possible negative or harmful experiences that they have had. This process might be carried out for groups of patients, students, athletes, performers, or auditioners in the entertainment industry. And the goal would be to uncover traces of the kinds of behavior involving sexual harassment and assault that are at the heart of recent revelations in a myriad of industries and organizations. The results of such an audit would occasionally reveal a pattern of previously unknown behavior requiring additional investigation, while the more frequent results would be negative. This process would lead to a higher level of confidence that the organization has reasonably good knowledge of the frequency and scope of bad behavior and a better system for putting in place a plan of remediation.

All of these organizational strategies serve fundamentally as attempts to solve principal-agent problems within the organization. The principals of the organization have expectations about the norms that ought to govern behavior within the organization. These mechanisms are intended to increase the likelihood that there is conformance between the principal’s expectations and the agent’s behavior. And, when they fail, several of these mechanisms are intended to make it more likely that bad behavior is identified and corrected.

(Here is an earlier post treating scientific misconduct as a principal-agent problem; link.)

Trust and organizational effectiveness

It is fairly well agreed that organizations require a degree of trust among the participants in order for the organization to function at all. But what does this mean? How much trust is needed? How is trust cultivated among participants? And what are the mechanisms through which trust enhances organizational effectiveness?

The minimal requirements of cooperation presuppose a certain level of trust. As A plans and undertakes a sequence of actions designed to bring about Y, his or her efforts must rely upon the coordination promised by other actors. If A does not have a sufficiently high level of confidence in B’s assurances and compliance, then he will be rationally compelled to choose another series of actions. If Larry Bird didn’t have trust in his teammate Dennis Johnson, the famous steal would not have happened.

 

First, what do we mean by trust in the current context? Each actor in an organization or group has intentions, engages in behavior, and communicates with other actors. Part of communication is often in the form of sharing information and agreeing upon a plan of coordinated action. Agreeing upon a plan in turn often requires statements and commitments from various actors about the future actions they will take. Trust is the circumstance that permits others to rely upon those statements and commitments. We might say, then, that A trusts B just in case —

  • A believes that when B asserts P, this is an honest expression of B’s beliefs.
  • A believes that when B says he/she will do X, this is an honest commitment on B’s part and B will carry it out (absent extraordinary reasons to the contrary).
  • A believes that when B asserts that his/her actions will be guided by his best understanding of the purposes and goals of the organization, this is a truthful expression.
  • A believes that B’s future actions, observed and unobserved, will be consistent with his/her avowals of intentions, values, and commitments.

So what are some reasons why mistrust might rear its ugly head between actors in an organization? Why might A fail to trust B?

  • A may believe that B’s private interests are driving B’s actions (rather than adherence to prior commitments and values).
  • A may believe that B suffers from weakness of the will, an inability to carry out his honest intentions.
  • A may believe that B manipulates his statements of fact to suit his private interests.
  • Or less dramatically: A may not have high confidence in these features of B’s behavior.
  • B may have no real interest or intention in behaving in a truthful way.

And what features of organizational life and practice might be expected to either enhance inter-personal trust or to undermine it?

Trust is enhanced by individuals having the opportunity to get acquainted with their collaborators in a more personal way — to see from non-organizational contexts that they are generally well intentioned; that they make serious efforts to live up to their stated intentions and commitments; and that they are generally honest. So perhaps there is a rationale for the bonding exercises that many companies undertake for their workers.

Likewise, trust is enhanced by the presence of a shared and practiced commitment to the value of trustworthiness. An organization itself can enhance trust in its participants by performing the actions that its participants expect the organization to perform. For example, an organization that abruptly and without consultation ends an important employee benefit undermines trust in the employees that the organization has their best interests at heart. This abrogation of prior obligations may in turn lead individuals to behave in a less trustworthy way, and lead others to have lower levels of trust in each other.

How does enhancing trust have the promise of bringing about higher levels of organizational effectiveness? Fundamentally this comes down to the question of the value of teamwork and the burden of unnecessary transaction costs. If every expense report requires investigation, the amount of resources spent on accountants will be much greater than a situation where only the outlying reports are questioned. If each vice president needs to defend him or herself against the possibility that another vice president is conspiring against him, then less time and energy are available to do the work of the organization. If the CEO doesn’t have high confidence that her executive team will work wholeheartedly to bring about a successful implementation of a risky investment, then the CEO will choose less risky investments.

In other words, trust is crucial for collaboration and teamwork. And organizations that manage to help to cultivate a high level of trust among its participants is likely to perform better than one that depends primarily on supervision and enforcement.

Varieties of organizational dysfunction

Several earlier posts have made the point that important technology failures often include organizational faults in their causal background.

It is certainly true that most important accidents have multiple causes, and it is crucial to have as good an understanding as possible of the range of causal pathways that have led to air crashes, chemical plant explosions, or drug contamination incidents. But in the background we almost always find organizations and practices through which complex technical activities are designed, implemented, and regulated. Human actors, organized into patterns of cooperation, collaboration, competition, and command, are as crucial to technical processes as are power lines, cooling towers, and control systems in computers. So it is imperative that we follow the lead of researchers like Charles Perrow (The Next Catastrophe: Reducing Our Vulnerabilities to Natural, Industrial, and Terrorist Disasters), Kathleen Tierney (The Social Roots of Risk: Producing Disasters, Promoting Resilience), or Diane Vaughan (The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA) and give close attention to the social- and organization-level failures that sometimes lead to massive technological failures.

It is useful to have a few examples in mind as we undertake to probe this question more deeply. Here are a number of important accidents and failures that have been carefully studied.

  • Three Mile Island, Chernobyl nuclear disasters
  • Challenger and Columbia space shuttle disasters
  • Failure of United States anti-submarine warfare in 1942-43
  • Flawed policy and decision-making in US leading to escalation of Vietnam War
  • Flawed policy and decision-making in France leading to Dien Bien Phu defeat
  • Failure of Nuclear Regulatory Commission to ensure reactor safety
  • DC-10 design process
  • Osprey design process
  • failure of Federal flood insurance to appropriately guide rational land use
  • FEMA failure in Katrina aftermath
  • Design and manufacture of the Edsel sedan
  • High rates of hospital-born infections in some hospitals

Examples like these allow us to begin to create an inventory of organizational flaws that sometimes lead to failures and accidents:

  • siloed decision-making (design division, marketing division, manufacturing division all have different priorities and interests)
  • lax implementation of formal processes
  • strategic bureaucratic manipulation of outcomes 
    • information withholding, lying
    • corrupt practices, conflicts of interest and commitment
  • short-term calculation of costs and benefits
  • indifference to public goods
  • poor evaluation of data; misinterpretation of data
  • lack of high-level officials responsible for compliance and safety

These deficiencies may be analyzed in terms of a more abstract list of organizational failures:

  • Poor decisions given existing priorities and facts
    • poor priority-setting processes
    • poor information-gathering and analysis
  • failure to learn and adapt from changing circumstances
  • internal capture of decision-making; corruption, conflict of interest
  • vulnerability of decision-making to external pressures (external capture)
  • faulty or ineffective implementation of policies, procedures, and regulations
******

Nancy Leveson is a leading authority on the systems-level causes of accidents and failures. A recent white paper can be found here. Here is the abstract for that paper:

New technology is making fundamental changes in the etiology of accidents and is creating a need for changes in the explanatory mechanisms used. We need better and less subjective understanding of why accidents occur and how to prevent future ones. The most effective models will go beyond assigning blame and instead help engineers to learn as much as possible about all the factors involved, including those related to social and organizational structures. This paper presents a new accident model founded on basic systems theory concepts. The use of such a model provides a theoretical foundation for the introduction of unique new types of accident analysis, hazard analysis, accident prevention strategies including new approaches to designing for safety, risk assessment techniques, and approaches to designing performance monitoring and safety metrics. (1; italics added)

Here is what Leveson has to say about the social and organizational causes of accidents:

2.1 Social and Organizational Factors

Event-based models are poor at representing systemic accident factors such as structural deficiencies in the organization, management deficiencies, and flaws in the safety culture of the company or industry. An accident model should encourage a broad view of accident mechanisms that expands the investigation from beyond the proximate events.

Ralph Miles Jr., in describing the basic concepts of systems theory, noted that:

Underlying every technology is at least one basic science, although the technology may be well developed long before the science emerges. Overlying every technical or civil system is a social system that provides purpose, goals, and decision criteria (Miles, 1973, p. 1).

Effectively preventing accidents in complex systems requires using accident models that include that social system as well as the technology and its underlying science. Without understanding the purpose, goals, and decision criteria used to construct and operate systems, it is not possible to completely understand and most effectively prevent accidents. (6)

Collapse of Eastern European communisms

An earlier post commented on Tony Judt’s magnificent book Postwar: A History of Europe Since 1945. There I focused on the story he tells of the brutality of the creation of Communist Party dictatorships across Eastern Europe (link). Equally fascinating is his narrative of the abrupt collapse of those states in 1989. In short order the world witnessed the collapse of communism in Poland (June 1989), East Germany (November 1989), Czechoslovakia (November 1989), Bulgaria (November 1989), Romania (December 1989), Hungary (March 1990), and the USSR (December 1991). Most of this narrative occurs in chapter 19.

The sudden collapse of multiple Communist states in a period of roughly a year requires explanation. These were not sham states; they had formidable forced of repression and control; and there were few avenues of public protest available to opponents of the regimes. So their collapse is worth of careful assessment.

There seem to be several crucial ingredients in the sudden collapse of these dictatorships. One is the persistence of an intellectual and practical opposition to Communism and single-party rule in almost all these countries. The brutality of violent repression in Poland, Hungary, Czechoslovakia, and other countries did not succeed in permanently suppressing opposition based on demands for greater freedom and greater self-determination through political participation. And this was true in the fields of the arts and literature as much as it was in the disciplines of law and politics. Individuals and organizations reemerged at various important junctures to advocate again for political and legal reforms, in Poland, Czechoslovakia, Hungary, and even the USSR.

Second was the chronic inability of these states to achieve economic success and rising standards of living for their populations. Price riots in Poland in the 1970s and elsewhere signaled a fundamental discontent by consumers and workers who were aware of the living conditions of people living in other parts of non-Communist Europe. Material discontent was a powerful factor in the repeated periods of organized protest that occurred in several of these states prior to 1989. (Remember the joke from Poland in the 1970s — “If they pretend to pay us, we pretend to work.”)

And third was the position taken by Mikhail Gorbachev on the use of force to maintain Communist regimes in satellite countries. The use of violence and armed force had sufficed to quell popular movements in Hungary, Czechoslovakia, and Poland in years past. But when Gorbachev made it credible and irreversible that the USSR would no longer use tanks to reinforce the satellite regimes — for example, in his speech to the United Nations in December 1988 — local parties were suddenly exposed to new realities. Domestic repression was still possible, but it was no longer obvious that it would succeed.

And the results were dramatic. In a period of months the world witnessed the sudden collapse of Communist rule in country after country; and in most instances the transitions were relatively free of large-scale violence. (The public executions of Romania’s Nicolae and Elena Ceaușescu on Christmas Day, 1889 were a highly visible exception.)

There seem to be many historical lessons to learn from this short period of history. Particularly sharp are the implications for other single-party dictatorships. So let’s reflect on the behavior of the single-party state in China since the mid-1980s. The Chinese party-state has had several consistent action plans since the 1980s. First, it has focused great effort on economic reform, rising incomes, and improving standards of living for the bulk of its population. In these efforts it has been largely successful — in strong contrast to the USSR and its satellite states. Second, the Chinese government has intensified its ability to control ideology and debate, culminating in the current consolidation of power under President Xi. And third, it used brutal force against the one movement that emerged in 1989 with substantial and broad public involvement, the Democracy Movement. The use of force against demonstrations in Tiananmen Square and other cities in China demonstrated the party’s determination to prevent largescale public mobilization with force if needed.

It is difficult to avoid the conclusion that China’s leaders have reflected very carefully on the collapse of single-party states in 1989, culminating in the collapse of the Soviet Union itself. They appear to have settled on a longterm coordinated strategy aimed at preventing the emergence of the particular factors that led to those political catastrophes. They are committed to fulfilling the expectations of the public that the economy will continue to grow and support rising standards of living for the mass of the population. So economic growth has remained a very high priority. Second, they are vigilant in monitoring ideological correctness, suppressing individuals and groups who continue to advocate for universal human rights, democracy, and individual freedoms. And they are unstinting in providing the resources needed by the state organizations through which censorship, political repression, and ideological correctness are maintained. And finally, they appear to be willing to use overwhelming force if necessary to prevent largescale public protests. The regime seems very confident that a pathway of future development that continues to support material improvement for the population while tightly controlling ideas and public discussions of political issues will be successful. And it is hard to see that this calculation is fundamentally incorrect.

Corruption and institutional design

Robert Klitgaard is an insightful expert on the institutional causes of corruption in various social arrangements. His 1988 book, Controlling Corruption, laid out several case studies in detail, demonstrating specific features of institutional design that either encouraged or discouraged corrupt behavior by social and political actors.

More recently Klitgaard prepared a major report for the OECD on the topic of corruption and development assistance (2015; link). This working paper is worth reading in detail for anyone interested in understanding the dysfunctional origins of corruption as an institutional fact. Here is an early statement of the kinds of institutional facts that lead to higher levels of corruption:

Corruption is a crime of calculation. Information and incentives alter patterns of corruption. Processes with strong monopoly power, wide discretion for officials and weak accountability are prone to corruption. (7)

Corruption can go beyond bribery to include nepotism, neglect of duty and favouritism. Corrupt acts can involve third parties outside the organisation (in transactions with clients and citizens, such as extortion and bribery) or be internal to an organisation (theft, embezzlement, some types of fraud). Corruption can occur in government, business, civil society organisations and international agencies. Each of these varieties has the dimension of scale, from episodic to systemic. (18)

Here is an early definition of corruption that Klitgaard offers:

Corruption is a term of many meanings, but at the broadest level, corruption is the misuse of office for unofficial ends. Office is a position of duty, or should be; the office-holder is supposed to put the interests of the institution and the people first. In its most pernicious forms, systemic corruption creates the shells of modern institutions, full of official ranks and rules but “institutions” in inverted commas only. V.S. Naipaul, the Trinidad-born Nobel Prize winner, once noted that underdevelopment is characterised by a duplicitous emphasis on honorific titles and simultaneously the abuse of those titles: judges who love to be called “your honour” even as they accept bribes, civil servants who are uncivil and serve themselves. (18)

The bulk of Klitgaard’s report is devoted to outlining mechanisms through which governments, international agencies, and donor agencies can attempt to initiate effective reform processes leading to lower levels of corruption. There are two theoretical foundations underlying the recommendations, one having to do with the internal factors that enhance or reduce corruption and the other having to do with a theory of effective institutional change. The internal theory is couched as a piece of algebra: corruption is the result of monopoly power plus official discretion minus accountability (37). So effective interventions should be designed around reducing monopoly power and official discretion while increasing accountability.

The premise about reform process that Klitgaard favors involves what he refers to as “convening” — assembling working groups of influential and knowledgeable stakeholders in a given country and setting them the task of addressing corruption in the country. Examples and case studies include the Philippines, Columbia, Georgia, and Indonesia. Here is a high-level description of what he has in mind:

The recommended process – referred to in this paper as convening – invites development assistance providers to share international data, case studies and theory, and invites national leaders from recipient countries to provide local knowledge and creative problem-solving skills. (5)

Klitgaard spends a fair amount of time on the problem of measuring corruption at the national level. He refers to several international indices that are relevant: Transparency International’s Corruption Perceptions Index, the World Economic Forum’s Global Competitiveness Index, the Global Integrity index, and the International Finance Corporation’s ranking of nations in terms of “ease of doing business” (11).

What this report does not attempt to do is to address specific institutional arrangements in order to discover the propensities for corrupt behavior that they create. This is the strength of Klitgaard’s earlier book, where he looks at alternative forms of social or political arrangements for policing or collecting taxes. In this report there is none of that micro detail. What specific institutional arrangements can be designed that have the effect of reducing official monopoly power and discretion, or the effect of increasing official accountability? Implicitly Klitgaard suggests that these are questions best posed to the experts who participate in the national convening on corruption, because they have the best local knowledge of government and business practices. But here are a few mechanisms that Klitgaard specifically highlights: punish major offenders, pick visible, low-hanging fruit, bring in new leaders and reformers, coordinate government institutions, involve officials, and mobilize citizens and the business community (chapter 5).

A more micro perspective on international corruption is provided by a recent study by David Hess, “Combating Corruption in International Business: The Big Questions” (link). Hess focuses on the Foreign Corrupt Practices Act in the United States, and he asks the question, why do large corporations pay bribes when this is clearly illegal under the FCPA? Moreover, given that FCPA has the power to assess very large fines against corporations that violate its strictures, how can violation be a rational strategy? Hess considers the case of Siemens, which was fined over $1.5 billion in 2008 for repeated acts of bribery in the pursuit of contracts (3). He considers two theories of corporate bribing: a cost-benefit analysis showing that the practice of bribing leads to higher returns, and the “rogue employee” view, according to which the corporation is unable to control the actions of its occasionally unscrupulous employees. On the latter view, bribery is essentially a principal-agent problem.

Hess takes the position that bribery often has to do with organizational culture and individual behavior, and that effective steps to reduce the incidence of bribery must proceed on the basis of an adequate analysis of both culture and behavior. And he links this issue to fundamental problems in the area of corporate social responsibility.

Corporations must combat corruption. By allowing their employees to pay bribes they are contributing to a system that prevents the realization of basic human rights in many countries. Ensuring that employees do not pay bribes is not accomplished by simply adopting a compliance and ethics program, however. This essay provided a brief overview of why otherwise good employees pay bribes in the wrong organizational environment, and what corporations must focus on to prevent those situations from arising. In short, preventing bribe payments must be treated as an ethical issue, not just a legal compliance issue, and the corporation must actively manage its corporate culture to ensure it supports the ethical behavior of employees.

As this passage emphasizes, Hess believes that controlling corrupt practices requires changing incentives within the corporation while equally changing the ethical culture of the corporation; he believes that the ethical culture of a company can have effects on the degree to which employees engage in bribery and other corrupt practices.

The study of corruption is an ideal case for the general topic of institutional dysfunction. And, as many countries have demonstrated, it is remarkably difficult to alter the pattern of corrupt behavior in a large, complex society.

A new model of organization?

In Team of Teams: New Rules of Engagement for a Complex World General Stanley McChrystal (with Tantum Collins, David Silverman, and Chris Fussell) describes a new, 21st-century conception of organization for large, complex activities involving thousands of individuals and hundreds of major sub-tasks. His concept is grounded in his experience in counter-insurgency warfare in Iraq. Rather than being constructed as centrally organized, bureaucratic, hierarchical processes with commanders and scripted agents, McChrystal argues that modern counter-terrorism requires a more decentralized and flexible system of action, which he refers to as “teams of teams”. Information is shared freely, local commanders have ready access to resources and knowledge from other experts, and they make decisions in a more flexible way. The model hopes to capture the benefits of improvisation, flexibility, and a much higher level of trust and communication than is characteristic of typical military and corporate organizations.

 

One place where the “team of teams” structure is plausible is in the context of a focused technology startup company, where the whole group of participants need to be in regular and frequent collaboration with each other. Indeed, Paul Rabinow’s ethnography in 1996 of the Cetus Corporation in its pursuit of PCR (polymerase chain reaction) in Making PCR: A Story of Biotechnology reflects a very similar topology of information flows and collaboration links across and within working subgroups (link). But the vision does not fit very well the organizational and operational needs of a large hospital, a railroad company, or a research university. It seems plausible that the challenges the US military faced in fighting Al-Qaeda and ISIL are not really analogous to those faced by less dramatic organizations like hospitals, universities, and corporations. The decentralized and improvisational circumstances of urban warfare against loosely organized terrorists may be sui generis

McChrystal proposes an organizational structure that is more decentralized, more open to local decision-making, and more flexible and resilient. These are unmistakeable virtues in some circumstances; but not in all circumstances and all organizations. And arguably such a structure would have been impossible in the planning and execution of the French defense of Dien Bien Phu or the US decision to wage war against the Vietnamese insurgency ten years later. These were situations where central decisions needed to be made, and the decisions needed to be implemented through well organized bureaucracies. The problem in both instances is that the wrong decisions were made, based on the wrong information and assessments. What was needed, it would appear, was better executive leadership and decision-making — not a fundamentally decentralized pattern of response and counter-response.

One thing that deserves comment in the context of McChrystal’s book is the history of bad organization, bad intelligence, and bad decision-making the world has witnessed in the military experiences of the past century. The radical miscalculations and failures of planning involved in the first months of the Korean War, the painful and tragic misjudgments made by the French military in preparing for Dien Bien Phu, the equally bad thinking and planning done by Robert McNamara and the whiz kids leading to the Vietnam War — these examples stand out as sentinel illustrations of the failures of large organizations that have been tasked to carry out large, complex activities involving numerous operational units. The military and the national security establishments were good at some tasks, and disastrously bad at others. And the things they were bad at were both systemic and devastating. Bernard Fall illustrates these failures in Hell In A Very Small Place: The Siege Of Dien Bien Phu, and David Halberstam does so for the decision-making that led to the war in Vietnam in The Best and the Brightest.

So devising new ideas about command, planning, intelligence gathering and analysis, and priority-setting that are more effective would be a big contribution to humanity. But the deficiencies in Dien Bien Phu, Korea, or Vietnam seem different from those McChrystal identifies in Iraq. What was needed in these portentous moments of policy choice was clear-eyed establishment of appropriate priorities and goals, honest collection of intelligence and sources of information, and disinterested implementation of policies and plans that served the highest interests of the country. The “team of teams” approach doesn’t seem to be a general solution to the wide range of military and political challenges nations face.

What one would have wanted to see in the French military or the US national security apparatus is something different from the kind of teamwork described by McChrystal: greater honesty on all parts, a commitment to taking seriously the assessments of experts and participants in the field, an openness to questioning strongly held assumptions, and a greater capacity for institutional wisdom in arriving at decisions of this magnitude. We would have wanted to see a process that was not dominated by large egos, self-interest, and fixed ideas. We would have wanted French generals and their civilian masters to soberly assess the military function that a fortress camp at Dien Bien Phu could satisfy; the realistic military requirements that would need to be satisfied in order to defend the location; and an honest effort to solicit the very best information and judgment from experienced commanders and officials about what a Viet-Minh siege might look like. Instead, the French military was guided by complacent assumptions about French military superiority, which led to a genuine catastrophe for the soldiers assigned to the task and to French society more broadly.

There are valid insights contained in McChrystal’s book about the urgency of breaking down obstacles to communication and action within sprawling organizations as they confront a changing environment. But it doesn’t add up to a model that is well designed for most contexts in which large organizations actually function.

%d bloggers like this: