The US Chemical Safety Board

The Federal agency responsible for investigating chemical and petrochemical accidents in the United States is the Chemical Safety Board (link). The mission of the Board is described in these terms:

The CSB is an independent federal agency charged with investigating industrial chemical accidents. Headquartered in Washington, DC, the agency’s board members are appointed by the President and confirmed by the Senate.
The CSB’s mission is to “drive chemical safety change through independent investigation to protect people and the environment.”
The CSB’s vision is “a nation safe from chemical disasters.”
The CSB conducts root cause investigations of chemical accidents at fixed industrial facilities. Root causes are usually deficiencies in safety management systems, but can be any factor that would have prevented the accident if that factor had not occurred. Other accident causes often involve equipment failures, human errors, unforeseen chemical reactions or other hazards. The agency does not issue fines or citations, but does make recommendations to plants, regulatory agencies such as the Occupational Safety and Health Administration (OSHA) and the Environmental Protection Agency (EPA), industry organizations, and labor groups. Congress designed the CSB to be non-regulatory and independent of other agencies so that its investigations might, where appropriate, review the effectiveness of regulations and regulatory enforcement.

CSB was legislatively conceived in analogy with the National Transportation Safety Board, and its sole responsibility is to conduct investigations of major chemical accidents in the United States and report its findings to the public. It is not subordinate to OSHA or EPA, but it collaborates with those (and other) Federal agencies as appropriate (link). It has no enforcement powers; its sole function is to investigate, report, and recommend when serious chemical or petrochemical accidents have occurred.

One of its most important investigations concerned the March 23, 2005 Texas City BP refinery explosion. A massive explosion resulted in the deaths of 15 workers, injuries to over 170 workers, and substantial destruction of the refinery infrastructure. CSB conducted an extensive investigation into the “root causes” of the accident, and assigned substantial responsibility to BP’s corporate management of the facility. Here is the final report of that investigation (link), and here is a video prepared by CSB summarizing its main findings (link).

The key findings of the CSB report focus on the responsibility of BP management for the accident. Here is a summary of the CSB assessment of root causes:

The BP Texas City tragedy is an accident with organizational causes embedded in the refinery’s culture. The CSB investigation found that organizational causes linked the numerous safety system failures that extended beyond the ISOM unit. The organizational causes of the March 23, 2005, ISOM explosion are

    • BP Texas City lacked a reporting and learning culture. Reporting bad news was not encouraged, and often Texas City managers did not effectively investigate incidents or take appropriate corrective action.
  • BP Group lacked focus on controlling major hazard risk. BP management paid attention to, measured, and rewarded personal safety rather than process safety.
  • BP Group and Texas City managers provided ineffective leadership and oversight. BP management did not implement adequate safety oversight, provide needed human and economic resources, or consistently model adherence to safety rules and procedures.
  • BP Group and Texas City did not effectively evaluate the safety implications of major organizational, personnel, and policy changes.

Underlying almost all of these failures to manage this complex process with a priority on “process safety” rather than simply personal safety is a corporate mandate for cost reduction:

In late 2004, BP Group refining leadership ordered a 25 percent budget reduction “challenge” for 2005. The Texas City Business Unit Leader asked for more funds based on the conditions of the Texas City plant, but the Group refining managers did not, at first, agree to his request. Initial budget documents for 2005 reflect a proposed 25 percent cutback in capital expenditures, including on compliance, HSE, and capital expenditures needed to maintain safe plant operations.[208] The Texas City Business Unit Leader told the Group refining executives that the 25 percent cut was too deep, and argued for restoration of the HSE and maintenance-related capital to sustain existing assets in the 2005 budget. The Business Unit Leader was able to negotiate a restoration of less than half the 25 percent cut; however, he indicated that the news of the budget cut negatively affected workforce morale and the belief that the BP Group and Texas City managers were sincere about culture change. (176)

And what about corporate accountability? What did BP have to pay in recompense for its faulty management of the Texas City refinery and the subsequent damages to workers and local residents? The answer is, remarkably little. OSHA assessed a fine of $50.6 million for its violations of safety regulations (link, link), and it committed to spend at least $500M to take corrective steps within the plant to protect the safety of workers. This was a record fine at the time; and yet it might very well be seen by BP corporate executives as a modest cost of doing business in this industry. It does not seem to be of the magnitude that would lead to fundamental change of culture, action, and management within the company.

 
BP commissioned a major review of BP refinery safety in all five of its US-based refineries following release of the CSB report. This study became the Baker Panel REPORT OF THE BP U.S. REFINERIES INDEPENDENT SAFETY REVIEW PANEL (JANUARY 2007) (link). The Baker Panel consisted of fully qualified experts on industrial and technological safety who were in a very good position to assess the safety management and culture of BP in its operations of its five US-based refineries. The Baker Panel was specifically directed to refrain from attempting to analyze responsibility for the Texas City disaster and to focus its efforts on assessing the safety culture and management direction that were currently to be found in BP’s five refineries. Here are some central findings:

  • Based on its review, the Panel believes that BP has not provided effective process safety leadership and has not adequately established process safety as a core value across all its five U.S. refineries.
  • BP has not always ensured that it identified and provided the resources required for strong process safety performance at its U.S. refineries. Despite having numerous staff at different levels of the organization that support process safety, BP does not have a designated, high-ranking leader for process safety dedicated to its refining business.
  • The Panel also found that BP did not effectively incorporate process safety into management decision-making. BP tended to have a short-term focus, and its decentralized management system and entrepreneurial culture have delegated substantial discretion to U.S. refinery plant managers without clearly defining process safety expectations, responsibilities, or accountabilities.
  • BP has not instilled a common, unifying process safety culture among its U.S. refineries.
  • While all of BP’s U.S. refineries have active programs to analyze process hazards, the system as a whole does not ensure adequate identification and rigorous analysis of those hazards.
  • The Panel’s technical consultants and the Panel observed that BP does have internal standards and programs for managing process risks. However, the Panel’s examination found that BP’s corporate safety management system does not ensure timely compliance with internal process safety standards and programs at BP’s five U.S. refineries.
  • The Panel also found that BP’s corporate safety management system does not ensure timely implementation of external good engineering practices that support and could improve process safety performance at BP’s five U.S. refineries. (Summary of findings, xii-xiii)
These findings largely validate and support the critical assessment of BP’s safety management practices in the CSB report.

It seems clear that an important part of the substantial improvement that has occurred in aviation safety in the past fifty years is the effective investigation and reporting provided by the NTSB. NTSB is an authoritative and respected bureau of experts whom the public trusts when it comes to discovering the causes of aviation disasters. The CSB has a much shorter institutional history — it was created in 1990 — but we need to ask a parallel question here as well: Does the CSB provide a strong lever for improving safety practices in the chemical and petrochemical industries through its accident investigations; or are industry actors largely free to continue their poor management practices indefinitely, safe in the realization that large chemical accidents are rare and the costs of occasional liability judgments are manageable?

Testing the NRC

Serious nuclear accidents are rare but potentially devastating to people, land, and agriculture. (It appears that minor to moderate nuclear accidents are not nearly so rare, as James Mahaffey shows in Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima.) Three Mile Island, Chernobyl, and Fukushima are disasters that have given the public a better idea of how nuclear power reactors can go wrong, with serious and long-lasting effects. Reactors are also among the most complex industrial systems around, and accidents are common in complex, tightly coupled industrial systems. So how can we have reasonable confidence in the safety of nuclear reactors?

One possible answer is that we cannot have reasonable confidence at all. However, there are hundreds of large nuclear reactors in the world, and 98 active nuclear reactors in the United States alone. So it is critical to have highly effective safety regulation and oversight of the nuclear power industry. In the United States that regulatory authority rests with the Nuclear Regulatory Commission. So we need to ask the question: how good is the NRC at regulating, inspecting, and overseeing the safety of nuclear reactors in our country?

One would suppose that there would be excellent and detailed studies within the public administration literature that attempt to answer this question, and we might expect that researchers within the field of science and technology studies might have addressed it as well. However, this seems not to be the case. I have yet to find a full-length study of the NRC as a regulatory agency, and the NRC is mentioned only twice in the 600-plus page Oxford Handbook of Regulation. However, we can get an oblique view of the workings of the NRC through other sources. One set of observers who are in a position to evaluate the strengths and weaknesses of the NRC are nuclear experts who are independent of the nuclear industry. For example, publications from the Bulletin of the Atomic Scientists include many detailed reports on the operations and malfunctions of nuclear power plants that permit a degree of assessment of the quality of oversight provided by the NRC (link). And a detailed (and scathing) report by the General Accounting Office on the near-disaster at the Davis-Besse nuclear power plant is another expert assessment of NRC functioning (link).

David Lochbaum, Edwin Lyman, and Susan Stranahan fit the description of highly qualified independent scientists and observers, and their detailed case history of the Fukushima disaster provides a degree of insight into the workings of the NRC as well as the Japanese nuclear safety agency. Their book, Fukushima: The Story of a Nuclear Disaster, is jointly written by the authors under the auspices of the Union of Concerned Scientists, one of the best informed networks of nuclear experts we have in the United States. Lochbaum is director of the UCS Nuclear Safety Project and author of Nuclear Waste Disposal Crisis. The book provides a careful and scientific treatment of the unfolding of the Fukushima disaster hour by hour, and highlights the background errors that were made by regulators and owners in the design and operation of the Fukushima plant as well. The book makes numerous comparisons to the current workings of the NRC which permit a degree of assessment of the US regulatory agency.

In brief, Lochbaum and his co-authors appear to have a reasonably high opinion of the technical staff, scientists, and advisors who prepare recommendations for NRC consideration, but a low opinion of the willingness of the five commissioners to adopt costly recommendations that are strongly opposed by the nuclear industry. The authors express frustration that the nuclear safety agencies in both countries appear to have failed to have learned important lessons from the Fukushima disaster:

“The [Japanese] government simply seems in denial about the very real potential for another catastrophic accident…. In the United States, the NRC has also continued operating in denial mode. It turned down a petition requesting that it expand emergency evacuation planning to twenty-five miles from nuclear reactors despite the evidence at Fukushima that dangerous levels of radiation can extend at least that far if a meltdown occurs. It decided to do nothing about the risk of fire at over-stuffed spent fuel pools. And it rejected the main recommendation of its own Near-Term Task Force to revise its regulatory framework. The NRC and the industry instead are relying on the flawed FLEX program as a panacea for any and all safety vulnerabilities that go beyond the “design basis.” (kl 117)

They believe that the NRC is excessively vulnerable to influence by the nuclear power industry and to elected officials who favor economic growth over hypothetical safety concerns, with the result that it tends to err in favor of the economic interests of the industry.

Like many regulatory agencies, the NRC occupies uneasy ground between the need to guard public safety and the pressure from the industry it regulates to get off its back. When push comes to shove in that balancing act, the nuclear industry knows it can count on a sympathetic hearing in Congress; with millions of customers, the nation’s nuclear utilities are an influential lobbying group. (36)

They note that the NRC has consistently declined to undertake more substantial reform of its approach to safety, as recommended by its own panel of experts. The key recommendation of the Near-Term Task Force (NTTF) was that the regulatory framework should be anchored in a more strenuous standard of accident prevention, requiring plant owners to address “beyond-design-basis accidents”. The Fukushima earthquake and tsunami events were “beyond-design-basis”; nonetheless, they occurred, and the NTTF recommended that safety planning should incorporate consideration of these unlikely but possible events.

The task force members believed that once the first proposal was implemented, establishing a well-defined framework for decision making, their other recommendations would fall neatly into place. Absent that implementation, each recommendation would become bogged down as equipment quality specifications, maintenance requirements, and training protocols got hashed out on a case-by-case basis. But when the majority of the commissioners directed the staff in 2011 to postpone addressing the first recommendation and focus on the remaining recommendations, the game was lost even before the opening kickoff. The NTTF’s Recommendation 1 was akin to the severe accident rulemaking effort scuttled nearly three decades earlier, when the NRC considered expanding the scope of its regulations to address beyond-design accidents. Then, as now, the perceived need for regulatory “discipline,” as well as industry opposition to an expansion of the NRC’s enforcement powers, limited the scope of reform. The commission seemed to be ignoring a major lesson of Fukushima Daiichi: namely, that the “fighting the last war” approach taken after Three Mile Island was simply not good enough. (kl 253)

As a result, “regulatory discipline” (essentially the pro-business ideology that holds that regulation should be kept to a minimum) prevailed, and the primary recommendation was tabled. The issue was of great importance, in that it involved setting the standard of risk and accident severity for which the owner needed to plan. By staying with the lower standard, the NRC left the door open to the most severe kinds of accidents.

The NTTF task force also addressed the issue of “delegated regulation” (in which the agency defers to the industry in many issues of certification and risk assessment) (Here is the FAA’s definition of delegated regulation; link.)

The task force also wanted the NRC to reduce its reliance on industry voluntary initiatives, which were largely outside of regulatory control, and instead develop its own “strong program for dealing with the unexpected, including severe accidents.” (252)

Other more detail-oriented recommendations were refused as well — for example, a requirement to install reliable hardened containment vents in boiling water reactors, with a requirement that these vents should incorporate filters to remove radioactive gas before venting. 

But what might seem a simple, logical decision—install a $15 million filter to reduce the chance of tens of billions of dollars’ worth of land contamination as well as harm to the public—got complicated. The nuclear industry launched a campaign to persuade the NRC commissioners that filters weren’t necessary. A key part of the industry’s argument was that plant owners could reduce radioactive releases more effectively by using FLEX equipment…. In March 2013, they voted 3–2 to delay a requirement that filters be installed, and recommended that the staff consider other alternatives to prevent the release of radiation during an accident. (254)

The NRC voted against including the requirement of filters on containment vents, a decision that was based on industry arguments that the cost of the filters was excessive and unnecessary.

The authors argue that the NRC needs to significantly rethink its standards of safety and foreseeable risk.

What is needed is a new, commonsense approach to safety, one that realistically weighs risks and counterbalances them with proven, not theoretical, safety requirements. The NRC must protect against severe accidents, not merely pretend they cannot occur. (257)

Their recommendation is to make use of an existing and rigorous plan for reactor safety incorporating the results of “severe accident mitigation alternatives” (SAMA) analysis already performed — but largely disregarded.

However, they are not optimistic that the NRC will be willing to undertake these substantial changes that would significantly enhance safety and make a Fukushima-scale disaster less likely. Reporting on a post-Fukushima conference sponsored by the NRC, they write:

But by now it was apparent that little sentiment existed within the NRC for major changes, including those urged by the commission’s own Near-Term Task Force to expand the realm of “adequate protection.”

Lochbaum and his co-authors also make an intriguing series of points about the use of modeling and simulation in the effort to evaluate safety in nuclear plants. They agree that simulation methods are an essential part of the toolkit for nuclear engineers seeking to evaluate accident scenarios; but they argue that the simulation tools currently available (or perhaps ever available) fall far short of the precision sometimes attributed to them. So simulation tools sometimes give a false sense of confidence in the existing safety arrangements in a particular setting.

Even so, the computer simulations could not reproduce numerous important aspects of the accidents. And in many cases, different computer codes gave different results. Sometimes the same code gave different results depending on who was using it. The inability of these state-of-the-art modeling codes to explain even some of the basic elements of the accident revealed their inherent weaknesses—and the hazards of putting too much faith in them. (263)

In addition to specific observations about the functioning of the NRC the authors identify chronic failures in the nuclear power system in Japan that should be of concern in the United States as well. Conflict of interest, falsification of records, and punishment of whistleblowers were part of the culture of nuclear power and nuclear regulation in Japan. And these problems can arise in the United States as well. Here are examples of the problems they identify in the Japanese nuclear power system; it is a valuable exercise to attempt to determine whether these issues arise in the US regulatory environment as well.

Non-compliance and falsification of records in Japan

Headlines scattered over the decades built a disturbing picture. Reactor owners falsified reports. Regulators failed to scrutinize safety claims. Nuclear boosters dominated safety panels. Rules were buried for years in endless committee reviews. “Independent” experts were financially beholden to the nuclear industry for jobs or research funding. “Public” meetings were padded with industry shills posing as ordinary citizens. Between 2005 and 2009, as local officials sponsored a series of meetings to gauge constituents’ views on nuclear power development in their communities, NISA encouraged the operators of five nuclear plants to send employees to the sessions, posing as members of the public, to sing the praises of nuclear technology. (46)

The authors do not provide evidence about similar practices in the United States, though the history of the Davis-Besse nuclear plant in Ohio suggests that similar things happen in the US industry. Charles Perrow treats the Davis-Besse near-disaster in a fair amount of detail; link. Descriptions of the Davis-Besse nuclear incident can be found herehere, here, and here.

Conflict of interest

Shortly after the Fukushima accident, Japan’s Yomiuri Shimbun reported that thirteen former officials of government agencies that regulate energy companies were currently working for TEPCO or other power firms. Another practice, known as amaagari, “ascent to heaven,” spins the revolving door in the opposite direction. Here, the nuclear industry sends retired nuclear utility officials to government agencies overseeing the nuclear industry. Again, ferreting out safety problems is not a high priority.

Punishment of whistle-blowers

In 2000, Kei Sugaoka, a nuclear inspector working for GE at Fukushima Daiichi, noticed a crack in a reactor’s steam dryer, which extracts excess moisture to prevent harm to the turbine. TEPCO directed Sugaoka to cover up the evidence. Eventually, Sugaoka notified government regulators of the problem. They ordered TEPCO to handle the matter on its own. Sugaoka was fired. (47)

There is a similar story in the Davis-Besse plant history.

Factors that interfere with effective regulation

In summary: there appear to be several structural factors that make nuclear regulation less effective than it needs to be.

First is the fact of the political power and influence of the nuclear industry itself. This was a major factor in the background of the Chernobyl disaster as well, where generals and party officials pushed incessantly for rapid completion of reactors; Serhii Plokhy, Chernobyl: The History of a Nuclear Catastrophe. Lochbaum and his collaborators demonstrate the power that TEPCO had in shaping the regulations under which it built the Fukushima complex, including the assumptions that were incorporated about earthquake risk and tsunami risk. Charles Perrow demonstrates a comparable ability by the nuclear industry in the United States to influence the rules and procedures that govern their use of nuclear power as well (link). This influence permits the owners of nuclear power plants to influence the content of regulation as well as the systems of inspection and oversight that the agency adopts.

A related factor is the set of influences and lobbying points that come from the needs of the economy and the production pressures of the energy industry. (Interestingly enough, this was also a major influence on Soviet decision-making in choosing the graphite-moderated light water reactor for use at Chernobyl and numerous other plants in the 1960s; Serhii Plokhy, Chernobyl: The History of a Nuclear Catastrophe.)

Third is the fact emphasized by Charles Perrow that the NRC is primarily governed by Congress, and legislators are themselves vulnerable to the pressures and blandishments of the industry and demands for a low-regulation business environment. This makes it difficult for the NRC to carry out its role as independent guarantor of the health and safety of the public. Here is Perrow’s description of the problem in The Next Catastrophe: Reducing Our Vulnerabilities to Natural, Industrial, and Terrorist Disasters (quoting Lochbaum from a 2004 Union of Concerned Scientists report):

With utilities profits falling when the NRC got tough after the Time story, the industry not only argued that excessive regulation was the problem, it did something about what it perceived as harassment. The industry used the Senate subcommittee that controls the agency’s budget, headed by a pro-nuclear Republican senator from New Mexico, Pete Domenici. Using the committee’s funds, he commissioned a special study by a consulting group that was used by the nuclear industry. It recommended cutting back on the agency’s budget and size. Using the consultant’s report, Domenici “declared that the NRC could get by just fine with a $90 million budget cut, 700 fewer employees, and a greatly reduced inspection effort.” (italics supplied) The beefed-up inspections ended soon after the threat of budget cuts for the agency. (Mangels 2003) And the possibility for public comment was also curtailed, just for good measure. Public participation in safety issues once was responsible for several important changes in NRC regulations, says David Lochbaum, a nuclear safety engineer with the Union of Concerned Scientists, but in 2004, the NRC, bowed to industry pressure and virtually eliminated public participation. (Lochbaum 2004) As Lochbaum told reporter Mangels, “The NRC is as good a regulator as Congress permits it to be. Right now, Congress doesn’t want a good regulator.”  (The Next Catastrophe, kl 2799)

A fourth important factor is a pervasive complacency within the professional nuclear community about the inherent safety of nuclear power. This is a factor mentioned by Lochbaum:

Although the accident involved a failure of technology, even more worrisome was the role of the worldwide nuclear establishment: the close-knit culture that has championed nuclear energy—politically, economically, socially—while refusing to acknowledge and reduce the risks that accompany its operation. Time and again, warning signs were ignored and near misses with calamity written off. (kl 87)

This is what we might call an ideological or cultural factor, in that it describes a mental framework for thinking about the technology and the public. It is very real factor in decision-making, both within the industry and in the regulatory world. Senior nuclear engineering experts at major research universities seem to share the view that the public “fear” of nuclear power is entirely misplaced, given the safety record of the industry. They believe the technical problems of nuclear power generation have been solved, and that a rational society would embrace nuclear power without anxiety. For rebuttal to this complacency, see Rose and Sweeting’s report in the Bulletin of the Atomic Scientists, “How safe is nuclear power? A statistical study suggests less than expected” (link). Here is the abstract to their paper:

After the Fukushima disaster, the authors analyzed all past core-melt accidents and estimated a failure rate of 1 per 3704 reactor years. This rate indicates that more than one such accident could occur somewhere in the world within the next decade. The authors also analyzed the role that learning from past accidents can play over time. This analysis showed few or no learning effects occurring, depending on the database used. Because the International Atomic Energy Agency (IAEA) has no publicly available list of nuclear accidents, the authors used data compiled by the Guardian newspaper and the energy researcher Benjamin Sovacool. The results suggest that there are likely to be more severe nuclear accidents than have been expected and support Charles Perrow’s “normal accidents” theory that nuclear power reactors cannot be operated without major accidents. However, a more detailed analysis of nuclear accident probabilities needs more transparency from the IAEA. Public support for nuclear power cannot currently be based on full knowledge simply because important information is not available.

Lee Clarke’s book on planning for disaster on the basis of unrealistic models and simulations is relevant here. In Mission Improbable: Using Fantasy Documents to Tame Disaster Clarke argues that much of the planning currently in place for largescale disasters depends upon models, simulations, and scenario-building tools in which we should have very little confidence.

The complacency about nuclear safety mentioned here makes safety regulation more difficult and, paradoxically, makes the safe use of nuclear power more unlikely. Only when the risks are confronted with complete transparency and honesty will it be possible to design regulatory systems that do an acceptable job of ensuring the safety and health of the public.

In short, Lochbaum and his co-authors seem to provide evidence for the conclusion that the NRC is not in a position to perform its primary function: to establish a rational and scientifically well grounded set of standards for safe reactor design and operation. Further, its ability to enforce through inspection seems impaired as well by the power and influence the nuclear industry can deploy through Congress to resist its regulatory efforts. Good expert knowledge is canvassed through the NRC’s processes; but the policy recommendations that flow from this scientific analysis are all too often short-circuited by the ability of the industry to fend off new regulatory requirements. Lochbaum’s comment quoted by Perrow above seems all too true: “The NRC is as good a regulator as Congress permits it to be. Right now, Congress doesn’t want a good regulator.” 

It is very interesting to read the transcript of a 2014 hearing of the Senate Committee on Environment and Public Works titled “NRC’S IMPLEMENTATION OF THE FUKUSHIMA NEAR-TERM TASK FORCE RECOMMENDATIONS AND OTHER ACTIONS TO ENHANCE AND MAINTAIN NUCLEAR SAFETY” (link). Senator Barbara Boxer, California Democrat and chair of the committee, opened the meeting with these words:

Although Chairman Macfarlane said, when she announced her resignation, she had assured that ‘‘the agency implemented lessons learned from the tragic accident at Fukushima.’’ She said, ‘‘the American people can be confident that such an accident will never take place here.’’

I say the reality is not a single one of the 12 key safety recommendations made by the Fukushima Near-Term Task Force has been implemented. Some reactor operators are still not in compliance with the safety requirements that were in place before the Fukushima disaster. The NRC has only completed its own action 4 of the 12 task force recommendations.

This is an alarming assessment, and one that is entirely in accord with the observations made by Lochbaum above.

Pervasive organizational and regulatory failures

It is intriguing to observe how pervasive organizational and regulatory failures are in our collective lives. Once you are sensitized to these factors, you see them everywhere. A good example is in the business section of today’s print version of the New York Times, August 1, 2019. There are at least five stories in this section that reflect the consequences of organizational and regulatory failure.

The first and most obvious story is one that has received frequent mention in Understanding Society, the Boeing 737 Max disaster. In a story titled “FAA oversight of Boeing scrutinized”, the reporters give information about a Senate hearing on FAA oversight earlier this week.  Members of the Senate Appropriations Committee questioned the process of certification of new aircraft currently in use by the FAA.

Citing the Times story, Ms. Collins raised concerns over “instances in which FAA managers appeared to be more concerned with Boeing’s production timeline, rather than the safety recommendations of its own engineers.”

Senator Jack Reed referred to the need for a culture change to rebalance the relationship between regulator and industry. Agency officials continued to defend the certification process, which delegates 96% of the work of certification to the manufacturer.

This story highlights two common sources of organizational and regulatory failure. There is first the fact of “production pressure” coming from the owner of a risky process, involving timing, supply of product, and profitability. This pressure leads the owner to push the organization hard in an effort to achieve goals — often leading to safety and design failures. The second factor identified here is the structural imbalance that exists between powerful companies running complex and costly processes, and the safety agencies tasked to oversee and regulate their behavior. The regulatory agency, in this case the FAA, is under-resourced and lacks the expert staff needed to carry out in depth a serious process of technical oversight.  The article does not identify the third factor which has been noted in prior posts on the Boeing disaster, the influence which Boeing has on legislators, government officials, and the executive branch.

 A second relevant story (on the same page as the Boeing story) refers to charges filed in Germany against the former CEO of Audi who has been charged concerning his role in the vehicle emissions scandal. This is part of the long-standing deliberate effort by Volkswagen to deceive regulators about the emissions characteristics of their diesel engine and exhaust systems. The charges against the Audi executive involved ordering the development of software designed to cheat diesel emissions testing for their vehicles. This ongoing story is primarily a story about corporate dysfunction, in which corporate leaders were involved in unethical and dishonest activities on behalf of the company. Regulatory failure is not a prominent part of this story, because the efforts at deception were so carefully calculated that it is difficult to see how normal standards of regulatory testing could have defeated them. Here the pressing problem is to understand how professional, experienced executives could have been led to undertake such actions, and how the corporation was vulnerable to this kind of improper behavior at multiple levels within the corporation. Presumably there were staff at multiple levels within these automobile companies who were aware of improper behavior. The story quotes a mid-level staff person who writes in an email that “we won’t make it without a few dirty tricks.” So the difficult question for these corporations is how their internal systems were inadequate to take note of dangerously improper behavior. The costs to Volkswagen and Audi in liability judgments and government penalties are truly vast, and surely outweigh the possible gains of the deception. These costs in the United States alone exceed $22 billion.

A similar story, this time from the tech industry, concerns a settlement of civil claims against Cisco Systems to settle claims “that it sold video surveillance technology that it knew had a significant security flaw to federal, state and local government agencies.” Here again we find a case of corporate dishonesty concerning some of its central products, leading to a public finding of malfeasance. The hard question is, what systems are in place for companies like Cisco that ensure ethical and honest presentation of the characteristics and potential defects of the products that they sell? The imperatives of working always to maximize profits and reduce costs lead to many kinds of dysfunctions within organizations, but this is a well understood hazard. So profit-based companies need to have active and effective programs in place that encourage and enforce honest and safe practices by managers, executives, and frontline workers. Plainly those programs broke down at Cisco, Volkswagen, and Audi. (One of the very useful features of Tom Beauchamp’s book Case Studies in Business, Society, and Ethics is the light Beauchamp sheds through case studies on the genesis of unethical and dishonest behavior within a corporate setting.)

Now we go on to Christopher Flavelle’s story about home-building in flood zones. From a social point of view, it makes no sense to continue to build homes, hotels, and resorts in flood zones. The increasing destruction of violent storms and extreme weather events has been evident at least since the devastation of Hurricane Katrina. Flavelle writes:

There is overwhelming scientific consensus that rising temperatures will increase the frequency and severity of coastal flooding caused by hurricanes, storm surges, heavy rain and tidal floods. At the same time there is the long-term threat of rising seas pushing the high-tide line inexorably inland.

However, Flavelle reports research by Climate Central that shows that the rate of home-building in flood zones since 2010 exceeds the rate of home-building in non-flood zones in eight states. So what are the institutional and behavioral factors that produce this amazingly perverse outcome? The article refers to incentives of local municipalities in generating property-tax revenues and of potential homeowners subject to urban sprawl and desires for second-home properties on the water. Here is a tragically short-sighted development official in Galveston who finds that “the city has been able to deal with the encroaching water, through the installation of pumps and other infrastructure upgrades”: “You can build around it, at least for the circumstances today. It’s really not affected the vitality of things here on the island at all.” The factor that is not emphasized in this article is the role played by the National Flood Insurance Program in the problem of coastal (and riverine) development. If flood insurance rates were calculated in terms of the true riskiness of the proposed residence, hotel, or resort, then it would no longer be economically attractive to do the development. But, as the article makes clear, local officials do not like that answer because it interferes with “development” and property tax growth. ProPublica has an excellent 2013 story on the perverse incentives created by the National Flood Insurance Program, and its inequitable impact on wealthier home-owners and developers (link). Here is an article by Christine Klein and Sandra Zellmer in the SMU Law Review on the dysfunctions of Federal flood policy (link):

Taken together, the stories reveal important lessons, including the inadequacy of engineered flood control structures such as levees and dams, the perverse incentives created by the national flood insurance program, and the need to reform federal leadership over flood hazard control, particularly as delegated to the Army Corps of Engineers.

Here is a final story from the business section of the New York Times illustrating organizational and regulatory dysfunctions — this time from the interface between the health industry and big tech. The story here is an effort that is being made by DeepMind researchers to use artificial intelligence techniques to provide early diagnosis of otherwise mysterious medical conditions like “acute kidney injury” (AKI). The approach proceeds by analyzing large numbers of patient medical records and attempting to identify precursor conditions that would predict the occurrence of AKI. The primary analytical tool mentioned in the article is the set of algorithms associated with neural networks. In this instance the organizational / regulatory dysfunction is latent rather than explicit and has to do with patient privacy. DeepMind is a business unit within the Google empire of businesses, Alphabet. DeepMind researchers gained access to large volumes of patient data from the UK National Health Service. There is now regulatory concern in the UK and the US concerning the privacy of patients whose data may wind up in the DeepMind analysis and ultimately in Google’s direct control. “Some critics question whether corporate labs like DeepMind are the right organization to handle the development of technology with such broad implications for the public.” Here the issue is a complicated one. It is of course a good thing to be able to diagnose disorders like AKI in time to be able to correct them. But the misuse and careless custody of user data by numerous big tech companies, including especially Facebook, suggests that sensitive personal data like medical files need to be carefully secured by effective legislation and regulation. And so far the regulatory system appears to be inadequate for the protection of individual privacy in a world of massive databases and largescale computing capabilities. The recent FTC $5 billion settlement imposed on Facebook, large as it is, may not suffice to change the business practices of Facebook (link).

(I didn’t find anything in the sports section today that illustrates organizational and regulatory dysfunction, but of course these kinds of failures occur in professional and college sports as well. Think of doping scandals in baseball, cycling, and track and field, sexual abuse scandals in gymnastics and swimming, and efforts by top college football programs to evade NCAA regulations on practice time and academic performance.)

Soviet nuclear disasters: Kyshtym

The 1986 meltdown of reactor number 4 at the Chernobyl Nuclear Power Plant was the greatest nuclear disaster the world has yet seen. Less well known is the Kyshtym disaster in 1957, which resulted in a massive release of radioactive material in the Eastern Ural region of the Soviet Union. This was a catastrophic underground explosion at a nuclear storage facility near the Mayak power plant in the Eastern Ural region of the USSR. Information about the disaster was tightly restricted by Soviet authorities, with predictably bad consequences.

Zhores Medvedev was one of the first qualified scientists to provide information and hypotheses about the Kyshtym disaster. His book Nuclear Disaster in the Urals was written while he was in exile in Great Britain and appeared in 1980. It is fascinating to learn that his reasoning is based on his study of ecological, biological, and environmental research done by Soviet scientists between 1957 and 1980. Medvedev was able to piece together the extent of contamination and the general nature of the cause of the event from basic information about radioactive contamination in lakes and streams in the region included incidentally in scientific reports from the period.

It is very interesting to find that scientists in the United States were surprisingly skeptical about Medvedev’s assertions. W. Stratton et al published a review analysis in Science in 1979 (link) that found Medvedev’s reasoning unpersuasive.

A steam explosion of one tank is not inconceivable but is most improbable, because the heat generation rate from a given amount of fission products is known precisely and is predictable. Means to dissipate this heat would be a part of the design and could be made highly reliable. (423)

They offer an alternative hypothesis about any possible radioactive contamination in the Kyshtym region — the handful of multimegaton nuclear weapons tests conducted by the USSR in the Novaya Zemlya area.

We suggest that the observed data can be satisfied by postulating localized fallout (perhaps with precipitation) from explosion of a large nuclear weapon, or even from more than one explosion, because we have no limits on the length of time that fallout continued. (425)

And they consider weather patterns during the relevant time period to argue that these tests could have been the source of radiation contamination identified by Medvedev. Novaya Zemlya is over 1000 miles north of Kyshtym (20 degrees of latitude). So the fallout from the nuclear tests may be a possible alternative hypothesis, but it is farfetched. They conclude:

We can only conclude that, though a radiation release incident may well be supported by the available evidence, the magnitude of the incident may have been grossly exaggerated, the source chosen uncritically, and the dispersal mechanism ignored. Even so we find it hard to believe that an area of this magnitude could become contaminated and the event not discussed in detail or by more than one individual for more than 20 years. (425)

The heart of their skepticism depends on an entirely indefensible assumption: that Soviet science, engineering, and management were entirely capable of designing and implementing a safe system for nuclear waste storage. They were perhaps right about the scientific and engineering capabilities of the Soviet system; but the management systems in place were woefully inadequate. Their account rested on an assumption of straightforward application of engineering knowledge to the problem; but they failed to take into account the defects of organization and oversight that were rampant within Soviet industrial systems. And in the end the core of Medvedev’s claims have been validated.

Another official report was compiled by Los Alamos scientists, released in 1982, that concluded unambiguously that Medvedev was mistaken, and that the widespread ecological devastation in the region resulted from small and gradual processes of contamination rather than a massive explosion of waste materials (link). Here is the conclusion put forward by the study’s authors:

What then did happen at Kyshtym? A disastrous nuclear accident that killed hundreds, injured thousands, and contaminated thousands of square miles of land? Or, a series of relatively minor incidents, embellished by rumor, and severely compounded by a history of sloppy practices associated with the complex? The latter seems more highly probable.

So Medvedev is dismissed.

After the collapse of the USSR voluminous records about the Kyshtym disaster became available from secret Soviet files, and those records make it plain that US scientists badly misjudged the nature of the Kyshtym disaster. Medvedev was much closer to the truth than were Stratton and his colleagues or the authors of the Los Alamos report.

A scientific report based on Soviet-era documents that were released after the fall of the Soviet Union appeared in the Journal of Radiological Protection in 2017 (A V Akleyev et al 2017; link). Here is their brief description of the accident:

Starting in the earliest period of Mayak PA activities, large amounts of liquid high-level radioactive waste from the radiochemical facility were placed into long-term controlled storage in metal tanks installed in concrete vaults. Each full tank contained 70–80 tons of radioactive wastes, mainly in the form of nitrate compounds. The tanks were water-cooled and equipped with temperature and liquid-level measurement devices. In September 1957, as a result of a failure of the temperature-control system of tank #14, cooling-water delivery became insufficient and radioactive decay caused an increase in temperature followed by complete evaporation of the water, and the nitrate salt deposits were heated to 330 °C–350 °C. The thermal explosion of tank #14 occurred on 29 September 1957 at 4:20 pm local time. At the time of the explosion the activity of the wastes contained in the tank was about 740 PBq [5, 6]. About 90% of the total activity settled in the immediate vicinity of the explosion site (within distances less than 5 km), primarily in the form of coarse particles. The explosion gave rise to a radioactive plume which dispersed into the atmosphere. About 2 × 106 Ci (74PBq) was dispersed by the wind (north-northeast direction with wind velocity of 5–10 m s−1) and caused the radioactive trace along the path of the plume [5]. Table 1 presents the latest estimates of radionuclide composition of the release used for reconstruction of doses in the EURT area. The mixture corresponded to uranium fission products formed in a nuclear reactor after a decay time of about 1 year, with depletion in 137Cs due to a special treatment of the radioactive waste involving the extraction of 137Cs [6]. (R20-21)

Here is the region of radiation contamination (EURT) that Akleyev et al identify:

This region represents a large area encompassing 23,000 square kilometers (8,880 square miles). Plainly Akleyev et al describe a massive disaster including a very large explosion in an underground nuclear waste storage facility, large-scale dispersal of nuclear materials, and evacuation of population throughout a large region. This is very close to the description provided by Medvedev.

A somewhat surprising finding of the Akleyev study is that the exposed population did not show dramatically worse health outcomes and mortality relative to unexposed populations. For example, “Leukemia mortality rates over a 30-year period after the accident did not differ from those in the group of unexposed people” (R30). Their epidemiological study for cancers overall likewise indicates only a small effect of accidental radiation exposure on cancer incidence:

The attributable risk (AR) of solid cancer incidence in the EURTC, which gives the proportion of excess cancer cases out of the sum of excess and baseline cases, calculated according to the linear model, made up 1.9% over the whole follow-up period. Therefore, only 27 cancer cases out of 1426 could be associated with accidental radiation exposure of the EURT population. AR is highest in the highest dose groups (250–500 mGy and >500 mGy) and exceeds 17%.

So why did the explosion occur? James Mahaffey examines the case in detail in Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima. Here is his account:

In the crash program to produce fissile bomb material, a great deal of plutonium was wasted in the crude separation process. Production officials decided that instead of being dumped irretrievably into the river, the plutonium that had failed to precipitate out, remaining in the extraction solution, should be saved for future processing. A big underground tank farm was built in 1953 to hold processed fission waste. Round steel tanks were installed in banks of 20, sitting on one large concrete slab poured at the bottom of an excavation, 27 feet deep. Each bank was equipped with a heat exchanger, removing the heat buildup from fission-product decay using water pipes wrapped around the tanks. The tanks were then buried under a backfill of dirt. The tanks began immediately to fill with various waste solutions from the extraction plant, with no particular distinction among the vessels. The tanks contained all the undesirable fission products, including cobalt-60, strontium-90, and cesium-137, along with unseparated plutonium and uranium, with both acetate and nitrate solutions pumped into the same volume. One tank could hold probably 100 tons of waste product. 

In 1956, a cooling-water pipe broke leading to one of the tanks. It would be a lot of work to dig up the tank, find the leak, and replace the pipe, so instead of going to all that trouble, the engineers in charge just turned off the water and forgot about it. 

A year passed. Not having any coolant flow and being insulated from the harsh Siberian winter by the fill dirt, the tank retained heat from the fission-product decay. Temperature inside reached 660 ° Fahrenheit, hot enough to melt lead and cast bullets. Under this condition, the nitrate solutions degraded into ammonium nitrate, or fertilizer, mixed with acetates. The water all boiled away, and what was left was enough solidified ANFO explosive to blow up Sterling Hall several times, being heated to the detonation point and laced with dangerous nuclides. [189] 

Sometime before 11: 00 P.M. on Sunday, September 29, 1957, the bomb went off, throwing a column of black smoke and debris reaching a kilometer into the sky, accented with larger fragments burning orange-red. The 160-ton concrete lid on the tank tumbled upward into the night like a badly thrown discus, and the ground thump was felt many miles away. Residents of Chelyabinsk rushed outside and looked at the lighted display to the northwest, as 20 million curies of radioactive dust spread out over everything sticking above ground. The high-level wind that night was blowing northeast, and a radioactive plume dusted the Earth in a tight line, about 300 kilometers long. This accident had not been a runaway explosion in an overworked Soviet production reactor. It was the world’s first “dirty bomb,” a powerful chemical explosive spreading radioactive nuclides having unusually high body burdens and guaranteed to cause havoc in the biosphere. The accidentally derived explosive in the tank was the equivalent of up to 100 tons of TNT, and there were probably 70 to 80 tons of radioactive waste thrown skyward. (KL 5295)

So what were the primary organizational and social causes of this disaster? One is the haste created in nuclear design and construction created by Stalin’s insistence on moving forward the Soviet nuclear weapons program as rapidly as possible. As is evident in the Chernobyl case as well, the political pressures on engineers and managers that followed from these political priorities often led to disastrous decisions and actions. A second is the institutionalized system of secrecy that surrounded industry generally, the military specifically, and the nuclear industry most especially. A third is the casual attitude taken by Soviet officials towards the health and wellbeing of the population. And a final cause highlighted by Mahaffey’s account is the low level of attention given at the plant level to safety and maintenance of highly risky facilities. Stratton et al based their analysis on the fact that the heat-generating characteristics of nuclear waste were well understood and that effective means existed for controlling those risks. That may be, but what they failed to anticipate is that these risks would be fundamentally disregarded on the ground and in the supervisory system above the Kyshtym reactor complex.

(It is interesting to note that Mahaffey himself underestimates the amount of information that is now available about the effects of the disaster. He writes that “studies of the effects of this disaster are extremely difficult, as records do not exist, and previous residents are hard to track down” (kl 5330). But the Akleyev study mentioned above provides extensive health details about the affected population made possible as a result of data collected during Soviet times and concealed.)

 

Safety and accident analysis: Longford

Andrew Hopkins has written a number of fascinating case studies of industrial accidents, usually in the field of petrochemicals. These books are crucial reading for anyone interested in arriving at a better understanding of technological safety in the context of complex systems involving high-energy and tightly-coupled processes. Especially interesting is his Lessons from Longford: The ESSO Gas Plant Explosion. The Longford refining plant suffered an explosion and fire in 1998 that killed two workers, badly injured others, and interrupted the supply of natural gas to the state of Victoria for two weeks. Hopkins is a sociologist, but has developed substantial expertise in the technical details of petrochemical refining plants. He served as an expert witness in the Royal Commission hearings that investigated the accident. The accounts he offers of these disasters are genuinely fascinating to read.

Hopkins makes the now-familiar point that companies often seek to lay responsibility for a major industrial accident on operator error or malfeasance. This was Esso’s defense concerning its corporate liability in the Longford disaster. But, as Hopkins points out, the larger causes of failure go far beyond the individual operators whose decisions and actions were proximate to the event. Training, operating plans, hazard analysis, availability of appropriate onsite technical expertise — these are all the responsibility of the owners and managers of the enterprise. And regulation and oversight of safety practices are the responsibility of stage agencies. So it is critical to examine the operations of a complex and dangerous technology system at all these levels.

A crucial part of management’s responsibility is to engage in formal “hazard and operability” (HAZOP) analysis. “A HAZOP involves systematically imagining everything that might go wrong in a processing plant and developing procedures or engineering solutions to avoid these potential problems” (26). This kind of analysis is especially critical in high-risk industries including chemical plants, petrochemical refineries, and nuclear reactors. It emerged during the Longford accident investigation that HAZOP analyses had been conducted for some aspects of risk but not for all — even in areas where the parent company Exxon was itself already fully engaged in analysis of those risky scenarios. The risk of embrittlement of processing equipment when exposed to super-chilled conditions was one that Exxon had already drawn attention to at the corporate level because of prior incidents.

A factor that Hopkins judges to be crucial to the occurrence of the Longford Esso disaster is the decision made by management to remove engineering staff from the plant to a central location where they could serve a larger number of facilities “more efficiently”.

A second relevant change was the relocation to Melbourne in 1992 of all the engineering staff who had previously worked at Longford, leaving the Longford operators without the engineering backup to which they were accustomed. Following their removal from Longford, engineers were expected to monitor the plant from a distance and operators were expected to telephone the engineers when they felt a need to. Perhaps predictably, these arrangements did not work effectively, and I shall argue in the next chapter that the absence of engineering expertise had certain long-term consequences which contributed to the accident. (34)

One result of this decision is the fact that when the Longford incident began there were no engineering experts on site who could correctly identify the risks created by the incident. Technicians therefore restarted the process by reintroducing warm oil into the super-chilled heat exchanger. The metal had become brittle as a result of the extremely low temperatures and cracked, leading to the release of fuel and subsequent explosion and fire. As Hopkins points out, Exxon experts had long been aware of the hazards of embrittlement. However, it appears that the operating procedures developed by Esso at Longford ignored this risk, and operators and supervisors lacked the technical/scientific knowledge to recognize the hazard when it arose.

The topic of “tight coupling” (the tight interconnection across different parts of a complex technological system) comes up frequently in discussions of technology accidents. Hopkins shows that the Longford case gives a new spin to this idea. In the case of the explosion and fire at Longford it turned out to be very important that plant 1 was interconnected by numerous plumbing connections to plants 2 and 3. This meant that fuel from plants 2 and 3 continued to flow into plant 1 and greatly extended the length of time it took to extinguish the fire. Plant 1 had to be fully isolated from plants 2 and 3 before the fire could be extinguished (or plants 2 and 3 could be restarted), and there were enough plumbing connections among them, poorly understood at the time of the fire, that took a great deal of time to disconnect (32).

Hopkins addresses the issue of government regulation of high-risk industries in connection with the Longford disaster. Written in 1999 or so, he recognizes the trend towards “self-regulation” in place of government rules stipulating the operating of various industries. He contrasts this approach with deregulation — the effort to allow the issue of safe operation to be governed by the market rather than by law.

Whereas the old-style legislation required employers to comply with precise, often quite technical rules, the new style imposes an overarching requirement on employers that they provide a safe and healthy workplace for their employees, as far as practicable. (92)

He notes that this approach does not necessarily reduce the need for government inspections; but the goal of regulatory inspection will be different. Inspectors will seek to satisfy themselves that the industry has done a responsible job of identify hazards and planning accordingly, rather than looking for violations of specific rules. (This parallels to some extent his discussion of two different philosophies of audit, one of which is much more conducive to increasing the systems-safety of high-risk industries; chapter 7.) But his preferred regulatory approach is what he describes as “safety case regulation”. (Hopkins provides more detail about the workings of a safety case regime in Disastrous Decisions: The Human and Organisational Causes of the Gulf of Mexico Blowout, chapter 10.)

The essence of the new approach is that the operator of a major hazard installation is required to make a case or demonstrate to the relevant authority that safety is being or will be effectively managed at the installation. Whereas under the self-regulatory approach, the facility operator is normally left to its own devices in deciding how to manage safety, under the safety case approach it must lay out its procedures for examination by the regulatory authority. (96)

The preparation of a safety case would presumably include a comprehensive HAZOP analysis, along with procedures for preventing or responding to the occurrence of possible hazards. Hopkins reports that the safety case approach to regulation is being adopted by the EU, Australia, and the UK with respect to a number of high-risk industries. This discussion is highly relevant to the current debate over aircraft manufacturing safety and the role of the FAA in overseeing manufacturers.

It is interesting to realize that Hopkins is implicitly critical of another of my favorite authors on the topic of accidents and technology safety, Charles Perrow. Perrow’s central idea of “normal accidents” brings along with it a certain pessimism about the ability to increase safety in complex industrial and technological systems; accidents are inevitable and normal (Normal Accidents: Living with High-Risk Technologies). Hopkins takes a more pragmatic approach and argues that there are engineering and management methodologies that can significantly reduce the likelihood and harm of accidents like the Esso gas plant explosion. His central point is that we don’t need to be able to anticipate a long chain of unlikely events in order to identify the hazard in which these chains may eventuate — for example, loss of coolant in a nuclear reactor or loss of warm oil in a refinery process. These final events of numerous different possible accident scenarios all require procedures in place that will guide the responses of engineers and technicians when “normal accidents” occur (33).

Hopkins highlights the challenge to safety created by the ongoing modification of a power plant or chemical plant; later modifications may create hazards not anticipated by the rigorous accident analysis performed on the original design.

Processing plants evolve and grow over time. A study of petroleum refineries in the US has shown that “the largest and most complex refineries in the sample are also the oldest … Their complexity emerged as a result of historical accretion. Processes were modified, added, linked, enhanced and replaced over a history that greatly exceeded the memories of those who worked in the refinery. (33)

This is one of the chief reasons why Perrow believes technological accidents are inevitable. However, Hopkins draws a different conclusion:

However, those who are committed to accident prevention draw a different conclusion, namely, that it is important that every time physical changes are made to plant these changes be subjected to a systematic hazard identification process. …  Esso’s own management of change philosophy recognises this. It notes that “changes potentially invalidate prior risk assessments and can create new risks, if not managed diligently.” (33)

(I believe this recommendation conforms to Nancy Leveson’s theories of system safety engineering as well; link.)

Here is the causal diagram that Hopkins offers for the occurrence of the explosion at Longford (122).

The lowest level of the diagram represents the sequence of physical events and operator actions leading to the explosion, fatalities, and loss of gas supply. The next level represents the organizational factors identified in Longford’s analysis of the event and its background. Central among these factors are the decision to withdraw engineers from the plant; a safety philosophy that focused on lost-time injuries rather than system hazards and processes; failures in the incident reporting system; failure to perform a HAZOP for plant 1; poor maintenance practices; inadequate audit practices; inadequate training for operators and supervisors; and a failure to identify the hazard created by interconnections with plants 2 and 3. The next level identifies the causes of the management failures — Esso’s overriding focus on cost-cutting and a failure by Exxon as the parent company to adequately oversee safety planning and share information from accidents at other plants. The final two levels of causation concern governmental and societal factors that contributed to the corporate behavior leading to the accident.

(Here is a list of major industrial disasters; link.)

Herbert Simon’s theories of organizations

Image: detail from Family Portrait 2 1965 
(Creative Commons license, Richard Rappaport)
 

Herbert Simon made paradigm-changing contributions to the theory of rational behavior, including particularly his treatment of “satisficing” as an alternative to “maximizing” economic rationality (link). It is therefore worthwhile examining his views of organizations and organizational decision-making and action — especially given how relevant those theories are to my current research interest in organizational dysfunction. His highly successful book Administrative Behavior went through four editions between 1947 and 1997 — more than fifty years of thinking about organizations and organizational behavior. The more recent editions consist of the original text and “commentary” chapters that Simon wrote to incorporate more recent thinking about the content of each of the chapters.

Here I will pull out some of the highlights of Simon’s approach to organizations. There are many features of his analysis of organizational behavior that are worth noting. But my summary assessment is that the book is surprisingly positive about the rationality of organizations and the processes through which they collect information and reach decisions. In the contemporary environment where we have all too many examples of organizational failure in decision-making — from Boeing to Purdue Pharma to the Federal Emergency Management Agency — this confidence seems to be fundamentally misplaced. The theorist who invented the idea of imperfect rationality and satisficing at the individual level perhaps should have offered a somewhat more critical analysis of organizational thinking.

The first thing that the reader will observe is that Simon thinks about organizations as systems of decision-making and execution. His working definition of organization highlights this view:

In this book, the term organization refers to the pattern of communications and relations among a group of human beings, including the processes for making and implementing decisions. This pattern provides to organization members much of the information and many of the assumptions, goals, and attitudes that enter into their decisions, and provides also a set of stable and comprehensible expectations as to what the other members of the group are doing and how they will react to what one says and does. (18-19).

What is a scientifically relevant description of an organization? It is a description that, so far as possible, designates for each person in the organization what decisions that person makes, and the influences to which he is subject in making each of these decisions. (43)

The central theme around which the analysis has been developed is that organization behavior is a complex network of decisional processes, all pointed toward their influence upon the behaviors of the operatives — those who do the action ‘physical’ work of the organization. (305)

The task of decision-making breaks down into the assimilation of relevant facts and values — a distinction that Simon attributes to logical positivism in the original text but makes more general in the commentary. Answering the question, “what should we do?”, requires a clear answer to two kinds of questions: what values are we attempting to achieve? And how does the world work such that interventions will bring about those values?

It is refreshing to see Simon’s skepticism about the “rules of administration” that various generations of organizational theorists have advanced — “specialization,” “unity of command,” “span of control,” and so forth. Simon describes these as proverbs rather than as useful empirical discoveries about effective administration. And he finds the idea of “schools of management theory” to be entirely unhelpful (26). Likewise, he is entirely skeptical about the value of the economic theory of the firm, which abstracts from all of the arrangements among participants that are crucial to the internal processes of the organization in Simon’s view. He recommends an approach to the study of organizations (and the design of organizations) that focuses on the specific arrangements needed to bring factual and value claims into a process of deliberation leading to decision — incorporating the kinds of specialization and control that make sense for a particular set of business and organizational tasks.

An organization has only two fundamental tasks: decision-making and “making things happen”. The decision-making process involves intelligently gathering facts and values and designing a plan. Simon generally approaches this process as a reasonably rational one. He identifies three kinds of limits on rational decision-making:

  • The individual is limited by those skills, habits, and reflexes which are no longer in the realm of the conscious…
  • The individual is limited by his values and those conceptions of purpose which influence him in making his decision…
  • The individual is limited by the extent of his knowledge of things relevant to his job. (46)

And he explicitly regards these points as being part of a theory of administrative rationality:

Perhaps this triangle of limits does not completely bound the area of rationality, and other sides need to be added to the figure. In any case, the enumeration will serve to indicate the kinds of considerations that must go into the construction of valid and noncontradictory principles of administration. (47)

The “making it happen” part is more complicated. This has to do with the problem the executive faces of bringing about the efficient, effective, and loyal performance of assigned tasks by operatives. Simon’s theory essentially comes down to training, loyalty, and authority.

If this is a correct description of the administrative process, then the construction of an efficient administrative organization is a problem in social psychology. It is a task of setting up an operative staff and superimposing on that staff a supervisory staff capable of influencing the operative group toward a pattern of coordinated and effective behavior. (2)

To understand how the behavior of the individual becomes a part of the system of behavior of the organization, it is necessary to study the relation between the personal motivation of the individual and the objectives toward which the activity of the organization is oriented. (13-14) 

Simon refers to three kinds of influence that executives and supervisors can have over “operatives”: formal authority (enforced by the power to hire and fire), organizational loyalty (cultivated through specific means within the organization), and training. Simon holds that a crucial role of administrative leadership is the task of motivating the employees of the organization to carry out the plan efficiently and effectively.

Later he refers to five “mechanisms of organization influence” (112): specialization and division of task; the creation of standard practices; transmission of decisions downwards through authority and influence; channels of communication in all directions; and training and indoctrination. Through these mechanisms the executive seeks to ensure a high level of conformance and efficient performance of tasks.

What about the actors within an organization? How do they behave as individual actors? Simon treats them as “boundedly rational”:

To anyone who has observed organizations, it seems obvious enough that human behavior in them is, if not wholly rational, at least in good part intendedly so. Much behavior in organizations is, or seems to be, task-oriented–and often efficacious in attaining its goals. (88)

But this description leaves out altogether the possibility and likelihood of mixed motives, conflicts of interest, and intra-organizational disagreement. When Simon considers the fact of multiple agents within an organization, he acknowledges that this poses a challenge for rationalistic organizational theory:

Complications are introduced into the picture if more than one individual is involved, for in this case the decisions of the other individuals will be included among the conditions which each individual must consider in reaching his decisions. (80)

This acknowledges the essential feature of organizations — the multiplicity of actors — but fails to treat it with the seriousness it demands. He attempts to resolve the issue by invoking cooperation and the language of strategic rationality: “administrative organizations are systems of cooperative behavior. The members of the organization are expected to orient their behavior with respect to certain goals that are taken as ‘organization objectives'” (81). But this simply presupposes the result we might want to occur, without providing a basis for expecting it to take place.

With the hindsight of half a century, I am inclined to think that Simon attributes too much rationality and hierarchical purpose to organizations.

The rational administrator is concerned with the selection of these effective means. For the construction of an administrative theory it is necessary to examine further the notion of rationality and, in particular, to achieve perfect clarity as to what is meant by “the selection of effective means.” (72)  

These sentences, and many others like them, present the task as one of defining the conditions of rationality of an organization or firm; this takes for granted the notion that the relations of communication, planning, and authority can result in a coherent implementation of a plan of action. His model of an organization involves high-level executives who pull together factual information (making use of specialized experts in this task) and integrating the purposes and goals of the organization (profits, maintaining the health and safety of the public, reducing poverty) into an actionable set of plans to be implemented by subordinates. He refers to a “hierarchy of decisions,” in which higher-level goals are broken down into intermediate-level goals and tasks, with a coherent relationship between intermediate and higher-level goals. “Behavior is purposive in so far as it is guided by general goals or objectives; it is rational in so far as it selects alternatives which are conducive to the achievement of the previously selected goals” (4).  And the suggestion is that a well-designed organization succeeds in establishing this kind of coherence of decision and action.

 

It is true that he also asserts that decisions are “composite” —

It should be perfectly apparent that almost no decision made in an organization is the task of a single individual. Even though the final responsibility for taking a particular action rests with some definite person, we shall always find, in studying the manner in which this decision was reached, that its various components can be traced through the formal and informal channels of communication to many individuals … (305)

But even here he fails to consider the possibility that this compositional process may involve systematic dysfunctions that require study. Rather, he seems to presuppose that this composite process itself proceeds logically and coherently. In commenting on a case study by Oswyn Murray (1923) on the design of a post-WWI battleship, he writes: “The point which is so clearly illustrated here is that the planning procedure permits expertise of every kind to be drawn into the decision without any difficulties being imposed by the lines of authority in the organization” (314). This conclusion is strikingly at odds with most accounts of science-military relations during World War II in Britain — for example, the pernicious interference of Frederick Alexander Lindemann with Patrick Blackett over Blackett’s struggles to create an operations-research basis for anti-submarine warfare (Blackett’s War: The Men Who Defeated the Nazi U-Boats and Brought Science to the Art of Warfare). His comments about the processes of review that can be implemented within organizations (314 ff.) are similarly excessively optimistic — contrary to the literature on principal-agent problems in many areas of complex collaboration.

This is surprising, given Simon’s contributions to the theory of imperfect rationality in the case of individual decision-making. Against this confidence, the sources of organizational dysfunction that are now apparent in several literatures on organization make it more difficult to imagine that organizations can have a high success rate in rational decision-making. If we were seeking for a Simon-like phrase for organizational thinking to parallel the idea of satisficing, we might come up with the notion of bounded localistic organizational rationality”: “locally rational, frequently influenced by extraneous forces, incomplete information, incomplete communication across divisions, rarely coherent over the whole organization”.

Simon makes the point emphatically in the opening chapters of the book that administrative science is an incremental and evolving field. And in fact, it seems apparent that his own thinking continued to evolve. There are occasional threads of argument in Simon’s work that seem to point towards a more contingent view of organizational behavior and rationality, along the lines of Fligstein and McAdam’s theories of strategic action fields. For example, when discussing organizational loyalty Simon raises the kind of issue that is central to the strategic action field model of organizations: the conflicts of interest that can arise across units (11). And in the commentary on Chapter I he points forward to the theories of strategic action fields and complex adaptive systems:

The concepts of systems, multiple constituencies, power and politics, and organization culture all flow quite naturally from the concept of organizations as complex interactive structures held together by a balance of the inducements provided to various groups of participants and the contributions received from them. (27)

The book has been a foundational contribution to organizational studies. At the same time, if Herbert Simon were at the beginning of his career and were beginning his study of organizational decision-making today, I suspect he might have taken a different tack. He was plainly committed to empirical study of existing organizations and the mechanisms through which they worked. And he was receptive to the ideas surrounding the notion of imperfect rationality. The current literature on the sources of contention and dysfunction within organizations (Perrow, Fligstein, McAdam, Crozier, …) might well have led him to write a different book altogether, one that gave more attention to the sources of failures of rational decision-making and implementation alongside the occasional examples of organizations that seem to work at a very high level of rationality and effectiveness.

The 737 MAX disaster as an organizational failure

The topic of the organizational causes of technology failure comes up frequently in Understanding Society. The tragic crashes of two Boeing 737 MAX aircraft in the past year present an important case to study. Is this an instance of pilot error (as has occasionally been suggested)? Is it a case of engineering and design failures? Or are there important corporate and regulatory failures that created the environment in which the accidents occurred, as the public record seems to suggest?

The formal accident investigations are not yet complete, and the FAA and other air safety agencies around the world have not yet approved the aircraft for flight following the suspension of certification following the second crash. There will certainly be a detailed and expert case study of this case at some point in the future, and I will be eager to read the resulting book. In the meantime, though, it is  useful to bring the perspectives of Charles Perrow, Diane Vaughan, and Andrew Hopkins to bear on what we can learn about this case from the public media sources that are available. The preliminary sketch of a case study offered below is a first effort and is intended simply to help us learn more about the social and organizational processes that govern the complex technologies upon which we depend. Many of the dysfunctions identified in the safety literature appear to have had a role in this disaster.

I have made every effort to offer an accurate summary based on publicly available sources, but readers should bear in mind that it is a preliminary effort.

The key conclusions I’ve been led to include these:

The updated flight control system of the aircraft (MCAS) created the conditions for crashes in rare flight conditions and instrument failures.

  • Faults in the AOA sensor and the MCAS flight control system persisted through the design process 
  • pilot training and information about changes in the flight control system were likely inadequate to permit pilots to override the control system when necessary  

There were fairly clear signs of organizational dysfunction in the development and design process for the aircraft:

  • Disempowered mid-level experts (engineers, designers, software experts)
  • Inadequate organizational embodiment of safety oversight
  • Business priorities placing cost savings, timeliness, profits over safety
  • Executives with divided incentives
  • Breakdown of internal management controls leading to faulty manufacturing processes 

Cost-containment and speed trumped safety. It is hard to avoid the conclusion that the corporation put cost-cutting and speed ahead of the professional advice and judgment of the engineers. Management pushed the design and certification process aggressively, leading to implementation of a control system that could fail in foreseeable flight conditions.

The regulatory system seems to have been at fault as well, with the FAA taking a deferential attitude towards the company’s assertions of expertise throughout the certification process. The regulatory process was “outsourced” to a company that already has inordinate political clout in Congress and the agencies.

  • Inadequate government regulation
  • FAA lacked direct expertise and oversight sufficient to detect design failures. 
  • Too much influence by the company over regulators and legislators

Here is a video presentation of the case as I currently understand it (link). 

 
See also this earlier discussion of regulatory failure in the 737 MAX case (link). Here are several experts on the topic of organizational failure whose work is especially relevant to the current case:

Organizations and dysfunction

A recurring theme in recent months in Understanding Society is organizational dysfunction and the organizational causes of technology failure. Helmut Anheier’s volume When Things Go Wrong: Organizational Failures and Breakdowns is highly relevant to this topic, and it makes for very interesting reading. The volume includes contributions by a number of leading scholars in the sociology of organizations.

And yet the volume seems to miss the mark in some important ways. For one thing, it is unduly focused on the question of “mortality” of firms and other organizations. Bankruptcy and organizational death are frequent synonyms for “failure” here. This frame is evident in the summary the introduction offers of existing approaches in the field: organizational aspects, political aspects, cognitive aspects, and structural aspects. All bring us back to the causes of extinction and bankruptcy in a business organization. Further, the approach highlights the importance of internal conflict within an organization as a source of eventual failure. But it gives no insight into the internal structure and workings of the organization itself, the ways in which behavior and internal structure function to systematically produce certain kinds of outcomes that we can identify as dysfunctional.

Significantly, however, dysfunction does not routinely lead to death of a firm. (Seibel’s contribution in the volume raises this possibility, which Seibel refers to as “successful failures“). This is a familiar observation from political science: what looks dysfunctional from the outside may be perfectly well tuned to a different set of interests (for example, in Robert Bates’s account of pricing boards in Africa in Markets and States in Tropical Africa: The Political Basis of Agricultural Policies). In their introduction to this volume Anheier and Moulton refer to this possibility as a direction for future research: “successful for whom, a failure for whom?” (14).

The volume tends to look at success and failure in terms of profitability and the satisfaction of stakeholders. But we can define dysfunction in a more granular way by linking characteristics of performance to the perceived “purposes and goals” of the organization. A regulatory agency exists in order to effectively project the health and safety of the public. In this kind of case, failure is any outcome in which the agency flagrantly and avoidably fails to prevent a serious harm — release of radioactive material, contamination of food, a building fire resulting from defects that should have been detected by inspection. If it fails to do so as well as it might then it is dysfunctional.

Why do dysfunctions persist in organizations? It is possible to identify several possible causes. The first is that a dysfunction from one point of view may well be a desirable feature from another point of view. The lack of an authoritative safety officer in a chemical plant may be thought to be dysfunctional if we are thinking about the safety of workers and the public as a primary goal of the plant (link). But if profitability and cost-savings are the primary goals from the point of view of the stakeholders, then the cost-benefit analysis may favor the lack of the safety officer.

Second, there may be internal failures within an organization that are beyond the reach of any executive or manager who might want to correct them. The complexity and loose-coupling of large organizations militate against house cleaning on a large scale.

Third, there may be powerful factions within an organization for whom the “dysfunctional” feature is an important component of their own set of purposes and goals. Fligstein and McAdam argue for this kind of disaggregation with their theory of strategic action fields (link). By disaggregating purposes and goals to the various actors who figure in the life cycle of the organization – founders, stakeholders, executives, managers, experts, frontline workers, labor organizers – it is possible to see the organization as a whole as simply the aggregation of the multiple actions and purposes of the actors within and adjacent to the organization. This aggregation does not imply that the organization is carefully adjusted to serve the public good or to maximize efficiency or to protect the health and safety of the public. Rather, it suggests that the resultant organizational structure serves the interests of the various actors to the fullest extent each actor is able to manage.

Consider the account offered by Thomas Misa of the decline of the steel industry in the United States in the first part of the twentieth century in A Nation of Steel: The Making of Modern America, 1865-1925. Misa’s account seems to point to a massive dysfunction in the steel corporations of the inter-war period, a deliberate and sustained failure to invest in research on new steel technologies in metallurgy and production. Misa argues that the great steel corporations — US Steel in particular — failed to remain competitive in their industry in the early years of the twentieth century because management persistently pursued short-term profits and financial advantage for the company through domination of the market at the expense of research and development. It relied on market domination instead of research and development for its source of revenue and profits.

In short, U.S. Steel was big but not illegal. Its price leadership resulted from its complete dominance in the core markets for steel…. Indeed, many steelmakers had grown comfortable with U.S. Steel’s overriding policy of price and technical stability, which permitted them to create or develop markets where the combine chose not to compete, and they testified to the court in favor of the combine. The real price of stability … was the stifling of technological innovation. (255)

The result was that the modernized steel industries in Europe leap-frogged the previous US advantage and eventually led to unviable production technology in the United States.

At the periphery of the newest and most promising alloy steels, dismissive of continuous-sheet rolling, actively hostile to new structural shapes, a price leader but not a technical leader: this was U.S. Steel. What was the company doing with technological innovation? (257)

Misa is interested in arriving at a better way of understanding the imperatives leading to technical change — better than neoclassical economics and labor history. His solution highlights the changing relationships that developed between industrial consumers and producers in the steel industry.

We now possess a series of powerful insights into the dynamics of technology and social change. Together, these insights offer the realistic promise of being better able, if we choose, to modulate the complex process of technical change. We can now locate the range of sites for technical decision making, including private companies, trade organizations, engineering societies, and government agencies. We can suggest a typology of user-producer interactions, including centralized, multicentered, decentralized, and direct-consumer interactions, that will enable certain kinds of actions while constraining others. We can even suggest a range of activities that are likely to effect technical change, including standards setting, building and zoning codes, and government procurement. Furthermore, we can also suggest a range of strategies by which citizens supposedly on the “outside” may be able to influence decisions supposedly made on the “inside” about technical change, including credibility pressure, forced technology choice, and regulatory issues. (277-278)

In fact Misa places the dynamic of relationship between producer and large consumer at the center of the imperatives towards technological innovation:

In retrospect, what was wrong with U.S. Steel was not its size or even its market power but its policy of isolating itself from the new demands from users that might have spurred technical change. The resulting technological torpidity that doomed the industry was not primarily a matter of industrial concentration, outrageous behavior on the part of white- and blue-collar employees, or even dysfunctional relations among management, labor, and government. What went wrong was the industry’s relations with its consumers. (278)

This relative “callous treatment of consumers” was profoundly harmful when international competition gave large industrial users of steel a choice. When US Steel had market dominance, large industrial users had little choice; but this situation changed after WWII. “This favorable balance of trade eroded during the 1950s as German and Japanese steelmakers rebuilt their bombed-out plants with a new production technology, the basic oxygen furnace (BOF), which American steelmakers had dismissed as unproven and unworkable” (279). Misa quotes a president of a small steel producer: “The Big Steel companies tend to resist new technologies as long as they can … They only accept a new technology when they need it to survive” (280).

*****

Here is an interesting table from Misa’s book that sheds light on some of the economic and political history in the United States since the post-war period, leading right up to the populist politics of 2016 in the Midwest. This chart provides mute testimony to the decline of the rustbelt industrial cities. Michigan, Illinois, Ohio, Pennsylvania, and western New York account for 83% of the steel production on this table. When American producers lost the competitive battle for steel production in the 1980s, the Rustbelt suffered disproportionately, and eventually blue collar workers lost their places in the affluent economy.

Is corruption a social thing?

When we discuss the ontology of various aspects of the social world, we are often thinking of such things as institutions, organizations, social networks, value systems, and the like. These examples pick out features of the world that are relatively stable and functional. Where does an imperfection or dysfunction of social life like corruption fit into our social ontology?

We might say that “corruption” is a descriptive category that is aimed at capturing a particular range of behavior, like stealing, gossiping, or asceticism. This makes corruption a kind of individual behavior, or even a characteristic of some individuals. “Mayor X is corrupt.”

This initial effort does not seem satisfactory, however. The idea of corruption is tied to institutions, roles, and rules in a very direct way, and therefore we cannot really present the concept accurately without articulating these institutional features of the concept of corruption. Corruption might be paraphrased in these terms:

  • Individual X plays a role Y in institution Z; role Y prescribes honest and impersonal performance of duties; individual X accepts private benefits to take actions that are contrary to the prescriptions of Y. In virtue of these facts X behaves corruptly.

Corruption, then, involves actions taken by officials that deviate from the rules governing their role, in order to receive private benefits from the subjects of those actions. Absent the rules and role, corruption cannot exist. So corruption is a feature that presupposes certain social facts about institutions. (Perhaps there is a link to Searle’s social ontology here; link.)

We might consider that corruption is analogous to friction in physical systems. Friction is a factor that affects the performance of virtually all mechanical systems, but that is a second-order factor within classical mechanics. And it is possible to give mechanical explanations of the ubiquity of friction, in terms of the geometry of adjoining physical surfaces, the strength of inter-molecular attractions, and the like. Analogously, we can offer theories of the frequency with which corruption occurs in organizations, public and private, in terms of the interests and decision-making frameworks of variously situated actors (e.g. real estate developers, land value assessors, tax assessors, zoning authorities …). Developers have a business interest in favorable rulings from assessors and zoning authorities; some officials have an interest in accepting gifts and favors to increase personal income and wealth; each makes an estimate of the likelihood of detection and punishment; and a certain rate of corrupt exchanges is the result.

This line of thought once again makes corruption a feature of the actors and their calculations. But it is important to note that organizations themselves have features that make corrupt exchanges either more likely or less likely (link, link). Some organizations are corruption-resistant in ways in which others are corruption-neutral or corruption-enhancing. These features include internal accounting and auditing procedures; whistle-blowing practices; executive and supervisor vigilance; and other organizational features. Further, governments and systems of law can make arrangements that discourage corruption; the incidence of corruption is influenced by public policy. For example, legal requirements on transparency in financial practices by firms, investment in investigatory resources in oversight agencies, and weighty penalties to companies found guilty of corrupt practices can affect the incidence of corruption. (Robert Klitgaard’s treatment of corruption is relevant here; he provides careful analysis of some of the institutional and governmental measures that can be taken that discourage corrupt practices; link, link. And there are cross-country indices of corruption (e.g. Transparency International) that demonstrate the causal effectiveness of anti-corruption measures at the state level. Finland, Norway, and Switzerland rank well on the Transparency International index.)

So — is corruption a thing? Does corruption need to be included in a social ontology? Does a realist ontology of government and business organization have a place for corruption? Yes, yes, and yes. Corruption is a real property of individual actors’ behavior, observable in social life. It is a consequence of strategic rationality by various actors. Corruption is a social practice with its own supporting or inhibiting culture. Some organizations effectively espouse a core set of values of honesty and correct performance that make corruption less frequent. And corruption is a feature of the design of an organization or bureau, analogous to “mean-time-between-failure” as a feature of a mechanical design. Organizations can adopt institutional protections and cultural commitments that minimize corrupt behavior, while other organizations fail to do so and thereby encourage corrupt behavior. So “corruption-vulnerability” is a real feature of organizations and corruption has a social reality.

System effects

Quite a few posts here have focused on the question of emergence in social ontology, the idea that there are causal processes and powers at work at the level of social entities that do not correspond to similar properties at the individual level. Here I want to raise a related question, the notion that an important aspect of the workings of the social world derives from “system effects” of the organizations and institutions through which social life transpires. A system accident or effect is one that derives importantly from the organization and configuration of the system itself, rather than the specific properties of the units.

What are some examples of system effects? Consider these phenomena:

  • Flash crashes in stock markets as a result of automated trading
  • Under-reporting of land values in agrarian fiscal regimes 
  • Grade inflation in elite universities 
  • Increase in product defect frequency following a reduction in inspections 
  • Rising frequency of industrial errors at the end of work shifts 

Here is how Nancy Leveson describes systems causation in Engineering a Safer World: Systems Thinking Applied to Safety:

Safety approaches based on systems theory consider accidents as arising from the interactions among system components and usually do not specify single causal variables or factors. Whereas industrial (occupational) safety models and event chain models focus on unsafe acts or conditions, classic system safety models instead look at what went wrong with the system’s operation or organization to allow the accident to take place. (KL 977)

Charles Perrow offers a taxonomy of systems as a hierarchy of composition in Normal Accidents: Living with High-Risk Technologies:

Consider a nuclear plant as the system. A part will be the first level — say a valve. This is the smallest component of the system that is likely to be identified in analyzing an accident. A functionally related collection of parts, as, for example, those that make up the steam generator, will be called a unit, the second level. An array of units, such as the steam generator and the water return system that includes the condensate polishers and associated motors, pumps, and piping, will make up a subsystem, in this case the secondary cooling system. This is the third level. A nuclear plan has around two dozen subsystems under this rough scheme. They all come together in the fourth level, the nuclear plant or system. Beyond this is the environment. (65)

Large socioeconomic systems like capitalism and collectivized socialism have system effects — chronic patterns of low productivity and corruption in the latter case, a tendency to inequality and immiseration in the former case. In each case the observed effect is the result of embedded features of property and labor in the two systems that result in specific kinds of outcomes. And an important dimension of social analysis is to uncover the ways in which ordinary actors pursuing ordinary goals within the context of the two systems, lead to quite different outcomes at the level of the “mode of production”. And these effects do not depend on there being a distinctive kind of actor in each system; in fact, one could interchange the actors and still find the same macro-level outcomes.

Here is a preliminary effort at a definition for this concept in application to social organizations:

A system effect is an outcome that derives from the embedded characteristics of incentive and opportunity within a social arrangement that lead normal actors to engage in activity leading to the hypothesized aggregate effect.

Once we see what the incentive and opportunity structures are, we can readily see why some fraction of actors modify their behavior in ways that lead to the outcome. In this respect the system is the salient causal factor rather than the specific properties of the actors — change the system properties and you will change the social outcome.

 

When we refer to system effects we often have unintended consequences in mind — unintended both by the individual actors and the architects of the organization or practice. But this is not essential; we can also think of examples of organizational arrangements that were deliberately chosen or designed to bring about the given outcome. In particular, a given system effect may be intended by the designer and unintended by the individual actors. But when the outcomes in question are clearly dysfunctional or “catastrophic”, it is natural to assume that they are unintended. (This, however, is one of the specific areas of insight that comes out of the new institutionalism: the dysfunctional outcome may be favorable for some sets of actors even as they are unfavorable for the workings of the system as a whole.)

 
Another common assumption about system effects is that they are remarkably stable through changes of actors and efforts to reverse the given outcome. In this sense they are thought to be somewhat beyond the control of the individuals who make up the system. The only promising way of undoing the effect is to change the incentives and opportunities that bring it about. But to the extent that a given configuration has emerged along with supporting mechanisms protecting it from deformation, changing the configuration may be frustratingly difficult.

Safety and its converse are often described as system effects. By this is often meant two things. First, there is the important insight that traditional accident analysis favors “unit failure” at the expense of more systemic factors. And second, there is the idea that accidents and failures often result from “tightly linked” features of systems, both social and technical, in which variation in one component of a system can have unexpected consequences for the operation of other components of the system. Charles Perrow describes the topic of loose and tight coupling in social systems in Normal Accidents; 89 ff,)

%d bloggers like this: