Organizations do things — process tax returns, deploy armies, send spacecraft to Mars. And in order to do these various things, organizations have people with job descriptions; organization charts; internal rules and procedures; information flows and pathways; leaders, supervisors, and frontline staff; training and professional development programs; and other particular characteristics that make up the decision-making and action implementation of the organization. These individuals and sub-units take on tasks, communicate with each other, and give rise to action steps.
And often enough organizations make mistakes — sometimes small mistakes (a tax return is sent to the wrong person, a hospital patient is administered two aspirins rather than one) and sometimes large mistakes (the space shuttle Challenger is cleared for launch on January 28, 1986, a Union Carbide plant accidentally releases toxic gases over a large population in Bhopal, FEMA bungles its response to Hurricane Katrina). What can we say about the causes of organizational mistakes? And how can organizations and their processes be improved so mistakes are less common and less harmful?
Charles Perrow has devoted much of his career to studying these questions. Two books in particular have shed a great deal of light on the organizational causes of industrial and technological accidents, Normal Accidents: Living with High-Risk Technologies and The Next Catastrophe: Reducing Our Vulnerabilities to Natural, Industrial, and Terrorist Disasters. (Perrow’s work has been discussed in several earlier posts; link, link, link.) The first book emphasizes that errors and accidents are unavoidable; they are the random noise of the workings of a complex organization. So the key challenge is to have processes that detect errors and that are resilient to the ones that make it through. One of Perrow’s central findings in The Next Catastrophe is the importance of achieving a higher level of system resilience by decentralizing risk and potential damage. Don’t route tanker cars of chlorine through dense urban populations; don’t place nuclear power plants adjacent to cities; don’t create an Internet or a power grid with a very small number of critical nodes. Kathleen Tierney’s The Social Roots of Risk: Producing Disasters, Promoting Resilience (High Reliability and Crisis Management) emphasizes the need for system resilience as well (link).
Is it possible to arrive at a more granular understanding of organizational errors and their sources? A good place to begin is with the theory of organizations as “strategic action fields” in the sense advocated by Fligstein and McAdam in A Theory of Fields. This approach imposes an important discipline on us — it discourages the mental mistake of reification when we think about organizations. Organizations are not unitary decision and action bodies; instead, they are networks of people linked in a variety of forms of dependency and cooperation. Various sub-entities consider tasks, gather information, and arrive at decisions for action, and each of these steps is vulnerable to errors and shortfalls. The activities of individuals and sub-groups are stimulated and conveyed through these networks of association; and, like any network of control or communication, there is always the possibility of a broken link or a faulty action step within the extended set of relationships that exist.
Errors can derive from individual mistakes; they can derive from miscommunication across individuals and sub-units within the organization; they can derive from more intentional sources, including self-interested or corrupt behavior on the part of internal participants. And they can derive from conflicts of interest between units within an organization (the manufacturing unit has an interest in maximizing throughput, the quality control unit has an interest in minimizing faulty products).
Errors are likely in every part of an organization’s life. Errors occur in the data-gathering and analysis functions of an organization. A sloppy market study is incorporated into a planning process leading to a substantial over-estimate of demand for a product; a survey of suppliers makes use of ambiguous questions that lead to misinterpretation of the results; a vice president under-estimates the risk posed by a competitor’s advertising campaign. For an organization to pursue its mission effectively, it needs to have accurate information about the external circumstances that are most relevant to its goals. But “relevance” is a judgment issue; and it is possible for an organization to devote its intelligence-gathering resources to the collection of data that are only tangentially helpful for the task of designing actions to carry out the mission of the institution.
Errors occur in implementation as well. The action initiatives that emerge from an organization’s processes — from committees, from CEOs, from intermediate-level leaders, from informal groups of staff — are also vulnerable to errors of implementation. The facilities team formulates a plan for re-surfacing a group of parking lots; this plan depends upon closing these lots several days in advance; but the safety department delays in implementing the closure and the lots have hundreds of cars in them when the resurfacing equipment arrives. An error of implementation.
One way of describing these kinds of errors is to recognize that organizations are “loosely connected” when it comes to internal processes of information gathering, decision making, and action. The CFO stipulates that the internal audit function should be based on best practices nationally; the chief of internal audit interprets this as an expectation that processes should be designed based on the example of top-tier companies in the same industry; and the subordinate operationalizes this expectation by doing a survey of business-school case studies of internal audit functions at 10 companies. But the data collection that occurs now has only a loose relationship to the higher-level expectation formulated by the CFO. Similar disconnects — or loose connections — occur on the side of implementation of action steps as well. Presumably top FEMA officials did not intend that FEMA’s actions in response to Hurricane Katrina would be as ineffective and sporadic as they turned out to be.
Organizations also have a tendency towards acting on the basis of collective habits and traditions of behavior. It is easier for a university’s admissions department to continue the same programs of recruitment and enrollment year after year than it is to rethink the approach to recruitment in a fundamental way. And yet it may be that the circumstances of the external environment have changed so dramatically that the habitual practices will no longer achieve similar results. A good example is the emergence of social media marketing in admissions; in a very short period of time the 17- and 18-year-old young people whom admissions departments want to influence went from willing recipients of glossy admissions publications in the mail to “Facebook-only” readers. Yesterday’s correct solution to an organizational problem may become tomorrow’s serious error, because the environment has changed.
In a way the problem of organizational errors is analogous to the problem of software bugs in large, complex computer systems. It is recognized by software experts that bugs are inevitable; and some of these coding errors or design errors may have catastrophic consequences in unusual settings. (Nancy Leveson’s Safeware: System Safety and Computers provides an excellent review of these possibilities.) So the task for software engineers and organizational designers and leaders is similar: designing fallible systems that do a pretty good job almost all of the time, and are likely to fail gracefully when errors inevitably occur.