Herbert Simon’s theories of organizations

Image: detail from Family Portrait 2 1965 
(Creative Commons license, Richard Rappaport)
 

Herbert Simon made paradigm-changing contributions to the theory of rational behavior, including particularly his treatment of “satisficing” as an alternative to “maximizing” economic rationality (link). It is therefore worthwhile examining his views of organizations and organizational decision-making and action — especially given how relevant those theories are to my current research interest in organizational dysfunction. His highly successful book Administrative Behavior went through four editions between 1947 and 1997 — more than fifty years of thinking about organizations and organizational behavior. The more recent editions consist of the original text and “commentary” chapters that Simon wrote to incorporate more recent thinking about the content of each of the chapters.

Here I will pull out some of the highlights of Simon’s approach to organizations. There are many features of his analysis of organizational behavior that are worth noting. But my summary assessment is that the book is surprisingly positive about the rationality of organizations and the processes through which they collect information and reach decisions. In the contemporary environment where we have all too many examples of organizational failure in decision-making — from Boeing to Purdue Pharma to the Federal Emergency Management Agency — this confidence seems to be fundamentally misplaced. The theorist who invented the idea of imperfect rationality and satisficing at the individual level perhaps should have offered a somewhat more critical analysis of organizational thinking.

The first thing that the reader will observe is that Simon thinks about organizations as systems of decision-making and execution. His working definition of organization highlights this view:

In this book, the term organization refers to the pattern of communications and relations among a group of human beings, including the processes for making and implementing decisions. This pattern provides to organization members much of the information and many of the assumptions, goals, and attitudes that enter into their decisions, and provides also a set of stable and comprehensible expectations as to what the other members of the group are doing and how they will react to what one says and does. (18-19).

What is a scientifically relevant description of an organization? It is a description that, so far as possible, designates for each person in the organization what decisions that person makes, and the influences to which he is subject in making each of these decisions. (43)

The central theme around which the analysis has been developed is that organization behavior is a complex network of decisional processes, all pointed toward their influence upon the behaviors of the operatives — those who do the action ‘physical’ work of the organization. (305)

The task of decision-making breaks down into the assimilation of relevant facts and values — a distinction that Simon attributes to logical positivism in the original text but makes more general in the commentary. Answering the question, “what should we do?”, requires a clear answer to two kinds of questions: what values are we attempting to achieve? And how does the world work such that interventions will bring about those values?

It is refreshing to see Simon’s skepticism about the “rules of administration” that various generations of organizational theorists have advanced — “specialization,” “unity of command,” “span of control,” and so forth. Simon describes these as proverbs rather than as useful empirical discoveries about effective administration. And he finds the idea of “schools of management theory” to be entirely unhelpful (26). Likewise, he is entirely skeptical about the value of the economic theory of the firm, which abstracts from all of the arrangements among participants that are crucial to the internal processes of the organization in Simon’s view. He recommends an approach to the study of organizations (and the design of organizations) that focuses on the specific arrangements needed to bring factual and value claims into a process of deliberation leading to decision — incorporating the kinds of specialization and control that make sense for a particular set of business and organizational tasks.

An organization has only two fundamental tasks: decision-making and “making things happen”. The decision-making process involves intelligently gathering facts and values and designing a plan. Simon generally approaches this process as a reasonably rational one. He identifies three kinds of limits on rational decision-making:

  • The individual is limited by those skills, habits, and reflexes which are no longer in the realm of the conscious…
  • The individual is limited by his values and those conceptions of purpose which influence him in making his decision…
  • The individual is limited by the extent of his knowledge of things relevant to his job. (46)

And he explicitly regards these points as being part of a theory of administrative rationality:

Perhaps this triangle of limits does not completely bound the area of rationality, and other sides need to be added to the figure. In any case, the enumeration will serve to indicate the kinds of considerations that must go into the construction of valid and noncontradictory principles of administration. (47)

The “making it happen” part is more complicated. This has to do with the problem the executive faces of bringing about the efficient, effective, and loyal performance of assigned tasks by operatives. Simon’s theory essentially comes down to training, loyalty, and authority.

If this is a correct description of the administrative process, then the construction of an efficient administrative organization is a problem in social psychology. It is a task of setting up an operative staff and superimposing on that staff a supervisory staff capable of influencing the operative group toward a pattern of coordinated and effective behavior. (2)

To understand how the behavior of the individual becomes a part of the system of behavior of the organization, it is necessary to study the relation between the personal motivation of the individual and the objectives toward which the activity of the organization is oriented. (13-14) 

Simon refers to three kinds of influence that executives and supervisors can have over “operatives”: formal authority (enforced by the power to hire and fire), organizational loyalty (cultivated through specific means within the organization), and training. Simon holds that a crucial role of administrative leadership is the task of motivating the employees of the organization to carry out the plan efficiently and effectively.

Later he refers to five “mechanisms of organization influence” (112): specialization and division of task; the creation of standard practices; transmission of decisions downwards through authority and influence; channels of communication in all directions; and training and indoctrination. Through these mechanisms the executive seeks to ensure a high level of conformance and efficient performance of tasks.

What about the actors within an organization? How do they behave as individual actors? Simon treats them as “boundedly rational”:

To anyone who has observed organizations, it seems obvious enough that human behavior in them is, if not wholly rational, at least in good part intendedly so. Much behavior in organizations is, or seems to be, task-oriented–and often efficacious in attaining its goals. (88)

But this description leaves out altogether the possibility and likelihood of mixed motives, conflicts of interest, and intra-organizational disagreement. When Simon considers the fact of multiple agents within an organization, he acknowledges that this poses a challenge for rationalistic organizational theory:

Complications are introduced into the picture if more than one individual is involved, for in this case the decisions of the other individuals will be included among the conditions which each individual must consider in reaching his decisions. (80)

This acknowledges the essential feature of organizations — the multiplicity of actors — but fails to treat it with the seriousness it demands. He attempts to resolve the issue by invoking cooperation and the language of strategic rationality: “administrative organizations are systems of cooperative behavior. The members of the organization are expected to orient their behavior with respect to certain goals that are taken as ‘organization objectives'” (81). But this simply presupposes the result we might want to occur, without providing a basis for expecting it to take place.

With the hindsight of half a century, I am inclined to think that Simon attributes too much rationality and hierarchical purpose to organizations.

The rational administrator is concerned with the selection of these effective means. For the construction of an administrative theory it is necessary to examine further the notion of rationality and, in particular, to achieve perfect clarity as to what is meant by “the selection of effective means.” (72)  

These sentences, and many others like them, present the task as one of defining the conditions of rationality of an organization or firm; this takes for granted the notion that the relations of communication, planning, and authority can result in a coherent implementation of a plan of action. His model of an organization involves high-level executives who pull together factual information (making use of specialized experts in this task) and integrating the purposes and goals of the organization (profits, maintaining the health and safety of the public, reducing poverty) into an actionable set of plans to be implemented by subordinates. He refers to a “hierarchy of decisions,” in which higher-level goals are broken down into intermediate-level goals and tasks, with a coherent relationship between intermediate and higher-level goals. “Behavior is purposive in so far as it is guided by general goals or objectives; it is rational in so far as it selects alternatives which are conducive to the achievement of the previously selected goals” (4).  And the suggestion is that a well-designed organization succeeds in establishing this kind of coherence of decision and action.

 

It is true that he also asserts that decisions are “composite” —

It should be perfectly apparent that almost no decision made in an organization is the task of a single individual. Even though the final responsibility for taking a particular action rests with some definite person, we shall always find, in studying the manner in which this decision was reached, that its various components can be traced through the formal and informal channels of communication to many individuals … (305)

But even here he fails to consider the possibility that this compositional process may involve systematic dysfunctions that require study. Rather, he seems to presuppose that this composite process itself proceeds logically and coherently. In commenting on a case study by Oswyn Murray (1923) on the design of a post-WWI battleship, he writes: “The point which is so clearly illustrated here is that the planning procedure permits expertise of every kind to be drawn into the decision without any difficulties being imposed by the lines of authority in the organization” (314). This conclusion is strikingly at odds with most accounts of science-military relations during World War II in Britain — for example, the pernicious interference of Frederick Alexander Lindemann with Patrick Blackett over Blackett’s struggles to create an operations-research basis for anti-submarine warfare (Blackett’s War: The Men Who Defeated the Nazi U-Boats and Brought Science to the Art of Warfare). His comments about the processes of review that can be implemented within organizations (314 ff.) are similarly excessively optimistic — contrary to the literature on principal-agent problems in many areas of complex collaboration.

This is surprising, given Simon’s contributions to the theory of imperfect rationality in the case of individual decision-making. Against this confidence, the sources of organizational dysfunction that are now apparent in several literatures on organization make it more difficult to imagine that organizations can have a high success rate in rational decision-making. If we were seeking for a Simon-like phrase for organizational thinking to parallel the idea of satisficing, we might come up with the notion of bounded localistic organizational rationality”: “locally rational, frequently influenced by extraneous forces, incomplete information, incomplete communication across divisions, rarely coherent over the whole organization”.

Simon makes the point emphatically in the opening chapters of the book that administrative science is an incremental and evolving field. And in fact, it seems apparent that his own thinking continued to evolve. There are occasional threads of argument in Simon’s work that seem to point towards a more contingent view of organizational behavior and rationality, along the lines of Fligstein and McAdam’s theories of strategic action fields. For example, when discussing organizational loyalty Simon raises the kind of issue that is central to the strategic action field model of organizations: the conflicts of interest that can arise across units (11). And in the commentary on Chapter I he points forward to the theories of strategic action fields and complex adaptive systems:

The concepts of systems, multiple constituencies, power and politics, and organization culture all flow quite naturally from the concept of organizations as complex interactive structures held together by a balance of the inducements provided to various groups of participants and the contributions received from them. (27)

The book has been a foundational contribution to organizational studies. At the same time, if Herbert Simon were at the beginning of his career and were beginning his study of organizational decision-making today, I suspect he might have taken a different tack. He was plainly committed to empirical study of existing organizations and the mechanisms through which they worked. And he was receptive to the ideas surrounding the notion of imperfect rationality. The current literature on the sources of contention and dysfunction within organizations (Perrow, Fligstein, McAdam, Crozier, …) might well have led him to write a different book altogether, one that gave more attention to the sources of failures of rational decision-making and implementation alongside the occasional examples of organizations that seem to work at a very high level of rationality and effectiveness.

The 737 MAX disaster as an organizational failure

The topic of the organizational causes of technology failure comes up frequently in Understanding Society. The tragic crashes of two Boeing 737 MAX aircraft in the past year present an important case to study. Is this an instance of pilot error (as has occasionally been suggested)? Is it a case of engineering and design failures? Or are there important corporate and regulatory failures that created the environment in which the accidents occurred, as the public record seems to suggest?

The formal accident investigations are not yet complete, and the FAA and other air safety agencies around the world have not yet approved the aircraft for flight following the suspension of certification following the second crash. There will certainly be a detailed and expert case study of this case at some point in the future, and I will be eager to read the resulting book. In the meantime, though, it is  useful to bring the perspectives of Charles Perrow, Diane Vaughan, and Andrew Hopkins to bear on what we can learn about this case from the public media sources that are available. The preliminary sketch of a case study offered below is a first effort and is intended simply to help us learn more about the social and organizational processes that govern the complex technologies upon which we depend. Many of the dysfunctions identified in the safety literature appear to have had a role in this disaster.

I have made every effort to offer an accurate summary based on publicly available sources, but readers should bear in mind that it is a preliminary effort.

The key conclusions I’ve been led to include these:

The updated flight control system of the aircraft (MCAS) created the conditions for crashes in rare flight conditions and instrument failures.

  • Faults in the AOA sensor and the MCAS flight control system persisted through the design process 
  • pilot training and information about changes in the flight control system were likely inadequate to permit pilots to override the control system when necessary  

There were fairly clear signs of organizational dysfunction in the development and design process for the aircraft:

  • Disempowered mid-level experts (engineers, designers, software experts)
  • Inadequate organizational embodiment of safety oversight
  • Business priorities placing cost savings, timeliness, profits over safety
  • Executives with divided incentives
  • Breakdown of internal management controls leading to faulty manufacturing processes 

Cost-containment and speed trumped safety. It is hard to avoid the conclusion that the corporation put cost-cutting and speed ahead of the professional advice and judgment of the engineers. Management pushed the design and certification process aggressively, leading to implementation of a control system that could fail in foreseeable flight conditions.

The regulatory system seems to have been at fault as well, with the FAA taking a deferential attitude towards the company’s assertions of expertise throughout the certification process. The regulatory process was “outsourced” to a company that already has inordinate political clout in Congress and the agencies.

  • Inadequate government regulation
  • FAA lacked direct expertise and oversight sufficient to detect design failures. 
  • Too much influence by the company over regulators and legislators

Here is a video presentation of the case as I currently understand it (link). 

 
See also this earlier discussion of regulatory failure in the 737 MAX case (link). Here are several experts on the topic of organizational failure whose work is especially relevant to the current case:

Organizations and dysfunction

A recurring theme in recent months in Understanding Society is organizational dysfunction and the organizational causes of technology failure. Helmut Anheier’s volume When Things Go Wrong: Organizational Failures and Breakdowns is highly relevant to this topic, and it makes for very interesting reading. The volume includes contributions by a number of leading scholars in the sociology of organizations.

And yet the volume seems to miss the mark in some important ways. For one thing, it is unduly focused on the question of “mortality” of firms and other organizations. Bankruptcy and organizational death are frequent synonyms for “failure” here. This frame is evident in the summary the introduction offers of existing approaches in the field: organizational aspects, political aspects, cognitive aspects, and structural aspects. All bring us back to the causes of extinction and bankruptcy in a business organization. Further, the approach highlights the importance of internal conflict within an organization as a source of eventual failure. But it gives no insight into the internal structure and workings of the organization itself, the ways in which behavior and internal structure function to systematically produce certain kinds of outcomes that we can identify as dysfunctional.

Significantly, however, dysfunction does not routinely lead to death of a firm. (Seibel’s contribution in the volume raises this possibility, which Seibel refers to as “successful failures“). This is a familiar observation from political science: what looks dysfunctional from the outside may be perfectly well tuned to a different set of interests (for example, in Robert Bates’s account of pricing boards in Africa in Markets and States in Tropical Africa: The Political Basis of Agricultural Policies). In their introduction to this volume Anheier and Moulton refer to this possibility as a direction for future research: “successful for whom, a failure for whom?” (14).

The volume tends to look at success and failure in terms of profitability and the satisfaction of stakeholders. But we can define dysfunction in a more granular way by linking characteristics of performance to the perceived “purposes and goals” of the organization. A regulatory agency exists in order to effectively project the health and safety of the public. In this kind of case, failure is any outcome in which the agency flagrantly and avoidably fails to prevent a serious harm — release of radioactive material, contamination of food, a building fire resulting from defects that should have been detected by inspection. If it fails to do so as well as it might then it is dysfunctional.

Why do dysfunctions persist in organizations? It is possible to identify several possible causes. The first is that a dysfunction from one point of view may well be a desirable feature from another point of view. The lack of an authoritative safety officer in a chemical plant may be thought to be dysfunctional if we are thinking about the safety of workers and the public as a primary goal of the plant (link). But if profitability and cost-savings are the primary goals from the point of view of the stakeholders, then the cost-benefit analysis may favor the lack of the safety officer.

Second, there may be internal failures within an organization that are beyond the reach of any executive or manager who might want to correct them. The complexity and loose-coupling of large organizations militate against house cleaning on a large scale.

Third, there may be powerful factions within an organization for whom the “dysfunctional” feature is an important component of their own set of purposes and goals. Fligstein and McAdam argue for this kind of disaggregation with their theory of strategic action fields (link). By disaggregating purposes and goals to the various actors who figure in the life cycle of the organization – founders, stakeholders, executives, managers, experts, frontline workers, labor organizers – it is possible to see the organization as a whole as simply the aggregation of the multiple actions and purposes of the actors within and adjacent to the organization. This aggregation does not imply that the organization is carefully adjusted to serve the public good or to maximize efficiency or to protect the health and safety of the public. Rather, it suggests that the resultant organizational structure serves the interests of the various actors to the fullest extent each actor is able to manage.

Consider the account offered by Thomas Misa of the decline of the steel industry in the United States in the first part of the twentieth century in A Nation of Steel: The Making of Modern America, 1865-1925. Misa’s account seems to point to a massive dysfunction in the steel corporations of the inter-war period, a deliberate and sustained failure to invest in research on new steel technologies in metallurgy and production. Misa argues that the great steel corporations — US Steel in particular — failed to remain competitive in their industry in the early years of the twentieth century because management persistently pursued short-term profits and financial advantage for the company through domination of the market at the expense of research and development. It relied on market domination instead of research and development for its source of revenue and profits.

In short, U.S. Steel was big but not illegal. Its price leadership resulted from its complete dominance in the core markets for steel…. Indeed, many steelmakers had grown comfortable with U.S. Steel’s overriding policy of price and technical stability, which permitted them to create or develop markets where the combine chose not to compete, and they testified to the court in favor of the combine. The real price of stability … was the stifling of technological innovation. (255)

The result was that the modernized steel industries in Europe leap-frogged the previous US advantage and eventually led to unviable production technology in the United States.

At the periphery of the newest and most promising alloy steels, dismissive of continuous-sheet rolling, actively hostile to new structural shapes, a price leader but not a technical leader: this was U.S. Steel. What was the company doing with technological innovation? (257)

Misa is interested in arriving at a better way of understanding the imperatives leading to technical change — better than neoclassical economics and labor history. His solution highlights the changing relationships that developed between industrial consumers and producers in the steel industry.

We now possess a series of powerful insights into the dynamics of technology and social change. Together, these insights offer the realistic promise of being better able, if we choose, to modulate the complex process of technical change. We can now locate the range of sites for technical decision making, including private companies, trade organizations, engineering societies, and government agencies. We can suggest a typology of user-producer interactions, including centralized, multicentered, decentralized, and direct-consumer interactions, that will enable certain kinds of actions while constraining others. We can even suggest a range of activities that are likely to effect technical change, including standards setting, building and zoning codes, and government procurement. Furthermore, we can also suggest a range of strategies by which citizens supposedly on the “outside” may be able to influence decisions supposedly made on the “inside” about technical change, including credibility pressure, forced technology choice, and regulatory issues. (277-278)

In fact Misa places the dynamic of relationship between producer and large consumer at the center of the imperatives towards technological innovation:

In retrospect, what was wrong with U.S. Steel was not its size or even its market power but its policy of isolating itself from the new demands from users that might have spurred technical change. The resulting technological torpidity that doomed the industry was not primarily a matter of industrial concentration, outrageous behavior on the part of white- and blue-collar employees, or even dysfunctional relations among management, labor, and government. What went wrong was the industry’s relations with its consumers. (278)

This relative “callous treatment of consumers” was profoundly harmful when international competition gave large industrial users of steel a choice. When US Steel had market dominance, large industrial users had little choice; but this situation changed after WWII. “This favorable balance of trade eroded during the 1950s as German and Japanese steelmakers rebuilt their bombed-out plants with a new production technology, the basic oxygen furnace (BOF), which American steelmakers had dismissed as unproven and unworkable” (279). Misa quotes a president of a small steel producer: “The Big Steel companies tend to resist new technologies as long as they can … They only accept a new technology when they need it to survive” (280).

*****

Here is an interesting table from Misa’s book that sheds light on some of the economic and political history in the United States since the post-war period, leading right up to the populist politics of 2016 in the Midwest. This chart provides mute testimony to the decline of the rustbelt industrial cities. Michigan, Illinois, Ohio, Pennsylvania, and western New York account for 83% of the steel production on this table. When American producers lost the competitive battle for steel production in the 1980s, the Rustbelt suffered disproportionately, and eventually blue collar workers lost their places in the affluent economy.

Is corruption a social thing?

When we discuss the ontology of various aspects of the social world, we are often thinking of such things as institutions, organizations, social networks, value systems, and the like. These examples pick out features of the world that are relatively stable and functional. Where does an imperfection or dysfunction of social life like corruption fit into our social ontology?

We might say that “corruption” is a descriptive category that is aimed at capturing a particular range of behavior, like stealing, gossiping, or asceticism. This makes corruption a kind of individual behavior, or even a characteristic of some individuals. “Mayor X is corrupt.”

This initial effort does not seem satisfactory, however. The idea of corruption is tied to institutions, roles, and rules in a very direct way, and therefore we cannot really present the concept accurately without articulating these institutional features of the concept of corruption. Corruption might be paraphrased in these terms:

  • Individual X plays a role Y in institution Z; role Y prescribes honest and impersonal performance of duties; individual X accepts private benefits to take actions that are contrary to the prescriptions of Y. In virtue of these facts X behaves corruptly.

Corruption, then, involves actions taken by officials that deviate from the rules governing their role, in order to receive private benefits from the subjects of those actions. Absent the rules and role, corruption cannot exist. So corruption is a feature that presupposes certain social facts about institutions. (Perhaps there is a link to Searle’s social ontology here; link.)

We might consider that corruption is analogous to friction in physical systems. Friction is a factor that affects the performance of virtually all mechanical systems, but that is a second-order factor within classical mechanics. And it is possible to give mechanical explanations of the ubiquity of friction, in terms of the geometry of adjoining physical surfaces, the strength of inter-molecular attractions, and the like. Analogously, we can offer theories of the frequency with which corruption occurs in organizations, public and private, in terms of the interests and decision-making frameworks of variously situated actors (e.g. real estate developers, land value assessors, tax assessors, zoning authorities …). Developers have a business interest in favorable rulings from assessors and zoning authorities; some officials have an interest in accepting gifts and favors to increase personal income and wealth; each makes an estimate of the likelihood of detection and punishment; and a certain rate of corrupt exchanges is the result.

This line of thought once again makes corruption a feature of the actors and their calculations. But it is important to note that organizations themselves have features that make corrupt exchanges either more likely or less likely (link, link). Some organizations are corruption-resistant in ways in which others are corruption-neutral or corruption-enhancing. These features include internal accounting and auditing procedures; whistle-blowing practices; executive and supervisor vigilance; and other organizational features. Further, governments and systems of law can make arrangements that discourage corruption; the incidence of corruption is influenced by public policy. For example, legal requirements on transparency in financial practices by firms, investment in investigatory resources in oversight agencies, and weighty penalties to companies found guilty of corrupt practices can affect the incidence of corruption. (Robert Klitgaard’s treatment of corruption is relevant here; he provides careful analysis of some of the institutional and governmental measures that can be taken that discourage corrupt practices; link, link. And there are cross-country indices of corruption (e.g. Transparency International) that demonstrate the causal effectiveness of anti-corruption measures at the state level. Finland, Norway, and Switzerland rank well on the Transparency International index.)

So — is corruption a thing? Does corruption need to be included in a social ontology? Does a realist ontology of government and business organization have a place for corruption? Yes, yes, and yes. Corruption is a real property of individual actors’ behavior, observable in social life. It is a consequence of strategic rationality by various actors. Corruption is a social practice with its own supporting or inhibiting culture. Some organizations effectively espouse a core set of values of honesty and correct performance that make corruption less frequent. And corruption is a feature of the design of an organization or bureau, analogous to “mean-time-between-failure” as a feature of a mechanical design. Organizations can adopt institutional protections and cultural commitments that minimize corrupt behavior, while other organizations fail to do so and thereby encourage corrupt behavior. So “corruption-vulnerability” is a real feature of organizations and corruption has a social reality.

System effects

Quite a few posts here have focused on the question of emergence in social ontology, the idea that there are causal processes and powers at work at the level of social entities that do not correspond to similar properties at the individual level. Here I want to raise a related question, the notion that an important aspect of the workings of the social world derives from “system effects” of the organizations and institutions through which social life transpires. A system accident or effect is one that derives importantly from the organization and configuration of the system itself, rather than the specific properties of the units.

What are some examples of system effects? Consider these phenomena:

  • Flash crashes in stock markets as a result of automated trading
  • Under-reporting of land values in agrarian fiscal regimes 
  • Grade inflation in elite universities 
  • Increase in product defect frequency following a reduction in inspections 
  • Rising frequency of industrial errors at the end of work shifts 

Here is how Nancy Leveson describes systems causation in Engineering a Safer World: Systems Thinking Applied to Safety:

Safety approaches based on systems theory consider accidents as arising from the interactions among system components and usually do not specify single causal variables or factors. Whereas industrial (occupational) safety models and event chain models focus on unsafe acts or conditions, classic system safety models instead look at what went wrong with the system’s operation or organization to allow the accident to take place. (KL 977)

Charles Perrow offers a taxonomy of systems as a hierarchy of composition in Normal Accidents: Living with High-Risk Technologies:

Consider a nuclear plant as the system. A part will be the first level — say a valve. This is the smallest component of the system that is likely to be identified in analyzing an accident. A functionally related collection of parts, as, for example, those that make up the steam generator, will be called a unit, the second level. An array of units, such as the steam generator and the water return system that includes the condensate polishers and associated motors, pumps, and piping, will make up a subsystem, in this case the secondary cooling system. This is the third level. A nuclear plan has around two dozen subsystems under this rough scheme. They all come together in the fourth level, the nuclear plant or system. Beyond this is the environment. (65)

Large socioeconomic systems like capitalism and collectivized socialism have system effects — chronic patterns of low productivity and corruption in the latter case, a tendency to inequality and immiseration in the former case. In each case the observed effect is the result of embedded features of property and labor in the two systems that result in specific kinds of outcomes. And an important dimension of social analysis is to uncover the ways in which ordinary actors pursuing ordinary goals within the context of the two systems, lead to quite different outcomes at the level of the “mode of production”. And these effects do not depend on there being a distinctive kind of actor in each system; in fact, one could interchange the actors and still find the same macro-level outcomes.

Here is a preliminary effort at a definition for this concept in application to social organizations:

A system effect is an outcome that derives from the embedded characteristics of incentive and opportunity within a social arrangement that lead normal actors to engage in activity leading to the hypothesized aggregate effect.

Once we see what the incentive and opportunity structures are, we can readily see why some fraction of actors modify their behavior in ways that lead to the outcome. In this respect the system is the salient causal factor rather than the specific properties of the actors — change the system properties and you will change the social outcome.

 

When we refer to system effects we often have unintended consequences in mind — unintended both by the individual actors and the architects of the organization or practice. But this is not essential; we can also think of examples of organizational arrangements that were deliberately chosen or designed to bring about the given outcome. In particular, a given system effect may be intended by the designer and unintended by the individual actors. But when the outcomes in question are clearly dysfunctional or “catastrophic”, it is natural to assume that they are unintended. (This, however, is one of the specific areas of insight that comes out of the new institutionalism: the dysfunctional outcome may be favorable for some sets of actors even as they are unfavorable for the workings of the system as a whole.)

 
Another common assumption about system effects is that they are remarkably stable through changes of actors and efforts to reverse the given outcome. In this sense they are thought to be somewhat beyond the control of the individuals who make up the system. The only promising way of undoing the effect is to change the incentives and opportunities that bring it about. But to the extent that a given configuration has emerged along with supporting mechanisms protecting it from deformation, changing the configuration may be frustratingly difficult.

Safety and its converse are often described as system effects. By this is often meant two things. First, there is the important insight that traditional accident analysis favors “unit failure” at the expense of more systemic factors. And second, there is the idea that accidents and failures often result from “tightly linked” features of systems, both social and technical, in which variation in one component of a system can have unexpected consequences for the operation of other components of the system. Charles Perrow describes the topic of loose and tight coupling in social systems in Normal Accidents; 89 ff,)

Philosophy and the study of technology failure

image: Adolf von Menzel, The Iron Rolling Mill (Modern Cyclopes)
 

Readers may have noticed that my current research interests have to do with organizational dysfunction and largescale technology failures. I am interested in probing the ways in which organizational failures and dysfunctions have contributed to large accidents like Bhopal, Fukushima, and the Deepwater Horizon disaster. I’ve had to confront an important question in taking on this research interest: what can philosophy bring to the topic that would not be better handled by engineers, organizational specialists, or public policy experts?

One answer is the diversity of viewpoint that a philosopher can bring to the discussion. It is evident that technology failures invite analysis from all of these specialized experts, and more. But there is room for productive contribution from reflective observers who are not committed to any of these disciplines. Philosophers have a long history of taking on big topics outside the defined canon of “philosophical problems”, and often those engagements have proven fruitful. In this particular instance, philosophy can look at organizations and technology in a way that is more likely to be interdisciplinary, and perhaps can help to see dimensions of the problem that are less apparent from a purely disciplinary perspective.

There is also a rationale based on the terrain of the philosophy of science. Philosophers of biology have usually attempted to learn as much about the science of biology as they can manage, but they lack the level of expertise of a research biologist, and it is rare for a philosopher to make an original contribution to the scientific biological literature. Nonetheless it is clear that philosophers have a great deal to add to scientific research in biology. They can contribute to better reasoning about the implications of various theories, they can probe the assumptions about confirmation and explanation that are in use, and they can contribute to important conceptual disagreements. Biology is in a better state because of the work of philosophers like David Hull and Elliot Sober.

Philosophers have also made valuable contributions to science and technology studies, bringing a viewpoint that incorporates insights from the philosophy of science and a sensitivity to the social groundedness of technology. STS studies have proven to be a fruitful place for interaction between historians, sociologists, and philosophers. Here again, the concrete study of the causes and context of large technology failure may be assisted by a philosophical perspective.

There is also a normative dimension to these questions about technology failure for which philosophy is well prepared. Accidents hurt people, and sometimes the causes of accidents involve culpable behavior by individuals and corporations. Philosophers have a long history of contribution to these kinds of problems of fault, law, and just management of risks and harms.

Finally, it is realistic to say that philosophy has an ability to contribute to social theory. Philosophers can offer imagination and critical attention to the problem of creating new conceptual schemes for understanding the social world. This capacity seems relevant to the problem of describing, analyzing, and explaining largescale failures and disasters.

The situation of organizational studies and accidents is in some ways more hospitable for contributions by a philosopher than other “wicked problems” in the world around us. An accident is complicated and complex but not particularly obscure. The field is unlike quantum mechanics or climate dynamics, which are inherently difficult for non-specialists to understand. The challenge with accidents is to identify a multi-layered analysis of the causes of the accident that permits observers to have a balanced and operative understanding of the event. And this is a situation where the philosopher’s perspective is most useful. We can offer higher-level descriptions of the relative importance of different kinds of causal factors. Perhaps the role here is analogous to messenger RNA, providing a cross-disciplinary kind of communications flow. Or it is analogous to the role of philosophers of history who have offered gentle critique of the cliometrics school for its over-dependence on a purely statistical approach to economic history.

So it seems reasonable enough for a philosopher to attempt to contribute to this set of topics, even if the disciplinary expertise a philosopher brings is more weighted towards conceptual and theoretical discussions than undertaking original empirical research in the domain.

What I expect to be the central finding of this research is the idea that a pervasive and often unrecognized cause of accidents is a systemic organizational defect of some sort, and that it is enormously important to have a better understanding of common forms of these deficiencies. This is a bit analogous to a paradigm shift in the study of accidents. And this view has important policy implications. We can make disasters less frequent by improving the organizations through which technology processes are designed and managed.

System safety

An ongoing thread of posts here is concerned with organizational causes of large technology failures. The driving idea is that failures, accidents, and disasters usually have a dimension of organizational causation behind them. The corporation, research office, shop floor, supervisory system, intra-organizational information flow, and other social elements often play a key role in the occurrence of a gas plant fire, a nuclear power plant malfunction, or a military disaster. There is a tendency to look first and foremost for one or more individuals who made a mistake in order to explain the occurrence of an accident or technology failure; but researchers such as Perrow, Vaughan, Tierney, and Hopkins have demonstrated in detail the importance of broadening the lens to seek out the social and organizational background of an accident.

It seems important to distinguish between system flaws and organizational dysfunction in considering all of the kinds of accidents mentioned here. We might specify system safety along these lines. Any complex process has the potential for malfunction. Good system design means creating a flow of events and processes that make accidents inherently less likely. Part of the task of the designer and engineer is to identify chief sources of harm inherent in the process — release of energy, contamination of food or drugs, unplanned fission in a nuclear plant — and design fail-safe processes so that these events are as unlikely as possible. Further, given the complexity of contemporary technology systems it is critical to attempt to anticipate unintended interactions among subsystems — each of which is functioning correctly but that lead to disaster in unusual but possible interaction scenarios.

In a nuclear processing plant, for example, there is the hazard of radioactive materials being brought into proximity with each other in a way that creates unintended critical mass. Jim Mahaffey’s Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima offers numerous examples of such unintended events, from the careless handling of plutonium scrap in a machining process to the transfer of a fissionable liquid from a vessel of one shape to another. We might try to handle these risks as an organizational problem: more and better training for operatives about the importance of handling nuclear materials according to established protocols, and effective supervision and oversight to ensure that the protocols are observed on a regular basis. But it is also possible to design the material processes within a nuclear plant in a way that makes unintended criticality virtually impossible — for example, by storing radioactive solutions in containers that simply cannot be brought into close proximity with each other.

Nancy Leveson is a national expert on defining and applying principles of system safety. Her book Engineering a Safer World: Systems Thinking Applied to Safety is a thorough treatment of her thinking about this subject. She offers a handful of compelling reasons for believing that safety is a system-level characteristic that requires a systems approach: the fast pace of technological change, reduced ability to learn from experience, the changing nature of accidents, new types of hazards, increasing complexity and coupling, decreasing tolerance for single accidents, difficulty in selecting priorities and making tradeoffs , more complex relationships between humans and automation, and changing regulatory and public view of safety (kl 130 ff.). Particularly important in this list is the comment about complexity and coupling: “The operation of some systems is so complex that it defies the understanding of all but a few experts, and sometimes even they have incomplete information about the system’s potential behavior” (kl 137). 

Given the fact that safety and accidents are products of whole systems, she is critical of the accident methodology generally applied to serious industrial, aerospace, and chemical accidents. This methodology involves tracing the series of events that led to the outcome, and identifying one or more events as the critical cause of the accident. However, she writes:

In general, event-based models are poor at representing systemic accident factors such as structural deficiencies in the organization, management decision making, and flaws in the safety culture of the or industry. An accident model should encourage a broad view of accident mechanisms that expands the investigation beyond the proximate evens.A narrow focus on technological components and pure engineering activities or a similar narrow focus on operator errors may lead to ignoring some of the most important factors in terms of preventing future accidents. (kl 452)

Here is a definition of system safety offered later in ESW in her discussion of the emergence of the concept within the defense and aerospace fields in the 1960s:

System Safety … is a subdiscipline of system engineering. It was created at the same time and for the same reasons. The defense community tried using the standard safety engineering techniques on their complex new systems, but the limitations became clear when interface and component interaction problems went unnoticed until it was too late, resulting in many losses and near misses. When these early aerospace accidents were investigated, the causes of a large percentage of them were traced to deficiencies in design, operations, and management. Clearly, big changes were needed. System engineering along with its subdiscipline, System Safety, were developed to tackle these problems. (kl 1007)

Here Leveson mixes system design and organizational dysfunctions as system-level causes of accidents. But much of her work in this book and her earlier Safeware: System Safety and Computers gives extensive attention to the design faults and component interactions that lead to accidents — what we might call system safety in the narrow or technical sense.

A systems engineering approach to safety starts with the basic assumption that some properties of systems, in this case safety, can only be treated adequately in the context of the social and technical system as a whole. A basic assumption of systems engineering is that optimization of individual components or subsystems will not in general lead to a system optimum; in fact, improvement of a particular subsystem may actually worsen the overall system performance because of complex, nonlinear interactions among the components. (kl 1007) 

Overall, then, it seems clear that Leveson believes that both organizational features and technical system characteristics are part of the systems that created the possibility for accidents like Bhopal, Fukushima, and Three Mile Island. Her own accident model designed to help identify causes of accidents, STAMP (Systems-Theoretic Accident Model and Processes) emphasizes both kinds of system properties.

Using this new causality model … changes the emphasis in system safety from preventing failures to enforcing behavioral safety constraints. Component failure accidents are still included, but or conception of causality is extended to include component interaction accidents. Safety is reformulated as a control problem rather than a reliability problem. (kl 1062)

In this framework, understanding why an accident occurred requires determining why the control was ineffective. Preventing future accidents requires shifting from a focus on preventing failures to the broader goal of designing and implementing controls that will enforce the necessary constraints. (kl 1084)

 Leveson’s brief analysis of the Bhopal disaster in 1984 (kl 384 ff.) emphasizes the organizational dysfunctions that led to the accident — and that were completely ignored by the Indian state’s accident investigation of the accident: out-of-service gauges, alarm deficiencies, inadequate response to prior safety audits, shortage of oxygen masks, failure to inform the police or surrounding community of the accident, and an environment of cost cutting that impaired maintenance and staffing. “When all the factors, including indirect and systemic ones, are considered, it becomes clear that the maintenance worker was, in fact, only a minor and somewhat irrelevant player in the loss. Instead, degradation in the safety margin occurred over time and without any particular single decision to do so but simply as a series of decisions that moved the plant slowly toward a situation where any slight error would lead to a major accident” (kl 447).

What the boss wants to hear …

According to David Halberstam in his outstanding history of the war in Vietnam, The Best and the Brightest, a prime cause of disastrous decision-making by Presidents Kennedy and Johnson was an institutional imperative in the Defense Department to come up with a set of facts that conformed to what the President wanted to hear. Robert McNamara and McGeorge Bundy were among the highest-level miscreants in Halberstam’s account; they were determined to craft an assessment of the situation on the ground in Vietnam that conformed best with their strategic advice to the President.

Ironically, a very similar dynamic led to one of modern China’s greatest disasters, the Great Leap Forward famine in 1959. The Great Helmsman was certain that collective agriculture would be vastly more productive than private agriculture; and following the collectivization of agriculture, party officials in many provinces obliged this assumption by reporting inflated grain statistics throughout 1958 and 1959. The result was a famine that led to at least twenty million excess deaths during a two-year period as the central state shifted resources away from agriculture (Frank DikötterMao’s Great Famine: The History of China’s Most Devastating Catastrophe, 1958-62).

More mundane examples are available as well. When information about possible sexual harassment in a given department is suppressed because “it won’t look good for the organization” and “the boss will be unhappy”, the organization is on a collision course with serious problems. When concerns about product safety or reliability are suppressed within the organization for similar reasons, the results can be equally damaging, to consumers and to the corporation itself. General Motors, Volkswagen, and Michigan State University all seem to have suffered from these deficiencies of organizational behavior. This is a serious cause of organizational mistakes and failures. It is impossible to make wise decisions — individual or collective — without accurate and truthful information from the field. And yet the knowledge of higher-level executives depends upon the truthful and full reporting of subordinates, who sometimes have career incentives that work against honesty.

So how can this unhappy situation be avoided? Part of the answer has to do with the behavior of the leaders themselves. It is important for leaders to explicitly and implicitly invite the truth — whether it is good news or bad news. Subordinates must be encouraged to be forthcoming and truthful; and bearers of bad news must not be subject to retaliation. Boards of directors, both private and public, need to make clear their own expectations on this score as well: that they expect leading executives to invite and welcome truthful reporting, and that they expect individuals throughout the organization to provide truthful reporting. A culture of honesty and transparency is a powerful antidote to the disease of fabrications to please the boss.

Anonymous hotlines and formal protection of whistle-blowers are other institutional arrangements that lead to greater honesty and transparency within an organization. These avenues have the advantage of being largely outside the control of the upper executives, and therefore can serve as a somewhat independent check on dishonest reporting.

A reliable practice of accountability is also a deterrent to dishonest or partial reporting within an organization. The truth eventually comes out — whether about sexual harassment, about hidden defects in a product, or about workplace safety failures. When boards of directors and organizational policies make it clear that there will be negative consequences for dishonest behavior, this gives an ongoing incentive of prudence for individuals to honor their duties of honesty within the organization.

This topic falls within the broader question of how individual behavior throughout an organization has the potential for giving rise to important failures that harm the public and harm the organization itself. 


Empowering the safety officer?

How can industries involving processes that create large risks of harm for individuals or populations be modified so they are more capable of detecting and eliminating the precursors of harmful accidents? How can nuclear accidents, aviation crashes, chemical plant explosions, and medical errors be reduced, given that each of these activities involves large bureaucratic organizations conducting complex operations and with substantial inter-system linkages? How can organizations be reformed to enhance safety and to minimize the likelihood of harmful accidents?

One of the lessons learned from the Challenger space shuttle disaster is the importance of a strongly empowered safety officer in organizations that deal in high-risk activities. This means the creation of a position dedicated to ensuring safe operations that falls outside the normal chain of command. The idea is that the normal decision-making hierarchy of a large organization has a built-in tendency to maintain production schedules and avoid costly delays. In other words, there is a built-in incentive to treat safety issues with lower priority than most people would expect.

If there had been an empowered safety officer in the launch hierarchy for the Challenger launch in 1986, there is a good chance this officer would have listened more carefully to the Morton-Thiokol engineering team’s concerns about low temperature damage to O-rings and would have ordered a halt to the launch sequence until temperatures in Florida raised to the critical value. The Rogers Commission faulted the decision-making process leading to the launch decision in its final report on the accident (The Report of the Presidential Commission on the Space Shuttle Challenger Accident – The Tragedy of Mission 51-L in 1986 – Volume OneVolume TwoVolume Three).

This approach is productive because empowering a safety officer creates a different set of interests in the management of a risky process. The safety officer’s interest is in safety, whereas other decision makers are concerned about revenues and costs, public relations, reputation, and other instrumental goods. So a dedicated safety officer is empowered to raise safety concerns that other officers might be hesitant to raise. Ordinary bureaucratic incentives may lead to underestimating risks or concealing faults; so lowering the accident rate requires giving some individuals the incentive and power to act effectively to reduce risks.

Similar findings have emerged in the study of medical and hospital errors. It has been recognized that high-risk activities are made less risky by empowering all members of the team to call a halt in an activity when they perceive a safety issue. When all members of the surgical team are empowered to halt a procedure when they note an apparent error, serious operating-room errors are reduced. (Here is a report from the American College of Obstetricians and Gynecologists on surgical patient safety; link. And here is a 1999 National Academy report on medical error; link.)

The effectiveness of a team-based approach to safety depends on one central fact. There is a high level of expertise embodied in the staff operating a surgical suite, an engineering laboratory, or a drug manufacturing facility. By empowering these individuals to stop a procedure when they judge there is an unrecognized error in play, this greatly extend the amount of embodied knowledge involved in a process. The surgeon, the commanding officer, or the lab director is no longer the sole expert whose judgments count.

But it also seems clear that these innovations don’t work equally well in all circumstances. Take nuclear power plant operations. In Atomic Accidents: A History of Nuclear Meltdowns and Disasters: From the Ozark Mountains to Fukushima James Mahaffey documents multiple examples of nuclear accidents that resulted from the efforts of mid-level workers to address an emerging problem in an improvised way. In the case of nuclear power plant safety, it appears that the best prescription for safety is to insist on rigid adherence to pre-established protocols. In this case the function of a safety officer is to monitor operations to ensure protocol conformance — not to exercise independent judgment about the best way to respond to an unfavorable reactor event.

It is in fact an interesting exercise to try to identify the kinds of operations in which these innovations are likely to be effective.

Here is a fascinating interview in Slate with Jim Bagian, a former astronaut, one-time director of the Veteran Administration’s National Center for Patient Safety, and distinguished safety expert; link. Bagian emphasizes the importance of taking a system-based approach to safety. Rather than focusing on finding blame for specific individuals whose actions led to an accident, Bagian emphasizes the importance of tracing back to the institutional, organizational, or logistic background of the accident. What can be changed in the process — of delivering medications to patients, of fueling a rocket, or of moving nuclear solutions around in a laboratory — that make the likelihood of an accident substantially lower?

The safety principles involved here seem fairly simple: cultivate a culture in which errors and near-misses are reported and investigated without blame; empower individuals within risky processes to halt the process if their expertise and experience indicates the possibility of a significant risky error; create individuals within organizations whose interests are defined in terms of the identification and resolution of unsafe practices or conditions; and share information about safety within the industry and with the public.

Mechanisms, singular and general

Let’s think again about the semantics of causal ascriptions. Suppose that we want to know what  caused a building crane to collapse during a windstorm. We might arrive at an account something like this:

  • An unusually heavy gust of wind at 3:20 pm, in the presence of this crane’s specific material and structural properties, with the occurrence of the operator’s effort to adjust the crane’s extension at 3:21 pm, brought about cascading failures of structural elements of the crane, leading to collapse at 3:25 pm.

The process described here proceeds from the “gust of wind striking the crane” through an account of the material and structural properties of the device, incorporating the untimely effort by the operator to readjust the device’s extension, leading to a cascade from small failures to a large failure. And we can identify the features of causal necessity that were operative at the several links of the chain.

Notice that there are few causal regularities or necessary and constant conjunctions in this account. Wind does not usually bring about the collapse of cranes; if the operator’s intervention had occurred a few minutes earlier or later, perhaps the failure would not have occurred; and small failures do not always lead to large failures. Nonetheless, in the circumstances described here there is causal necessity extending from the antecedent situation at 3:15 pm to the full catastrophic collapse at 3:25 pm.

Does this narrative identify a causal mechanism? Are we better off describing this as a sequences of cause-effect sequences, none of which represents a causal mechanism per se? Or, on the contrary, can we look at the whole sequence as a single causal mechanism — though one that is never to be repeated? Does a causal mechanism need to be a recurring and robust chain of events, or can it be a highly unique and contingent chain?

Most mechanisms theorists insist on a degree of repeatability in the sequences that they describe as “mechanisms”. A causal mechanism is the triggering pathway through which one event leads to the production of another event in a range of circumstances in an environment. Fundamentally a causal mechanism is a “molecule” of causal process which can recur in a range of different social settings.

For example:

  • X typically brings about O.

Whenever this sequence of events occurs, in the appropriate timing, the outcome O is produced. This ensemble of events {X, O} is a single mechanism.

And here is the crucial point: to call this a mechanism requires that this sequence recurs in multiple instances across a range of background conditions.

This suggests an answer to the question about the collapsing crane: the sequence from gust to operator error to crane collapse is not a mechanism, but is rather a unique causal sequence. Each part of the sequence has a causal explanation available; each conveys a form of causal necessity in the circumstances. But the aggregation of these cause-effect connections falls short of constituting a causal mechanism because the circumstances in which it works are all but unique. A satisfactory causal explanation of the internal cause-effect pairs will refer to real repeatable mechanisms — for example, “twisting a steel frame leads to a loss of support strength”. But the concatenation does not add up to another, more complex, mechanism.

Contrast this with “stuck valve” accidents in nuclear power reactors. Valves control the flow of cooling fluids around the critical fuel. If the fuel is deprived of coolant it rapidly overheats and melts. A “stuck valve-loss of fluid-critical overheating” sequence is a recognized mechanism of nuclear meltdown, and has been observed in a range of nuclear-plant crises. It is therefore appropriate to describe this sequence as a genuine causal mechanism in the creation of a nuclear plant failure.

(Stuart Glennan takes up a similar question in “Singular and General Causal Relations: A Mechanist Perspective”; link.)

%d bloggers like this: