Are randomized controlled trials the “gold standard” for establishing causation?

The method of randomized controlled trials (RCT) is often thought to be the best possible way of establishing causation, whether in biology, or medicine or social science. An experiment based on random controlled trials can be described simply. It is hypothesized that

  • (H) X causes Y in a population of units P.

An experiment testing H is designed by randomly selecting a number of individuals from P into Gtest (the test group) and randomly assigning a different set of individuals from P into Gcontrol (the control group). Gtest and Gcontrol are exposed to X (the treatment) under carefully controlled conditions designed to ensure that the ambient conditions surrounding both tests are approximately the same. The status of each group is then measure with regard to Y, and the difference in the value of Y between the two groups is said to be the “average treatment effect” (ATE).

This research methodology is often thought to capture the logical core of experimentation, and is thought to constitute the strongest evidence possible for establishing or refuting a causal relationship between X and Y. It is thought to represent a purely observational way of establishing causal relations among factors. This is so because of the random assignment of individuals to the two groups (so potentially causally relevant individual differences are averaged out in each group) and because of the strong efforts to isolate the administration of the test so that each group is exposed to the same unknown factors that may themselves influence the outcome to be measured. As Handley et al put the point in their review article “Selecting and Improving Quasi-Experimental Designs in Effectiveness and Implementation Research” (2018): “Random allocation minimizes selection bias and maximizes the likelihood that measured and unmeasured confounding variables are distributed equally, enabling any differences in outcomes between the intervention and control arms to be attributed to the intervention under study” (Handley et al 2018: 6). Sociology is interested in discovering and measuring the causal effects of large social conditions and interventions – “treatments”, as they are often called in medicine and policy studies. It might seem plausible, then, that empirical social science should make use of random controlled trials whenever possible, in efforts to discover or validate causal connections.

The supposed “gold standard” status of random controlled trials has been especially controversial in the last several years. Serious methodological and inferential criticisms have been raised of common uses of RCT experiments, and philosopher of science Nancy Cartwright has played a key role in advancing these criticisms. Cartwright and Hardie’s Evidence-Based Policy: A Practical Guide to Doing It Better (link) provided a strong critique of the use of RCT methodology in areas of public policy, and Cartwright and others have offered strong arguments to show that inferences about causation based on RCT experiments are substantially more limited and conditional than generally believed.

A pivotal debate among experts in a handful of fields about RCT methodology took place in a special issue of Social Science and Medicine in 2018. This volume is essential reading for anyone interested in causal reasoning. Especially important is Deaton and Cartwright’s article “Understanding and misunderstanding randomized controlled trials” (link). Here is the abstract to the Deaton and Cartwright article:

ABSTRACT Randomized Controlled Trials (RCTs) are increasingly popular in the social sciences, not only in medicine. We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of investigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justification is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding ‘external validity’ is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not ‘what works’, but ‘why things work’.

Deaton and Cartwright put their central critique of RCT methodology in these terms:

We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of investigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates…. We argue that any special status for RCTs is unwarranted. (Deaton and Cartwright 2018: 2).

They provide an interpretation of RCT methodology that places it within a range of strategies of empirical and theoretical investigation, and they argue that researchers need to choose methods that are suitable to the problems that they study.

One of the key concerns they express has to do with extrapolating and generalizing from RCT studies (3). A given RCT study is carried out in a specific and limitation set of cases, and the question arises whether the effects documented for the intervention in this study can be extrapolated to a broader population. Do the results of a drug study, a policy study, or a behavioral study give a basis for believing that these results will obtain in the larger population? Their general answer is that extrapolation must be done very carefully. “The ‘gold standard or truth’ view does harm when it undermines the obligation of science to reconcile RCTs results with other evidence in a process of cumulative understanding” (5). And even more emphatically, “we strongly contest the often-expressed idea that the ATE calculated from an RCT is automatically reliable, that randomization automatically controls for unobservables, or worst of all, that the calculated ATE is true [of the whole population]” (10).

In his contribution to the SSM volume Robert Sampson (link) shares this last concern about the limits of extending RCT results to new contexts/settings:

For example, will a program that was evaluated in New York work in Chicago? To translate an RCT into future actions, we must ask hard questions about the potential mechanisms through which a treatment influences an outcome, heterogeneous treatment effects, contextual variations, unintended consequences or policies that change incentive and opportunity structures, and the scale at which implementing policies changes their anticipated effects. (Sampson 2018: 67)

The general perspective from which Deaton and Cartwright proceed is that empirical research about causal relationships — including experimentation—requires a broad swath of knowledge about the processes, mechanisms, and causal powers at work in the given domain. This background knowledge is needed in order to interpret the results of empirical research and to assess the degree to which the findings of a specific study can plausibly be extrapolated to other populations.

These methodological and logical concerns about the design and interpretation of experiments based on randomized controlled trials make it clear that it is crucial for social scientists to treat RCT methodology carefully and critically. Is RCT experimentation a valuable component of the toolkit of sociological investigation? Yes, of course. But as Cartwright demonstrates, it is important to keep several philosophical points in mind. First, there is no “gold-standard” method for research in any field; rather, it is necessary to adapt methods to the nature of the data and causal patterns in a given field. Second, she (like most philosophers of science) is insistent that empirical research, whether experimental, observational, statistical, or Millian, always requires theoretical inquiry into the underlying mechanisms that can be hypothesized to be at work in the field. Only in the context of a range of theoretical knowledge is it possible to arrive at reasonable interpretations of (and generalizations from) a set of empirical findings.

So, what about it? Should we imagine that randomized controlled trials constitute the aspirational gold standard for sociological research, in sociology or medicine or public policy? The answer seems to be clear: RCT methodology is a legitimate and important tool for sociological research, but it is not fundamentally superior to the many other methods of empirical investigation and inference in use in the social sciences.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: