Consolidated quantitative history

It is fascinating to browse through the sessions on the program at the Social Science History Association this month (link). SSHA is distinguished by its deep embrace of disciplinary and methodological diversity, and there are panels deriving from qualitative, comparative, and theoretical perspectives. But particularly interesting for me this year are the more quantitative subjects — reflecting the cliometric impulse that led to the formation of the SSHA several decades ago.  (Here are a few comments by Julia Adams, Elisabeth Stephanie Clemens, and Anne Shola Orloff, past and current presidents of SSHA, on this history.) There are panels on historical measures of the standard of living in different parts of Eurasia; on fertility, mobility, and population size in small and large regions; on longterm climate and atmospheric fluctuation over time (the year without a summer in mid-nineteenth century); levels of agricultural productivity over several centuries in several regions; the degree of inequality in landholding in Scania and North China; and many other fascinating studies of measurable social properties. And, of course, the papers offer time graphs of the variables that are the subject of the study.

So what if we had a goal of providing a unified and public measurement of factors like these over a large expanse of time and space? What if we set out to synthesize many studies currently underway and arrive at a common set of measures over time for these regions?Q

To an extent this is the goal of the Eurasian Population and Family History Project: to assemble a large set of research groups across Eurasia, measuring demographic data using comparable methods in the several locations (link). Though the project hasn’t yet produced a synthetic volume summarizing all the results, we can hope that this kind of product will eventually be forthcoming. The researchers describe the project in these terms: “New data and new methods … have begun to illuminate the complexities of demographic responses to exogenous stress, economic and otherwise.… Combined time-series and event-history analyses of longitudinal, nominative, microlevel data now allow for the finely grained differentiation of mortality, fertility, and other demographic responses by social class, household context, and other dimensions at the individual level” (Tommy Bengtsson, Cameron Campbell, James Z. Lee, et al, Life under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900 (Eurasian Population and Family History), 2004, pp. viii-ix). Their goal is an ambitious one; it is to provide detailed, analytically sophisticated multi-generational studies of a number of populations across Eurasia. The studies are intended to permit the researchers to probe issues of causation as well as to identify important dimensions of similarity and difference across regions and communities.  The most recent volume in the series appeared earlier this fall (Noriko Tsuya, Wang Feng, George Alter, James Z. Lee, et al, Prudence and Pressure: Reproduction and Human Agency in Europe and Asia, 1700-1900 (Eurasian Population and Family History)).

Suppose we wanted to go further and create an interactive Wiki site that permitted researchers to upload their findings for a specified set of variables; and suppose the underlying software created a dynamic set of time graphs and maps representing these data over time. And suppose that the data displays can be broken out at different levels of scale — North China, China, Eurasia. Finally, of course, we would want to specify that the data summaries are tagged with meta-data indicating the studies and methodologies leading to the graph. Could we say that this hypothetical site would then represent the meta-knowledge of the community of economic historians, climate scientists, and historical demographers? And could we speculate that this product would be an enormous benefit for historical researchers in a broad range of disciplines?

We can immediately predict some limitations to such a collective project. Most important is the unavoidable incompleteness of the data. We may have studies on farm productivity that document output for portions of North China and portions of the Yangzi Delta. But, of course, this doesn’t tell us much about western China. So we can’t realistically aspire to a full and complete representation of the variables full regions and periods.

Second, there is the problem of methodological inconsistencies across studies. Robert Allen is a leader in attempting to document standard of living across Europe and Asia (Robert Allen, ed., Living Standards in the Past: New Perspectives on Well-Being in Asia and Europe). And a central problem he faces is that multiple studies estimate consumption and wellbeing in different ways. So forming a composite representation requires an additional set of assumptions and models by the meta-study researcher.

Third, there is the question of defining the role for verbal analysis and reasoning in such a knowledge system. Are we to imagine this collective data set as a universal data appendix to a huge range of verbal historical narratives and analyses? Or might we come to think that the graphs speak for themselves, with no need for verbal analysis and inference?

All that said, I think the hypothetical Wiki site would be enormously valuable. It would provide us with birds-eye view of the large structural and material features that defined and constrained Eurasian history.  And it has the potential of suggesting new avenues of research and new causal hypotheses about documented processes of change. For example, we may compare the time series of life expectancy and average temperature, and we may hypothesize that mortality and fertility were affected by abnormal climate conditions (through the medium of agricultural performance). But we may also be able to observe suggestive correlations between material variables and behavior — for example, between ecological crises and the frequency of peasant uprisings. Or, conceivably, our eye might be led to a graph of sex ratios in a region and another of the incidence of banditry, and we might be led to a “bare sticks” hypothesis about social unrest: when there is an excess of unmarried young men, we can expect an upsurge of banditry and crime.

There are increasingly powerful tools available that permit scholars and the interested public to explore large public datasets such as the US Census or Bureau of Labor Statistics (link).  It is perhaps not wholly unrealistic to imagine a platform that permits multiple researchers to contribute to a meta-dataset for economic and social history of the world.

New modes of historical presentation

Victor Lieberman’s Strange Parallels: Volume 1, Integration on the Mainland: Southeast Asia in Global Context, c.800-1830 and Strange Parallels: Volume 2, Mainland Mirrors: Europe, Japan, China, South Asia, and the Islands: Southeast Asia in Global Context, c.800-1830 represent about 1000 pages of careful, dense historical prose extending over two volumes. As previously discussed (linklink), the book reviews a thousand years of history of the polities of France, Kiev, Burma, Japan, and China, it documents a significant correlation of timing across the extremes of Eurasia, and it offers some historical hypotheses about the causes of this synchronicity. It is a long and involved story.

My question here is perhaps a startling one: Is it possible that some alternative modes of presentation would permit the author to represent the heart of the historical findings much more efficiently in the form of a complex animated visual display? Could the empirical heart of the two volumes be summarized in the form of a rich data display over time? Is the verbal narrative simply a clumsy way of representing what really ought to be graphed? What would be gained, and what would be lost by replacing the long complex text with a compact series of graphs and maps?

This thought experiment is possible because Lieberman’s argument lends itself to a quantitative interpretation. Essentially he is focusing on factors that can be estimated over time: degree of scope of a regime, degree of integration of institutions, economic productivity, agricultural intensity, rainfall, temperature, population level and density, mortality by disease, and transport capacity, for example. And he is looking for one or more factors whose temporal variations can be interpreted as a causal factor explaining correlations across the graphs. So we could imagine a master graph representing the factual core of the research, with six groups of graphs over time, representing the chief variables for each region over time.  Here is Lieberman’s initial effort along these lines, graphing his estimate of “scope and consolidation” of the states of SE Asia and France.  And we can imagine presenting different data series representing his findings about agricultural productivity, mortality, population, climate change, etc., arranged around a single timeline.

We might imagine supplementing these superimposed data series with a series of dated maps representing the territorial scope of the states of the various regions, arranged along a timeline:

France T1

France T2

France T3

(A similar series would be constructed for the states of SE Asia.)

What this coordinated series of graphics represents, then, is the core set of facts that Lieberman has synthesized and presented in the book.  By absorbing the social, political, and economic changes represented by this graphical timeline, the reader has gained access for the full set of empirical claims offered by Lieberman.  And, one might say, the presentation is more direct and comprehensible than the verbal description of these changes contained in the text.  Moreover, we might expect that patterns will emerge more or less directly from these graphic presentations — for example, the synchrony between state crisis and accumulating climate change.

As for what is lost in this version of the story — several things seem clear. First, much of the narrative that Lieberman provides is synthetic. He attempts to pull together a wide variety of sources in order to arrive at a summary statement such as this: “The Capetian state increased dramatically in scope and administrative competence between 1000 and 1250.” So the narrative serves to justify and document a particular inflection point in the long graph of “French polity”. It provides the evidentiary basis for the estimate at this period in time.

Second, of course, the packet of graphs I’ve just described lacks the eloquence and vividness of the prose that Lieberman or other talented historians are able to achieve in telling their stories. It represents only the abstract summary of conclusions, not the nuance of the reasoning or the drama of the story. The prose text is inherently enjoyable to read, and it engages the reader to share the historical puzzle. But, one might argue, the epistemic core of the book is precisely the abstract factual findings, not the prose style. And the reasoning can be captured as a hypertext lying behind the graph — a sort of annotated hyper-document.

Finally, this notion of arriving at an abstract, schematic representation of a history of something doesn’t work at all for many kinds of historical writing. Michael Kammen’s Mystic Chords of Memory: The Transformation of Tradition in American Culture is an inherently semiotic argument, working out the ways that public ceremonies and monuments work in the consciousness of a population. Robert Darnton’s The Great Cat Massacre: And Other Episodes in French Cultural History is a deft interpretive inquiry, arriving at a complex interpretation of a puzzling set of actions. These examples of great historical writing are evidence-based; but they are not designed to allow estimation of a set of variables over time. And I don’t see that there is the possibility of a more abstract and symbolic representation of the historical knowledge they represent.

One might say that what we have encountered here is an important fissure within contemporary historical writing, between “cliometric” research and knowledge (Reflections on the Cliometrics Revolution: Conversations with Economic Historians) and hermeneutic historical knowledge (Paul Ricoeur, Memory, History, Forgetting). The former is primary interested in the processes of change of measurable human of social variables over time, whereas the latter is concerned with interpreting human actions and meanings. The former is amenable to quantitative representation — graphs — while the latter is inherently linguistic and interpretive. The former has to do with estimation and causal analysis, while the latter has to do with interpretation and narrative.

Often, of course, historians are involved in both kinds of interpretation and analysis — both measurement and interpretation.  So when Charles Tilly describes four centuries of French contention in The Contentious French, he is interested in charting the rising frequency of contentious actions (cliometric); but he is also interested in interpreting the intentions and meanings associated with those actions (hermeneutic).


Image: Artillery, 1911. Roger de La Fresnaye. Metropolitan Museum, New York

In general I’m skeptical about the ability of the social sciences to offer predictions about future social developments. (In this respect I follow some of the instincts of Oskar Morgenstern in On the Accuracy of Economic Observations.) We have a hard time answering questions like these:

  • How much will the first installment of TARP improve the availability of credit within three months?
  • Will the introduction of UN peacekeeping units reduce ethnic killings in the Congo?
  • Will the introduction of small high schools improve student performance in Chicago?
  • Will China develop towards more democratic political institutions in the next twenty years?
  • Will American cities witness another round of race riots in the next twenty years?

However, the situation isn’t entirely negative, and there certainly are some social situations for which we can offer predictions in at least a probabilistic form. Here are some examples:

  • The unemployment rate in Michigan will exceed 10% sometime in the next six months.
  • Coalition casualties in the Afghanistan war will be greater in 2009 than in 2008.
  • Illinois Governor Blogojevich will leave office within six months.
  • Germany will be the world leader in solar energy research by 2020 (link).
  • The Chinese government will act strategically to prevent emergence of regional independent labor organizations.

It is worth exploring the logic and function of prediction for a few lines. Fundamentally, it seems that prediction is related to the effort to forecast the effects of interventions, the trajectory of existing trends, and the likely strategies of powerful social actors. We often want to know what will be the net effect of introducing X into the social environment. (For example, what effect on economic development would result from a region’s succeeding in increasing the high school graduation rate from 50% to 75%?) We may find it useful to project into the future some social trends that can be observed in the present. (Demographers’ prediction that the United States will be a “majority-minority” population by 2042 falls in this category (link).) And we can often do quite a bit of rigorous reasoning about the likely actions of leaders, policy makers, and other powerful actors given what we know about their objectives and their beliefs. (We can try to forecast the outcome of the current impasse between Russia and Ukraine over natural gas by analyzing the strategic interests of both sets of decision-makers and the constraints to which they must respond.)

So the question is, what kinds of predictions can we make in the social realm? And what circumstances limit our ability to predict?

Predictions about social phenomena are based on a couple of basic modes of reasoning:

  • extrapolation of current trends
  • modeling of causal hypotheses about social mechanisms and structures
  • reasoning about strategic actions likely to be taken by actors
  • derivation of future states of a system from a set of laws

And predictions can be presented in a range of levels of precision, specificity, and confidence:

  • prediction of a single event or outcome: the selected social system will be in state X at time T.
  • prediction of the range within which a variable will fall: the selected social variable will fall within a range Q ±20%.
  • prediction of the range of outcome scenarios that are most likely: “Given current level of unrest, rebellion 60%, everyday resistance 30%, resolution 10%”
  • prediction of the direction of change: the variable of interest will increase/decrease over the specified time period
  • prediction of the distribution of properties over a group of events/outcomes. X percent of interventions will show improvement of variable Y.

Here are some particular obstacles to reliable predictions in the social realm:

  • unquantifiable causal hypotheses — “small schools improve student performance”. How large is the effect? How does it weigh in relation to other possible causal factors?
  • indeterminate interaction effects — how will school policy changes interact with rising unemployment to jointly influence school attendance and performance?
  • open causal fields. What other currently unrecognized causal factors are in play?
  • the occurrence of unpredictable exogenous events or processes (outbreak of disease)
  • ceteris paribus conditions. These are frequently unsatisfied.

So where does all this leave us with respect to social predictions? A few points seem relatively clear.

Specific prediction of singular events and outcomes seems particularly difficult: the collapse of the Soviet Union, China’s decision to cross the Yalu River in the Korean War, or the onset of the Great Depression were all surprises to the experts.

Projection of stable trends into the near future seems most defensible — though of course we can give many examples of discontinuities in previously stable trends. Projection of trends over medium- and long-term is more uncertain — given the likelihood of intervening changes of structure, behavior, and environment that will alter the trends over the extended time.

Predictions of limited social outcomes, couched in terms of a range of possibilities attached to estimates of probabilities and based on analysis of known causal and strategic processes, also appear defensible. The degree of confidence we can have in such predictions is limited by the possibility of unrecognized intervening causes and processes.

The idea of forecasting the total state of a social system given information about the current state of the system and a set of laws of change is entirely indefensible. This is unattainable; societies are not systems of variables linked by precise laws of transition.

Causing public opinion

It is interesting to consider what sorts of things cause shifts in public opinion about specific issues. This week’s national election is one important example. But what about more focused issues — for example, the many ballot initiatives that were considered in many states? To what extent can we discover whether there is a measurable effect on public opinion by the organized efforts of advocacy groups through advertising and other strategies for reaching the minds of voters?

In these cases we might imagine that voters have a prior set of attitudes towards the issue — perhaps including a large number of “don’t know/don’t care” people. Then a set of advocates form to lobby the public pro and con. They mount campaigns to influence voters’ opinions towards the option they prefer. And on the day of the election voters will indicate their approval — often in ratios quite different from those that were measured in pre-campaign surveys. So something happened to change the composition of public opinion on the issue. The question here is whether it is possible to estimate the effects of various possible influencers.

This seems like potentially a very simple area of causal reasoning about social processes. The outcome variable is fairly observable through polling and the final election, and the interventions are also usually observable as well, both in timing and magnitude. So the world may present us with a series of interventions and outcomes that support fairly strong causal conclusions — for example, “each time ad campaign X hits the airwaves in a given market, there is an observed uptick in support for the proposition.” It is unlikely that the correlation occurred as a result of random variations in both terms; we have a theory of how advertising influences voters; and we conclude that “ad campaign X was a causal factor in shaping voter opinion in this time period.” (It is even possible that X played a role in both segments of opinion, resulting in an up-tick in both yes and no responses. Then we might also judge that X was effective at polarizing voters — not the effect the strategist would have aimed at.)

This is an example of singular causal reasoning, in that it has to do with one population, one issue, and a specific series of interventions. What would be needed in order to arrive at a conclusion with generic scope — for example, “advertising along the lines of X is generally effective in increasing support for its issue”? The most straightforward argument to the generic conclusion would be a study of an extended set of cases with a variety of strategies in play. If we discover something like this — “In 80% of cases where X is included in the mix it is observed to have a positive effect on opinion” — then we would have inductive reason for accepting the generic causal claim as well. This is basic experimental reasoning.

Take a hypothetical issue — a referendum on a proposal for changing the system the state uses for assessing business taxes. Suppose that a polling firm has done weekly polling on the question and has recorded “yes/no/no opinion” since October 2007. Suppose that two organizations emerged in December to advocate for and against the proposal; that each raised about $5 million; and that each included an advertising campaign in its strategy. Suppose further that the “no” campaign also included a well-organized effort at the parish level to persuade church members to vote against the measure on religious grounds and the “yes” campaign included a grassroots effort to get university students and staff to be supportive of the measure on pro-science and pro-economy grounds. And suppose each organization mounted a “new media” campaign using email lists and web comminication to make its case. Finally, suppose we have good timeline data about the occurrence and volume of media spots throughout the period of June through November.

This scenario involves three types of causes, a timeline representing the application of the interventions, and a timeline representing the effects. From this body of data can we arrive at estimates of the relative efficacy of the three treatments? And does this set if conclusions provide credible guidance for other campaigns over other issues in other places?

There is also the question of the efficacy of the implementation of the strategies. Take the ad campaigns. Whether a specific campaign succeeds in changing viewers’ opinions depends on the content, message, and production quality. Does the message resonate with a target segment of voters? Does the production design stimulate emotions that will lead to the desired vote? So evaluating efficacy needs to be done across instances of media as well as across varieties of media. (This is the function of focus groups and snap polls — to evaluate the effects of specific messages and production choices on real voters.)

(Here is a link to some information about the process leading up to a positive vote on the Michigan Stem Cell initiative this month. A good general introduction to the social psychological theories about the formation of attitudes and opinions is Stuart Oskamp and P. Wesley Schultz, Attitudes and Opinions.)

Polling and social knowledge

Here’s a pretty interesting graphic from

As you can see, the graph summarizes a large number of individual polls measuring support for the two major party candidates from January 1 to October 26. The site indicates that it includes all publicly available polls during the time period. Each poll result is represented with two markers — blue for Obama and red for McCain. The red and blue trend lines are “trend estimates” based on local regressions for the values of the corresponding measurements for a relatively short interval of time (the site doesn’t explicitly say what the time interval is). So, for example, the trend estimate for August 1 appears to be approximately 47%:42% for the two candidates. As the site explains, 47% is not the average of poll results for Obama on August 1; instead, it is a regression result based on the trend of all of Obama’s polling results for the previous several days.

There are a couple of things to observe about this graph and the underlying methodology. First, it’s a version of the “wisdom of the crowd” idea, in that it arrives at an estimate based on a large number of less-reliable individual observations (the dozen or so polling results for the previous several days). Each of the individual poll results has an estimate-of-error which may be in the range of 3-5 percentage points; the hope is that the aggregate result has a higher degree of precision (a narrower error bar).

Second, the methodology attempts to incorporate an estimate of the direction and rate of movement of public opinion, by incorporating trend information based on the prior several days’ polling results.

Third, it is evident that there is likely to be a range of degrees of credibility assigned to the various component polls; but the methodology doesn’t assign greater weight to “more credible” polls. Ordinary readers might be inclined to assign greater weight to a Gallup poll or a CBS poll than a Research2000 or a DailyKos poll; but the methodology treats all results equally. Likewise, the critical reader might assign more credibility to a live phone-based poll than an internet-based or automated phone poll; but this version of the graph includes all polls. (On the website it is possible to filter out internet-generated or automated phone polling results; this doesn’t seem to change the shape of the results noticeably.)

There is also a fundamental question of validity and reliability that the critical reader needs to ask: how valid and reliable are these estimates for a particular point in time? That is, how likely is it that the trend estimate of support for either candidate on a particular day is within a small range of error of the actual value? I assume there is some statistical method for estimating probable error for this methodology, though it doesn’t appear to be explained on the website. But fundamentally, the question is whether we have a rational basis for drawing any of the inferences that the graph suggests — for example, that Obama’s lead over McCain is narrowing in the final 14 days of the race.

Finally, there is the narrative that we can extract from the graph, and it tells an interesting story. From January through March candidate Obama has a lead over candidate McCain; but of course both candidates are deeply engaged in their own primary campaigns. At the beginning of April the candidates are roughly tied at 45%. From April through September Obama rises slowly and maintains support at about 48%, while McCain falls in support until he reaches a low point of 43% in the beginning of August. Then the conventions take place in August and early September — and McCain’s numbers bump up to the point where Obama and McCain cross in the first week of September. McCain takes a brief lead in the trend estimates. His ticket seems to derive more benefit from his “convention bump” than Obama does. But in the early part of September the national financial crisis leaps to center stage and the two candidates fare very differently. Obama’s support rises steeply and McCain’s support falls at about the same rate, opening up a 7 percentage point gap in the trend estimates by the middle of October. From the middle of October the race begins to tighten; McCain’s support picks up and Obama’s begins to dip slightly at the end of October. But the election looms — the trend estimates tell a story that’s hard to read in any way but “too late, too little” for the McCain campaign.

And, of course, it will be fascinating to see where things stand a week from today.

Here is the explanation that the website offers of its methodology:

[quoting from]
“Where do the numbers come from?

When you hold the mouse pointer over a state, you see a display of the latest “trend estimate” numbers from our charts of all available public polls for that race. The numbers for each candidate correspond to the most recent trend estimate — that is the end point of the trend line that we draw for each candidate. If you click the state on the map, you will be taken to the page on that displays the chart and table of polls results for that race.

In most cases, the numbers are not an “average” but rather regression based trendlines. The specific methodology depends on the number of polls available.

  • If we have at least 8 public polls, we fit a trend line to the dots represented by each poll using a “Loess” iterative locally weighted least squares regression.
  • If we have between 4 and 7 polls, we fit a linear regression trend line (a straight line) to best fit the points.
  • If we have 3 polls or fewer, we calculate a simple average of the available surveys.

How do regression trend lines differ from simple averages?

Charles Franklin, who created the statistical routines that plot our trend lines, provided the following explanation last year:

Our trend estimate is just that, an estimate of the trends and where the race stands as of the latest data available. It is NOT a simple average of recent polling but a “local regression” estimate of support as of the most recent poll. So if you are trying to [calculate] our trend estimates from just averaging the recent polls, you won’t succeed.

Here is a way to think about this: suppose the last 5 polls in a race are 25, 27, 29, 31 and 33. Which is a better estimate of where the race stands today? 29 (the mean) or 33 (the local trend)? Since support has risen by 2 points in each successive poll, our estimator will say the trend is currently 33%, not the 29% the polls averaged over the past 2 or 3 weeks during which the last 5 polls were taken. Of course real data are more noisy than my example, so we have to fit the trend in a more complicated way than the example, but the logic is the same. Our trend estimates are local regression predictions, not simple averaging. If the data have been flat for a while, the trend and the mean will be quite close to each other. But if the polls are moving consistently either up or down, the trend estimate will be a better estimate of opinion as of today while the simple average will be an estimate of where the race was some 3 polls ago (for a 5 poll average– longer ago as more polls are included in the average.) And that’s why we estimate the trends the way we do.”

What do polls tell us?

We’re all interested in the opinions of vast numbers of strangers — potential voters, investors, consumers, college students, or home owners. Our interest is often a practical one — we would like to know how the election is likely to go, whether the stock market will rebound, whether an influenza season will develop into a pandemic, or whether the shops in our cities and malls will have higher or lower demand in the holiday season. And so we turn to polls and surveys of attitudes and preferences — consumer confidence surveys, voter preference polls, surveys of public health behaviors, surveys of investor confidence. And tools such as aggregate and disaggregate the data to allow us to make more refined judgments about what the public’s mood really is. But how valid is the knowledge that is provided by surveys and polls? To what extent do they accurately reflect an underlying social reality of public opinion? And to what extent does this knowledge provide a basis for projecting future collective behavior (including voting)?

There are several important factors to consider.

First is the heterogeneity of social characteristics across a population at virtually every level of scale — including especially attitudes and beliefs. No matter what slice of the social demographic we select — selecting for specific age, race, religion, and income, for example — there will be a range of opinions across the resulting group. Groups don’t suddenly become homogeneous when we find the right way of partitioning the population.

Second is an analogous point about plasticity over time. The attitudes and preferences of individuals and groups change over time — often rapidly. In polling I suppose this is referred to as “voter volatility” — the susceptibility of a group to changing its preferences in response to changing information and other stimulation. And the fact appears to be that opinions and beliefs change rapidly during periods of decision-making. So knowing that 65% of Hispanic voters preferred X over Y on October 10 doesn’t imply much about the preferences of this group two weeks later. This is precisely what the campaigns are trying to accomplish — a new message or commercial that shifts preferences for an extended group.

Third are questions having to do with the honesty of the responses that a survey or poll elicits. Do subjects honestly record their answers; do they conceal responses they may be ashamed of (the Bradley effect); do they exaggerate their income or personal happiness or expected grade in a course? There are survey techniques intended to address these possibilities (obscuring the point of a question, coming back to a question in a different way); but the possibility of untruthful responses raises an important problem for us when we try to assess the realism of a poll or survey.

Fourth are the standard technical issues having to do with sampling and population estimation: how large a set of observations are required to arrive at an estimate of a population value with 95% confidence? And what measures need to be taken to assure a random sample and avoid sample bias? For example, if polling is done based solely on landline phone numbers, does this introduce either an age bias or an income bias if it is true that affluent young people are more likely to have only cell phones?

So, then, what is the status of opinion surveys and polls as a source of knowledge about social reality? Do public opinion surveys tell us something about objective underlying facts about a population? What does a finding like “65% of Iowans favor subsidies for corn ethanol” based on a telephone poll of 1000 citizens tell us about the opinions of the full population of the state of Iowa?

Points made above lead to several important cautions. First, the reality of public opinion and preference itself is fluid and heterogeneous. The properties we’re trying to measure vary substantially across any subgroup we might define — pro/con assessments about a candidate’s judgment, for example. So the measurement of a particular question is simply an average value for the group as a whole, with the possibility of a substantial variance within the group. And second, the opinions themselves may change rapidly over time at the individual level — with the result that an observation today may be very different from a measurement next week. Third, it is a credible hypothesis that demographic factors such as race, income, or gender may affect attitudes and opinions; so there is a basis for thinking that data disaggregated by these variables may demonstrate more uniformity and consistency. Finally, the usual cautions about sample size and sample bias are always relevant; a poorly designed study tells us almost nothing about the underlying reality.

But what about the acid test: to what extent can a series of polls and surveys performed over many subgroups over an extended period of time, help to forecast the collective behavior of the group as a whole? Can we use this kind of information to arrive at credible estimates of an election in two weeks, or the likely demand for automobiles in 2009, or the willingness of a whole population to accept public health guidelines in a time of pandemic flu?

Discipline, method, hegemony in sociology

An earlier post referred to the “Perestroika” debate within political science. There are similar foundational debates within other social science disciplines, including especially sociology. What is particularly striking is not that there are deep disagreements about the methodology and epistemology of sociology — this has often been true within sociology, going back to the methodenstreiten that divided the German-speaking social sciences around the turn of the twentieth century, but rather the degree to which these disagreements have been so divisive and polarizing within the discipline in the U.S. in the past forty years.

Several interesting books that focus on some of these debates within sociology include George Steinmetz, The Politics of Method in the Human Sciences: Positivism and Its Epistemological Others; Immanuel Wallerstein, The End of the World As We Know It: Social Science for the Twenty-First Century; Craig Calhoun, Sociology in America: A History; and Alvin Gouldner, The Coming Crisis of Western Sociology. Andrew Abbott’s Chaos of Disciplines is also a very interesting treatment of the sociology of the social science disciplines and the mechanisms through which a discipline defines its boundaries and maintains “discipline”.

In sociology it is possible to map the main fault lines within the discipline in several ways. First, we can distinguish quantitative-statistical research from both qualitative-ethnographic approaches and comparative-historical approaches. (It’s worth observing that this results in a tripartite division of methods rather than the simpler bipolar “quantitative-qualitative” divide. Historical and comparative studies are distinct from statistical studies, but they are also distinguishable from ethnographic interpretations.) Or we can characterize this space in terms of “large-N, small-N, single-N” studies. And we can distinguish broadly among positivist and anti-positivist perspectives; causal and interpretive perspectives; realist and anti-realist perspectives; critical and orthodox perspectives; and there are probably other important dimensions of disagreement as well. In addition to these large divisions among methodological approaches, there are also a large number of frameworks of thought that involve a combination of method and theory — for example, feminist sociology, post-structuralist sociology, critical theory, and post-colonial sociology.

The disputes between these methodological frameworks seem to continue to create large, fractious divides within graduate sociology departments, with advocates for one method or the other claiming virtually exclusive legitimacy. And this struggle for methodological primacy appears to extend to the editorial policies of major sociology journals, association programs, and tenure deliberations. Until fairly recently — the 1990s, let us say — the quantitative-statistical faction held sway as the hegemonic methodological doctrine. Inspired by positivism and the example of the natural sciences and perhaps guided by governmental and foundation funding priorities, quantitative studies were considered most scientific, most rigorous, most objective, and most explanatory. Historical and interpretive studies were treated as “ideographic” or anecdotal — not well suited to discovering important social regularities. And yet it seems apparent that many problems of sociological interest are not amenable to quantitative or statistical research.

Let’s consider for a moment how these issues ought to work — how method, theory, and the world ought to be related. In any area of science there is a range of phenomena that we want to understand. So we need to have tools for investigating the real, empirical characteristics of this stuff, and we aim to arrive at theories that explain the more interesting features of this domain of real phenomena. Finally, we need some intellectual resources on the basis of which to arrive at the desired knowledge — we need some methods of inquiry, some models of theory, and some ideas about the underlying ontology of the phenomena we are studying. So the world exists; we want to gain knowledge and understanding of this world; and we need some tools for investigating and theorizing this world.

But here is the key point: the central focus here is knowledge, not method. Method is a tool for helping us to arrive at knowledge. For any given empirical question there will be a variety of methods on the basis of which to investigate this problem. And ideally, we should select a set of tools that are well suited to the particular characteristics of the problem at hand.

In other words, analysis of the situation of knowledge producers would suggest methodological pluralism. We should be open to a variety of tools and methods, and should design research in a way that is closely tailored to the nature of the empirical problem. And therefore young sociologists — graduate students — should be encouraged to be eclectic in their reading and thinking; they should be exposed to many of the approaches, perspectives, and methods through which imaginative sociologists have addressed their problems of research and explanation.

This general recommendation in favor of pluralism in sociology is strengthened when we consider the fact of the inherent heterogeneity of the social world. (See an earlier posting on this subject.) There is not one single kind of social process, for which there might conceivably be a uniquely best kind of method of inquiry. Rather, the social world consists of a deeply heterogeneous mix of processes, some of which are better suited to an ethnographic or comparative approach, just as other processes may be best studied quantitatively. If one is interested in the topic of corruption, for example — he/she will need to be informed about institutions, culture, principal-agent problems, social psychology, and many other potentially relevant sociological factors. And these researches may well require a combination of statistical analysis, comparison across a select group of cases, and ethnographic investigation in a small number of specific cases and individuals.

In other words, there are very deep arguments supporting the value and epistemic suitability of methodological pluralism. And this in turn suggests that sociology departments are well advised to incorporate a variety of methods and frameworks into their doctoral programs.

Fortunately, it appears that this rethinking is now taking place in a number of top sociology departments in the U.S., and the formerly hegemonic position of quantitative methods is now being challenged by a more pluralistic treatment of methods and frameworks. And this is all to the good: the result will be a better sociology and a better understanding of the heterogeneous, novel, and rapidly changing world in which we find ourselves.

Is sociology analogous to epidemiology?

Quantitative sociology attempts, among other things, to establish causal connections between large social factors (race, socio-economic status, residential status) and social outcomes of interest (rates of delinquency). Is this type of inquiry analogous in any way to the use of large disease databases to attempt to identify risk factors? In other words, is there a useful analogy between sociology and epidemiology?

Suppose that the divorce rate for all American men is 30%. Suppose the rate for New York City males with income greater than $200K is 60%. We might want to draw the inference that something about being a high-income male resident of New York causes a higher risk of divorce for these persons. And we might want to justify this inference by noticing that it is similar to a parallel statistical finding relating smoking to lung cancer. So sociology is similar to epidemiology. Certain factors can be demonstrated to cause an elevated risk of a certain kind of outcome. There are “risk factors” for social outcomes such as divorce, delinquency, or drug use.

Is this a valid analogy? I think it is not. Epidemiological reasoning depends upon one additional step — a background set of assumptions about the ontology and etiology of disease. A given disease is a specific physiological condition within a complex system of cells and biochemical processes. We may assume that each of these physiological abnormalities is caused by some specific combination of external and internal factors through specific causal mechanisms. So the causal pathways of normal functioning are discrete and well-defined, and so are the mechanisms that cause disruption of these normal causal pathways. Within the framework of these guiding assumptions, the task of the statistics of epidemiology is to help sort out which factors are causally associated with the disease. The key, though, is that we can be confident that there is a small number of discrete causal mechanisms that link the factor to the disease.

The case is quite different in the social world. Social processes are not similar to physiological processes, and social outcomes are not similar to diseases. In each case the failure of parallel derives from the fact that there are not unique and physiologically specific causal systems at work. Cellular reproduction has a specific biochemistry. Cancerous reproduction is a specific deviation from these cellular processes. And specific physical circumstances cause these deviations.

Now think about the social world. A process like “urbanization” is not a homogeneous social process. Rather, it is a heterogeneous mix of social developments and events; and these components are different in different times and places. And outcomes that might be considered the social equivalent of disease — a rising murder rate, for example — is a also composite of many distinct social happenings and processes. So social systems and outcomes lack the simple, discrete causal uniformity that is a crucial part of epidemiological reasoning.

This is not to say that there are not underlying causal mechanisms whose workings bring about a sharp increase in, say, the population’s murder rate. Rather, it is to say that there are numerous, heterogeneous and cross-cutting such mechanisms. So the resultant social outcome is simply the contingent residue of the multiple middle-level processes that were in play in the relevant time period. And the discovery that “X, Y, Z factors are correlated with a rise in the incidence of O” isn’t causally irrelevant. But the effects of these factors must be understood as working through their influence on the many mid-level causal mechanisms.

How can race be a cause of something like asthma?

Though I’ve posed this posting around the question of “race and asthma,” the question here isn’t really about public health. It is rather concerned with the general question, how can a group characteristic be a causal factor in enhancing some other group characteristic?

Suppose the facts are these: that African-Americans have a higher probability of developing asthma, even controlling for income levels, education levels, age, and urban-suburban residence. (I don’t know if the facts support this statement, but it is the logic that I am concerned with here.) And suppose that the researcher summarizes his/her findings by saying that “being African-American causes the individual to have a higher risk of developing asthma.” How are we supposed to interpret this claim?

My preferred interpretation of statements like these is to hypothesize a causal mechanism, presently unknown, that influences African-American people differentially and produces a higher incidence of asthma. Here are a few possibilities:

  • (a) African-Americans as a population have a lower level of access to quality healthcare and are more likely to be uninsured. Asthma is a disease that is best treated on the basis of early diagnosis. Therefore African–Americans are more likely to suffer from undiagnosed and worsening asthma. This hypothesis is inconsistent with the assumed facts, however, in that the assertion is that the pattern persists even when we control for income.
  • (b) Asthma is an inner-city disease. It is stimulated by air pollution. African-Americans are more likely to live in inner-city environments because of the workings of residential segregation. So race causes exposure which in turn causes a higher incidence of the disease. (Again, this hypothesis is inconsistent with the stated facts that stipulate having controlled for residence.)
  • (c) There might be an unidentified gene that is more frequent in people with African ancestry than non-African ancestry and that makes one more susceptible to asthma. If this were correct, then we would expect the discrepancy to disappear if we control for frequency of this gene. Groups of white and black people randomly selected but balanced so that the frequency of the gene is the same in both groups should show the same incidence of asthma.
  • (d) It could be that there is a nutritional component to the onset of asthma, and it could be that cultural differences between the two communities lead the African-American population to have higher levels of exposure to the nutritional cause of the disease.

And of course we could proliferate possible mechanisms.

In each case the logic of the account is similar. We proceed by hypothesizing a factor or combination of factors that increase the likelihood of developing asthma; and then we try to determine whether this collalateral factor is more common in the African-American community. Some of these stories would amount to spurious correlations, while others would constitute stories in which the fact of race (as opposed to a factor with which race is accidentally correlated) plays an essential role in the causal story. (Reduced access to healthcare and inner city air pollution fall in this category, since it is institutional race segregation that causes the higher-than-normal frequency of urban residence for African-Americans.)

So this is a potential interpretation of the causal meaning of a statement like “race causes an increased risk of X.” But is this now a fact about individuals or groups? Do the causal interpretations here disaggregate from group to individual? Does “higher incidence in the population” disaggregate onto statements about the factors that influence the individual’s separate risk? It appears that this causal mechanism interpretation does in fact disaggregate to the individual level, since each describes a factor that pertains to the individual and that directly influences his/her likelihood of developing the disease.

What would be most perplexing is if there were multiple sets of causal mechanisms, each independent of the others and each creating a race-specific difference in incidence of the disease. For example, it might be that both exposure to air pollution and lack of health insurance lead to a higher incidence of the disease; and further, it might be that inner-city residents do in fact have adequate healthcare but exposure to inner-city pollution; while suburban African-Americans might have less healthcare and limited exposure to air pollution. In this set of facts, both African-American populations would display higher-than-normal incidence, but for different and unrelated reasons.

Aggregating social trends

Would we say that discerning and aggregating social trends is an important kind of social knowledge? What about explaining social trends? What is a social trend, anyway?

Suppose people notice that crimes are getting less frequent but more violent; or that Thai restaurants are replacing Chinese restaurants at the bottom end in Chicago; or that young people are using instant messages more than telephone or email. Are these social trends? I suppose that, most simply, these are statements about changing frequencies of certain kinds of social occurrences over time. And “noticing” means counting and tracking the results of past observations. There are important conceptual and measurement issues here — how we identify the kinds of events whose frequency is “important” or “of interest”? How do we assure that we are counting in a way that is accurate — not over- or under-reporting? But the basic logic is fairly clear. To say that there is a trend for X is to say that the frequency of X relative to the population is changing in a sustained way.

Is the discovery of social trends an important effort for sociology? Probaby so, for several reasons. We are interested in knowing “how society is changing” — and the frequencies of various kinds of social actions are themselves an important component of social change. So discovering and documenting changing patterns of social behavior — “trend-spotting” — is an important piece of sociological discovery.

But there are two other reasons to think trends are important for sociology. A trend (more Thai restaurants) may be an indicator of some other more important social change — aging of the Chinese population in Chicago, or an influx of Southeast Asian immigrants in a certain time period. But, second, some social trends may function as causes of other future changes in social behavior. A rising trend in male-female sex ratios in China or India may be a potential cause of future social disorder; A rising frequency of soccer violence may be both an indication of rising youth alienation and a cause of future state action (more repression, more social welfare intervention). A trend towards longer sentences for non-violent crime may produce a rise in violent crime in the future (as non-violent criminals “graduate” to violent crime through longer exposure to violent criminals). Finally, discovery of trends can produce strategies of adaptive behavior — trends in consumer taste may permit some businesses to create a whole new market for a product, trends in violent crime may produce new policing strategies, and trends in young voters using the web for communication may suggest new campaign tactics.

Finally, it should be noted that some “trends” may not be true features of social behavior at all, but rather the reslt of heightened awareness of certain kinds of social behavior. (Teachers often say “our students are getting worse every year” … even though the statistics on performance say the opposite.)

We asked above what is involved in explaining social trends. Surely there must be a range of different social mechanisms that would produce change in the frequencies of various social behaviors: incentives, filters, external shocks, imitation, changes in the composition of the underlying population. For example, suppose we observe a trend towards fewer incidents of purse-snatching in Miami, and we observe as well that the median age of the population has
increased from 50 to 60. The change in the age structure may explain the trend. In this case, there may be no change at all in the behaviors of the various age groups — no trend there — but a change in frequency relative to the total population nonetheless.

(This discussion of “social trends” places the concept in the domain of “changing distributions of individual social behavior”. Is there a structural counterpart about collective entities? Can we make true and justified statements about the trends of change among — labor unions, states, churches, or universities?)

%d bloggers like this: