Criteria for assessing economic models

How can we assess the epistemic warrant of an economic model that purports to represent some aspects of economic reality?  The general problem of assessing the credibility of an economic model can be broken down into more specific questions concerning the validity, comprehensiveness, robustness, reliability, and autonomy of the model. Here are initial definitions of these concepts.

  • Validity is a measure of the degree to which the assumptions employed in the construction of the model are thought to correspond to the real processes underlying the phenomena represented by the model. 
  • Comprehensiveness is the degree to which the model is thought to succeed in capturing the major causal factors that influence the features of the behavior of the system in which we are interested. 
  • Robustness is a measure of the degree to which the results of the model persist under small perturbations in the settings of parameters, formulation of equations, etc. 
  • Autonomy refers to the stability of the model’s results in face of variation of contextual factors. 
  • Reliability is a measure of the degree of confidence we can have in the data employed in setting the values of the parameters. 

These are features of models that can be investigated more or less independently and prior to examination of the empirical success or failure of the predictions of the model.

Let us look more closely at these standards of adequacy. The discussion of realism elsewhere suggests that we may attempt to validate the model deductively, by examining each of the assumptions underlying construction of the model for its plausibility or realism (link). (This resembles Mill’s “deductive method” of theory evaluation.) Economists are highly confident in the underlying general equilibrium theory. The theory is incomplete (or, in Daniel Hausman’s language, inexact; link), in that economic outcomes are not wholly determined by purely economic forces. But within its scope economists are confident that the theory identifies the main causal processes: an equilibration of supply and demand through market-determined prices.

Validity can be assessed through direct inspection of the substantive economic assumptions of the model: the formulation of consumer and firm behavior, the representation of production and consumption functions, the closure rules, and the like. To the extent that the particular formulation embodied in the model is supported by accepted economic theory, the validity of the model is enhanced. On the other hand, if particular formulations appear to be ad hoc (introduced, perhaps, to make the problem more tractable), the validity of the model is reduced. If, for example, the model assumes linear demand functions and we judge that this is a highly unrealistic assumption about the real underlying demand functions, then we will have less confidence in the predictive results of the model.

Unfortunately, there can be no fixed standard of evaluation concerning the validity of a model. All models make simplifying and idealizing assumptions; so to that extent they deviate from literal realism. And the question of whether a given idealization is felicitous or not cannot always be resolved on antecedent theoretical grounds; instead, it is necessary to look at the overall empirical adequacy of the model. The adequacy of the assumption of fixed coefficients of production cannot be assessed a priori; in some contexts and for some purposes it is a reasonable approximation of the economic reality, while in other cases it introduces unacceptable distortion of the actual economic processes (when input substitution is extensive). What can be said concerning the validity of a model’s assumptions is rather minimal but not entirely vacuous. The assumptions should be consistent with existing economic theory; they should be reasonable and motivated formulations of background economic principles; and they should be implemented in a mathematically acceptable fashion.

Comprehensiveness too is a weak constraint on economic models. It is plain that all economic theories and models disregard some causal factors in order to isolate the workings of specific economic mechanisms; moreover, there will always be economic forces that have not been represented within the model. So judgment of the comprehensiveness of a model depends on a qualitative assessment of the relative importance of various economic and non-economic factors in the particular system under analysis. If a given factor seems to be economically important (e.g. input substitution) but unrepresented within the model, then the model loses points on comprehensiveness.

Robustness can be directly assessed through a technique widely used by economists, sensitivity analysis. The model is run a large number of times, varying the values assigned to parameters (reflecting the range of uncertainty in estimates or observations). If the model continues to have qualitatively similar findings, it is said to be robust. If solutions vary wildly under small perturbations of the parameter settings, the model is rightly thought to be a poor indicator of the underlying economic mechanisms.

Autonomy is the theoretical equivalent of robustness. It is a measure of the stability of the model under changes of assumptions about the causal background of the system. If the model’s results are highly sensitive to changes in the environment within which the modeled processes take place, then we should be suspicious of the results of the model.

Assessment of reliability is also somewhat more straightforward than comprehensiveness and validity. The empirical data used to set parameters and exogenous variables have been gathered through specific well-understood procedures, and it is mandatory that we give some account of the precision of the resulting data.

Note that reliability and robustness interact; if we find that the model is highly robust with respect to a particular set of parameters, then the unreliability of estimates of those parameters will not have much effect on the reliability of the model itself. In this case it is enough to have “stylized facts” governing the parameters that are used: roughly 60% of workers’ income is spent on food, 0% is saved, etc.

Failures along each of these lines can be illustrated easily.

  1. The model assumes that prices are determined on the basis of markup pricing (costs plus a fixed exogenous markup rate and wage). In fact, however, we might believe (along neoclassical lines) that prices, wages, and the profit rate are all endogenous, so that markup pricing misrepresents the underlying price mechanism. This would be a failure of validity; the model is premised on assumptions that may not hold. 
  2. The model is premised on a two-sector analysis of the economy. However, energy production and consumption turn out to be economically crucial factors in the performance of the economy, and these effects are overlooked unless we represent the energy sector separately. This would be a failure of comprehensiveness; there is an economically significant factor that is not represented in the model. 
  3. We rerun the model assuming a slightly altered set of production coefficients, and we find that the predictions are substantially different: the increase in income is only 33% of what it was, and deficits are only half what they were. This is a failure of robustness; once we know that the model is extremely sensitive to variations in the parameters, we have strong reason to doubt its predictions. The accuracy of measurement of parameters is limited, so we can be confident that remeasurement would produce different values. So we can in turn expect that the simulation will arrive at different values for the endogenous variables. 
  4. Suppose that our model of income distribution in a developing economy is premised on the international trading arrangements embodied in GATT. The model is designed to represent the domestic causal relations between food subsidies and the pattern of income distribution across classes. If the results of the model change substantially upon dropping the GATT assumption, then the model is not autonomous with respect to international trading arrangements. 
  5. Finally, we examine the data underlying the consumption functions and we find that these derive from one household study in one Mexican state, involving 300 households. Moreover, we determine that the model is sensitive to the parameters defining consumption functions. On this scenario we have little reason to expect that the estimates derived from the household study are reliable estimates of consumption in all social classes all across Mexico; and therefore we have little reason to depend on the predictions of the model. This is a failure of reliability. 

These factors–validity, comprehensiveness, robustness, autonomy, and reliability–figure into our assessment of the antecedent credibility of a given model. If the model is judged to be reasonably valid and comprehensive; if it appears to be fairly robust and autonomous; and if the empirical data on which it rests appears to be reliable; then we have reason to believe that the model is a reasonable representation of the underlying economic reality. But this deductive validation of the model does not take us far enough. These are reasons to have a priori confidence in the model. But we need as well to have a basis for a posteriori confidence in the particular results of this specific model. And since there are many well-known ways in which a generally well-constructed model can nonetheless miss the mark–incompleteness of the causal field, failure of ceteris paribus clauses, poor data or poor estimates of the exogenous variables and parameters, proliferation of error to the point where the solution has no value, and path-dependence of the equilibrium solution–we need to have some way of empirically evaluating the results of the model.

(Here is an application of these ideas to computable general equilibrium (CGE) models in an article published in On the Reliability of Economic Models: Essays in the Philosophy of Economics; link.  See also Lance Taylor’s reply and discussion in the same volume.)


3 responses

  1. At a minimum, I check for stock/flow consistency in economic models. Another thing I check for is if it jives with real operational realities. A lot of models assume bank lending is reserve constrained. It's not. They lend first, then find the required reserves, the federal reserve must accommodate the demand for reserves if it wants to hit its interest rate target. A lot of people think our federal government borrows and taxes first, then spends. It doesn't. There is no way to tax or borrow something that you are the monopoly issuer of unless you first spent it into existence. That's why I like MMT:

  2. If an economic model tries to make predictions, it must reflect the decision-makers and their decision-making rules, whoever the decision-makers are, which could be individuals, organizations, or countries. The various powers of decision-makers and their decision-making rules decide the dynamics of an economic system.

  3. "Unfortunately, there can be no fixed standard of evaluation concerning the validity of a model."I think that you missed consistency. That is basic. As Tschäff said, at minimum an economic model has obey the accounting rules and be stock-flow consistent. It also has to accord with operational reality.This is the minimum standard, and these are apriori tests. If a model doesn't pass this test, it is illogical and fails out of the gate.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: