Return to Decision Line Home Page
Return to DSI Home Page

RESEARCH ISSUES

SHAWNEE VICKERY, Feature Editor, Eli Broad Graduate School of Management, Michigan State University

Assessments of Validity

by Cornelia Dröge, Eli Broad Graduate School of Management, Michigan State University

T his article is the second of two that addresses the following question: how valid are our measurements? The first article appeared in the September/October 1996 issue of Decision Line. Its content can be summarized as follows. The terms measurement, reliability, and validity were defined. The impact of measurement problems on the ability of researchers to draw substantive theoretical conclusions was discussed. An example focusing on a hypothesized positive relationship between manufacturing flexibility and firm performance was used to illustrate:

  1. a variety of measurement decisions that researchers must make;
  2. questions about reliability or validity related to these decisions; and
  3. the possible impact of these decisions on the conclusions reached about the relationship between flexibility and performance.

After key terms were defined and the importance of measurement in the research process was discussed, the article turned to the topic of reliability. Several ways of assessing aspects of reliability were discussed, including Cronbach's alpha for internal consistency, theta from factor analysis, KR20 for a set of dichotomously scaled items, and key informant reliability. The article concluded with a very brief discussion of the meaning of reliability when measurement focuses on classifying rather than quantifying.

It is very important to note that measurement can be reliable but not valid. The simple classic example is a thermometer that always registers a temperature 5 degrees more than the actual temperature. T and the actual temperature are perfectly correlated, and a thousand judges could agree on what T is; i.e., T could be deemed perfectly reliable. It is only because we have many other indicators of temperature that norms for measurement are sufficiently clear and we know the temperature T is not valid. Unfortunately, for the topics of interest to business researchers, measurement is not at a comparable level of development. Even for ``hard'' data such as the Consumer Price Index, questions about validity can be raised: Does the CPI overestimate the inflation rate? The Boskin Senate Finance Committee has said that it does, by 1.1%. Overall, it is essential for researchers to assess both reliability and validity.

The purpose of this second article is to list some of the types of validity assessment that are most relevant in business research. The reader is invited to consult the seminal reference books listed below for complete discussions about reliability, validity, and the research process. Validity assessment involves demonstrating that the theoretical construct supposedly being measured is actually being measured by the empirical indicator(s). ``Validity'' is usually preceded by an adjective (such as ``construct'' or ``discriminant'') that indicates what type of validity is being assessed.

Construct Validity

Probably the most important type of validity is construct validity. Construct validity can be thought of as the degree of correspondence between a construct and is operationalizations, where that correspondence is evaluated within a nomological net (i.e., within a theoretical context). It is important to note that an evaluation of the construct validity of concept X with indicators x1, x2, x3, and x4 requires that at least one other construct Y be specified theoretically in relation to X and empirically with its own indicators. The researcher can then proceed to evaluate construct validity by assessing:

  1. unidimensionality and reliability, which are necessary prerequisites;
  2. convergent validity;
  3. discriminant validity; and possibly
  4. nomological validity.

Different authors suggest different other criteria for claiming construct validity, but those listed certainly form the core of every author's recommendations.

Assessing unidimensionality means determining whether a set of indicators reflect one underlying factor (as opposed to more than one). In the example discussed in the first part of this series, the unidimensionality of firm performance as measured by ROI, ROA, ROI growth, and ROA growth could be determined by conducting an exploratory factor analysis and finding that only the first factor has an eigenvalue of over one. Lately unidimensionality has been identified as a crucial implicit assumption of valid measurements (Hattie, 1985). If the set of indicators supposedly measuring the same theoretical construct actually reflect more than one factor, then construct validity cannot be claimed and reliability measures such a Cronbach's alpha may be meaningless (Gerbing and Anderson, 1988).

Convergent and discriminant validities are often evaluated together. Convergent validity is the degree of convergence seen when two attempts are made to measure the same construct through maximally different methods. Different methods are necessary so that common method variance is minimized. Discriminant validity assesses the degree to which a concept and its indicators differ from another concept and its indicators.

One of the early and still classic ways of investigating convergent and discriminant validities is Campbell and Fiske's multitrait-multimethod approach. This approach is based on an analysis of the correlations among indicators of different constructs along four criteria. The four criteria may lead to different conclusions about overall validity. The basic idea is that the correlations among indicators of the same construct should be (1) ``sufficiently'' different from zero and (2) greater than the correlations with indicators of different constructs. A more sophisticated approach to evaluating convergent and discriminant validities is confirmatory factor analysis, using LISREL, for example. This approach has the advantage of allowing an overall inference about validity, as well as simultaneously giving estimates of indicator and composite reliabilities.

The final common criterion for construct validity is nomological validity, or the degree to which the construct as measured by a set of indicators predicts other constructs that past theoretical and empirical work says it should predict. Suppose a researcher proposes an entirely new way to measure manufacturing flexibility and has demonstrated reliability, unidimensionality, and convergent and discriminant validities. However, this new construct is not related to other constructs in established ways that past research strongly supports. ``Lack of nomological validity'' in this case is an expression of the intuitively appealing idea that the new construct is suspect: although past measurement could be faulty or so-called established relationships could be wrong, the burden of proof is on the researcher proposing the new way to measure the construct.

Two final comments about construct validity are noteworthy. First, from the discussion about the various aspects of construct validity it should be clear that a theoretical framework is necessary in order to do an assessment. A single construct cannot be assessed in isolation. The theory that specifies which indicators measure which constructs and the theory that specifies which constructs are related to which constructs are intimately fused when it comes to assessing construct validity. Second, there is a completely different approach to construct validation. Cook and Campbell (1979) propose that known threats to validity should be evaluated lest they lead to confounding. Their seminal book, which lists and explains these threats, is a particularly valuable reference to those researchers contemplating experimental research. Due to lack of space, this approach cannot be discussed here.

Criterion-Related Validity

Another type of validity is criterion-related validity. One type is concurrent validity, which is assessed when the indicator(s) and the targeted criterion occur at the same point in time. For example, the Vanguard Index Trust 500 Portfolio is supposed to mimic the Standard & Poor's 500-stock index (which itself is an indicator of stock market ``performance''). Another type is predictive validity. Here, an indicator or a set of indicators is used to predict something. The indicators are often called ``the test'' and the key point is how well the test predicts the outcome. For example, a commodity price index may predict the inflation rate, a selection test may predict job performance, GMAT results may predict success in MBA studies, and so on.

The correlation between the test result and the criterion is the most important indicator of criterion-related validity, and thus this form of validity is far less dependent than construct validity on the theoretical net of constructs. Of course, the test items are not chosen in an atheoretical manner in practice. But as far as assessing criterion-related validity is concerned, the theoretical relationship between the test and the criterion need not be shown. The test items also need not be unidimensional, and indeed are often chosen to cover a multitude of dimensions. Finally, a potentially serious problem in assessing criterion-related validity is the selection and measurement of an appropriate criterion. Referring to the examples given earlier, how is ``inflation rate,'' ``job performance,'' or ``success in MBA studies'' measured? If the criterion is measured unreliably, the simple correlation between the test and the criterion will be attenuated and the researcher may unnecessarily conclude that the test is not valid. Even the selection of a criterion is problematic when the research topic of interest is leadership, self-esteem, or consumer personality, to name just a few.

Content Validity

Content validity refers to the extent to which measurement reflects the domain of the concept. Ideally, the full domain of the concept is first specified through a complete review of the available literature. The domain may be divided into dimensions (and possibly subdimensions) because general concepts such as manufacturing flexibility are usually analyzed in disaggregate terms such as volume flexibility, mix flexibility, changeover flexibility, and so on. Frequently, the researcher will immediately encounter problems with semantic validity because the concept does not have uniform sematic usage in the literature. Even widely used terms such as core competencies and JIT suffer from this problem.

Once the domain has been specified, the researcher must construct items or select items from the literature's pool so that the items capture the meaning either of the entire concept or of each of the concept's dimensions. As a practical matter, it is always better to have too many rather than too few items, since items can be dropped but rarely added during later research stages.

Unfortunately, there is no rigorous way (e.g., not even correlations) to assess the content validity of the set of items that are chosen and then operationalized. The researcher must argue that the meanings as stated by the literature's experts are captured in the current research, or alternatively, that the current research's focus is on particular dimensions of the concept's domain. The researcher might also ask certain experts directly for their opinion about the set of items as it is being developed. These experts may be other researchers, industry experts, managers, and/or others with expertise, and the purpose of soliciting their opinion is to assess whether the instrument is appropriate ``on the face of it'' (i.e., face validity).

Concluding Comments

The purpose of this two-part series is to introduce the reader to the core concept of reliability and validity of measurement. Because of the limited space available, complete discussions about reliability and validity were not possible. Some types of reliability and validity (such as statistical conclusion validity) were not discussed, and important advances in measurement research centering on confirmatory factor analysis (CFA) were merely mentioned. Yet, hopefully, the reader will be left with (1) an awareness of the importance of the theory of measurement in the research process, and (2) an understanding of some of the key issues when that often implicit theory of measurement is put to the test.

References

Cook, Thomas D. and Donald T. Campbell (1979), Quasi-Experimentation: Design and Analysis Issues for Field Settings, Houghton Mifflin Co., Boston, MA.

Gerbing, David W. and James C. Anderson (1988), ``An Updated Paradigm for Scale Development Incorporating Unidimensionality and Its Assessment'', Journal of Marketing Research, 25 (May), pp. 186-192.

Hattie, John R. (1985), ``Methodological Review: Assessing Unidimensionality of Tests and Items'', Applied Psychological Measurement, 9 (June), pp. 139-164.

Kerlinger, Fred N. (1986), Foundations of Behavioral Research (Third Edition), Harcourt Brace Jovanovich College Publishers, Orlando, FL.

Nunnally, Jum C. (1978), Psychometric Theory (Second Edition), McGraw-Hill Inc., New York.