Psychological Testing: What It Is, How It Works

Psychological testing is a measurement procedure used to describe or predict behavioral, cognitive, emotional, or symptomatic characteristics of the person taking the tests, or the person to which the tests refer (some tests are given to people who know the person of interest, but most are taken by the person of interest).

To stay within the area of clinical psychology (mental health psychology), there are several kinds of tests often used. According to my simple classification scheme developed for educating the public, there are personality, cognitive, behavioral, diagnostic, and achievement tests. Sub-specialty tests, like forensic psychology or neuropsychology tests would actually fall under one or more of these broad categories but with a more specialized focus. Also, some tests incorporate elements of more than one classification.

Please note that my scheme is simply convenient; there are research-based classifications of tests based upon what they do and how they do it. My discussion of other test aspects is also based upon research but does not always use the 'official' terms or terms typical in my field, as I wanted to write a simpler overview. Note also that in counseling psychology, industrial-organizational psychology, and other fields there are other kinds of tests, such as 'interest' tests designed to detect interests in different professions, or even in-vivo (live) behavioral tests such as sessions designed to replicate a 'rough day at the office' for important, stressful, and expensive executive positions.

I am not going to give away any test secrets, but what I will present is a brief overview of each category of test, some examples by title only, and some basic information on how tests are generally interpreted. You will not learn from this post any useful secrets of the tests themselves. This is intended only to inform the public of the value of psychological testing.

'Types' of Tests

Personality tests can sometimes overlap with diagnostic or symptom-related tests. Broadly, a personality test is designed to describe or predict usual attitudes, behaviors, or traits related to the examinee's interpersonal perception (how they see others) and intra-personal perception (how they see themselves). Famous examples include the MMPI-2 (which is structured and taken with paper and pencil) and the Rorschach (which is less structured and involves interviewing the examinee about their perceptions of inkblots).

Cognitive tests are used to describe or predict a person's mental abilities. For instance, two persons may each have reasonable ability to creatively solve problems, but which one can do so more quickly or more flexibly? How strong is a person's concentration and memory? Is the person better at solving problems verbally, structurally, nonverbally, holistically? The list of cognitive ('thinking') abilities that can be tested is very long and detailed. Cognitive tests include IQ tests, neuropsychological tests, and specialized instruments used in research, to name only three types. Famous examples include the WAIS-IV, WMS-IV, Stanford-Binet V, Bender-Gestalt-II, and many, many individual neuropsychological tests and test batteries.

Behavioral inventories are based upon the report of people who know the person in question, or upon direct examiner observation of the person in question. One good example of when these measures are used is in cases of ADHD diagnosis or determination of a given person's ability to function in their day (for example, used together with an IQ test to determine the possibility of mental retardation or developmental disability). Examples include behavior checklists or the Vineland-II Adaptive Behavior Scales. Behavioral assessment is also common among practitioners of applied behavioral analysis, which is used for treating very serious behavior problems in the developmentally disabled or with the extremely severely mentally ill.

Diagnostic tests frequently use an interview format, though some of them are given with paper and pencil like a personality test. Some interviews are highly structured (and are thus more reliable), but they tend to be less flexible, may alienate or bore the examinee, and may not be as adaptable to a given case. Some interviews are not very structured (and are thus less reliable), but are more flexible and interactive. Usually a good assessment will somehow manage to include parts of both types of interview style. An example would be the Structured Clinical Interview for DSM-IV (SCID, a comprehensive structured interview) or the Beck Depression Inventory (paper and pencil, but focused only on depression).

Achievement tests measure how well the examinee does on academic measures of reading, writing, and mathematics (to name three broad categories). Other measures that test mainly knowledge could probably be categorized as achievement tests as well. It is important to understand that the results of these tests will be partly associated with the person's cognitive abilities, for example because knowledge tests usually involve some reasoning ability and some consideration of speed, exactitude, or both. Examples include the Wide Range Achievement Test - 4 or the high school SAT.

Interpretation of Tests

Tests are often interpreted according to whether they use or do not use some 'standard' or 'reference point,' and according to what that reference point (if any) is.

Rater-based reference point--in this interpretation, the test being used usually only refers to categories--diagnosis, for example. Structured interviews often fall in this category, and the only purpose of the test is to tell whether the person has a diagnosis or not. Comparison along a continuous line of percentiles or scores is not a part of this referencing. Here, the main concern is the reliability of the agreement between two or more examiners and the validity of the categories between which they choose.

No reference point (other than the examinee)--this can tell us a lot about the qualities of an examinee, but there is no way to measure those qualities against the same qualities of other persons. However, some tests that can measure against other people also include elements of this 'qualitative' description. This type of interpretation simply interprets 'type' of content and 'amount of X relative to Y for this examinee,' but not 'amount of X or Y relative to others.'

For example, one could note that the examinee did better on measures of concentration than on measures of reasoning, but could not compare these statements to the performance of other persons. Of course, here we are assuming that the number of questions or items for reasoning and for concentration are equal and that each reasoning item is of the same difficulty ('hardness') as each corresponding concentration item. Being able to evaluate difficulty is difficult without some outside reference point, and this brings us to norms...

Norm-based reference point--in this interpretation used by many psychological tests, the score of the examinee is compared to the scores of other test-takers (usually hundreds to thousands of other examinees). This allows the scores to be interpreted in terms of their distance from the average (usually the 'mean') and in terms of percentiles. For example, a person whose score on a measure of extraversion (outgoingness) is 'one standard deviation above the mean' is at approximately the 84th percentile relative to his or her peers in regard to that one characteristic.

Criterion-based reference--in this interpretation, also used in many tests and often used in conjunction with norm-referencing, certain score levels on the test are known to be highly associated with certain behaviors or outcomes (criteria) with some degree of probability. Usually this knowledge is acquired through research done in developing or confirming the results of the test. For example, a test could help one decide which person to hire for a job; a particular score on a test designed to measure organizational ability (the ability to prioritize and sort) might be highly correlated with success in a particular executive position. Other test results might be highly predictive of suicide or another more clinical concern.

Often norm-referencing is used to give some idea of how an examinee compares to peers, while at the same time criterion-referencing research is used to tell the interpreter of the test what the score means in terms of imporant associated outcomes. For example, a high IQ score is not just 'higher cognitive ability than most of her peers,' it is also usually predictive of high academic achievement and high-level professional employment. Of course, these predictions are not perfect, and neither are norm-based interpretations (or any interpretation for that matter).

For this reason, all good tests have information about their 'reliability.' Reliability gives knowledge about:
The usual error rates of a test
The amount of expected error in any score
The degree to which portions of the test agree with or are sensibly related to other portions
The degree to which separate raters agree, and/or
The degree to which one examinee's scores on a test at one time agree with their scores at another time).

Good tests also should have information available about their 'validity':
The extent to which the test actually measures what it is supposed to
The degree to which the test adequately measures a specific type of content
The degree to which the test is sensibly associated or non-associated with other similar and dissimilar tests and/or
The degree to which the test actually has a reliable association with important outcomes

Hopefully this overview will be helpful for anyone curious about psychological tests!

