Spearman’s proposal that intelligence was made up of both a general and a set of different specific components was known as *two-factor theory*. His conclusions had considerable impact on ideas about the nature of intelligence, but they were also very important from a statistical viewpoint because it established the concept of modelling test scores as being composed of number of *independent* contributions, determined by statistically unrelated or *orthogonal* factors (orthogonally literally means at right angles).

We can look at the whole idea in more depth using graphs. Consider the case of a scatterplot of individuals’ scores on two positively correlated variables, X and Y (e.g. vocabulary level and analogical reasoning). The graph would look something like the one shown in Figure 2.1.

## Scatterplot of Individual's Scores on Two Variables

From the material we covered on regression in Module 4, you should be familiar with the idea of trying to find the straight line equation in X (i.e. Y = a + bX) that best describes a set of observed data, using the *least squares technique* (in other words, finding the line that fits the observations with the smallest squared deviations between the actual observed values of Y and the line). As we saw then, this line has, as a corollary, the characteristic of accounting for or *explaining the greatest amount of variance* in the observed values of Y of any possible line that could be drawn through those observed values, since by definition the deviations are the smallest that they can possibly be.

It is possible to describe what factor analysis does in analogous terms i.e. as an attempt to fit factors which explain the greatest amount of *covariance* (i.e. shared variance) between X and Y (there is no dependent-predictor relationship between variables in factor analysis), and thus the overall variance in each. This can be done by *rotating the axes* of the graph to the point of best fit, as shown in Figure 2.2.

## Scatterplot of individuals' socres on two variables, with axes rotated to point of best fit

There is a number of points to note about this.

Firstly, the *principal axis* (labelled I for Factor I) is fitted along the direction of maximum joint variance, and ‘explains’ a high percentage of the covariance between X and Y. In other words, in the case of our example, individuals’ scores on vocabulary level and their related scores on analogical reasoning are largely accounted for in terms of their position on this common, underlying factor, which might perhaps be interpreted as representing general verbal knowledge.

Secondly, the secondary axis (labelled II for Factor II) accounts for the *residual* covariance (that left over) after Factor I has been considered (which is necessarily a smaller amount). This might be interpreted as representing some more specific characteristic that is common to performance on vocabulary tests and tests of analogical reasoning, such as ability to mentally retrieve words.

Note that Factor II is independent of (uncorrelated with) Factor I since it is orthogonal (at right angles) to it: whatever position or value an individual has on the secondary axis makes *no difference* to their position or value on the principal axis. It therefore becomes possible to consider individual X,Y coordinates (i.e., in our example, a pair of scores on a vocabulary test and a test of analogical reasoning) as determined by the *combined* influence of the position of the person generating those scores on Factor I (general verbal knowledge) and Factor II (ability to retrieve words), as shown in Figure 2.3. Note also that this corresponds exactly with the description provided by Spearman’s two factor theory under the conditions where X and Y share both g and a specific component.

## Diagram showing how an individual's scores on X and Y are determined by that individual's position on Factors I and II

Finally, in terms of our initial data reduction question about describing scores using a *smaller* number of dimensions, this solution is trivial, because there are still *two* factors: all that’s happened is that the axes have been rotated to find a better fit to the data (though this might have some value in terms of pinpointing the underlying influences which determine the more specific scores).

However, with 3 or more variables or dimensions (i.e. k=3 or k>3 where k represents the number of variables), it may be possible to find a factor structure which explains a high amount of variance (i.e. which therefore fits in this sense) with *fewer* than k dimensions; in other words, meeting our initial criterion for a solution.

Spearman made one further adjustment to his factor model to account for his observed pattern of correlations between test scores still more accurately. This adjustment proposed that, rather than different tests all measuring g to same extent, the influence of g was *weighted* differently for different tests i.e. g did not contribute the same amount to the observed variance for each test:

- measurement a 1-n = (weight a x g 1-n) + specific component a 1-n + error
- measurement b 1-n = (weight b x g 1-n) + specific component b 1-n + error

This model carries a number of important implications for the correlation between test scores. Firstly, the weighting of g (in this example) for each test can be defined as the correlation of that test with g, that is:

weight a = r ag

It follows from this that if g is the only common factor, the correlation between two tests is simply the product of these weightings:

r ab = r ag x r bg = weight a x weight b

The *practical* upshot of this was that it allowed variation in the correlations between tests to be modelled more *precisely*, in terms of differences in the extent to which they tapped into or were influenced by the underlying factor, g. The *statistical* upshot was the concept of *factor loadings*: consistent, indexed variations in extent to which a factor influenced each particular score in a set of variables.

One further step brings us up to the basics of modern techniques of factor analysis. This step is simply that of extending the concept of weightings or factor loadings to the specific component in Spearman’s model – and at same time allowing that there *may* be more than 2 factors in a particular solution (up to k, the number of variables, potentially, although as we’ve seen, a solution with fewer than k variables is required to meet the objective of increasing parsimony in the description of a set of scores).

In essence then, what the basic version of factor analysis does is:

*extract factors*by rotating the axes of the k-dimensional space describing the relationship among the k variables, to find the fit that explains the maximum amount of variance (this will always be 100% if all k dimensions are used)*determine the optimal solution*i.e. attempt to reduce the number of factors to the minimum needed to*adequately*explain the data (we will come back to the question of what ‘adequate’ might mean a little later)*calculate the factor loadings*which, given this factor solution, are necessary to account for the observed correlations between the k variables.

You can see that the implied mathematics is complex, and the procedure is in fact iterative, i.e. it progresses through a sequence involving making an initial estimate of the parameters, recalculation to improve these, re-checking of the new estimate, and so on till acceptable criteria for convergence between the solution and the data have been met.

However, we can look at the solution from a relatively simple example, based on data from four tests, to see a little of how things work out. Table 2.2 presents a two-factor solution which perfectly explains the observed correlations between these four tests.

## Example of a factor analysis solution for scores on four tests

Observed r |
Factor Loadings |
||||||||

tests |
1 |
2 |
3 |
4 |
I |
II |
h2 |
||

1 |
( ) |
.63 |
.45 |
.27 |
.9 |
.0 |
.81 |
||

2 |
.63 |
( ) |
.45 |
.37 |
.7 |
.2 |
.53 |
||

3 |
.45 |
.45 |
( ) |
.55 |
.5 |
.5 |
.50 |
||

4 |
.27 |
.37 |
.55 |
( ) |
.3 |
.8 |
.73 |

Note that the relationship between the factor loadings and the correlations between a particular pair of tests is simply as follows:

$$r_{ij}=\sum_{F=1 to K}(LF_i \times LF_j) $$

where r_{ij} is the correlation between tests i and j, F_{i} is the loading of test i on factor F, F_{j} is the loading of test j on factor F, and the summation sign indicates this calculation is repeated and totalled across factors (F) from 1 to k, the number of dimensions (those which have been included in the solution, of course).

What this means in practice is that the correlation between a pair of tests is the sum of the products of the loadings of each test on each factor in turn (cf. the correlation between two tests with only g in common, as mentioned earlier).

For instance: the correlation in Table 2.2 between test 1 and test 2, r 12 = .63

= (loading of test 1 on factor I x loading of test 2 on factor I) + (loading of test 1 on factor II x loading of test 2 on factor II)

= (LI1 x LI2) + (LII1 x LII2)

= (.9 x .7) + (.0 x .2)

= .63 + 0

Note also that h2 or the *communality* (the total variance in the scores on a given test explained by the common factors) is similarly the sum of the square of the loadings of that test on each factor in turn. For example, for test 1:

$$h^2_1 = 0.81 = (LI_1)^2+(LII_1)^2 = 0.9^2 + .0^2 = 0.81 + 0$$

Conceptually speaking, the communality values are those we would want to put on the diagonal of the correlation matrix: i.e. the value for the *explained covariance* of the test score or other variable *with itself* – hence the brackets in Table 2.2. The remainder is that which is attributable to error, or which is in some other respect *unique*, and so not further explicable.

Write out the steps in the derivation of the correlation between tests 3 and 4 in Table 1.3 from the loadings of these tests on factors I and II. Similarly, write out the steps in the computation of the communality value for test 3.

Note finally that the implied decomposition of the variance on a test score into that which is explained by common factors and that which is unique allows a *linear equation* for that variance to be constructed, e.g.:

$$var(test1)=(LI_1)^2+(LII_1)^2 + error$$

$$var(test2)=(LI_2)^2+(LII_2)^2 + error$$

Conversely, a linear equation for the factor based on the loadings for each test can also be constructed in the form of a standard regression equation, e.g.:

FI = LI_{1} + LI_{2} + ...

As we’ll see a little later, equations of this form can be used to generate *factor scores*. A factor score is a composite score across a set of variables which summarises the values obtained by an individual on those variables in such a way as to reflect the relative influence of a given factor on each variable; in other words, it is a form of *weighted total*. You should be able to see that factor scores arguably represent the most appropriate way of collapsing or combining scores across a series of variables or items.