• Aucun résultat trouvé

Exploratory Factor Analysis Terminology .1 Communalities and Uniqueness

Dans le document Data Mining Using (Page 96-101)

Unsupervised Learning Methods

4.4 Exploratory Factor Analysis

4.4.2 Exploratory Factor Analysis Terminology .1 Communalities and Uniqueness

The proportion of variance in the observed variable that is attributed to the factors is called communality, and the proportion of variance in the observed variable that is not accounted by factors is called uniqueness. Thus, communalities and uniqueness sum to one. Many different methods are available to estimate communalities. When the communalities are assumed to be equal to 1.0 (i.e., all the variables are completely predicted by the factors), then this factor analysis is equal to PCA; however, the objective of PCA is dimension reduction rather than explaining observed correlations with underlying factors.

A second method for estimating prior communalities is to use the squared multiple correlation (SMC) in a regression model. In estimating the SMC, each variable is treated as the response in a regression model in which all the other variables ar e considered predictor or input variables. The estimated R2 from this multiple regression is used as the prior communality estimate for SMC and in factor extraction. Similarly, the SMCs for all observed variables are estimated and used as prior communality estimates. After the factor analysis is completed, the actual communality values are re-estimated and reported as the final commu-nality estimates.

4.4.2.2 Factor Analysis Methods

A variety of different factor extraction methods are available in the SAS PROC FACTOR procedure, including principal component, principal factor, iterative principal factor, unweighted least-squares factor, maxi-mum-likelihood factor, alpha factor, image analysis, and Harris compo-nent analysis. The two most commonly employed factor analysis techniques are principal factor and maximum-likelihood factor. The various factor analysis techniques employ different criteria for extracting factors. Discussions on choosing different methods of factor extraction can be found in Sharma.8

4.4.2.3 Sampling Adequacy Check in Factor Analysis

Kaiser–Meyer–Olkin (KMO) statistics predict if data are likely to factor well, based on correlation and partial correlation among the variables. A KMO statistic is reported for each variable, and the sum of these statistics is the KMO overall statistic. The KMO varies from 0 to 1.0, and the overall KMO should be 0.60 or higher to proceed with successful factor analysis.

3456_Book.book Page 86 Thursday, November 21, 2002 12:40 PM

If the overall KMO statistic is less than 0.60, then drop the variables with the lowest individual KMO statistic values, until the overall KMO rises above 0.60. To compute the overall KMO, find the numerator, which is the sum of squared correlations of all variables in the analysis (except for the 1.0 self-correlations of variables with themselves), then calculate the denominator, which is the sum of squared correlations plus the sum of squared partial correlations of each ith variable with each jth variable, controlling for others in the analysis. The partial correlation values should not be very large for successful factor extraction.

4.4.2.4 Estimating the Number of Factors

Methods described in Section 4.3.1 for extracting the optimum number of PCs could also be used in factor analysis. Some of the most commonly used guidelines in estimating the number of factors are the modified Kaiser–Guttman rule, percentage of variance, scree test, size of the resid-uals, and interpretability.8

4.4.2.5 Modified Kaiser–Guttman Rule

The modified Kaiser–Guttman rule states that the number of factors to be extracted should be equal to the number of factors having an eigenvalue greater than 1.0. This rule should be adjusted downward when the common factor model is chosen. It has been suggested that the eigenvalue criterion should be lower and around the average of the initial commu-nality estimates.

4.4.2.6 Percentage of Variance

Another criterion related to the eigenvalue is the percentage of the common variance (defined by the sum of communality estimates) explained by successive factors. For example, if the cutoff value is set at 75% of the common variance, then factors will be extracted until the sum of eigenvalues for the retained factors exceeds 75% of the common variance, defined as the sum of initial communality estimates.

4.4.2.7 Scree/Parallel Analysis Plot

Similar to PC analysis, scree plot and parallel analysis could be used to detect the optimum number of factors for standardized data (see Section 4.3.1); however, the parallel analysis suggested under PC is not valid for the maximum-likelihood-based EFA method.

3456_Book.book Page 87 Thursday, November 21, 2002 12:40 PM

4.4.2.8 Chi-Square Test in Maximum-Likelihood Factor Analysis Method

This test is comprised of two separate hypotheses tests. The fi rst test

— test of H0: no common factors — tests the null hypothesis that no common factors can sufficiently explain the intercorrelations among the variables included in the analysis. This test should be statistically significant (p < .05); a non-significant value for this test statistic suggests that the intercorrelations may not be strong enough to warrant per-forming a factor analysis, as the results from such an analysis could probably not be replicated.

The second chi-square test statistic — test of H0: N factors are sufficient

— is the test of the null hypothesis that N common factors are sufficient to explain the intercorrelations among the variables, where N is the number of factors specified. This test is useful for testing the hypothesis that a given number of factors are sufficient to account for the data. In this instance, the goal is a small chi-square value relative to its degrees of freedom. This outcome results in a large p value (p > .05). One downside of this test is that the chi-square test is very sensitive to sample size: Given large degrees of freedom, this test will normally reject the null hypothesis of the residual matrix being a null matrix, even when the factor analysis solution is very good. Therefore, be careful in interpreting the significance value of this test. Some datasets do not lend themselves to good factor solutions, regardless of the number of factors extracted.

4.4.2.9 A Priori Hypothesis

The a priori hypothesis can provide a criterion for deciding the number of factors to be extracted. If a theory or previous research suggests a certain number of factors and the analyst wants to confirm the hypothesis or replicate the previous study, then a factor analysis with the prespecified number of factors can be run. Ultimately, the criterion for determining the number of factors should be replicability of the solution. It is important to extract only factors that can be expected to replicate themselves when a new sample of subjects is employed.

4.4.2.10 Interpretability

Another very important criterion for determining the number of factors is the interpretability of the factors extracted. Factor solutions should be evaluated not only according to empirical criteria, but also according to the criterion of theoretical meaningfulness.

3456_Book.book Page 88 Thursday, November 21, 2002 12:40 PM

4.4.2.11 Eigenvalues

Eigenvalues measure the amount of variation in the total sample accounted for by each factor and reveal the explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be excluded as redundant. Note that the eigenvalue is not the percent of variance explained but rather a measure of “amount” used for comparison with other eigenvalues. The eigenvalue of a factor may be computed as the sum of its squared factor loadings for all the variables. Note that the eigenvalues associated with the unrotated and rotated solution will differ, although the sum of all eigenvalues will be the same.

4.4.2.12 Factor Loadings

Factor loadings are the basis for assigning labels to the various factors, and they represent the correlation or linear association between a variable and the latent factors. Factor loadings are represented by a p ¥ k matrix of correlations between the original variables and their factors, where p is the number of variables and k is the number of factors retained. Factor loadings greater than 0.40 in absolute value are frequently used to make decisions regarding significant loading. As the sample size and the number of variables increase, the criterion may have to be adjusted slightly downward; as the number of factors increase, the criterion may have to be adjusted upward. The procedure described next outlines the steps of interpreting a factor matrix.

Once all significant loadings are identified, we can assign some mean-ing to the factors based on the factor loadmean-ings patterns. First, examine the significant loading for each factor. In general, the larger the absolute size of the factor loading for a variable, the more important the variable is in interpreting the factor. The sign of the loading also must be considered in labeling the factors. By considering the loading of all variables on a factor, including the size and sign of the loading, we can determine what the underlying factor may represent.

The squared factor loading is the percent of variance in that variable explained by the factor. To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor and divide by the number of variables. Note that the number of variables equals the sum of their variances, as the variance of a standardized variable is 1. This is the same as dividing the factor’s eigenvalue by the number of variables. The ratio of the squared factor loadings for a given variable shows the relative importance of the different factors in explaining the variance of the given variable.

3456_Book.book Page 89 Thursday, November 21, 2002 12:40 PM

4.4.2.13 Factor Rotation

The idea of simple structure and ease of interpretation form the basis for rotation. The goal of factor rotation is to rotate the factors simultaneously in order to have as many zero loadings on each factor as possible. The sum of eigenvalues is not affected by rotation, but rotation will alter the eigenvalues of particular factors. The rotated factor pattern matrix is calculated by post-multiplying the original factor pattern matrix by the orthogonal transformation matrix.

The simplest case of rotation is an orthogonal rotation in which the angle between the reference axes of factors is maintained at 90 degrees.

More complicated forms of rotation allow the angle between the refer-ence axes to be other than a right angle (i.e., factors can be correlated with each other). These types of rotational procedures are referred to as oblique rotations. Orthogonal rotation procedures are more commonly used than oblique rotation procedures. In some situations, theory may mandate that underlying latent factors be uncorrelated with each other, and oblique rotation procedures would not be appropriate. In other situations, when the correlations between the underlying factors are not assumed to be zero, oblique rotation procedures may yield simpler and more interpretable factor patterns. In all cases, interpretation is easiest if we achieve what is called simple structure. In simple structure, each variable is highly associated with one and only one factor. If that is the case, we can name factors for the observed variables highly associated with them.

VARIMAX is the most widely used orthogonal rotation method, and PROMAX is the most popular oblique rotation method. VARIMAX rotation produces factors that have high correlations with one smaller set of variables and little or no correlation with another set of variables. Each factor will tend to have either large or small loadings of particular variables on it. A VARIMAX solution yields results that make it as easy as possible to identify each variable with a single factor.

PROMAX rotation is a nonorthogonal rotation method that is compu-tationally faster and therefore is recommended for very large datasets.

PROMAX rotation begins with a VARIMAX rotation and makes the larger loadings closer to 1.0 and the smaller loadings closer to 0, resulting in an easy-to-interpret simple factor structure. When an oblique rotation method is performed, the output also includes a factor pattern matrix, which is a matrix of standardized regression coefficients for each of the original variables on the rotated factors. The meaning of the rotated factors is inferred from the variables significantly loaded on their factors. One downside of an oblique rotation method is that if the correlations among the factors are substantial, then it is sometimes difficult to distinguish

3456_Book.book Page 90 Thursday, November 21, 2002 12:40 PM

among factors by examining the factor loadings. In such situations, inves-tigate the factor pattern matrix, which displays the variance explained by each factor and the final communality estimates.

4.4.2.14 Standardized Factor Scores

Standardized factor scores are the scores of all the cases on all the factors, where cases are the rows and factors are the columns. Factor scores can quantify individual cases on a latent factor using a z-score scale, which ranges from approximately –3.0 to +3.0. The SAS FACTOR procedure can provide the estimated scoring confidents, which are then used in PROC SCORE to produce a matrix of estimated factor scores. These scores can then be input into an SAS dataset for further analysis.10

Dans le document Data Mining Using (Page 96-101)