• Aucun résultat trouvé

Help File for SAS Macro FACTOR

Dans le document Data Mining Using (Page 106-110)

Unsupervised Learning Methods

4.7 PCA and EFA Using SAS Macro FACTOR

4.7.2 Help File for SAS Macro FACTOR

1. Macro-call parameter: Input SAS dataset name (required parameter).

Descriptions and explanation: Input the name of the temporary (member_name) or permanent (libname.member_name) SAS dataset for the raw data on which PCA or EFA will be performed.

If only the correlation matrix is available and PCA or EFA is to be performed on this correlation matrix, create an SAS special corre-lation matrix data.

Options/examples:

Permanent SAS dataset — gf.cars93 (LIBNAME: gf; SAS dataset name: cars93)

Temporary SAS dataset — cars93 (SAS dataset name) 2. Macro-call parameter: Exploratory graphs (optional parameter).

Descriptions and explanation: This macro-call parameter is used to select the type of analysis (exploratory graphics and descriptive statistics or PCA/EFA).

Options/examples:

Yes — Only simple descriptive statistics, correlation matrix, and scatterplot matrix of all variables are produced. PCA/EFA is not performed.

Blank — If the macro input field is left blank, no descriptive statistics, correlation matrix, or scatterplot matrix are pro-duced. Only PCA or EFA is performed, depending on the macro input in option #6.

3. Macro-call parameter: Input continuous multi-attribute variable names (required parameter for PCA or EFA on raw data).

If this field is left blank, the macro is expected to perform PCA or FACTOR analysis on a special correlation matrix. The dataset

3456_Book.book Page 96 Thursday, November 21, 2002 12:40 PM

specified in macro input option #1 should be in the form of a correlation matrix.

Descriptions and explanation: Input continuous multi-attribute names from the SAS dataset that should be included in the PCA/EFA. If no raw data are available, but the correlation matrix is, and PCA or FACTOR analysis is to be performed on this correlation matrix, then leave this field blank. Make sure, however, that the SAS dataset type specified in macro input option #1 is a special correlation matrix.

Options/examples:

Y2 Y4 X4 X8 X11 X15 (names of continuous multi-attributes) 4. Macro-call parameter: Check for assumptions (optional statement).

Descriptions and explanation: To check for multivariate nor-mality and for the presence of any extreme multivariate outliers or influential data, input YES. If this field is to be left blank, this step is omitted.

Options/examples:

Yes — Statistical estimates for multivariate skewness, mul-tivariate kurtosis, and their statistical significance are pro-duced. In addition Q–Q plots for checking multivariate normality and multivariate outlier detection plots are also produced.

Blank — If the macro input field is left blank, no statistical estimates for checking for multivariate normality or detecting outliers are performed.

5. Macro-call parameter: Input number of PC or factors (required options).

Descriptions and explanation: Input the number of principal components or factors to be extracted. The number should be from 1 to the total number of multi-attributes.

Options/examples:

2 3 3 4

6. Macro-call parameter: Input the factor analysis method (required option).

Descriptions and explanation: Various factor analysis methods are available in SAS PROC FACTOR, but, to perform PCA, input factor analysis method P. This macro will use the default prior communality estimate 1. To perform factor analysis, select any one of the factor analysis methods other than P.

Options/examples:

P — PCA analysis with the default prior communality esti-mate 1. The scree plot analysis also includes parallel analysis plot.

3456_Book.book Page 97 Thursday, November 21, 2002 12:40 PM

PRINIT — Iterative PCA with the default prior communality estimate SMC. The scree plot analysis also includes parallel analysis plot.

ML — Maximum-likelihood factor analysis with the default prior communality estimate SMC. The scree plot analysis does not include parallel analysis plot.

For other EFA methods, refer to the PROC FACTOR section in the SAS online manual.16

7. Macro-call parameter: Input the factor rotation method (required option).

Descriptions and explanation: Different factor rotation methods are available in SAS PROC FACTOR, but, to perform PCA, input the factor rotation method None. To perform rotated factor analysis, select one of the following factor rotation methods.

Options/examples:

None — default

V — Varimax, orthogonal rotation P — Promax, oblique rotation

For other rotation methods, refer to the PROC FACTOR section in the SAS online manual.16

8. Macro-call parameter: Input ID variable (optional statement).

Descriptions and explanation: Input the name of the variable you would like to treat as ID. If this field is left blank, a character variable will be created from the observational number and will be used as the ID variable.

Option/example:

Car ID model

9. Macro-call parameter: Folder to save SAS output (optional statement).

Descriptions and explanation: To save the SAS output files in a specific folder, input the full path of the folder. The SAS dataset name will be assigned to the output file. If this field is left blank, the output file will be saved in the default folder.

Options/examples:

Possible values

c:\output\ — folder named “OUTPUT”

s:\george\ — folder named “George” in network drive S Be sure to include the back-slash at the end of the folder name.

10. Macro-call parameter: Folder to save SAS graphics (optional statement).

Descriptions and explanation: To save the SAS graphics files in an EMF format suitable for including in PowerPoint presentations, specify output format as TXT in version 8.0 or later. In pre-8.0

3456_Book.book Page 98 Thursday, November 21, 2002 12:40 PM

versions, all graphic format files will be saved in a user-specified folder. If the graphics folder field is left blank, the graphics file will be saved in the default folder.

Options/examples:

Possible values

c:\output\ — folder named “OUTPUT”

11. Macro-call parameter: zth number of run (required statement).

Descriptions and explanation: SAS output files will be saved by forming a file name from the original SAS dataset name and the counter value provided in this field. For example, if the original SAS dataset name is “gf.cars93” and counter number included is 1, the SAS output files will be saved as “gf.cars931.*” in the user-specified folder. By changing the counter value, users can avoid replacing previous SAS output files with new outputs.

12. Macro-call parameter: Display or save SAS output (required statement).

Descriptions and explanation: Option for displaying output files in the OUTPUT window or options for saving as a specific file format in a folder specified in option #9.

Options/examples:

Possible values

DISPLAY: Output will be displayed in the OUTPUT win-dow, all SAS graphics will be displayed in the GRAPHICS window, and system messages will be displayed in the LOG window.

WORD: Output and all SAS graphics will be saved together in the user-specified folder and will be displayed in the VIEWER window as a single RTF format file if MS WORD is installed on the system (version 8.0 and later) or saved only as a text file in pre-8.0 versions. All graphics files (CGM) will be saved separately in a user-specified folder (macro input option #10).

WEB: Output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single HTML file (version 8.0 and later) or saved only as a text file in pre-8.0 versions. All graphics files (GIF) will be saved separately in a user-specified folder (macro input option #10).

PDF: Output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single PDF file (version 8.2 and later) or saved only as a text file in pre-8.2 versions. All graphics files (PNG) will be saved separately in a user-specified folder (macro input option #10).

3456_Book.book Page 99 Thursday, November 21, 2002 12:40 PM

TXT: Output will be saved as a TXT file in all SAS versions.

No output will be displayed in the OUTPUT window. All graphic files will be saved in the EMF format (version 8.0 and later) or CGM format (pre-8.0 versions) in a user-specified folder (macro input option #9).

Note: System messages are deleted from the LOG window if DISPLAY is not selected as the input.

4.7.3 Case Study 1: Principal Component Analysis of 1993

Dans le document Data Mining Using (Page 106-110)