• Aucun résultat trouvé

Help File for SAS Macro DISJCLUS

Dans le document Data Mining Using (Page 136-140)

Unsupervised Learning Methods

4.8 Disjoint Cluster Analysis Using SAS Macro DISJCLUS

4.8.2 Help File for SAS Macro DISJCLUS

1. Macro-call parameter: Input SAS dataset name (required parameter).

Descriptions and explanation: Include the name of the tempo-rary (member_name) or permanent (libname.member_name) SAS dataset for which a disjoint cluster analysis will be performed. It should be in the form of coordinate data: rows (cases) ¥ columns (variables).

Options/examples:

Permanent SAS dataset: gf.cars93 (LIBNAME: gf; SAS dataset name: cars93)

Temporary SAS dataset: cars93 (SAS dataset name) 2. Macro-call parameter: Exploratory cluster analysis (optional

parameter).

Descriptions and explanation: Displays the results of cluster groupings in a simple two-variable scatterplot display; verifies the optimum cluster number by CCC, pseudo F statistic (PSF), and pseudo T2 statistics (PST2); and selects variables using the back-ward elimination method in stepwise discriminant analysis.

Figure 4.10 Screen copy of DISJCLUS call window showing the macro-call parameters required for performing complete disjoint cluster analysis.

3456_Book.book Page 126 Thursday, November 21, 2002 12:40 PM

Options/examples:

Yes: (1) Results of the disjoint cluster analysis are displayed in a simple scatterplot matrix if the number of multi-attributes is less than eight. (2) Plots of CCC, PSF, and PST2 against cluster numbers ranging from 1 to 20 for verifying the optimum cluster number are produced. (3) Results of back-ward elimination variable selection in stepwise discriminant analysis are produced. Note: Verification of cluster groupings by canonical discriminant analysis or checking for multivari-ate normality is not performed if YES is selected in this field.

Blank: Only disjoint cluster analysis and canonical discrim-inant analyses are performed. Exploratory cluster analysis is not performed.

3. Macro-call parameter: Input the number of disjoint clusters (required options).

Descriptions and explanation: Input the number of disjoint clusters you would like to extract.

Options/examples:

3 10 25

4. Macro-call parameter: Check for assumptions (optional statement).

Descriptions and explanation: To check for multivariate nor-mality assumption and detect for any extreme outliers or influential data, input YES. Multivariate normality is a requirement for canon-ical discriminant analysis but is not a requirement for DCA. If this field is left blank, this step will be omitted.

Options/examples:

Yes: Statistical estimates of multivariate skewness and kur-tosis and their statistical significance, Q–Q plots for checking for multivariate normality, and multivariate outlier detection plots are produced.

Blank: If the macro input field is left blank, no statistical estimates for checking for multivariate normality and detect-ing for outliers are performed.

5. Macro-call parameter: Input continuous multi-attribute variable names (required parameter for performing disjoint cluster analysis on coordinate data).

Descriptions and explanation: Input continuous multi-attribute names from the SAS dataset that are to be included in the DISJCLUS analysis.

Options/examples:

X4 X8 X11 X15 (names of continuous multi-attributes) 6. Macro-call parameter: Input ID variable (optional statement).

3456_Book.book Page 127 Thursday, November 21, 2002 12:40 PM

Descriptions and explanation: Input the name of the variable to be treated as the ID. If this field is left blank, a character variable will be created from the observational number and will be used as the ID variable.

Option/example:

Car ID model

7. Macro-call parameter: Use PRINCOMP of MVAR (optional statement).

Descriptions and explanation: Input YES if severe multicol-linearity exists among the multi-attributes and to perform disjoint cluster using all principal components of the variables specified in macro input option #5.

Options/examples:

Yes: Cluster analysis based on PCA.

Blank: Cluster analysis based on standardized data.

8. Macro-call parameter: Folder to save SAS output (optional statement).

Descriptions and explanation: To save the SAS output files in a specific folder, input the full path of the folder. The SAS dataset name will be assigned to the output file. If this field is left blank, the output file will be saved in the default folder.

Options/examples:

Possible values

c:\output\ — folder named “OUTPUT”

s:\george\ — folder named “George” in network drive S Be sure to include the back-slash at the end of the folder name.

9. Macro-call parameter: Folder to save SAS graphics (optional statement).

Descriptions and explanation: To save the SAS graphics files in the EMF format suitable for inclusion in PowerPoint presentations, specify the output format as TXT in version 8.0 or later. In pre-8.0 versions, all graphic format files will be saved in a user-specified folder. If the graphics folder field is left blank, the graphics file will be saved in the default folder.

Options/examples:

Possible values

c:\output\ — folder named “OUTPUT”

10. Macro-call parameter: zth number of run (required statement).

Descriptions and explanation: SAS output files will be saved by forming a file name from the original SAS dataset name and the counter value provided in this field. For example, if the original

3456_Book.book Page 128 Thursday, November 21, 2002 12:40 PM

SAS dataset name is “gf.cars93” and the counter number included is 1, the SAS output files will be saved as “gf.cars931.*” in the user-specified folder. By changing the counter value, users can avoid replacing the previous SAS output files with new outputs.

11. Macro-call parameter: Display or save SAS output (required statement).

Descriptions and explanation: Option for displaying all output files in the OUTPUT window or saving as a specific format in a folder specified in macro input option #8.

Options/examples:

Possible values

DISPLAY: Output will be displayed in the OUTPUT window, all SAS graphics will be displayed in the GRAPHICS window, and system messages will be displayed in LOG window.

WORD: Output and all SAS graphics will be saved together in the user-specified folder and displayed in the VIEWER window as a single RTF format file (version 8.0 and later if MS Word is installed on the system). In pre-8.0 versions, the SAS output is saved only as a text file and all graphics files (CGM) are saved separately in a user-specified folder (macro input option #9).

WEB: Output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single HTML file (version 8.0 and later). In pre-8.0 versions, the SAS output is saved only as a text file and all graphics files (GIF) are saved separately in a user-specified folder (macro input option #9).

PDF: Output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single PDF file (version 8.2 and later if Adobe Acrobat Reader is installed on the system). In pre-8.2 versions, the SAS output is saved only as a text file and all graphics files (PNG) are saved separately in a user-specified folder (macro input option #9).

TXT: Output will be saved as a TXT file in all SAS versions;

no output will be displayed in the OUTPUT window. All graphic files will be saved in the EMF format (version 8.0) or CGM format (pre-8.0 versions) in a user-specified folder (macro input option #9).

Note: System messages are deleted from the LOG window if DISPLAY is not selected in macro input option #11.

3456_Book.book Page 129 Thursday, November 21, 2002 12:40 PM

4.8.3 Case Study 3: Disjoint Cluster Analysis of 1993

Dans le document Data Mining Using (Page 136-140)