• Aucun résultat trouvé

Help File for SAS Macro REGDIAG

Dans le document Data Mining Using (Page 181-186)

Supervised Learning Methods: Prediction

Step 1. Fit a simple BLR model:

5.5 Multiple Linear Regression Using SAS Macro REGDIAG

5.5.2 Help File for SAS Macro REGDIAG

1. Macro-call parameter: Input SAS dataset name (required parameter).

Descriptions and explanation: Include the name of the tempo-rary (member_name) or permanent (libname.member_name) SAS dataset on which the regression analysis will be performed.

3456_Book.book Page 172 Wednesday, November 20, 2002 11:34 AM

Options/examples:

Permanent SAS dataset: gf.sales (LIBNAME: gf; SAS dataset name: sales)

Temporary SAS dataset: sales (SAS dataset name)

2. Macro-call parameter: Input continuous response variable name (required parameter).

Descriptions and explanation: Input the continuous response variable name from the SAS dataset to model as the target variable.

Option/example:

Y (name of a continuous response)

3. Macro-call parameter: Input group variables (optional statement).

Descriptions and explanation: To include categorical variables from the SAS dataset as predictors in regression modeling, input the names of these variables. The REGDIAG macro will use PROC GLM for regression modeling and use the categorical variable names in the GLM CLASS statement. If this field is left blank, the REGDIAG macro will use PROC REG for regression modeling and fit the regression model using the continuous variables specified in macro input field #5.

Options/examples:

month manager (regression modeling using PROC GLM) Blank (if the macro input field is left blank, PROC REG is used)

4. Macro-call parameter: Input the alpha level (required parameter).

Descriptions and explanation: Input the alpha level for com-puting the confidence interval estimates for parameter estimates.

Options/examples:

0.05 0.01 0.10

5. Macro-call parameter: Input the predictor variables (optional statement).

Descriptions and explanation: Input the continuous predictor variable names. If macro input field #3 is left blank, PROC REG is used to fit the MLR modeling using these variables as predictors.

The checking of significant quadratic and cross-product effects for all predictor variables will be performed using PROC RSREG.

Model selection based on all possible combinations of predictor variables will be performed using the continuous variables listed in this macro input. If YES is entered in the regression diagnostic plot option in macro input field #14, regression diagnostic plots for each variable will be generated. If macro input field #3 is not blank and categorical variable names are inputted, PROC GLM is used to fit the MLR modeling using these variables as predictors and the categorical variables (listed in macro input field #3) as

3456_Book.book Page 173 Wednesday, November 20, 2002 11:34 AM

indicator variables. If YES is entered in the regression diagnostics option in macro input field #14, partial plots are not generated, but simple scatterplots for each predictor variable by each cate-gorical variable and box plots of response by each catecate-gorical variable are generated.

Option/example:

X1 X2 X3 mpg murder (names of continuous pr edictor variables)

6. Macro-call parameter: Input model terms (required parameter).

Descriptions and explanation: This macro input field is equiv-alent to the right side of the equal sign in the PROC REG and PROC GLM model statement. In the case of fi tting MLR with continuous variables using PROC REG, input names of the contin-uous variables and quadratic and cross-product terms. Please note that the quadratic and cross-product terms should be created in the SAS dataset before specifying them in this macro field; however, to fit an MLR with indicator variables using PROC GLM, input continuous predictor variable names, the categorical variable names specified in macro input field #3, and any possible quadratic and interaction terms.

Options/examples:

X1 X2 X1X2 X1SQ (MLR using PROC REG, where X1 and X2 are the linear predictors; X1X2, the interaction term;

X1SQ, the quadratic term for X1)

X1 SOURCE X1*SOURCE (MLR with indicator variable SOURCE using PROC GLM, where X1 is the linear predictor;

SOURCE, the indicator variable; X1*SOURCE, the interaction term between X1 and SOURCE)

(For details about specifying model statements, refer to SAS online manuals on PROC REG34 and PROC GLM.35)

7. Macro-call parameter: Input model terms (optional statement).

Descriptions and explanation: Input any optional SAS PROC GLM or REG model options. Depending on the type of model being fit (GLM or REG), add these model options for additional statistics:

REG options (default options included in the macro: VIF, STB):

INFLUENCE: Additional influential statistics

COLINOINT: Multicollinearity diagnostic test statistics SS2: Type II SS

NOINT: No intercept model

GLM options (default options included in the macro:

SOLUTION):

NOINT: No intercept model

3456_Book.book Page 174 Wednesday, November 20, 2002 11:34 AM

(For details about specifying model options, refer to the SAS online manuals on PROC REG34 and PROC GLM.35)

8. Macro-call parameter: Input ID variable (optional statement).

Descriptions and explanation: If a unique ID variable can be used to identify each record in the database, input that variable name here. This will be used as the ID variable so that any influential outlier observations can be detected. If no ID variable is available in the dataset, leave this field blank. This macro can create an ID variable based on the observation number from the database.

Option/example:

ID NUM

9. Macro-call parameter: Adjust for extreme influential observations (optional parameter).

Descriptions and explanation: If YES is input for this option, the macro will fit the regression model after excluding extreme observations. Any standardized residual values falling outside ±3.5 will be treated as an outlier and will be excluded from the analysis. A printout of all excluded observations is also produced.

Options/examples:

Yes: Extreme outliers will be excluded from the analysis.

Blank: All observations in the dataset will be used.

10. Macro-call parameter: Input validation dataset name (optional parameter).

Descriptions and explanation: To validate the regression model obtained from a training dataset by using an independent validation dataset, input the name of the SAS validation dataset. This macro fits a separate regression line for the validation data and both regression models can be compared visually. Both regression mod-els and residuals from both the modmod-els can be compared to validate the regression model. Include the name of the temporary (member_name) or permanent (libname.member_name) SAS dataset to be treated as the validation data.

Options/examples:

Permanent SAS dataset: gf.valid (LIBNAME: gf; SAS dataset name: valid)

Temporary SAS dataset: valid (SAS dataset name)

11. Macro-call parameter: Display or save SAS output and graphs (required statement).

Descriptions and explanation: Option for displaying all output files in the OUTPUT window or saving as a specific format in the folder specified in macro input option #12.

3456_Book.book Page 175 Wednesday, November 20, 2002 11:34 AM

Options/examples:

Possible values

DISPLAY: Output will be displayed in the OUTPUT window, all SAS graphics will be displayed in the GRAPHICS window, and system messages will be displayed in the LOG window.

WORD: Output and all SAS graphics will be saved together in the user-specified folder and will be displayed in the VIEWER window (if MS WORD is installed in the computer) as a single RTF format file for version 8.0 and later. SAS output files will be saved as text files in pre-8.0 versions, and all graphics files (CGM) will be saved separately in a user-specified folder (macro input option #12).

WEB: Output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single HTML file (version 8.0 and later) or as a text file in pre-8.0 versions. All graphics files (GIF) will be saved sep-arately in a user-specified folder (macro input option #12).

PDF: If Adobe Acrobat Reader is installed in the computer, output and graphics are saved in the user-specified folder and are viewed in the results VIEWER window as a single PDF file (version 8.2 and later only). All graphics files (PNG) will be saved separately in a user-specified folder (macro input option #12) as a text file in pre-8.2 versions.

TXT: Output will be saved as a TXT file in all SAS versions.

No output will be displayed in the OUTPUT window. All graphic files will be saved in the EMF format (version 8.0 and later) or in the CGM format (pre-8.0 versions) in the user-specified folder (macro input option #12) folder.

Note: System messages are deleted from the LOG window if DIS-PLAY is not selected as the input.

12. Macro-call parameter: Folder to save SAS graphics and output files (optional statement).

Descriptions and explanation: To save the SAS graphics files in an EMF format suitable for inclusion in PowerPoint presentations, specify output format as TXT in version 8.0 or later. In pre-8.0 versions, all graphic format files will be saved in a user-specified folder. Similarly, output files in WORD, HTML, PDF, and TXT formats will be saved in the user-specified folder. If this macro field is left blank, the graphics and output files will be saved in the default folder.

Option/example:

c:\output\ — folder named “OUTPUT”

3456_Book.book Page 176 Wednesday, November 20, 2002 11:34 AM

13. Macro-call parameter: zth number of analysis (r equired statement).

Descriptions and explanation: SAS output files will be saved by forming a file name from the original SAS dataset name and the counter value provided in this field. For example, if the original SAS dataset name is “sales” and the counter number included is 1, then the SAS output files will be saved as “sales1” in the user-specified folder. By changing the counter value, users can avoid replacing previous SAS output files with new outputs.

14. Macro-call parameter: Regression diagnostic plots (optional parameter).

Descriptions and explanation: If YES is input and no cate-gorical variables are included in macro call parameter #2, the macro will produce regression diagnostic plots (augmented partial residuals, partial leverage plots, VIF diagnostic plots) and detect significant interactions for each predictor variable.

If YES is input and categorical variables are included in macro call parameter #2, regression diagnostic plots (X–Y scatterplots by categorical variable, box plots of r esponse by categorical variable, and regression plots for detecting significant interac-tion) for regression models with indicator variables ar e pro-duced. If this macro field is left blank, no diagnostic plots are produced.

Options/examples:

Yes: Diagnostic plots are produced for each predictor variable.

Blank: No diagnostic plots are produced.

Dans le document Data Mining Using (Page 181-186)