• Aucun résultat trouvé

Logistic Regression Using SAS Macro LOGISTIC

Dans le document Data Mining Using (Page 195-200)

Supervised Learning Methods: Prediction

Step 1. Fit a simple BLR model:

5.8 Logistic Regression Using SAS Macro LOGISTIC

The LOGISTIC macro is a powerful SAS application for performing complete and user-friendly logistic regressions with and without cate-gorical predictor variables. Options are available for performing various logistic regression diagnostic graphs and tests. The SAS pr ocedure, LOGISTIC is the main tool used in the macro. In addition to these SAS procedures, GPLOT is also utilized in the LOGISTIC macro to obtain diagnostic graphs. The advantages of using the LOGISTIC macro over the PROC LOGISTIC include:

Production of logistic regression diagnostic plots such as overlaid partial delta logit and simple logit plots (PROC GPLOT) for each continuous predictor variable

Production of outlier detection plots, ROC curves, and false positive and negative plots for assessing the overall model fit

Option for excluding extreme outliers (delta deviance >4) and then performing logistic regression using the remaining data points Option for validating the fitted logistic model obtained from a

training dataset using an independent validation dataset by com-paring Brier scores

Options for saving the output tables and graphics in WORD, HTML, PDF, and TXT formats

Software requirements for using the LOGISTIC macro include:

SAS/CORE, SAS/BASE, SAS/STAT, and SAS/GRAPH must be licensed and installed at the site.

SAS version 8.0 and above is recommended for full utilization.

An active Internet connection is required for downloading the LOGISTIC macro from the book website if the companion CD-ROM is not available.

5.8.1 Steps Involved in Running the LOGISTIC Macro

1. Create an SAS dataset (permanent or temporary) containing at least one binary response (target) variable and many continuous and/or categorical predictor (input) variables. Code 0 as non-event and 1 as an event in the data file. The LOGISTIC macro will model the probability of the event. (Disabling the SAS-enhanced editor in the latest SAS versions is highly recommended; open only one PRO-GRAM EDITOR window during the execution of this macro.)

3456_Book.book Page 186 Wednesday, November 20, 2002 11:34 AM

2. If the companion CD-ROM is not available, first verify that the Internet connection is active. Open the LOGISTIC.sas macro-call file in the SAS PROGRAM EDITOR window. Instructions are given in the Appendix regarding downloading the macro-call and sample data files from the book website. If the companion CD-ROM is available, the LOGISTIC.sas macro-call file can be found in the mac-call folder on the CD-ROM. Open the LOGISTIC.sas macro-call file in the SAS PROGRAM EDITOR window. Click the RUN icon to submit the call file LOGISTIC.sas to open the macro-call window macro-called LOGISTIC (Figure 5.20).

3. Input the appropriate parameters in the macro-call window by following the instructions provided in the LOGISTIC macro help file in Section 5.8.2. Users can choose to exclude large extreme observations from analysis. After inputting all the required macro parameters, be sure the cursor is in the last input field and the RESULTS VIEWER window is closed, then hit the ENTER key (not the RUN icon) to submit the macro.

4. Examine the LOG window (only in DISPLAY mode) for any macro execution errors. If any errors are reported in the LOG window, activate the PROGRAM EDITOR window, resubmit the LOGIS-TIC.sas macro-call file, check the macro input values, and correct any input errors.

5. If no errors are found in the LOG window, activate the PROGRAM EDITOR window, resubmit the LOGISTIC.sas macro-call file, and change the macro input value from DISPLAY to any other desirable format (see Section 5.8.2). The SAS output files from complete logistic modeling and diagnostic graphs can be saved in a user-specified-format in the user-specified folder.

5.8.2 Help File for SAS Macro LOGISTIC

1. Macro-call parameter: Input SAS dataset name (r equired parameter).

Descriptions and explanation: Include the name of the tempo-rary (member_name) or permanent (libname.member_name) SAS dataset on which the logistic regression analysis will be performed.

Options/examples:

Permanent SAS dataset: gf.bank (LIBNAME: gf; SAS dataset name: bank)

Temporary SAS dataset: bank (SAS dataset name)

2. Macro-call parameter: Input binary response variable name (required parameter).

3456_Book.book Page 187 Wednesday, November 20, 2002 11:34 AM

Descriptions and explanation: Input the binary response vari-able name from the SAS dataset to be modeled as the target variable.

Option/example:

Y (name of the binary response)

3. Macro-call parameter: Input group variables (optional statement).

Descriptions and explanation: To include categorical variables from the SAS dataset as predictors in logistic regression modeling, input the names of these variables.

Options/examples:

month manager

Blank (categorical predictors are not used)

4. Macro-call parameter: Input continuous predictor variable names (optional statement).

Descriptions and explanation: For each continuous predictor variable specified, an overlaid partial delta logit/simple logit plot will be produced automatically. This diagnostic plot is useful in checking for nonlinearity, the significance of the parameter esti-mate, and the presence of multicollinearity among the predictor variables. If categorical variables are also included in macro input option #3, the logistic parameter estimates are adjusted for these categorical variables. No interaction terms between categorical and the continuous predictors are included in the diagnostic plots when computing the delta logit values. Leave this field blank when only categorical variables are in the model statement.

Options/examples:

X1 X2 X3

5. Macro-call parameter: Input model terms (required option).

Descriptions and explanation: This macro input field is equiv-alent to the right side of the equal sign in the PROC LOGISTIC model statement. You can include main effects of categorical variables, linear effects of continuous variables, and any interac-tions among these effects.

Options/examples:

X1 X2 X1X2 X1SQ (continuous predictor variables, including linear predictors X1 and X2, an interaction term X1X2, and the quadratic term for X1, X1SQ)

X1 SOURCE X1*SOURCE (logistic regression with categorical variables SOURCE, linear predictor X1, and an interaction term between X1 and SOURCE, X1*SOURCE)

(For details about specifying model statements, refer to the SAS online manuals on PROC LOGISTIC.36)

6. Macro-call parameter: Exploratory analysis (optional statement).

3456_Book.book Page 188 Wednesday, November 20, 2002 11:34 AM

Descriptions and explanation: To perform exploratory analysis, input YES in this field, which will result in partial delta logit plots and variable selection by forward selection method for continuous predictors, or predicted probability plots for the categorical variable model. If this field is left blank, this macro skips the exploratory analysis step and goes directly to the logistic regression step.

Options/examples:

YES: Perform exploratory analysis Blank: Skip exploratory analysis

7. Macro-call parameter: Overdispersion correction (required statement).

Descriptions and explanation: To not adjust for overdispersion, input NONE. But, if the test for overdispersion indicates that a high degree of overdispersion exists, adjust for it by inputting either DEVIANCE or PEARSON.

Options/examples:

NONE: No overdispersion adjustment

DEVIANCE: Overdispersion adjustment by DEVIANCE factor PEARSON: Overdispersion adjustment by PEARSON factor 8. Macro-call parameter: Customized odds ratios/parameter test

(optional statement).

Descriptions and explanation: Input appropriate statements to obtain customized odds ratio estimates (UNITS option) or test the parameter estimates for specific values (TEST option).

Options/examples:

Units x1=0.5 –0.5 (to obtain customized odds ratio estimate for X1 predictor when X1 is increased or decreased by 0.5 unit)

Test x1=0.05 (to test the hypothesis that the X1 parameter estimate is equal to 0.5)

Units x1=0.5 –0.5 ; Test x1=0.05 (to obtain both customized odds ratio and performing parameter test)

Note the “;” after the first statement. When more than one statement is specified, include a “;” at the end of each statement (except for the last statement).

9. Macro-call parameter: Input validation dataset name (optional parameter).

Descriptions and explanation: To validate the logistic regres-sion model obtained from a training dataset by using an indepen-dent validation dataset, input the name of the SAS validation dataset. This macro estimates predicted probabilities for the vali-dation dataset using the model estimates derived from the training data. The success of this prediction could be verified by checking

3456_Book.book Page 189 Wednesday, November 20, 2002 11:34 AM

the deviance residual and the Brier scores for the validation dataset. Input the name of the temporary (member name) or permanent (libname.member_name) SAS dataset to be treated as the validation data.

Options/examples:

Permanent SAS dataset: gf.valid (LIBNAME: gf; SAS dataset name: valid)

Temporary SAS dataset: valid (SAS dataset name) 10. Macro-call parameters: Input ID variable (optional statement).

Descriptions and explanation: If a unique ID variable can be used to identify each record in the database, input that variable name here. This will be used as the ID variable so that any outlier/influential observations can be detected. If no ID variable is available in the dataset, this field can be left blank. This macro can create an ID variable based on the observation number from the database.

Option/example:

ID NUM

11. Macro-call parameter: zth number of analysis (required statement).

Descriptions and explanation: SAS output files will be saved by forming a file name from the original SAS dataset name and the counter value provided in this field. For example, if the original SAS dataset name is “sales” and the counter number included is 1, the SAS output files will be saved as “sales1” in the user-specified folder. By changing the counter value, users can avoid replacing previous SAS output files with new outputs.

12. Macro-call parameter: Display or save SAS output and graphs (required statement).

Descriptions and explanation: Option for displaying all output files in the OUTPUT window or saving as a specific format in a folder specified in macro input option #12.

Options/examples: See Section 5.5.2 for explanation of these formats:

DISPLAY WORD WEB PDF TXT

Note: System messages are deleted from the LOG window if DISPLAY is not selected as the input.

13. Macro-call parameter: Folder to save SAS graphics and output files (optional statement).

3456_Book.book Page 190 Wednesday, November 20, 2002 11:34 AM

Descriptions and explanation: To save the SAS graphics files in an EMF format suitable for inclusion in PowerPoint presentations, specify the output format as TXT in version 8.0 or later. In pre-8.0 versions, all graphic format files will be saved in a user-specified folder. Similarly, output files in WORD, HTML, PDF, and TXT formats will be saved in the user-specified folder. If this macro field is left blank, the graphics and output files will be saved in the default folder.

Option/example:

c:\output\ — folder named “OUTPUT”

14. Macro-call parameter: Adjust for extreme influential observations (optional parameter).

Descriptions and explanation: If YES is input for this option, the macro will fit the logistic regression model after excluding extreme (delta deviance, >4.0) observations. An output of all excluded observations is also produced.

Options/examples:

YES: Extreme outliers will be excluded from the analysis Blank: All observations in the dataset will be used 15. Macro-call parameter: Input cutoff p value (required option).

Descriptions and explanation: Input the cutoff p value for classifying the predicted probability as event or non-event.

Options/examples:

0.45 0.5 0.55 0.60

5.9 Scoring New Logistic Regression Data Using

Dans le document Data Mining Using (Page 195-200)