• Aucun résultat trouvé

Scoring New Regression Data Using the SAS Macro RSCORE

Dans le document Data Mining Using (Page 191-195)

Supervised Learning Methods: Prediction

Step 1. Fit a simple BLR model:

5.7 Scoring New Regression Data Using the SAS Macro RSCORE

The RSCORE macro is a powerful SAS application for scoring new datasets using the established regression model estimates. Options are available for checking the residuals graphically if observed response variables are available in the new dataset; otherwise, only predicted scores are produced and saved in a SAS dataset. In addition to the SAS REG procedure, GPLOT is also utilized in the RSCORE macro. No procedures or options are currently available in the SAS systems to compare the residuals from the new data automatically.

Software requirements for using the RSCORE macro include:

3456_Book.book Page 182 Wednesday, November 20, 2002 11:34 AM

SAS/CORE, SAS/BASE, SAS/STAT, and SAS/GRAPH must be licensed and installed at the site.

SAS version 8.0 and above is recommended for full utilization.

An active Internet connection is required for downloading the RSCORE macro from the book website if the companion CD-ROM is not available.

5.7.1 Steps Involved in Running the RSCORE Macro

1. Create a new scoring SAS dataset (permanent or temporary) con-taining one response (optional) variable and continuous predictor (input) variables. This new dataset should contain all the predictor variables that were used to develop the original regression model.

2. Verify that the regression parameter estimates are available in an SAS dataset. If the REGDIAG macro was used to fit the original regression model, it should have created an SAS dataset called

“regest”. Check the “regest” data and create a temporary dataset to use for scoring the new dataset.

3. If the companion CD-ROM is not available, first verify that the Internet connection is active. Open the RSCORE.sas macro-call file in the SAS PROGRAM EDITOR window. Instructions are given in the Appendix regarding downloading the macro-call and sample data files from the book website. If the companion CD-ROM is available, the RSCORE.sas macro-call file can be found in the mac-call folder on the CD-ROM. Open the RSCORE.sas macro-mac-call file in the SAS PROGRAM EDITOR window. Click the RUN icon to submit the macro-call file RSCORE.sas and open the macro-call window called RSCORE (Figure 5.18).

4. Input the appropriate parameters in the macro-call window by following the instructions provided in the RSCORE macro help file in Section 5.7.2. After inputting all the required macro parameters, be sure the cursor is in the last input field and the RESULTS VIEWER window is closed, then hit the ENTER key (not the RUN icon) to submit the macro.

5. Examine the LOG window (only in DISPLAY mode) for any macro execution errors. If any errors are reported in the LOG window, activate the PROGRAM EDITOR window, resubmit the RSCORE.sas macro-call file, check the macro input values, and correct any input errors.

6. If no errors are found in the LOG window, activate the PROGRAM EDITOR window, resubmit the RSCORE.sas macro-call file, and change the macro input value from DISPLAY to any other desirable format (see Section 5.7.2). The SAS output files from RSCORE analysis and RSCORE charts can be saved as user-specified-format files in the user-specified folder.

3456_Book.book Page 183 Wednesday, November 20, 2002 11:34 AM

5.7.2 Help File for Using SAS Macro RSCORE

1. Macro-call parameter: Input name of the new scoring SAS dataset name (required parameter).

Descriptions and explanation: Input the name of the temporary (member name) or permanent (libname.member_name) SAS dataset to be scored using the established regression model.

Options/examples:

Permanent SAS dataset: gf.new (LIBNAME: gf; SAS dataset name: new)

Temporary SAS dataset: new (SAS dataset name)

2. Macro-call parameter: Input the regression parameter estimate data name (required parameter).

Descriptions and explanation: Input the name of the temporary (member name) or permanent (libname.member_name) SAS dataset to be used in scoring the new dataset specified in macro input option #1.

Options/examples:

Permanent SAS dataset: gf.regest (LIBNAME: gf; SAS dataset name: regest)

Temporary SAS dataset: regest (SAS dataset name) 3. Macro-call parameter: Input response variable name (optional

parameter).

Descriptions and explanation: If a response variable is avail-able in the “new” dataset, input the name of the response variavail-able.

The RSCORE macro can also estimate the residual and compute the R2 for prediction. If a response value is not available, leave this field blank.

Options/examples:

Y (name of a continuous response)

4. Macro-call parameter: Input model terms (required statement).

Descriptions and explanation: Input the regression model used to develop the original regression estimates. This must be identical to the PROC REG model statements specified when estimating the regression model using the REGDIAG macro.

Options/examples:

X1 X2 X3 X2X3 X2SQ (X1, X2, and X3 are linear predictors;

X2X3 is the interaction term; and X2SQ is the quadratic term for X2)

5. Macro-call parameter: Display or save SAS output and graphs (required statement).

3456_Book.book Page 184 Wednesday, November 20, 2002 11:34 AM

Descriptions and explanation: Option for displaying all out-put/graphics files in the OUTPUT/GRAPHICS window or saving as a specific format in a folder specified in macro input option #6.

Options/examples: See Section 5.5.2 (macro input option #11) for detailed explanations for these options:

DISPLAY WORD WEB PDF TXT

Note: System messages are deleted from the LOG window if DISPLAY is not selected as the input.

6. Macro-call parameter: Folder to save SAS graphics and output files (optional statement).

Descriptions and explanation: To save the SAS graphics files in an EMF format suitable for inclusion in PowerPoint presentations, specify the output format as TXT in version 8.0 or later. In pre-8.0 versions, all graphic format files will be saved in a user-specified folder. Similarly, output files in WORD, HTML, PDF, and TXT formats will be saved in the user-specified folder. If this macro field is left blank, the graphics and output files will be saved in the default folder.

Option/example:

c:\output\ — folder named “OUTPUT”

7. Macro-call parameter: ith number of analysis (required statement).

Descriptions and explanation: SAS output files will be saved by forming a file name from the original SAS dataset name and using the counter value provided in this field. For example, if the original SAS dataset name is “sales” and the counter number included is 1, the SAS output files will be saved as “sales1” in the user-specified folder. By changing the counter value, users can avoid replacing the previous SAS output files with new outputs.

8. Macro-call parameter: Optional input ID variable (optional statement).

Descriptions and explanation: If a unique ID variable can be used to identify each record in the database, input that variable name here. This will be used as the ID variable so that any outlier/influential observations can be detected. If no ID variable is available in the dataset, leave this field blank. This macro can create an ID variable based on the observation number from the database.

Option/example:

ID NUM

3456_Book.book Page 185 Wednesday, November 20, 2002 11:34 AM

Dans le document Data Mining Using (Page 191-195)