Multiple linear regression
« Try to explain logamout »
PROC IMPORT OUT= WORK.ESTIM DATAFILE=
"D:\SAS\estim20171220.csv"
DBMS=TAB REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
*************************
MULTIPLE REGRESSION
*************************;
* print mean;
PROC MEANS;
var gender2 matri3_4 age logamount fpublic;
run;
* centering (subtract mean);
data new2;
set estim;
logamount_c = logamount - 10.8243634;
age_c = age - 24.5697499;
run;
* check coding;
PROC MEANS;
var logamount_c age_c;
run;
* multiple regression model with fpublic;
PROC GLM; model fpublic=gender2 age_c logamount_c /solution;
run;
* regression model with matri3_4;
PROC GLM; model matri3_4=gender2 age_c /solution;
run;
* adding logamount;
PROC GLM; model matri3_4=gender2 age_c logamount_c/solution;
run;
* logamount adding more explanatory variables;
PROC GLM; model
logamount_c=gender2 age_c matri3_4 fpublic/solution;
run;
* confidence intervals for parameter estimates;
PROC GLM; model
logamount_c=gender2 age_c matri3_4 fpublic/solution clparm;
run;
We successively try to explain public sector employement, matrimony and logamount by other explanatory variables.
La procédure MEANS
Variable N Moyenne Ec-type Minimum Maximum
gender2 matri3_4 age logamount fpublic
23233 23233 23233 23233 23233
0.3049111 0.0412775 24.5697499 10.8243634 0.0201868
0.4603797 0.1989356 8.7935487 2.8145327 0.1406419
0 0 12.0000000 0 0
1.0000000 1.0000000 111.0000000 21.4875626 1.0000000 La procédure MEANS : after centering
Variable N Moyenne Ec-type Minimum Maximum
logamount_c age_c
23233 23233
3.9205181E-8 2.4675572E-8
2.8145327 8.7935487
-10.8243634 -12.5697499
10.6631992 86.4302501 La procédure GLM : fpublic
Paramètre Estimation Erreur type
Valeur du test t Pr > |t|
Constante 0.0379242784 0.00156396 24.25 <.0001 gender2 0.0109973519 0.00285779 3.85 0.0001 age_c 0.0020154072 0.00014977 13.46 <.0001 logamount_c -.0021708963 0.00046286 -4.69 <.0001
La procédure GLM : matrimony Paramètre Estimation Erreur
type
Valeur du test t Pr > |t|
Constante 0.0377216083 0.00156407 24.12 <.0001 gender2 0.0116620375 0.00285557 4.08 <.0001 age_c 0.0019681690 0.00014950 13.16 <.0001
La procédure GLM : matrimony
Paramètre Estimation Erreur
type
Valeur du test t Pr > |t|
Constante 0.0379242784 0.00156396 24.25 <.0001
gender2 0.0109973519 0.00285779 3.85 0.0001
age_c 0.0020154072 0.00014977 13.46 <.0001
logamount_c -.0021708963 0.00046286 -4.69 <.0001 La procédure GLM : logamount
Paramètre Estimation Erreur type
Valeur du test t Pr > |t| Intervalle de confiance à95%
Constante 0.1087099942 0.02256656 4.82 <.0001 0.0644780517 0.1529419368 gender2 -.3015951016 0.04047309 -7.45 <.0001 -.3809250291 -.2222651741 age_c 0.0221909202 0.00234160 9.48 <.0001 0.0176012266 0.0267806138 matri3_4 -.4365589733 0.09293786 -4.70 <.0001 -.6187233239 -.2543946226 fpublic 0.0629026810 0.14497076 0.43 0.6644 -.2212495896 0.3470549517 In conclusion, glm output show that :
women and old tends more to public sector than men and young
women, old and poor are more married than men, young and rich
women, married, young and private employe are poorer than the others