>RE::VISION CRM

R 데이터 분석

some basic checkpoints on multiple regression using small sample

YONG_X 2013. 9. 29. 21:52

Regression Memo .... 

some basic checkpoints on 

multiple regression using small sample 


A p-value of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real.

It is generally accepted practice to consider variables with a p-value of less than .1 as significant, though the only basis for this cutoff is convention.


If the variables are very closely related, and/or if you have only a small number of observations, it can be difficult to separate these effects.  Your regression gives you the coefficients that best describe your set of data, but the independent variables may not have a good p-value if multicollinearity is present.  Sometimes it may be appropriate to remove a variable that is related to others, but it may not always be appropriate.  


This does not necessarily mean that the model as a whole is hurt, but it may mean that the model should not be used to draw conclusions about the relationship of individual independent variables with the dependent variable.


--------------


http://stats.stackexchange.com/questions/29612/minimum-number-of-observations-for-multiple-linear-regression


The general rule of thumb (based on stuff in Frank Harrell's book, Regression Modeling Strategies) is that if you expect to be able to detect reasonable-size effects with reasonable power, you need 10-20 observations per parameter (covariate) estimated.


----------------


http://www.quality-control-plan.com/StatGuide/linreg_ass_viol.htm


Does your data violate linear regression assumptions?


Potential assumption violations include:


Implicit independent variables: X variables missing from the model

Lack of independence in Y: lack of independence in the Y variable

Outliers: apparent nonnormality by a few data points

Nonnormality: nonnormality of the Y variable

Variance of Y not constant

Correct model is nonlinear

X variable is random, not fixed

Patterns in plot of data: detecting violation assumptions graphically

Special problems with few data points

Special problems with regression through the origin



------------


For an unstable model, a small change in the data (by adding or removing a data point, for example) may lead to large changes of the parameter values.


------------