Regression Memo ....
some basic checkpoints on
multiple regression using small sample
A p-value of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real.
It is generally accepted practice to consider variables with a p-value of less than .1 as significant, though the only basis for this cutoff is convention.
If the variables are very closely related, and/or if you have only a small number of observations, it can be difficult to separate these effects. Your regression gives you the coefficients that best describe your set of data, but the independent variables may not have a good p-value if multicollinearity is present. Sometimes it may be appropriate to remove a variable that is related to others, but it may not always be appropriate.
This does not necessarily mean that the model as a whole is hurt, but it may mean that the model should not be used to draw conclusions about the relationship of individual independent variables with the dependent variable.
--------------
http://stats.stackexchange.com/questions/29612/minimum-number-of-observations-for-multiple-linear-regression
The general rule of thumb (based on stuff in Frank Harrell's book, Regression Modeling Strategies) is that if you expect to be able to detect reasonable-size effects with reasonable power, you need 10-20 observations per parameter (covariate) estimated.
----------------
http://www.quality-control-plan.com/StatGuide/linreg_ass_viol.htm
Does your data violate linear regression assumptions?
Potential assumption violations include:
Implicit independent variables: X variables missing from the model
Lack of independence in Y: lack of independence in the Y variable
Outliers: apparent nonnormality by a few data points
Nonnormality: nonnormality of the Y variable
Variance of Y not constant
Correct model is nonlinear
X variable is random, not fixed
Patterns in plot of data: detecting violation assumptions graphically
Special problems with few data points
Special problems with regression through the origin
------------
For an unstable model, a small change in the data (by adding or removing a data point, for example) may lead to large changes of the parameter values.
------------
'R 데이터 분석' 카테고리의 다른 글
[R 라인플롯][시계열분석] 페이스북에 대한 관심 시계열 분석 예제 (0) | 2014.02.03 |
---|---|
multiple regression for forecasting : considerations (0) | 2013.10.05 |
구글트렌즈로 보는 드러커와 스티브잡스 분석[R 시계열분석] (0) | 2013.09.27 |
Error in solve.default(sigma): system is computationally singular (0) | 2013.09.25 |
twitter access tmp (0) | 2013.08.29 |