some basic checkpoints on multiple regression using small sample

R 데이터 분석

some basic checkpoints on multiple regression using small sample

YONG_X 2013. 9. 29. 21:52

Regression Memo ....

some basic checkpoints on

multiple regression using small sample

A p-value of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real.

It is generally accepted practice to consider variables with a p-value of less than .1 as significant, though the only basis for this cutoff is convention.

If the variables are very closely related, and/or if you have only a small number of observations, it can be difficult to separate these effects. Your regression gives you the coefficients that best describe your set of data, but the independent variables may not have a good p-value if multicollinearity is present. Sometimes it may be appropriate to remove a variable that is related to others, but it may not always be appropriate.

This does not necessarily mean that the model as a whole is hurt, but it may mean that the model should not be used to draw conclusions about the relationship of individual independent variables with the dependent variable.

--------------

http://stats.stackexchange.com/questions/29612/minimum-number-of-observations-for-multiple-linear-regression

The general rule of thumb (based on stuff in Frank Harrell's book, Regression Modeling Strategies) is that if you expect to be able to detect reasonable-size effects with reasonable power, you need 10-20 observations per parameter (covariate) estimated.

----------------

http://www.quality-control-plan.com/StatGuide/linreg_ass_viol.htm

Does your data violate linear regression assumptions?

Potential assumption violations include:

Implicit independent variables: X variables missing from the model

Lack of independence in Y: lack of independence in the Y variable

Outliers: apparent nonnormality by a few data points

Nonnormality: nonnormality of the Y variable

Variance of Y not constant

Correct model is nonlinear

X variable is random, not fixed

Patterns in plot of data: detecting violation assumptions graphically

Special problems with few data points

Special problems with regression through the origin

------------

For an unstable model, a small change in the data (by adding or removing a data point, for example) may lead to large changes of the parameter values.

------------

저작자표시 비영리 변경금지

'R 데이터 분석' 카테고리의 다른 글

[R 라인플롯][시계열분석] 페이스북에 대한 관심 시계열 분석 예제 (0)	2014.02.03
multiple regression for forecasting : considerations (0)	2013.10.05
구글트렌즈로 보는 드러커와 스티브잡스 분석[R 시계열분석] (0)	2013.09.27
Error in solve.default(sigma): system is computationally singular (0)	2013.09.25
twitter access tmp (0)	2013.08.29

현재글some basic checkpoints on multiple regression using small sample

리비젼 CRM ( revisioncrm )

프롬프트엔지니어링, 리비젼, 전용준, 디지털마케팅, R, 인공지능, 빅데이터, 데이터분석, 챗GPT, 빅 데이터, GPT, 데이터 사이언티스트, CRM, 리비젼컨설팅, 머신러닝, 프롬프트, 데이터 분석, AI, chatGPT, 전용준 빅데이터,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30