>RE::VISION CRM

R 데이터 분석

predictive modeling :: algorithms and interpretation

YONG_X 2015. 7. 8. 08:38



CART 디시젼트리 소개 및 예시 강의 자료


http://statweb.stanford.edu/~lpekelis/talks/13_datafest_cart_talk.pdf


의사결정 트리 전반적 개념 설명 (한글)

https://ko.wikipedia.org/wiki/%EA%B2%B0%EC%A0%95_%ED%8A%B8%EB%A6%AC_%ED%95%99%EC%8A%B5%EB%B2%95




Box Plot의 의미 ==> CART 해석을 위한 기초


http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/


트리 알고리즘의 종류 개관


http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/


#--------------------


[interpretation of regression result in R ]


t value is the value of the t-statistic for testing whether the corresponding regression coefficient is different from 0.

the p-value for the hypothesis test for which the t value is the test statistic

Pr(>|t|) is low ==:= useful

Residual standard error (or  s) ==>  the standard deviation of the residuals. 
==  a measure of how close the fit is to the points.

The Multiple R-squared, also called the coefficient of determination is the proportion of the variance in the data that's explained by the model. 

The Adjusted one reduces that to account for the number of variables in the model.

The F statistic on the last line is telling you whether the regression as a whole is performing 'better than random



==============

The F statistic is used to identify the model that best fits the population from which the data were sampled. 

https://en.wikipedia.org/wiki/F-test


===================

In the literal meaning of the terms, a parametric statistical test is one that makes assumptions about the parameters (defining properties) of the population distribution(s) from which one's data are drawn, while a non-parametric test is one that makes no such assumptions. In this strict sense, "non-parametric" is essentially a null category, since virtually all statistical tests assume one thing or another about the properties of the source population(s).

For practical purposes, you can think of "parametric" as referring to tests, such as t-tests and the analysis of variance, that assume the underlying source population(s) to be normally distributed; they generally also assume that one's measures derive from an equal-interval scale. And you can think of "non-parametric" as referring to tests that do not make on these particular assumptions.  

Non-parametric tests are sometimes spoken of as "distribution-free" tests, although this too is something of a misnomer.

[ http://vassarstats.net/textbook/parametric.html  ]


회귀분석의 가정과 오용 >>



[ Chi-Square Test ]


[ Boosting ]


#--------------[to partition data ]----------------

plot(sort(rnorm(100,0,1)))
hist(sort(rnorm(100,0,1)))


sample(rnorm(100,1,2), size = 10, replace = FALSE)
sample(1:5, size=20, prob=c(5,1,1,1,1), replace=TRUE)

example ::  
cars1 <- mtcars[sample(1:nrow(mtcars), 20, replace=FALSE),]

sample.int(1e10, 12, replace = TRUE)
runif(5,0,1)

plot((sort(runif(10000,0,1))))



#---------------------




'R 데이터 분석' 카테고리의 다른 글

CVS example  (0) 2015.07.09
movie prediction exmple  (0) 2015.07.08
retail :: 예측모델링 data prep + tree model  (0) 2015.07.07
:: retail data preparation and EDA 연습 과제  (0) 2015.07.06
0702 빅Labor .... 주제  (0) 2015.07.03