Python데이터분석

[파이썬] 히트맵 heatmap 을 활용한 탐색적 분석 예제

YONG_X 2019. 8. 21. 13:20

[파이썬]

히트맵 heatmap 을 활용한 탐색적 분석

예제에서 사용할 연습용 데이터 파일 :

dff01.csv

추가 데이터 (상관관계 분석용 )::

Numpy Practice in EDA :: Retail Customer Analysis -- BuyIt.com¶

In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from numpy.polynomial.polynomial import polyfit
import matplotlib.style as style 
from IPython.display import Image
import warnings
warnings.filterwarnings('ignore')

Heatmapping¶

[전용준. 리비젼컨설팅]

? 실무적인 EDA 탐색적 분석에서는 여러 변수들의 조합에 대한 동일 패턴 반복 검토 흔함
? Heatmap 은 EDA 탐색적 분석에서 반복적 사용하기 좋음 = 매우 유용
? matplotlib.scatter() 사용하는 방식 Vs. seaborn.heatmap() 선택?

In [2]:

# dff01 = dfc01[dfc01['sex']=='F'][['age','height','weight','amt_strbk','amt_book']]
# dff01.to_csv(dataPath + 'dff01.csv', index=False)
# dff01 = pd.read_csv(dataPath + 'dff01.csv')
# 블로그로부터 CSV 파일 형식의 데이터 불러오기
dff01 = pd.read_csv('https://t1.daumcdn.net/cfile/blog/992CFF3B5D5CC70C2C?download')
dff01.head()

Out[2]:

	age	height	weight	amt_strbk
0	28	157	52	22300
1	28	154	47	35100
2	28	155	52	21300
3	27	155	44	0
4	28	155	51	17500

In [3]:

# matplotlib scatter 사용
plt.scatter(dff01.weight, dff01.height)
plt.xlabel('WEIGHT')
plt.ylabel('HEIGHT')
plt.show()
plt.scatter(dff01.amt_strbk, dff01.amt_book)
plt.xlabel('STARBUCKS')
plt.ylabel('BOOK')
plt.show()

In [4]:

dff01['rto_strbk'] = dff01.amt_strbk / (dff01.amt_strbk + dff01.amt_book + 0.001)
plt.hist(dff01.rto_strbk, bins=50)
plt.title('STARBUCKS / (STARBUCKS + BOOK) = RATIO')
plt.show()
colors1 = ['red' if x>=0.99 else 'blue' for x in dff01.rto_strbk]
plt.scatter(dff01.weight, dff01.height,
        alpha=0.1, color=colors1)
plt.xlabel('WEIGHT')
plt.ylabel('HEIGHT')
plt.title('RED: STARBUCKS')
plt.show()

In [5]:

dfc02z = dff01[['rto_strbk', 'height', 'weight']].groupby(['height', 'weight']).mean().reset_index()
dfc02z1 = dff01[['rto_strbk', 'height', 'weight']].groupby(['height', 'weight']).count().reset_index()
dfc02z1.columns = ['height', 'weight', 'cnt_cust']
dfc02z2 = dfc02z.merge(dfc02z1, how='left', on=['height', 'weight'])
print(dfc02z2.head(3))
colors1 = [(x, 0, 1-x) for x in dfc02z2.rto_strbk]
# ncust = mnmx_scl2(dfc02z2.cnt_cust)
# plt.hist(dfc02z2.cnt_cust, bins=50)
# plt.show()
dfc02z2 = dfc02z2[dfc02z2.cnt_cust>=5]
# ncust = mnmx_scl2(dfc02z2.cnt_cust)
# plt.hist(dfc02z2.cnt_cust, bins=50)
# plt.show()
plt.scatter(dfc02z2.weight, dfc02z2.height, 
            color=colors1, marker='s')
plt.legend(['cust'])
plt.xlabel('weight')
plt.ylabel('height')
plt.suptitle('LADY, STARBUCKS or BOOK?')
plt.title('(red: STARKBUCKS)', size=10, color='r')
plt.show()

   height  weight  rto_strbk  cnt_cust
0     153      44        1.0         1
1     153      53        0.0         1
2     154      43        1.0         1

In [6]:

dfc02z21 = dfc02z2
dfc02z21.drop(['cnt_cust'], axis=1)
dfc02z21.head()

Out[6]:

	height	weight	rto_strbk	cnt_cust
16	155	46	0.742790	6
21	155	51	1.000000	6
30	156	43	0.666667	6
31	156	44	0.812500	16
32	156	45	0.750000	12

seaborn의 heatmap 기능을 사용해본다면?¶

In [7]:

import seaborn as sns
# 피보팅을!
dfc02z21 = dfc02z21.pivot(index='height', columns='weight',
                         values='rto_strbk')
sns.heatmap(dfc02z21, 
            cmap='RdBu',
            square=True)
plt.show()

In [8]:

print('피보팅된 테이블\n--------------')
dfc02z21[[45,46,47,48,49,50]].head()

피보팅된 테이블
--------------

Out[8]:

weight	45	46	47	48	49	50
height
155	NaN	0.742790	NaN	NaN	NaN	NaN
156	0.750000	0.928571	0.926997	0.827061	0.769231	0.941176
157	0.860032	0.789474	0.861700	0.739236	0.887682	0.834955
158	0.782934	0.816333	0.840793	0.876252	0.767102	0.841637
159	1.000000	0.813832	0.787359	0.851208	0.937500	0.736842

In [9]:

import seaborn as sns
dfc02z21 = dfc02z2.pivot(index='height', columns='weight', 
                         values='rto_strbk')
# Y축 값을 재정렬 (큰 값에서 작은 값의 순서로)
dfc02z21 = dfc02z21.sort_values('height', ascending=False)
sns.heatmap(dfc02z21, 
            cmap='RdBu_r',  # 컬러맵을 반대로
            square=True, linewidth=0.1)
plt.show()

In [10]:

# 색상을 바꿔 보면 ?
sns.heatmap(dfc02z21, 
            cmap="YlGnBu",
            square=True)
plt.show()

scatter plot 에서 cell의 크기로 사람수를 표시해보면?¶

In [11]:

plt.scatter(dfc02z2.weight, dfc02z2.height, 
            s=dfc02z2.cnt_cust, color=colors1, marker='s')
plt.legend(['cust'])
plt.xlabel('weight')
plt.ylabel('height')
plt.suptitle('LADY, STARBUCKS or BOOK?')
plt.title('(red: STARKBUCKS)', size=10, color='r')
plt.show()

Summary¶

? Heatmap 은 EDA 탐색적 분석에서 매우 유용
? 생각보다는 heatmap 전용 기능 대신 scatter 사용하는 장점이 많음
? scatter -- 마음대로 그리는 것이 가능 - 신축적 FLEXIBLE!
? 기본기능이므로 버전, 옵션 바뀌는 걱정 적음

- 끝 -

[유튜브영상]

dff01.csv

0.09MB

저작자표시 비영리 변경금지

'Python데이터분석' 카테고리의 다른 글

[ssfc_pda01] Class 분석 프로젝트 진행방안 (0)	2019.10.22
파이썬: 실전팁 for EDA [전용준.리비젼컨설팅] (0)	2019.08.27
[파이썬] Numpy와 Pandas 구글 검색 지수 추이 비교 - 시각화 (0)	2019.08.08
[Python 분석]온라인 서점 고객세분화 Visual Data Exploration 예제 (0)	2019.07.18
[Dataset] Supermarket sales from Kaggle (0)	2019.07.08

현재글[파이썬] 히트맵 heatmap 을 활용한 탐색적 분석 예제

리비젼 CRM ( revisioncrm )

전용준 빅데이터, R, 인공지능, GPT, 디지털마케팅, 데이터 분석, AI, 데이터 사이언티스트, 리비젼, 챗GPT, chatGPT, 빅데이터, CRM, 프롬프트엔지니어링, 전용준, 프롬프트, 머신러닝, 빅 데이터, 데이터분석, 리비젼컨설팅,

Today :
Yesterday :

[파이썬] 히트맵 heatmap 을 활용한 탐색적 분석 예제

Numpy Practice in EDA :: Retail Customer Analysis -- BuyIt.com¶

Heatmapping¶

seaborn의 heatmap 기능을 사용해본다면?¶

scatter plot 에서 cell의 크기로 사람수를 표시해보면?¶

Summary¶

'Python데이터분석' 카테고리의 다른 글

'Python데이터분석'의 다른글

티스토리툴바

« 2025/03 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

[파이썬] 히트맵 heatmap 을 활용한 탐색적 분석 예제

Numpy Practice in EDA :: Retail Customer Analysis -- BuyIt.com¶

Heatmapping¶

seaborn의 heatmap 기능을 사용해본다면?¶

scatter plot 에서 cell의 크기로 사람수를 표시해보면?¶

Summary¶

'Python데이터분석' 카테고리의 다른 글

'Python데이터분석'의 다른글

관련글

티스토리툴바