[R 데이터 분석] 데이터 긁어오기 (Scraping )

R 데이터 분석

[R 데이터 분석] 데이터 긁어오기 (Scraping )

YONG_X 2015. 6. 9. 12:04

#-------------------------

#---------------

# 데이터 긁어오기 (Scraping ) ... 골치거리. 손대기 싫으나...

# rvest 패키지 하나만 활용

require(rvest)

#---------------

# 우선 가장 간단한 연습부터

#-------------------

# 리비젼컨에서 미리 지정된 특정 기사 제목 긁어오기

library(rvest)

# Store web url

bigdata_news <- html("http://revisioncon.co.kr/bbs/board.php?bo_table=tb08_06")

#Scrape

newsttla <- bigdata_news %>%

html_nodes("td a") %>%

html_text()

newsttla

newsttlb <- bigdata_news %>%

html_nodes(".list-date3") %>%

html_text()

newsttlb

df_aa <- data.frame(newsttla,newsttlb)

#-------------------------------

# 이번에는 ...

#----------------

# complete scraper :: 리비젼 빅데이터 뉴스 피드

newsttla <- ''

newsttlb <- ''

newsttlc <- ''

# 참고자료:: http://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

# ... selectorgadget 을 크롬에 설치 후 스크레이핑 대상 사이트의 구조(css path)를 파악하여 사용

# id 4000 번부터 4020번 까지를 긁어오기 (금방 됨)

for (i in 4000:4020 ) {

newsurl <- html(gsub(" ", "", paste("http://revisioncon.co.kr/bbs/board.php?bo_table=tb08_06&wr_id=",

as.character(i) )),fixed = TRUE)

# 해당 글의 항목별로 벡터에 추가

newsttlai <- newsurl %>%

html_nodes("#bo_v_title") %>%

html_text()

# 불필요한 문자들 제거

newsttlai <- gsub("\n", "", newsttlai)

newsttlai <- gsub("\r", "", newsttlai)

newsttlai <- gsub("\t", "", newsttlai)

newsttla <- c(newsttla , newsttlai )

# 중간 확인을 해보는 것도

# print(newsttla)

newsttlbi <- newsurl %>%

html_nodes(".view-content") %>%

html_text()

newsttlbi <- gsub("\n", "", newsttlbi)

newsttlbi <- gsub("\r", "", newsttlbi)

newsttlbi <- gsub("\t", "", newsttlbi)

newsttlb <- c(newsttlb , newsttlbi )

newsttlci <- newsurl %>%

html_nodes(".last") %>%

html_text()

# gsub() 로 특수기호들을 ""로 대치해서 제거

newsttlci <- gsub("\n", "", newsttlci)

newsttlci <- gsub("\r", "", newsttlci)

newsttlci <- gsub("\t", "", newsttlci)

newsttlc <- c(newsttlc , newsttlci )

}

# 세개로 나누어진 벡터를 하나의 데이터 프레임으로 결합

df_news <- data.frame(newsttla,newsttlb, newsttlc)

# rename columns

names(df_news)[1]<-paste("news_title")

names(df_news)[2]<-paste("news_preview")

names(df_news)[3]<-paste("news_date")

# CSV 파일 형태로 export

write.csv(df_news, "C:/Users/revision/Desktop/rvc_kb/02_EnGageMent/000_031_bigfi_ds/temp_news.csv")

# 완료

#-------------------

저작자표시 비영리 변경금지 (새창열림)

'R 데이터 분석' 카테고리의 다른 글

A0622 빅파이분석경기대 (0)	2015.06.22
[JARA] AA00_AA00 (0)	2015.06.19
[R분석] 장바구니 분석 (0)	2015.05.12
[R 분석] 왕초보 일 경우 기억할만한 몇가지 구문들:: R왕초보구문 (0)	2015.04.18
[R 데이터분석] SQLDF 에서 특정 문자열 포함 문자열 선택 Like 기능 사용법 (0)	2015.02.24

현재글[R 데이터 분석] 데이터 긁어오기 (Scraping )

리비젼 CRM ( revisioncrm )

전용준, chatGPT, 데이터 사이언티스트, R, 리비젼컨설팅, 데이터분석, GPT, 인공지능, 프롬프트, 머신러닝, 프롬프트엔지니어링, 전용준 빅데이터, AI, 데이터 분석, 빅 데이터, CRM, 리비젼, 빅데이터, 디지털마케팅, 챗GPT,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31