DataScience
article thumbnail
728x90

 

 

basic1 (1).csv
0.00MB

 

주어진 데이터에서 결측치가 80%이상 되는 컬럼은(변수는) 삭제하고, 80% 미만인 결측치가 있는 컬럼은 'city'별 중앙값으로 값을 대체하고 'f1'컬럼의 평균값

library(dplyr)
df=read.csv('../input/bigdatacertificationkr/basic1.csv')
apply(is.na(df),2,sum) # f1에 31개
df1=df %>% group_by(city) 
	   %>% mutate(pre_f1=ifelse(is.na(f1),median(f1,na.rm=T),f1))
mean(df1$pre_f1)

#정답:65.52

 

 

 

 

profile

DataScience

@Ninestar

포스팅이 좋았다면 "좋아요❤️" 또는 "구독👍🏻" 해주세요!