TIL. R - day03 review

hyuko·2022년 11월 12일

R Study

목록 보기

3/4

복습

데이터 준비, 패키지 준비

-   ex) mpg \<- as.data.frame(ggplot2::mpg) 데이터 불러오기
-   library(dplyr) dplyr 로드 
-   library(ggplot2) ggplot2 로드

데이터 파악

head() : 앞 부분
View() : raw 데이터 뷰어창에서 확인
dim() : 차원 확인
str() : 속성
summary : 요약 통계량

변수명 수정

-   ex )mpg <- rename(mpg, company = manufacture)

파생 변수 생성

-   ex) mpg$total <- <mpg$cty + mpg$hwy)/2
-   ex) mpg$t <- ifelse(mpg$total >= 20, 'Pass', 'fail')

빈도 확인

table(mpg$total) # 빈도표 출력
qplot(mpg$total) # 막대그래프 생성

dplyr 패키지 함수 요약

조건에 맞는 데이터만 추출하기

exam %>% filter(english >= 80)

여러 조건 동시에 충족

exam %>% filter(class == 1 & math >= 50)

여러 조건 중 하나 이상 충족

exam %>% filter(math >= 90 | engilsh >= 90)
exam %>% filter(class %in% c(1,3,5)) # 포함 연산자

정렬

exam %>% arrange(math) # 오름차순
exam %>% arrange(desc(math)) # 내림차순

필요한 변수만 추출

exam %>% select(math)
exam %>% select(math, science)

함수 조합하기, 일부만 추출

exam %>%
    select(id, math) %>%
    head(10)

파생 변수 추가, 여러 변수 한번에 추가하기

exam %>% mutate(total = math + english + science)
exam %>% mutate(total = math + english + science,
                mean = (math+ english + science)/3)

mutate()함수에 ifelse() 적용하기

exam %>% mutate(test = ifelse(science > 60, "과학반", "일반반"))

집단별로 요약

exam %>%
    group_by(class) %>%
    summarise(mean_math = mean(math))

집단별로 다시 집단 나누기

exam %>%
    group_by(manufacture, drv) %>%
    summarise(mean_cty = mean(cty))

데이터 합치기

- 가로
total <- left_join(test1, test2, by = "id") by 기준으로 합쳐라

- 세로
all <- bind_rows(group_a, group_b)

데이터 정제

결측치 확인

table(is.na(df$score))

결측치 제거

df_nomiss <- df %>% filter(!is.na(score))

여러 변수를 동시에 결측치를 제거하는 방법

df_nomiss <- df %>% filter(!is.na(score) & !is.na(gender))

함수의 결측치 제외 기능

mean(df$score, na.rm=T)
exam %>% summarise(mean_math = mean(math, na.rm=T))

이상치 확인

table(outlier$gender)

결측 처리

outlier$gender <- ifelse(outlier$gender == 3, NA, outlier$gender)

boxplot으로 극단치 기준 찾기

boxplot(mpg$hwy)$stats

극단치 결측처리

mpg$hwy <- ifelse(mpg$hwy < 12 || mpg$hwy > 37 , NA, mpg$hwy)

hyuko

백엔드 개발자 준비중

이전 포스트

TIL. R day02

다음 포스트

TIL. R - day03 review

R Study

복습

dplyr 패키지 함수 요약

데이터 정제

TIL. R day02

TIL. R day04

0개의 댓글