[Hadoop] 하둡 WordCount(예제)

임재규·2023년 6월 19일

Data_Engineering_Track_22

목록 보기

16/24

putty 접속 후

su hadoop

#bashrc 명령어 실행
$ start dfs
$ start_yarn
$ start_mr

$ hdfs dfs -mkdir /mydata

$ hdfs dfs -put ~/hadoop/etc/hadoop/`*.xml` /mydata

$ hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep /mydata /output2 'dfs[a-z.]+'

Hadoop MapReduce의 예제 중 하나인 grep 실행
하둡 클러스터에서 텍스트 파일을 검색

/mydata 경로에 있는 텍스트 파일을
dfs[a-z.]+라는 문법 (dfs가 들어가는 거를 wordcount)을 통해서 작업할거고, 그 아웃풋을 /output2 에 담겠다.

$ hdfs dfs -cat /output2/*

공부 기록