[Python] 도커에 Notebook 띄우고 hive 연결

JunMyung Lee·2022년 5월 2일
0

데이터

목록 보기
5/14

해당 예제는 도커에 Python - Notebook을 띄우고 Hive를 연결하여 데이터가 나오기 까지의 예제이다.

Docker 띄우기

Pull image

도커허브에서 anaconda3을 기준으로 pull 받는다.

docker pull continuumio/anaconda3

Run container

1차적으로 정상동작하는지 띄워본다.

docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "\
    conda install jupyter -y --quiet && \
    mkdir -p /opt/notebooks && \
    jupyter notebook \
    --notebook-dir=/opt/notebooks --ip='*' --port=8888 \
    --no-browser --allow-root"

Check jupyter notebook

정상 접속이 되면, 토큰을 물어보는데 이게 여간 귀찮은것이 아님,, (토큰시간이 만료되면 다시 재인증을 해야함)
해서 비밀번호 방식으로 변경하고자 한다.

참고 확인 사이트 : https://financedata.github.io/posts/jupyter-notebook-authentication.html

  1. 프로파일 생성
jupyter notebook --generate-config
  1. password 생성
# Input
ipython
# Input
from IPython.lib import passwd
# Input
passwd()
# Output
Out[2]: 'sha1:f0bf7a023f60:25920410f68d70c03175e3fec4619c497b84193f'
  1. jupyter_notebook_config.py 파일 수정
    /root/.jupyter/jupyter_notebook_config.py 파일에 아래 내용을 추가 한다. 단, 해당 이미지에는 vi가 없으므로 echo를 통한 방법으로 한다.
echo "c = get_config()" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.ip = '0.0.0.0'" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.open_browser = False" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.port = 8888" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.password = 'sha1:06234b148e8d:f698b724e1cbfdd2713f00c9e84ccfaffb1cㅁㅁㅁㅁ'" >> /root/.jupyter/jupyter_notebook_config.py
  1. docker 컨테이너 재실행
    여기서 작업된 이미지를 다시 commit해도 되고, 실행되었던 컨테이너를 다시 실행해도 된다.
    다만 새로이 이미지를 commit하였을 경우, 설정파일을 작성하였으니 도커 실행시
jupyter notebook

만을 실행해도 된다. ( 설정파일에 작성하였으므로 )

Hive 연결 테스트

해당 테스트를 하는 이유는, EMR에서 제공하는 jupyter hub 도커 이미지는 커널의 버전 문제로 인하여 hive관련 패키지가 설치되지 않았다. (이외에도 여러 문제발생) 해서 먼저 테스트를 진행한다.

anaconda pyhive 설치

conda install pyhive
# Output
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - pyhive


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cyrus-sasl-2.1.27          |       h758a394_8         275 KB
    libdb-6.2.32               |       hf484d3e_0        18.5 MB
    pyhive-0.6.1               |   py39h06a4308_0         368 KB
    sasl-0.2.1                 |   py39h48830cd_1          58 KB
    thrift-0.13.0              |   py39h2531618_0         119 KB
    thrift_sasl-0.4.2          |   py39h06a4308_1          11 KB
    ------------------------------------------------------------
                                           Total:        19.3 MB

The following NEW packages will be INSTALLED:

  cyrus-sasl         pkgs/main/linux-64::cyrus-sasl-2.1.27-h758a394_8
  libdb              pkgs/main/linux-64::libdb-6.2.32-hf484d3e_0
  pyhive             pkgs/main/linux-64::pyhive-0.6.1-py39h06a4308_0
  sasl               pkgs/main/linux-64::sasl-0.2.1-py39h48830cd_1
  thrift             pkgs/main/linux-64::thrift-0.13.0-py39h2531618_0
  thrift_sasl        pkgs/main/linux-64::thrift_sasl-0.4.2-py39h06a4308_1


Proceed ([y]/n)? y


Downloading and Extracting Packages
thrift-0.13.0        | 119 KB    | ################################################################################################################################################################################################################################################################################### | 100%
pyhive-0.6.1         | 368 KB    | ################################################################################################################################################################################################################################################################################### | 100%
sasl-0.2.1           | 58 KB     | ################################################################################################################################################################################################################################################################################### | 100%
libdb-6.2.32         | 18.5 MB   | ################################################################################################################################################################################################################################################################################### | 100%
cyrus-sasl-2.1.27    | 275 KB    | ################################################################################################################################################################################################################################################################################### | 100%
thrift_sasl-0.4.2    | 11 KB     | ################################################################################################################################################################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

pip install은 지정한것만 설치되었는데 아나콘다로 하니 의존되는 다른 패키지도 설치가 되었다.
결론 -> 아나콘다 쓰자

jupyter notebook 코드 작성

from pyhive import hive
conn = hive.Connection(host='[IP]', port=10000, database='partner')
cursor = conn.cursor()
cursor.execute('select * from partner.db_business')
for row in cursor.fetchall():
    print(row)
# Output
(10001, '', 2, '2021-05-26 13:32:20.784929', '2021-05-26 13:32:20.784929', False, 'null')
(10004, '', 2, '2021-05-26 18:18:11.791122', '2021-05-26 18:18:11.791122', False, 'null')
(10007, '', 3, '2021-05-27 14:46:11.417005', '2021-05-27 14:46:11.417005', False, 'null')

0개의 댓글