docker airflow selenium

이상민·2023년 5월 11일
1

selenium image pull

공식문서

$ docker pull selenium/standalone-chrome

$ docker images
REPOSITORY                      TAG       IMAGE ID       CREATED        SIZE
extend_airflow                  latest    6dd9f35e1da5   12 hours ago   1.3GB
seleniarm/standalone-chromium   latest    330645947519   2 days ago     1.59GB
selenium/standalone-chrome      latest    b4da11a7c583   2 days ago     1.29GB

docker run 하지 않는다. 이후 airflow docker run에서 사용해야 한다.(포트 겹침)

docker extend

$ vi requirements.txt

selenium
webdriver_manager

selenium에 필요한 module 지정

$ vi Dockerfile

FROM apache/airflow:2.6.0
COPY requirements.txt /
RUN pip install --no-cache-dir -r /requirements.txt
$ vi docker-compose.yaml

service:
...
  selenium:
      container_name: remote_chromedriver
      image: selenium/standalone-chrome:latest
      ports:
        - 4444:4444
      restart: always     
volumes:
  postgres-db-volume:

service 내부에서 pull한 image를 실행시킨다

docker run

$ docker build . --tag extend_airflow:latest
$ docker compose up -d

webdriver

remote_webdriver = 'remote_chromedriver'
with webdriver.Remote(f'{remote_webdriver}:4444/wd/hub', options=options) as driver:
    # Scraping part
    pass
// ex
options = Options()
options.add_argument('--headless')
options.add_argument('window-size=1200x600')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
remote_webdriver = 'remote_chromedriver'
with webdriver.Remote(f'{remote_webdriver}:4444/wd/hub', options=options) as driver:
    # Scraping part
    driver.get("https://n.news.naver.com/mnews/article/005/0001606450")
    a = main(driver)

remote webdriver을 사용해야 한다.

참조문서

stackoverflow

0개의 댓글