$ docker pull selenium/standalone-chrome
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
extend_airflow latest 6dd9f35e1da5 12 hours ago 1.3GB
seleniarm/standalone-chromium latest 330645947519 2 days ago 1.59GB
selenium/standalone-chrome latest b4da11a7c583 2 days ago 1.29GB
docker run 하지 않는다. 이후 airflow docker run에서 사용해야 한다.(포트 겹침)
$ vi requirements.txt
selenium
webdriver_manager
selenium에 필요한 module 지정
$ vi Dockerfile
FROM apache/airflow:2.6.0
COPY requirements.txt /
RUN pip install --no-cache-dir -r /requirements.txt
$ vi docker-compose.yaml
service:
...
selenium:
container_name: remote_chromedriver
image: selenium/standalone-chrome:latest
ports:
- 4444:4444
restart: always
volumes:
postgres-db-volume:
service 내부에서 pull한 image를 실행시킨다
$ docker build . --tag extend_airflow:latest
$ docker compose up -d
remote_webdriver = 'remote_chromedriver'
with webdriver.Remote(f'{remote_webdriver}:4444/wd/hub', options=options) as driver:
# Scraping part
pass
// ex
options = Options()
options.add_argument('--headless')
options.add_argument('window-size=1200x600')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
remote_webdriver = 'remote_chromedriver'
with webdriver.Remote(f'{remote_webdriver}:4444/wd/hub', options=options) as driver:
# Scraping part
driver.get("https://n.news.naver.com/mnews/article/005/0001606450")
a = main(driver)
remote webdriver을 사용해야 한다.