TIL: Python | selenium crawling - 221023

Lumpen·2022년 10월 23일

TIL

목록 보기

167/244

Selenium

동적 크롤링이 가능한 라이브러리

크롤링: 웹문서의 정보를 가져오는 것 (html 태그를 탐색하여 가져옴)
동적 크롤링: 클릭, input 입력 등 사용자가 동작을 일으키는 것 처럼 작동할 수 있는 크롤링
정적 크롤링: 웹 페이지 주소에 따라 처음 뜨는 페이지에 있는 정보들만 가져올 수 있는 크롤링

다른 언어들로도 사용할 수 있지만
pandas + pymysql 이 너무 편리해서 파이썬으로

단지 셀레니움 사용법이 많이 바뀌어서 킹받는..

colab 에서 사용중..

from selenium import webdriver as wd
from selenium.webdriver.common.by import By

options = wd.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = wd.Chrome('chromedriver', options=options)


driver.find_element(By.XPATH, '//*[@id="info.search.place.more"]')
driver.find_elements(By.CSS_SELECTOR, "div.head_item.clickArea > strong > a.link_name")

find_element_by_css_selector('css_selector') 문법이
import by를 해서 find_elememt(By.CSS_SELECTOR, 'css_selector') 로 변경

CLASS_NAME, ID 등의 옵션으로 가져오는 것은 의심중..
CSS_SELECTOR 로 잘 찾지 못한다면 XPATH로 가져오는 것이 좋은 것 같다

element not interactable: javascript:void(0); has no size and location

에러 발생 시 XPATH 로 하면 된다

요즘 프론트엔드가 spa + csr 로 주로 작성되면서
a링크를 클릭하는 것이 안먹는 경우가 있음

이런 경우에는
아래와 같이 enter로 접근하면 열리는 경우가 있다

from selenium.webdriver.common.keys import Keys

driver.find_element(By.XPATH, '//*[@id="info.search.place.more"]').send_keys(Keys.ENTER)

Lumpen

떠돌이 생활을 하는. 실업자, 부랑 생활을 하는

이전 포스트

TIL: RN | TextInput - password - 221022

다음 포스트

TIL: Python | selenium crawling - 221023

TIL

Selenium

TIL: RN | TextInput - password - 221022

TIL: RN | Animated View Animation - 221024

0개의 댓글