TIL Python Basics Day 48 - Selenium

이다연·2021년 1월 31일
0

Udemy Python Course

목록 보기
44/64

Selenium

Css selector tip: https://saucelabs.com/resources/articles/selenium-tips-css-selectors

Installing driver

  • Selenium package interact with different browsers(Chrome, firefox, safari etc), Driver provides the bridge. (we use driver for chrome)
from selenium import webdriver

#install selenium
chrome_driver_path = "C:\dayeon2020\chromedriver.exe" #for mac: no .exe
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://www.amazon.com/Cuckoo-
CRP-P0609S-Cooker-10-10-11-60/dp/B01JRTZVVM/ref=sr_1_4?qid=1611738512&sr=8-4")

driver.close() #single window
# driver.quit() #entire program. regardless how many tabs



Locating

Find and locate HTML elements

  • BS has its limits when the website is written JS, Angular etc

driver.get("https://www.amazon.com/Cuckoo-
CRP-P0609S-Cooker-10-10-11-60/dp/B01JRTZVVM/ref=sr_1_4?qid=1611738512&sr=8-4")
price = driver.find_element_by_id("priceblock_ourprice")
print(price.text)

driver.quit() #entire program. regardless how many tabs



Locating with class name


search bar
logo

from selenium import webdriver

#install selenium
chrome_driver_path = "C:\dayeon2020\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://www.python.org/")

#search_bar
search_bar = driver.find_element_by_name("q")
print(search_bar.tag_name) #print: input
print(search_bar.get_attribute("placeholder")) #print: search

#logo
logo = driver.find_element_by_class_name("python-logo")
print(logo.size) #print: {'height': 72, 'width': 255}

driver.quit() #entire program. regardless how many tabs

Locating with selector

  • Be aware of (".documentation-widget a") dot before the selector. Even though the full class name was "small-widget documentation-widget"
#selector
driver.get("https://www.python.org/")
documentation_link = driver.find_element_by_css_selector(".documentation-widget a")
print(documentation_link.text) #print: docs.python.org
driver.close() 

X-path

  • XPath can be used to navigate through elements and attributes in an XML document.
  • Right click and copy xpath. We need to change the double quotes inside xpath into single quotes as it will crash with double quotes outside.
driver.get("https://www.python.org/")
bug_link = driver.find_element_by_xpath("//*[@id='site-map']/div[2]/div/ul/li[3]/a")
print(bug_link.text) #print: Submit Website Bug

Differnece btw find_element's' & find_element

find_elements: returns all that matchaes in a LIST form
Article: Locating strategy

  • find_elements
events = driver.find_elements_by_xpath('//*[@id="content"]/div/section/div[2]/div[2]/div/ul')
print(events) 
# -> print object in a list
[<selenium.webdriver.remote.webelement.WebElement 
(session="0621dbfca2d047fd85c6dba14c554b6c", element="fac0caf1-8b8a-4ee0-9f2d-5d347cb3df72")>]
print(type(events)) 
# -> <class 'list'>


for i in events:
    print(i.text)
  • find_element

    use" .text.splitlines()" to put each items in a list when there are mutiple lists under ul element

event = driver.find_element_by_xpath('//*[@id="content"]/div/section/div[2]/div[2]/div/ul')
print(event) 
#-> object
<selenium.webdriver.remote.webelement.WebElement 
(session="5107d054b8b39ed4b87b64ddb660c292", element="4c2f8ec2-f025-4fab-a2e3-2b6f99ef154e")>

print(type(event))
#-> <class 'selenium.webdriver.remote.webelement.WebElement'>


Task. scraping events section at Python.org (module #414)

In my code, range(1, 6) is not scalable.
Angela used css selector after inspecting the structure of the website.
Selecting the right method will only come only with experience and knowledge in HTML&CSS

<MY CODE>
from selenium import webdriver

chrome_driver_path = "C:\dayeon2020\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://www.python.org/")

events = {}
for i in range(1, 6):
    time = driver.find_element_by_xpath(f"//*[@id='content']/div/section/div[2]/div[2]/div/ul/li[{i}]/time")
    name = driver.find_element_by_xpath(f"//*[@id='content']/div/section/div[2]/div[2]/div/ul/li[{i}]/a")
    
    events[i - 1] = {
    'time': f"2021-{time.text}",
    'name': name.text
    }

print(events)

#print: {0: {'time': '2021-01-30', 'name': 'BelPy 2021'}, 
1: {'time': '2021-01-30', 'name': 'PyCamp Leipzig'},
2: {'time': '2021-02-19', 'name': 'PyCascades 2021'}, 
3: {'time': '2021-03-18', 'name': 'PyCon Cameroon 2021'}, 
4: {'time': '2021-04-22', 'name': 'GeoPython 2021'}}

driver.quit()

Angela's

<Angela's code>
event_times = driver.find_elements_by_css_selector(".event-widget time")
event_names = driver.find_elements_by_css_selector(".event-widget li a")
print(event_times) #-> selenium object

events = {}
for n in range(len(event_times)):
    events[n] = {
        "time": event_times[n].text,
        "name": event_names[n].text
    }

print(events)

#print: {0: {'time': '01-30', 'name': 'BelPy 2021'}, 
1: {'time': '01-30', 'name': 'PyCamp Leipzig'}, 
2: {'time': '02-19', 'name': 'PyCascades 2021'}, 
3: {'time': '03-18', 'name': 'PyCon Cameroon 2021'}, 
4: {'time': '04-22', 'name': 'GeoPython 2021'}}

From Q&A section using dictionary comprehension

Using dictionary comprehension. Something I innitially tried and failed.
.splitlines() : single items into a list
range(range(0, len(events), 2)) : scalable way! zero to end of the list, every 2 steps

events = driver.find_element_by_xpath(
'//*[@id="content"]/div/section/div[2]/div[2]/div/ul').text.splitlines()
print(events)
#['01-30', 'BelPy 2021', '01-30', 'PyCamp Leipzig',
'02-19', 'PyCascades 2021', '03-18', 'PyCon Cameroon 2021', '04-22', 'GeoPython 2021']

dictionary = {i: {'time': events[i], 'name': events[i + 1]} for i in range(0, len(events), 2)}

print(dictionary)

Hidden year problem

  • The year is hidden by CSS if the window is too small.
    When printing the text inside the "time" tag, only the text that is visible (i.e. text where the CSS property "visibility" is equal to "visible") will be printed.
    However, on python.org the CSS property "visibility" of the year component of the event is set to "hidden" when the browser window is resized to a certain width.
    This is why, when the browser window has a specific width, the year component of the event is not displayed.
Example 1:

driver.set_window_size(width=100, height=200)
driver.get("https://www.python.org/")
event_times = driver.find_elements_by_css_selector(".event-widget time")
for time in event_times:
    print(time.text)


Example 2:

driver.maximize_window()
driver.get("https://www.python.org/")
event_times = driver.find_elements_by_css_selector(".event-widget time")
for time in event_times:
    print(time.text)


Task. wikipedia

from selenium import webdriver
chrome_driver_path = "C:\dayeon2020\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://en.wikipedia.org/wiki/Main_Page")

num = driver.find_element_by_id("articlecount")
print(num.text) #print 6,237,906 articles in English

num2 = driver.find_element_by_css_selector("#articlecount a")
print(num2.text) #print 6,237,906

driver.quit()

How to Automate Filling Out Forms and Clicking Buttons

with Selenium

Link sits in between anchor tags 'a'

from selenium import webdriver
chrome_driver_path = "C:\dayeon2020\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://en.wikipedia.org/wiki/Main_Page")


count = driver.find_element_by_css_selector("#articlecount a")
# count.click()

all_fortals = driver.find_element_by_link_text("All portals")
all_fortals.click()


# driver.quit()

click() with send_keys()

we need to import Keys which has a bunch of different keys CONSTANT like ENTER, SHIFT, ALT etc

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_driver_path = "C:\dayeon2020\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)

driver.get("https://en.wikipedia.org/wiki/Main_Page")


search = driver.find_element_by_name("search")
search.send_keys("python")
search.send_keys(Keys.ENTER)

#### task. filling in the signup form

top = driver.find_element_by_class_name("top")
top.send_keys("python")
middle = driver.find_element_by_class_name("middle")
middle.send_keys("lee")
bottom = driver.find_element_by_class_name("bottom")
bottom.send_keys("pythonlee@mail.com")

# click = driver.find_element_by_class_name("btn-block")
# click.send_keys(Keys.ENTER)

time()
Function time.time returns the current time in seconds since 1st Jan 1970. The value is in floating point, so you can even use it with sub-second precision. In the beginning the value t_end is calculated to be "now" + 15 minutes. The loop will run until the current time exceeds this preset ending time.

Try this:

import time

t_end = time.time() + 60 * 15
while time.time() < t_end:
    # do whatever you do
This will run for 15 min x 60 s = 900 seconds.

word
on a similar vein

profile
Dayeon Lee | Django & Python Web Developer

0개의 댓글