Skip to content

Tabular Troubles Solution


import time as t
import pprint
from selenium import webdriver
from selenium.webdriver.common.by import By

# Instantiate WebDriver and open page
driver = webdriver.Chrome()
driver.get("https://seleniumplayground.practiceprobs.com/dogs/breeds/")
driver.implicitly_wait(5)

# XPath to the second tab
table_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[1]/label[2]"

# Click the second tab
table_header = driver.find_element(By.XPATH, table_xpath)
table_header.click()

# XPath to the country of origin column header
c_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[2]/div[2]/div/div/table/thead/tr/th[4]"

# Find column header
country_column = driver.find_element(By.XPATH, c_xpath)

# Click header until column is sorted as ascending
attr = country_column.get_attribute("aria-sort")
while attr != "ascending":
    country_column.click()
    attr = country_column.get_attribute("aria-sort")

table_data = []

# XPath to the body of the table
table_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[2]/div[2]/div/div/table/tbody/"

# Collect all cells from the table
for i in range(3):
    row = []
    for j in range(4):
        cell_element = driver.find_element(
            By.XPATH,
            table_xpath + f"tr[{i+1}]/td[{j+1}]"
        )
        row.append(cell_element.get_attribute("innerHTML"))
    table_data.append(row)
driver.close()

# Format the output prettily
pp = pprint.PrettyPrinter(indent=4, width=150)
pp.pprint(table_data)

Explanation

The key trick for this challenge was the use of XPath. The XPath can be seen as a simple string reference pointing to an HTML element on the relevant page. This is useful when other identifiers are either too generic or too specific, which often occurs on larger pages like this one. Instead of just using some properties of an individual element, we actually identify it by its position in the code. At the most basic level, it takes nested HTML classes, and converts them to a string pointing to a particular element, like so:

XPath Example

HTML structure

<html>
    <body>
        <div>
        </div>
        <div>
            <p>Hello World!</p>
        </div>
    </body>
</html>

XPath to <p> element

/html/body/div[2]/p

We begin by finding the tab under which the table can be found. It has the XPath:

# XPath to the second tab
table_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[1]/label[2]"
How would I know this?

After using Chrome's Inspect tool, right click the element in the code, and under copy, select XPath.

XPath HTML

Next, we find the corresponding element and click on the tab, making the table visible.

# Click the second tab
table_header = driver.find_element(By.XPATH, table_xpath)
table_header.click()

After that, we retrieve the XPath to the header of the "Country of Origin" column and select the element it corresponds to.

# XPath to the country of origin column header
c_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[2]/div[2]/div/div/table/thead/tr/th[4]"

# Find column header
country_column = driver.find_element(By.XPATH, c_xpath)

Next, we can click on the header until the order is ascending. Conveniently, the selenium element has an attribute aria-sort, that we can use for this purpose.

# Click header until column is sorted as ascending
attr = country_column.get_attribute("aria-sort")
while attr != "ascending":
    country_column.click()
    attr = country_column.get_attribute("aria-sort")

Now that the table is sorted correctly, we can read out the table data. Again, we make use of the convenient XPath. Elements of the same class that are in the same position are simply indexed, starting from 1. We first grab the XPath for the table body and initialize an empty list:

table_data = []

# XPath to the body of the table
table_xpath = "/html/body/div[3]/main/div/div[3]/article/div/div[2]/div[2]/div/div/table/tbody/"

Then, we iterate over the rows and columns, find each element and read out the innerHTML attribute for each of them.

# Collect all cells from the table
for i in range(3):
    row = []
    for j in range(4):
        cell_element = driver.find_element(
            By.XPATH,
            table_xpath + f"tr[{i+1}]/td[{j+1}]"
        )
        row.append(cell_element.get_attribute("innerHTML"))
    table_data.append(row)
Finally, we close the web driver and print out the results with pretty printing:
driver.close()

# Format the output prettily
pp = pprint.PrettyPrinter(indent=4, width=150)
pp.pprint(table_data)

Output
>>> pp.pprint(table_data)
[   [   '<a href="akita/">Akita Inu</a>',
        'Large breed of dog from northern Japan',
        '<a href="https://en.wikipedia.org/wiki/Akita_(dog)">Wiki: Akita Inu</a>',
        '<img alt="๐Ÿ‡ฏ๐Ÿ‡ต" class="twemoji" src="https://twemoji.maxcdn.com/v/latest/svg/1f1ef-1f1f5.svg" title=":flag_jp:">'],
    [   '<a href="labrador_retriever/">Labrador Retriever</a>',
        'British breed of retriever gun dog',
        '<a href="https://en.wikipedia.org/wiki/Labrador_Retriever">Wiki: Labrador Retriever</a>',
        '<img alt="๐Ÿ‡ฌ๐Ÿ‡ง" class="twemoji" src="https://twemoji.maxcdn.com/v/latest/svg/1f1ec-1f1e7.svg" title=":flag_gb:">'],
    [   '<a href="german_shepherd/">German Shepherd</a>',
        'German breed of working dog of medium to large size',
        '<a href="https://en.wikipedia.org/wiki/German_Shepherd">Wiki: German Shepherd</a>',
        '<img alt="๐Ÿ‡ฉ๐Ÿ‡ช" class="twemoji" src="https://twemoji.maxcdn.com/v/latest/svg/1f1e9-1f1ea.svg" title=":flag_de:">']]

Don't forget to close the web driver!

driver.close()

You should close the web driver as soon as you're done using it. Doing so frees up memory and compute resources for your machine.


See the problem