A Guide To Web Scraping With Selenium

December 10, 2020

The Internet is the ‘Large hub’ of Data. Whether you need the textual or image data for your company or personal research use, you can scrape all kinds of worthy data by using Selenium. There are plenty of tools and frameworks you can use to do web Scraping, today we are going to discuss selenium, which basically automates browsers. That’s it!

This means you can use your choice of browser to do automated scraping tasks for you.

Selenium was originally developed by Jason Huggins in 2004 as an internal tool at ThoughtWorks. It was mostly used for testing at that time, but now it’s widely used for browser automation platforms and, of course, web scraping!

It is available as Selenium WebDriver, Selenium IDE, and Selenium Grid.

Selenium WebDriver is used to automate browsers to test, scale, and distribute scripts with a language-specific binding to a browser.

Browser supported by Selenium (Chrome, Opera, Firefox, Safari, Internet Explorer)

Operating System Supported (Linux, Mac, Windows)

Programming languages Supported (Python, PHP, Ruby, Java, Javascript)

Selenium IDE (Integrated Development Environment) is a test tool used by testers and also can be used by someone who is not familiar with developing test cases for their websites. It is very easy to use, you just need to add the Selenium IDE extension to your browser, and you are good to go with a pre-built GUI function to easily record your sessions.

Selenium Grid is used to run parallel test sessions across different web browsers; it is based on the hub- node architecture, where one server acts as a hub and other devices act as nodes consisting of their operating system and remote Web drivers. It also reduces the time that a test suite takes to complete because of the Hub-Node relation they are relying on.

Selenium, all suites source code, is made available under the Apache 2.0 license for a contribution at Github

Installation procedure

We are going to use Python for coding with an additional Chrome driver(to make your script work in chrome browser) and a selenium framework for python.

Chrome Driver
Selenium package (install using pip)

pip install selenium

To check if your “ChromeDriver” and everything is setup use the command :

chromedriver

Note:

Put ChromeDriver downloads path into your environment variable path if it’s not running.
Never Name your python file “selenium.py” framework get disturbed and throw an error if you name your file selenium.

Quickstart

This code will open analytics india magazine homepage into your chrome.

from selenium import webdriver

DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get('https://analyticsindiamag.com/')

If you don’t want to give your ChromeDriver location every time you run a programme, just put your driver location into the environment variable path.

And the same result will be achieved by this programme too!

from selenium import webdriver
driver = webdriver.Chrome()
driver.get(‘https://analyticsindiamag.com/')

Other driver function you can use:

print(driver.title)
print(driver.window_handles)
print(driver.page_source)
print(driver.current_url)
driver.refresh()

To scrape the specific amount of data, we have plenty of handful functions you can try.

Tag name
Class name
IDs
CSS selectors

Usually, to scrape a specific type of data we need to find the element bound to that data, let’s say locating all the heading(title) we need to use

Inspect tools by right click on the website page in the browser
Or you can see the source code of a website into your terminal and then decide what element to extract.

from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://analyticsindiamag.com/')
print(driver.page_source)

Let’s See some example what selenium can do :

We can search for bikes images and download them if we want to with making a google search query like this:

search_url=“https://www.google.com/search?q={q}&tbm=isch&tbs=sur%3Afc&hl=en&ved=0CAIQpwVqFwoTCKCa1c6s4-oCFQAAAAAdAAAAABAC&biw=1251&bih=568" 

driver.get(search_url.format(q='Bikes'))

Conclusion

So this is one of many ways we can use Selenium to do our task from scraping to automating web surfing tasks and extract images and Report generation.

Another thing we can achieve is to automate the whole task of downloading reports from a website by filling in all the details of different users.

You can find more information about this in the Selenium documentation.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

A Guide To Web Scraping With Selenium

Installation procedure

Note:

Quickstart

Let’s See some example what selenium can do :

Conclusion

Related

Most Popular

Developers Can’t Work Without AI Anymore. That Might Be the Problem.

AI Is Changing Jobs So Fast That Hiring Can’t Keep Up

Elizabeth Warren Warns AI Could ‘Break Society,’ Demands Automation Tax

How String Theory Is Cracking the Code of Natural Networks

Vance Is Right That AI Shouldn’t Outrank Humans in War — But That’s Not Enough

Mark Cuban Says OpenAI Will Never Recoup Its Massive AI Spending

Follow Us

POPULAR POSTS

Anthropic’s $965 Billion Valuation Reshapes the Frontier AI Race

Why Data Quality — Not Smarter Models — Decides AI Success

How String Theory Is Cracking the Code of Natural Networks

Tech Giants Fueling a New AI IPO Wave — But the Real Race Is Structural

POPULAR CATEGORY

Developers Can’t Work Without AI Anymore. That Might Be the Problem.

A Guide To Web Scraping With Selenium

Installation procedure

Note:

Quickstart

Let’s See some example what selenium can do :

Conclusion

Related

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY