April 26, 2020

Web Scraping with Beautiful Soup

  1. BeautifulSoup comes in handy to scrape HTML. It can help you scrape data from a website. It allows is to easily and quickly take information from a website and put it into a DataFrame.

    1. #Create a BeautifulSoup object from a webpage

    2. soup.BeautifulSoup(webpage.content, “html.parser”)

    3. #Within the ‘soup’ object tags can be called by name:

    4. first_div = soup.div

    5. #or by CSS selector

    6. all_elements_of_header_class = soup.select(“.header”)

    7. #or call it to ‘.find_all’

    8. all_elements = soup.find_all(“p”)

  2. Python has a requests library that makes getting content easy.

    1. import requests

    2. webpage = requests.get(‘URL’)

    3. print(webpage.text)

  3. BeautifulSoup is a Python library that makes it easy for us to traverse an HTML page and pull out the parts we’re interested in. We can import it by using the line:

    1. from bs4 import BeautifulSoup

Previous
Previous

April 29, 2020

Next
Next

April 20, 2020