April 26, 2020
Web Scraping with Beautiful Soup
BeautifulSoup comes in handy to scrape HTML. It can help you scrape data from a website. It allows is to easily and quickly take information from a website and put it into a DataFrame.
#Create a BeautifulSoup object from a webpage
soup.BeautifulSoup(webpage.content, “html.parser”)
#Within the ‘soup’ object tags can be called by name:
first_div = soup.div
#or by CSS selector
all_elements_of_header_class = soup.select(“.header”)
#or call it to ‘.find_all’
all_elements = soup.find_all(“p”)
Python has a requests library that makes getting content easy.
import requests
webpage = requests.get(‘URL’)
print(webpage.text)
BeautifulSoup is a Python library that makes it easy for us to traverse an HTML page and pull out the parts we’re interested in. We can import it by using the line:
from bs4 import BeautifulSoup