April 26, 2020

Apr 26

Web Scraping with Beautiful Soup

BeautifulSoup comes in handy to scrape HTML. It can help you scrape data from a website. It allows is to easily and quickly take information from a website and put it into a DataFrame.
1. #Create a BeautifulSoup object from a webpage
2. soup.BeautifulSoup(webpage.content, “html.parser”)
3. #Within the ‘soup’ object tags can be called by name:
4. first_div = soup.div
5. #or by CSS selector
6. all_elements_of_header_class = soup.select(“.header”)
7. #or call it to ‘.find_all’
8. all_elements = soup.find_all(“p”)
Python has a requests library that makes getting content easy.
1. import requests
2. webpage = requests.get(‘URL’)
3. print(webpage.text)
BeautifulSoup is a Python library that makes it easy for us to traverse an HTML page and pull out the parts we’re interested in. We can import it by using the line:
1. from bs4 import BeautifulSoup