Scrape the HTML content of a website

January 14, 2023

Scrape the HTML content of a website

Here’s a basic Python script that uses the requests and BeautifulSoup libraries to scrape the HTML content of a website:

import requests
from bs4 import BeautifulSoup

# Set the URL to scrape
url = ‘https://www.example.com’

# Make an HTTP GET request to the website
response = requests.get(url)

# Parse the HTML content of the website
soup = BeautifulSoup(response.content, ‘html.parser’)

# Extract the information you want from the website
title = soup.find(‘title’).get_text()
print(title)

# Extract all the links from the website
links = soup.find_all(‘a’)
for link in links:
print(link.get(‘href’))

This script makes an HTTP GET request to the specified URL, retrieves the HTML content of the website, and then uses the BeautifulSoup library to parse the HTML. The find() method is used to locate the title tag and extract its text, and find_all() is used to locate all a tags and extract their href attributes.

You can modify the script to extract other information from the website by changing the find() and find_all() methods and their parameters to locate different elements in the HTML.

Menu

test

Geeking Out