How to Create Image Crawler in Python

09/13/2021

Contents

In this article, you will learn how to create image crawler in Python.

How to create image crawler

To create an image crawler in Python, you can use the following steps:

Import necessary libraries

We need to import requests, BeautifulSoup, and os libraries to create an image crawler in Python.

import requests
from bs4 import BeautifulSoup
import os
Define the URL

We need to define the URL of the webpage from where we want to extract the images.

url = "https://example.com"
Send an HTTP request to the URL

We use the requests library to send an HTTP request to the URL and fetch the HTML content of the webpage.

response = requests.get(url)
Parse the HTML content

We use the BeautifulSoup library to parse the HTML content and extract the image tags.

soup = BeautifulSoup(response.content, 'html.parser')
img_tags = soup.find_all('img')
Extract the image URLs

We iterate over the image tags and extract the source URL of each image.

urls = [img['src'] for img in img_tags]
Download the images

We use the requests library again to download the images from the URLs and save them to a local folder.

for url in urls:
    try:
        response = requests.get(url)
        with open(os.path.join(save_folder, os.path.basename(url)), 'wb') as f:
            f.write(response.content)
    except:
        print("Failed to download image from URL:", url)

Here’s the complete code for the image crawler in Python:

import requests
from bs4 import BeautifulSoup
import os

url = "https://example.com"
save_folder = "images"

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')
img_tags = soup.find_all('img')

urls = [img['src'] for img in img_tags]

if not os.path.exists(save_folder):
    os.makedirs(save_folder)

for url in urls:
    try:
        response = requests.get(url)
        with open(os.path.join(save_folder, os.path.basename(url)), 'wb') as f:
            f.write(response.content)
    except:
        print("Failed to download image from URL:", url)

Make sure to replace the url variable with the URL of the webpage you want to crawl and the save_folder variable with the name of the folder where you want to save the images.