How to Use Python urllib Module

09/14/2021

Contents

In this article, you will learn how to use Python urllib module.

Python urllib Module

The urllib module in Python is a powerful library for opening URLs and making HTTP requests. Here’s a simple guide on how to use urllib in Python:

Importing the module

First, you need to import the urllib module in your Python script.

import urllib.request
Making a request

The urllib.request module provides a variety of functions that allow you to open URLs, including urlopen(). This function returns a file-like object that you can use to read the contents of the URL.

response = urllib.request.urlopen('https://www.example.com')
Reading the response

Once you have a response object, you can use the read() method to read the contents of the response as bytes, or you can use the decode() method to convert the response to a string.

html = response.read()
decoded_html = html.decode('utf-8')
Sending parameters

You can also send parameters in the request using the urllib.parse module. For example, to send a query parameter with the request, you can use the urlencode() function.

import urllib.parse

params = urllib.parse.urlencode({'param1': 'value1', 'param2': 'value2'})
url = 'https://www.example.com/?' + params
response = urllib.request.urlopen(url)
Handling errors

When making HTTP requests, there may be errors due to network issues, invalid URLs, or other reasons. To handle these errors, you can use a try-except block.

try:
    response = urllib.request.urlopen('https://www.example.com')
except urllib.error.URLError as e:
    print(e.reason)

This is just a basic overview of using the urllib module in Python. There are many other functions and options available in the urllib module that allow you to customize your requests and handle more complex situations.

Here are some additional features of the urllib module that you may find useful:

Setting headers

When making HTTP requests, you may need to set headers to send additional information, such as user agent information or authentication credentials. You can do this by creating a Request object and adding headers to it.

import urllib.request

req = urllib.request.Request('https://www.example.com')
req.add_header('User-Agent', 'Mozilla/5.0')
response = urllib.request.urlopen(req)
Handling redirects

When you make an HTTP request, the server may redirect you to a different URL. By default, urlopen() will automatically follow redirects, but you can disable this behavior by setting the allow_redirects parameter to False.

response = urllib.request.urlopen('https://www.example.com', allow_redirects=False)
if response.status == 302:
    print('Redirected to:', response.getheader('Location'))
Handling authentication

If you need to authenticate with a server, you can use the HTTPBasicAuthHandler or HTTPDigestAuthHandler classes from the urllib.request module. These handlers take a username and password and add the appropriate authentication headers to the request.

import urllib.request
import urllib.error
import urllib.parse

username = 'myusername'
password = 'mypassword'

auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='Secure Area',
                          uri='https://www.example.com/secure/',
                          user=username,
                          passwd=password)
opener = urllib.request.build_opener(auth_handler)
try:
    response = opener.open('https://www.example.com/secure/')
    print(response.read())
except urllib.error.HTTPError as e:
    print(e.code, e.reason)
Sending POST requests

To send data using the POST method, you can use the urllib.request.urlopen() method and pass in a bytes or str object as the data parameter. You can also set headers, as shown in the example below.

import urllib.parse
import urllib.request

data = urllib.parse.urlencode({'key1': 'value1', 'key2': 'value2'}).encode('utf-8')
headers = {'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/plain'}
req = urllib.request.Request('https://www.example.com/post', data, headers)
response = urllib.request.urlopen(req)

These are just a few of the many features of the urllib module. With urllib, you can make various types of HTTP requests, set headers, handle cookies, and more. The module is included in the standard Python library, so it is readily available for use in your projects.