The best Python HTTP clients

09 June 2022 (updated) | 15 min read

There is a huge number of HTTP clients available for Python - a quick search for Python HTTP Clients on Github returns over 1,700(!) results. But how do you make sense of all of them and find one which is right for your particular use case?

Do you have a single machine at your disposal or a collection of them? Do you want to keep things simple or is raw performance more of a concern? A web application, which should make the occasional request to a micro-service API is going to have quite different requirements than a script constantly scraping data. Additionally, there's the concern whether the library you choose will still be around six months down the line.

In this article we're going to cover five of the best HTTP clients currently available for Python and detail why each of them might be one for you to consider.

cover image

Introduction

For all the examples here, I'll be making GET requests to the Star Wars API (swapi.dev), which returns data about the people, planets and data from the Star Wars Universe. You can see an example of a JSON response from it below:

{
  "name": "Death Star",
  "model": "DS-1 Orbital Battle Station",
  "manufacturer": "Imperial Department of Military Research, Sienar Fleet Systems",
  "cost_in_credits": "1000000000000",
  ...
}
// Now I know which manufacturer I won't be asking to make my own Death Star.

The POST request examples here are to httpbin.org which is a developer testing tool responding with the content of the request, you could also use requestbin.com if you prefer. We'll be sending the following JSON POST data about Obi Wan:

{
	"name": "Obi-Wan Kenobi",
	"height": "182",
	"mass": "77",
	"hair_color": "auburn, white",
	"skin_color": "fair",
	"eye_color": "blue-gray",
	"birth_year": "57BBY",
	"gender": "male"
}

The Basics

If you're familiar with Python's standard library, you're probably already aware of the confusing history of urllib and urllib2 modules within it. urllib2 (the original module) was split into separate modules in Python 3, urllib.request and urllib.error.

For comparison purposes with the packages in the rest of this article, let's first take a look at how we'd make a request using nothing but the standard library.

All our examples that follow use Python 3

import json
import urllib.request

response = urllib.request.urlopen('https://swapi.dev/api/starships/9/')
text = response.read()
print(json.loads(text.decode('utf-8')))

Note how we've had to use the JSON module to convert this into JSON, as read() returns a string.

Our POST would look like this:

import json
from urllib import request, parse

data = {"name": "Obi-Wan Kenobi", ...}

encoded_data = json.dumps(data).encode()

req = request.Request('https://httpbin.org/post', data=encoded_data)
req.add_header('Content-Type', 'application/json')
response = request.urlopen(req)

text = response.read()

print(json.loads(text.decode('utf-8')))

We've also had to encode the data we want to send and set the header content type which we'd need to update if we were submitting form data for example.

You might be feeling this is clunky - "All I wanted was to get some data!". Well, this is seemingly how many other developers felt too, given with a number of HTTP clients available as additional packages. In the rest of the article we'll take a look at a couple of good choices.

The Libraries

1. urllib3

urllib3 is a powerful, user-friendly HTTP client for Python. Much of the Python ecosystem already uses urllib3 and you should too. urllib3 brings many critical features that are missing from the Python standard library.

The urllib3 package is, rather confusingly, not part of the standard library, but a separate HTTP client package which builds upon urllib. It provides missing features such as connection pooling, TLS verification, and thread safety. This ultimately results in better performance for applications making many calls like web scraping, as they will reuse connections to hosts rather than creating new ones.

urllibs3 is actually a dependency of the HTTP clients mentioned later in this article and gets over 150 million downloads a month. In order to make a request using urllib3, we'd make a call with it like the following:

import urllib3
import json

http = urllib3.PoolManager()
r = http.request('GET', 'https://swapi.dev/api/starships/9/')

print(json.loads(r.data.decode('utf-8')))

As with the standard library, we've had to convert this to JSON ourselves as urllib3 leaves us to do things manually.

For POST requests, we'd also need to manually encode query parameters or JSON fields:

import json
import urllib3

data = {"name": "Obi-Wan Kenobi", ...}

http = urllib3.PoolManager()

encoded_data = json.dumps(data).encode('utf-8')

r = http.request(
    'POST',
    'https://httpbin.org/post',
    body=encoded_data,
    headers={'Content-Type': 'application/json'}
)

print(json.loads(r.data.decode('utf-8')))

urllib3.PoolManager() provides us with a Pool Manager object, handling connection pooling and thread safety. Subsequent requests are made with the request() method of the manager instance and providing it with the HTTP method and the desired URL. Connections to a specific hostname are cached/maintained in the background and will be re-used when applicable. For that reason we also want to ensure to configure PoolManager() with the right number of hostnames we are going to connect to.

urllib3 also offers complex retry behavior. This is a really important consideration - we don't want our connection to timeout due to a random one-off overloaded server and then just give up. We'd like to try multiple times before we consider the data unavailable. You can find more details on this topic within the urllib3 documentation.

The downside of using urllib3's connection pooling is that it makes it difficult to work with cookies as it isn't a stateful client. We have to manually set these as a header value on the request rather than having direct support by urllib3, or use something like the http.cookies module to manage them for us. For example:

headers={'Cookie': 'foo=bar; hello=world'}

Given that so many other libraries depend on urllib3, it's likely it will exist for some time to come.

2. Requests

Requests is an elegant and simple HTTP library for Python, built for human beings.

The Requests package is highly favored within the Python community, garnering over 110M downloads a month according to PePy. It's also recommended as a "higher level HTTP client interface" by the main urllib.request documentation.

Working with Requests is incredibly simple and, as such, the majority of developers in the Python community use it as their HTTP client of choice. It's maintained by the Python Software Foundation with over 45k stars on Github and a dependency of many other Python libraries, such as gRPC and pandas.

Let's review how we'd make our requests with, well 🙂, Requests:

import requests

r = requests.get('https://swapi.dev/api/starships/9/')

print(r.json())

Similarly, posting data is also made simple - we just need to change our get method call to post():

import requests

data = {"name": "Obi-Wan Kenobi", ...}

r = requests.post('https://httpbin.org/post', json=data)

print(r.json())

Here you can see why requests is so popular - its design is just so elegant! The example here is the most concise and requires the least code of all the examples given so far.

Requests incorporates HTTP verbs as methods (GET, POST) and we've even been able to convert straight to JSON without having to write our own decode method. As a developer this means it's dead simple to work with and understand, with only two method calls necessary to get the data we want from our API. In our POST example, we've also not had to bother with encoding our data dictionary or worry about setting the correct content type in the request headers. Requests does all that for us. Thanks a lot, Requests!

It's also easy to modify our POST call to submit form data instead by simply replacing our json argument with data.

Another example of its simplicity is the way we can set cookies which are just an additional argument on the post method. For example:

r = requests.post('https://httpbin.org/post', data=data, cookies={'foo': 'bar', 'hello': 'world'}))

Requests also offers a whole host of other advanced features like sessions, request hooks, and custom retry strategies. Sessions allow for statefulness with cookies being persisted across requests, something urllib3 didn't provide out-of-the-box.

A session example taken from the Requests documentation:

s = requests.Session()

s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

Here, we initialised a session object and used it to send two GET requests. Any cookies we received from the server were managed by the session object and - automatically - sent back to the server on subsequent requests.

Additionally, hooks allow you to register common behavior you want to execute after each call. You may be familiar with this concept if you use git, which allows you to do the same. You can check out all the advanced features within the documentation of Requests.

Given all requests advanced features means its a solid choice for a variety of applications.

3. aiohttp

Asynchronous HTTP Client/Server for asyncio and Python.

aiohttp is package containing both a client and server framework, meaning it might be well suited for an API which also makes requests elsewhere. It has 11k stars on Github and a number of third party libraries build upon it. Running our usual Star Wars request with aiohttp would be:

import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://swapi.dev/api/starships/9/') as response:
            print(await response.json())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

and our POST request:

import aiohttp
import asyncio

data = {"name": "Obi-Wan Kenobi", ...}

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post('https://httpbin.org/post', json=data) as response:
            print(await response.json())

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

You can see that the aiohttp.ClientSession() object uses similar syntax to Requests, but the overall code is a lot more complex than previous examples and we now have method calls using async and await along with an additional module import for asyncio. The aiohttp documentation gives a good overview of why all this extra code is necessary compared to say Requests.

It will take some time to understand the asynchronous programming concepts if you're not familiar with them, but what it ultimately means is it's possible to make a number of requests at the same time without waiting for each to return a response one after another. For situations where we only make a single request this might not be a concern, but if we need to make tens or even thousands of requests, all the time the CPU is waiting for a response could be better spent doing something else (like making another request!). We don't want to be paying for CPU cycles when we're just waiting around. As an example, let's take a look at some code looking up data for the first 50 starships from the Star Wars API.

import aiohttp
import asyncio
import time

async def get_starship(ship_id: int):
    async with aiohttp.ClientSession() as session:
        async with session.get(f'https://swapi.dev/api/starships/{ship_id}/') as response:
            print(await response.json())

async def main():
    tasks = []
    for ship_id in range(1, 50):
        tasks.append(get_starship(ship_id))
    await asyncio.gather(*tasks)

asyncio.run(main())

We first use asyncio's run() method to run our main() async function and where we use a for-loop to run our 50 requests. Usually, this would mean we run these 50 requests one by one and wait for one to complete before we start the next one. With asyncio and aiohttp, however, we have "async" functions which run asynchronously and immediately return with a Future value. We store these values in our task list and eventually wait with gather() for all of them to complete.

This consistently took under 2 seconds to run on my machine, whilst requesting the same data using a session with Requests takes just over 4 seconds. So we're able to speed up the time it takes to retrieve our data if we can deal with the additional complexity it introduces to our code.

aiohttp offers thorough documentation along with a host of advanced features like sessions, cookies, pools, DNS caching and client tracing. An area where it is still lacking, however is support for complex retry behavior, which is only available via third party modules.

4. GRequests

GRequests introduces Gevent - a "coroutine-based Python networking library" - to Requests to allow requests to be made asynchronously. It's an older library, first released in 2012 which doesn't use Python's standard asyncio module. Individual requests can be made as we would do with Requests, but we can also leverage the Gevent module to make a number of requests like so:

import grequests

reqs = []

for ship_id in range(0, 50):
    reqs.append(grequests.get(f'https://swapi.dev/api/starships/{ship_id}/'))

for r in grequests.map(reqs):
    print(r.json())

GRequests documentation is a bit sparse and even goes as far to recommend other libraries over it on its Github page. At just 165 lines of code, it doesn't offer any advanced functionality over Requests itself. Over its nine years it's had a total of six releases, so is probably only really worth considering if you find asynchronous programming particularly confusing.

5. HTTPX

HTTPX offers a "broadly Requests-compatible API", is the only library in our list to offer HTTP 2.0 support, and also offers async APIs.

Using HTTPX is very similar to Requests:

import httpx

r = httpx.get('https://swapi.dev/api/starships/9/')

print(r.json())

and for our POST request:

import httpx

data = {"name": "Obi-Wan Kenobi", ...}

r = httpx.post('https://httpbin.org/post', json=data)

print(r.json())

We've simply changed the name of our module and still didn't have to manage any JSON conversion. In our example, we use the synchronous approach, but could have also opted for an asynchronous version, by simply using httpx.AsyncClient. Here, the code is quite similar to our previous example for AIOHTTP:

import httpx
import asyncio

async def get_starship(ship_id: int):
    async with httpx.AsyncClient() as client:
      r = await client.get(f'https://swapi.dev/api/starships/{ship_id}/')
      print(r.json())
...

For requests which take some time to return a response, this again means our client does not have to wait around. It's definitely worth considering if you have a large number of requests you need to make simultaneously and want to save on CPU cycles. If you also are looking to refactor scripts based on requests to something asynchronous, then HTTPX would seem to be a good replacement.

Uplink is the most recent library on our list here and follows a slightly different approach than our other libraries, as it does not primarily focus on arbitrary HTTP requests, but rather on typical REST-structured request flows, with path patterns (routes) and REST parameters.

By default, Uplink is using our famous Requests library for assembling the actual HTTP request, however it also features support for aiohttp and Twisted, should you prefer asynchronous request invocations. In that sense, it is not so much a proper HTTP client itself, but more a convenience wrapper around existing HTTP clients.

To implement once more our most basic example

from uplink import Consumer, get

class SWAPI(Consumer):

	@get("api/starships/{ship_id}")
	def get_starship(self, ship_id: int):
		"Fetches the data for the provided ship ID"

swapi = SWAPI(base_url = "https://swapi.dev")

response = swapi.get_starship(9)

print(response.json())

What we did here was

  1. Create a SWAPI class, based on Uplink's base Consumer class.
  2. Add a dummy method get_starship and annotate it with @get to specify the desired REST path and parameters. Our parent class will provide the implementation during runtime.
  3. Instantiate SWAPI and pass our base_url.
  4. Call our get_starship method and pass the ship ID.
  5. Receive the ready response and simply use json() to convert it into a proper JSON object.

Voilà, we have a lightweight Python class with an aptly named method and could send our request without having to deal with the actual HTTP details.

It is similar for our POST example, where we specify a @post decorator to indicate a POST request and define the path/route. Additionally, we use a @json decorator to indicate that our data parameter should be used as JSON body for the HTTP request.

from uplink import Consumer, post, json, Body

class HTTPBIN(Consumer):

	@json
	@post("/post")
	def send_person(self, data: Body):
		"Sends ""data"" as POST request body"

httpbin = HTTPBIN(base_url = "http://httpbin.org")

response = httpbin.send_person({"name": "Obi-Wan Kenobi"})

print(response.json())

Once again, that's pretty much it and, with just a couple of lines of code, we managed to create a class which allows us to access a REST interface in a native Python fashion, without the need to handle HTTP ourselves.

Feature Comparison

All the libraries in our list, of course, come with the same fundamental functionality of composing and sending an HTTP request. In this regard, they are all quite similar and do support the same basic set of features (i.e. support for SSL and proxies).

Where they do differ are more advanced areas of Python and HTTP, such as support for asynchronous request invocation, cross-request session support, and modern HTTP versions. Particularly, the latter is (as of now) exclusive to HTTPX, which is the only Python library in our list with support for HTTP/2.

LibraryMonthly downloadsGithub ⭐AsyncSessionsProxy supportSSLHTTP 2
aiohttp58M12.4k✔️✔️✔️✔️-
GRequests288k4k✔️✔️✔️✔️-
HTTPX9M8.8k✔️✔️✔️✔️✔️
Requests214M47.5k-✔️✔️✔️-
Uplink377k900--✔️✔️-
urllibN/AN/A--✔️✔️-
urllib3226M3k--✔️✔️-

Numbers of downloads and Github stars, certainly, have to be taken with a grain of salt, but they can nonetheless serve as indicator of how popular a library is and what kind of level of community support you can expect, and here Requests is the clear winner (while urllib3 is downloaded more often, please keep in mind our previous note of it being a dependency of Requests).

Conclusion

We have seen throughout this article that Requests has inspired the design of many of the libraries shown. It is incredibly popular within the Python community, with it being the default choice for most developers. With the additional features it offers, like sessions and simple retry behavior, you should likely be looking at it if you have simple needs or want to maintain simple code.

If your requirements are slightly more complex - and in particular if you need to handle concurrent requests - you may want to check out aiohttp. In our tests it showed the best performance with asynchronous web requests, it has the highest popularity among async libraries, and is actively supported. HTTPX is a close contender however and does support some of the features aiohttp is still lacking, in particular HTTP/2.

On the other hand, Uplink provides a very elegant and lightweight abstraction layer, but it may not be your first choice for generic web scraping and will, instead, primarily shine if you need to access well-defined REST interfaces.

💡 The easiest approach, of course, is the no code approach. That's what we focus on at ScrapingBee, making web scraping as easy as possible and letting you focus on the data. Check it out and the first 1k requests are on us!

There are other Python HTTP clients that we didn't cover in this article. For example, there is a Python binding for cURL that we just covered in this article: How to use Python with cURL?.

But whatever your client needs, there's a Python package out there for you!

 

Originally published on June 28th 2021, updated by Alexander Mueller

image description
Ian Wootten

Ian is a freelance developer with a passion for simple solutions. He has written code to power surveys, studio pipelines and holds a PhD in distributed computing.