You can get the file type of a URL in Python via two different methods.
- Use the
mimetypes
module
mimetypes
module comes by default with Python and can infer the file type from the URL. This relies on the file extension being present in the URL. Here is some sample code:
import mimetypes
mimetypes.guess_type("http://example.com/file.pdf")
# Output: ('application/pdf', None)
mimetypes.guess_type("http://example.com/file")
# Output: (None, None)
- Perform a HEAD request to the URL and investigate the response headers
A head request does not download the whole response but rather makes a short request to a URL to get some metadata. An important piece of information that it provides is the Content-Type
of the response. This can give you a very good idea of the file type of a URL. Here is some sample code for making a HEAD request and figuring out the file type:
import requests
response = requests.head("https://scrapingbee.com")
print(response.headers['Content-Type'])
# Output: 'text/html; charset=utf-8'
response = requests.head("https://practicalpython.yasoob.me/_static/images/book-cover.png")
print(response.headers['Content-Type'])
# Output: image/png