HTTPS Certificate Verification in Python With urllib2

This post is a duplicate of one on my former site, muchtooscrawled.com. That site is no more, and this is the only post of any real quality, so I thought I would copy it over.

Everyone loves Python. I particularly feel encased in Python’s womb-like warmth and comfort when I am trying to do client-side communication with web servers or web services. Most of the magic  has already been accomplished by the time I type import urllib2 – super simple and clean interfaces that seem to go increasingly deep as you need them. Request a page with a single line, do a GET or POST request with two lines, modify headers as needed, do secure communication with SSL; all of these things are simple and elegant, adding complexity only when needed for more complex goals.

Recently, I found a hole in this seemingly infinitely deep well of value added by urllib2. While the module will happily do SSL-secured communication for you, it fails to provide any easy way to verify server certificates. This is a critical feature, especially when using web services. For instance, if I wanted to use a service to version-check files on my system with files on a central server, allowing me to download the updates as needed, communicating with an unverified server could be disastrous. After poking around a bit online, I still hadn’t found anything useful in the urllib2 interface to help me accomplish this, so I started opening up the library files themselves. My goal was to use SSL with cert verification while still leveraging urllib2 for all of my high-level interface needs.

It turns out that it isn’t very difficult at all, despite the fact that the interfaces are not such that it is as easy as it could be to extend the functionality in this way. The ssl module already includes certificate verification, although you must supply your own trusted root certificates. These are easy to find, as it is in the interest of the CAs like Verisign and Thawte to publish these (for instance, your browser already has copies that it uses for certificate verification). The question then is how does one supply the appropriate parameters to the ssl.wrap_socket(...) function?

The answer is in this case, by subclassing the httplib.HTTPSConnection class to pass in the appropriate data. Here is an example:

class VerifiedHTTPSConnection(httplib.HTTPSConnection):
    def connect(self):
        # overrides the version in httplib so that we do
        #    certificate verification
        sock = socket.create_connection((self.host, self.port), self.timeout)
        if self._tunnel_host:
            self.sock = sock
            self._tunnel()
        # wrap the socket using verification with the root
        #    certs in trusted_root_certs
        self.sock = ssl.wrap_socket(sock,
                                    self.key_file,
                                    self.cert_file,
                                    cert_reqs=ssl.CERT_REQUIRED,
                                    ca_certs="trusted_root_certs")

The key is the two extra parameters, cert_reqs and ca_certs, in the call to wrap_socket. For a more complete discussion of the meaning of these parameters, please refer to the documentation.

The next step is integrating our new connection in such a way that allows us to use it with urllib2. This is done by installing a non-default HTTPS handler, by first subclassing the urllib2.HTTPSHandler class, then installing it as a handler in an OpenerDirector object using the urllib2.build_opener(...) function. Here is the example subclass:

# wraps https connections with ssl certificate verification
class VerifiedHTTPSHandler(urllib2.HTTPSHandler):
    def __init__(self, connection_class = VerifiedHTTPSConnection):
        self.specialized_conn_class = connection_class
        urllib2.HTTPSHandler.__init__(self)
    def https_open(self, req):
        return self.do_open(self.specialized_conn_class, req)

As you can see, I have added the connection class as a parameter to the constructor. Because of the way the handler classes are used, it would require substantially more work to be able to pass in the value of the ca_certs parameter to wrap_socket. Instead, you can just create different subclasses for different root certificate sets. This would be useful if you had a development server with a self-signed certificate and a production server with a CA-signed certificate, as you could swap them out at runtime or delivery time using the parameter to the constructor above.

With this class, you can either create an OpenerDirector object, or you can install a handler into urllib2 itself for use in the urlopen(...) function. Here is how to create the opener and use it to open a secure site with certificate verification:

https_handler = VerifiedHTTPSHandler()
url_opener = urllib2.build_opener(https_handler)
handle = url_opener.open('https://www.example.com')
response = handle.readlines()
handle.close()

If the certificate for example.com is not signed by one of the trusted authority keys in the file trusted_root_certs (from the VerifiedHTTPSConnection class), then the call to url_opener.open(...) will raise a urllib2.URLError exception with some debugging-type information from the ssl module. Otherwise, urllib2 functions just as normal, albeit now communication with a trusted source.

Share

Comments

  1. Hi,
    i need some help to use your code, i’m beginner on python and i need to communicate with my web service with ssl certificat
    i do the code of a simple communication on http here:
    ————-
    from PtSystemServiceImplService_client import *
    from ZSI.client import AUTH,Binding

    loc = PtSystemServiceImplServiceLocator()
    url_string=”http://localhost/repository/services/PtSystemService”
    ssl_url_string=”https://localhost8443/repository/services/PtSystemService”
    url=loc.getPtSystemServiceImplPortAddress()

    upper_class= PtSystemServiceImplServiceSoapBindingSOAP(url)

    #list all ptSystem
    req= listPs()
    buh= upper_class.listPs(req)
    print ‘Resultat : ‘
    for i in range(len(buh._return)):
    print buh._return[i]
    print “fin\n\n”
    ————-
    my certif =”cacerts”
    my keyStore=”repository.jks”

    can you help me to use your function
    thx

    Reply
    • Hi radhwan,

      It looks like you are using a library other than urllib2. The ZSI library may well already handle cert verification; if it is failing because of cert verification, then it is already performing the verification. Check the documentation for how to include your trusted roots file, and you should be good to go.

      Cheers,
      Joseph

      Reply
  2. Hi Joseph,

    Thanks for posting this. Looks very helpful.

    Are there publicly available places to get a trusted_root_certs file?

    If not, is there a tidy way to generate the trusted_root_certs file – perhaps from a Firefox cert store or other readily available repository?

    Thanks,

    Steve

    Reply
    • Hi Steve,

      In my application, I wanted to be more assured of the certificate chain, so I grabbed our trusted root certificate straight from our signing source, Thawte; that way, only Thawte-signed certificates (like ours) would be accepted. Other providers also make their root certificates available. If you need more roots, you can just concatenate them into a single file that you pass in as the trusted root file. Both Chrome and Firefox (and probably IE, though why bother) allow you to export root certificates, though you may need to do a bit of hand concatenating. I would be surprised if Firefox didn’t have a plugin for doing the exporting.

      Note that you need to export the certs in base-64 encoded X.509 format for the ssl module to do its magic.

      Hope that helps!

      Joseph

      Reply
    • Andrew, if you are getting a “path to cert file” error, it means the code can’t find your trusted roots file. In the example code, it is pointing at a file called “trusted_root_certs” in the local directory; either place a certs file in that location, or change the code to be a full path to your certs file.

      Cheers,
      Joseph

      Reply
  3. Pingback: HTTPS Certificate Verification in Python With urllib2: The Githubening | Joseph Turner's Blog

  4. Hi Joseph,

    Thanks for posting this article. I’m developing an API and I found this to be very valuable. I created a class to make it easy to use an API with basic auth over https: https://gist.github.com/1346673. Please let me know what you think.

    One thing I noticed testing: if I modify my CA’s root cert, it still seems to verify without any problems. I would expect that to create an error. I’m on a mac, and the OS maintains a separate set of root certificates that it shares with the browser. I wonder if the SSL library is also accessing the root certs… otherwise, something is wrong with the way I’m wrapping the socket with ssl.

    Reply
  5. Hi Joseph, Thanks for the hack.
    can i use the verified connection to send multiple requests, without requiring to validate certificate on each request ??

    Reply
  6. Thanks for the quick response, i thought you would be on vacation .
    Wish you a very happy new year :)

    here is what i am trying to do,
    i wanted to upload a file via https post after validating server cert, i am chunking the file and sending chunks in the requests.

    if first req:

    cookie stufff….

    cookies_handler=urllib.request.HTTPCookieProcessor(cookies)

    https_handler=VerifiedHTTPSHandler()
    opener=urllib.request.build_opener(cookies_handler,https_handler)

    response=opener.open(req)

    but i see that for every request, verification of cert is happening ?(i inserted some print before self.sock = ssl.wrap_socket(sock,
    self.key_file,
    self.cert_file,
    cert_reqs=ssl.CERT_REQUIRED,
    ca_certs=”trusted_root_certs”)

    what is it that i am missing here ??

    Reply
    • Because the cert verification happens as part of the SSL handshake, and as such it will happen for each request you make. If that was not the case, then you would be vulnerable to a man-in-the-middle attack on the later requests. If you want to change that behavior, you could turn off cert verification after your initial connection.

      Reply
  7. Thanks Joseph Turner for such a beautiful insight. You saved my life.
    I’m grateful.
    It worked perfectly for me.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>