Why Python Pickle is Insecure

Python pickle is a powerful serialization module. It is the most common method to serialize and deserialize Python object structures. The pickle module has an optimized cousin called cPickle that is written in C. In this post I'm going to refer to both modules by the name pickle unless I mention otherwise. The security issues I'm going to discuss apply to both of them.

What This is All About

Pickle was never claimed to be secure. In the pickle documentation there is a warning in red that says:

Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

This clearly states that pickle is insecure. Many think this is because it can load classes other than what you expect and may trick you to run their functions. But the actual security risk is far more dangerous. Unpickling can be exploited to execute arbitrary commands on your machine!
Take this little example:

import pickle
pickle.loads("cos\nsystem\n(S'ls ~'\ntR.") # This will run: ls ~

Or of you are running windows try this instead:

import pickle
pickle.loads("cos\nsystem\n(S'dir'\ntR.") # This will run: dir

You can replace ls and dir with any other command.

I will use pickletools.dis to disassemble the pickle and show you how this is working:

import pickletools
print pickletools.dis("cos\nsystem\n(S'ls ~'\ntR.")

Output:

    0: c    GLOBAL     'os system'
   11: (    MARK
   12: S        STRING     'ls ~'
   20: t        TUPLE      (MARK at 11)
   21: R    REDUCE
   22: .    STOP

Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object. In other words the pickled instructions in our example are:

  1. Push self.find_class(module_name, class_name) i.e. push os.system
  2. Push the string 'ls ~'
  3. Build tuple from topmost stack items
  4. Apply callable to argtuple, both on stack. i.e. os.system(*('ls ~',))

The example is not exploiting a bug in pickle. Reduce is a vital step to instantiate objects from their classes. Take this example where I am unpickling an instance of the built-in object class:

import pickletools
import pickle
print pickletools.dis(pickle.dumps(object()))

Output:

    0: c    GLOBAL     'copy_reg _reconstructor'
   25: p    PUT        0
   28: (    MARK
   29: c        GLOBAL     '__builtin__ object'
   49: p        PUT        1
   52: g        GET        1
   55: N        NONE
   56: t        TUPLE      (MARK at 28)
   57: p    PUT        2
   60: R    REDUCE
   61: p    PUT        3
   64: .    STOP

Note the REDUCE step. To create an instance of the class object, pickle has to get the __builtin__.object class and then apply it to the given arguments.

As of 2.3 Python abandoned any pretense that it might be safe to load pickles received from untrusted parties. Because no sufficient security analysis has been done to guarantee this and there isn't a use case that warrants the expense of such an analysis. As a result all tests for __safe_for_unpickling__ or for copy_reg.safe_constructors were removed from the unpickling code.Source: pickletools.py source code comments

How to Make Unpickling Safer

To make unpickling saferThere is no 100% safety guarantee. pickle was never intended to be secure., you need to control exactly which classes will get created. In pickle this can be done by overriding the find_class method. For example:

import sys
import pickle
import StringIO
 
class SafeUnpickler(pickle.Unpickler):
    PICKLE_SAFE = {
        'copy_reg': set(['_reconstructor']),
        '__builtin__': set(['object'])
    }
    def find_class(self, module, name):
        if not module in self.PICKLE_SAFE:
            raise pickle.UnpicklingError(
                'Attempting to unpickle unsafe module %s' % module
            )
        __import__(module)
        mod = sys.modules[module]
        if not name in self.PICKLE_SAFE[module]:
            raise pickle.UnpicklingError(
                'Attempting to unpickle unsafe class %s' % name
            )
        klass = getattr(mod, name)
        return klass
 
    @classmethod
    def loads(cls, pickle_string):
        return cls(StringIO.StringIO(pickle_string)).load()
 
 
SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os

To extend the PICKLE_SAFE dictionary with your pickle safe classes and modules:

SafeUnpickler.PICKLE_SAFE.update({'__main__': set(['MyClass1', 'MyClass2']), 'MyModule': set(['MyClass3'])})

You need to be really careful with what you include in the PICKLE_SAFE dictionary. The __builtin__ module contains the eval method. Which can be as dangerous as the os.system method.

In cPickle this has to be implemented a bit differently. There is a special attribute called find_global that needs to be set to a function that accepts a module name and a class name, and returns the corresponding class object. cPickle.Unpickler can't be subclassed directly, instead we are going to wrap it in another class:

import sys
import cPickle
import StringIO
 
class SafeUnpickler(object):
    PICKLE_SAFE = {
        'copy_reg': set(['_reconstructor']),
        '__builtin__': set(['object'])
    }
 
    @classmethod
    def find_class(cls, module, name):
        if not module in cls.PICKLE_SAFE:
            raise cPickle.UnpicklingError(
                'Attempting to unpickle unsafe module %s' % module
            )
        __import__(module)
        mod = sys.modules[module]
        if not name in cls.PICKLE_SAFE[module]:
            raise cPickle.UnpicklingError(
                'Attempting to unpickle unsafe class %s' % name
            )
        klass = getattr(mod, name)
        return klass
 
    @classmethod
    def loads(cls, pickle_string):
        pickle_obj = cPickle.Unpickler(StringIO.StringIO(pickle_string))
        pickle_obj.find_global = cls.find_class
        return pickle_obj.load()
 
 
SafeUnpickler.loads("cos\nsystem\n(S'ls ~'\ntR.") # UnpicklingError: Attempting to unpickle unsafe module os

As you can see, this solution works. But it is hardly practical for many cases. You need to tell pickle what you want in advance and specifically. The moral of the story according to the pickle documentation

You should be really careful about the source of the strings your application unpickles.

Safer Alternatives

Fortunately, there are alternatives to pickle. They may not be as powerful when it comes to serializing python objects and classes. But for most cases all we need to serialize is basic types and simple data structures.

JSON

JSON is a lightweight computer data interchange format. Its human-readable format gives it an advantage over pickle. The json.org website provides a comprehensive listing of existing JSON bindings, including Python. The json module is now a standard part of python since 2.6.

YAML

YAML is a human-readable data serialization format. YAML has additional features lacking in JSON such as extensible data types, relational anchors, strings without quotation marks, and mapping types preserving key order. PyYAML is a Python binding for YAML. PyYAML allows sophisticated object instantiation to be executed which opens the potential for an injection attack. According to the PyYAML documentation, you need to use yaml.safe_load function to load data from untrusted sources.

Others

Depending on your application there are many other alternatives like: XML, Protocol Buffers, Thrift...There is a useful comparison of data serialization formats in wikipedia

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A drop-in (more) secure pickle replacement

Here's a fast and more secure alternative to Pickle with the same API (in pure python):

http://home.gna.org/oomadness/en/cerealizer/index.html

It's worked well in my projects.

json

Good suggestions, I have been fighting the JSON and YAML battle for them against other binary serilization, XML and old school inis. I hope it wins, JSON like python is a thing of computer science beauty. Simple but makes a new platform to use for syndicating data and content. python and json have in common basic types and they serialize and deserialize to one another very easily with simlejson. Supporting basic types, arrays and objects is all you need...

JSON and other specification limitations, e.g., Unicode

Other systems for serializing and deserializing objects will run afoul of conflicting standards. This causes objects to change during the process. For example, one would expect that "var == json.loads(json.dumps(var))" but this is not the case for strings. Strings, by the json specification, are all converted to Unicode. It is not possible to serialize and read back any structure that has byte strings in Python 2.6 using the JSON module.

Nice info

I wasn't aware of these possible "exploits". Thanks for sharing!

Comparison of data serialization formats

Glad you found the Wikipedia article useful! The syntax table is now much more complete than it was before.

No teme

Hello from Russia!
Can I quote a post in your blog with the link to you?

Sure

Of course you can.

Should be in the docs

First of all, this is a really nice article, thanks for that :)

It's a shame that this security risk is not made explained in detail in the Python documentation. You should propose this to be added! The docs only advice to subclass Unpickler but not why.

grr shelve uses Pickle too,

grr shelve uses Pickle too, except they don't have a fat warning on the page (they assume you read Pickle page I guess...)

test.dir:
'a', (0, 24)

test.dat:
cos
system
(S'ls ~'
tR.

import shelve; s = shelve.open('test'); s # boom

Now I have no built in option left because Jython is still 2.5 and has no JSON :'(

@AndiDog: There are too many security implications to explain (even if there was no arbitrary code execution, you could still refer to arbitrary names in the current scope, there are probably other bad things too) on the pickle page, you should just take their word that it's unsafe to "unpickle" untrusted data.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options