Donnerstag, 20. September 2018

Securing Python Code through Compilation

We recently had a task that involved deploying a python application on a machine that wasn't ours. Since we usually deploy our applications including their python code this raised the question of whether we have to protect our code and how we could do it. So here are some of our thoughts.

Assuming we have code that we need to deploy somewhere where we cant have anyone be able to read our code. Lets start with some example code (my_code.py):
1 import pandas as pd
2
3 def my_awesome_function(dataframe):
4     """ This is a docstring """
5     retval = dataframe.copy()
6     return retval

Importing from the .py file

We can use the code with either the python or ipython interpreter.
>>> import pandas as pd
>>> import my_code
>>> my_code.my_awesome_function(pd.DataFrame())
Empty DataFrame
Columns: []
Index: []
As long as we have the .py file we can even read it from within the interpreter!
>>> import inspect
>>> print(inspect.getsource(my_code.my_awesome_function))
def my_awesome_function(dataframe):
    """ This is a docstring """
    retval = dataframe.copy()
    return retval
 
Ok we can import and use the code but the .py file is readable for everyone.
Next let’s try using just the compiled python file (.pyc)

Importing from the .pyc file

Even though the .py files get automatically compiled when using them we can also compile them explicitly:
$ python -m compileall -b my_code.py 
Compiling 'my_code.py'...
$ rm my_code.py 
Now we have a .pyc file and no .py file. And we can still import and use it.
If we look at the .pyc file it’s not obvious what happens:
3^M2<84>j<9f>[<8c>^@^@^@ã^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@@^@^@^@s^T^@^@^@d^@d^Al^
@Z^Ad^Bd ^C<84>^@Z^Bd^AS^@)^Dé^@^@^@^@Nc^A^@^@^@^@^@^@^@^B^@^@^@^A^@^@^@C^@^@^@s^
L^@^@^@|^@j^@<83>^@}^A|^AS^@)^Az^U This is a docstring )^AÚ^Dcopy)^BZ dataframeZ^
Fretval©^@r^C^@^ @^@ú^Mmy_code.pyÚ^Vmy_awesome_function^C^@^@^@s^D^@^@^@^@^B^H^Ar^
E^@^@^@)^CZ^FpandasÚ^Bpdr^E^@^@^@r^C^@^@^@r^C^@^@^@r^C^@^@^@r^D^@^@^@Ú^H<module>^
A^@^@^@s^B^@^@^@^H^B
However there are ways to decompyle a .pyc file and even that compiled gibberish contains the variable und function names we used in the code!
In order to remove the code all together we can compile it with cython.

Compiling .py file with cython

Before we can compile the .py code we need to create C code from it.
For that we write a short setup like file called compile.py:

1 from distutils.core import setup
2 from distutils.extension import Extension
3 from Cython.Distutils import build_ext
4
5 ext_modules = [
6     Extension("my_code",  ["my_code.py"])
7 ]
8
9 setup(
10     name = 'My code',
11     cmdclass = {'build_ext': build_ext},
12     ext_modules = ext_modules
13 )
With this compile code we can create a .c file and compile it using our C compiler:
$ python compile.py build_ext --inplace
$ rm my_code.py
$ rm my_code.c
After removing both the python and the C code all that remains is the compiled .so file. First of all it is much larger than the previous .pyc file and second if you open it there is no hint of the python code we wrote. Neither variable nor function names are anywhere to be found.
However, while it is safe to say that your python code is gone, it is still possible to reconstruct (and possibly reproduce) what your programm is doing. Whoever is interested in your code now has to reconstruct the compiled C instead of the compiled python (which is arguably harder but not impossible).
Lastly we like to deploy applications using docker images and while at first glance you might think that this also adds some protection to the underlying code, it is very easy to extract all files from a docker image. If your application code must be protected at all costs, the safest way (to our knowledge) is to run the code only on hardware that no one has access to, who you dont want to know the code. And even then the code is only as safe as your system is.

Here is the documentation to pythons compile library and we got the idea of using C compiled code instead of python compiled code from this blog. We also found more on .pyc only packages for other reasons than security on this blog.

Keine Kommentare:

Kommentar veröffentlichen