minimum install & pickling

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Aaron \Castironpi\ Brady

    minimum install & pickling

    Sometimes questions come up on here about unpickling safely and
    executing foreign code. I was thinking a minimum install that didn't
    even have access to modules like 'os' could be safe. (Potentially.)
    I have time to entertain this a little, though all the devs are busy.
    I can bring it up again in a few months if it's a better time.

    I browsed for info on 'rexec'. Two c-l-py threads:



    A lot of modules would have to go. <Long list IPC modules:
    subprocess, socket, signal, popen2, asyncore, asynchat. ctypes, mmap,
    platform.popen, glob, shutil, dircache, and many more</Long>.

    I tested it out. I renamed the 'Lib' directory and ran.

    'import site' failed; use -v for traceback
    Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
    (Intel)] on
    win32
    Type "help", "copyright" , "credits" or "license" for more information.
    >>import os
    ImportError: No module named os
    >>import socket
    ImportError: No module named socket
    >>del __builtins__.__ import__
    >>__import__
    NameError: name '__import__' is not defined
    >>del __builtins__.op en, __builtins__.fi le
    >>open
    NameError: name 'open' is not defined
    >>file
    NameError: name 'file' is not defined

    Even a function created from raw bytecode string can't do anything
    without __import__ or 'open'. And you can't get a second instance
    running without subprocess or os.system.

    'rexec' may be full of swiss cheese and irreparable, but maybe it
    would work to start from bare-bones and add pieces known to be safe.
    This sort of thing wouldn't need and standard library support either,
    I don't think.
  • greg

    #2
    Re: minimum install &amp; pickling

    Aaron "Castironpi " Brady wrote:
    Even a function created from raw bytecode string can't do anything
    without __import__ or 'open'.
    Not true:

    for cls in (1).__class__._ _bases__[0].__subclasses__ ():
    if cls.__name__ == "file":
    F = cls

    F(my_naughty_pa th, "w").write(my_n aughty_data)

    --
    Greg

    Comment

    • Aaron \Castironpi\ Brady

      #3
      Re: minimum install &amp; pickling

      On Sep 17, 6:06 pm, greg <g...@cosc.cant erbury.ac.nzwro te:
      Aaron "Castironpi " Brady wrote:
      Even a function created from raw bytecode string can't do anything
      without __import__ or 'open'.
      >
      Not true:
      >
         for cls in (1).__class__._ _bases__[0].__subclasses__ ():
           if cls.__name__ == "file":
             F = cls
      >
         F(my_naughty_pa th, "w").write(my_n aughty_data)
      >
      --
      Greg
      You're right, the list is a little longer. See above, where I renamed
      the Lib/ folder.

      'import site' failed; use -v for traceback
      Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
      (Intel)] on
      win32
      Type "help", "copyright" , "credits" or "license" for more information.
      >>for cls in (1).__class__._ _bases__[0].__subclasses__ ():
      .... if cls.__name__ == "file":
      .... F = cls
      ....
      >>F
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      NameError: name 'F' is not defined
      >>>
      'file' here is still defined.
      >>file
      <type 'file'>
      >>del __builtins__.fi le
      >>file
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      NameError: name 'file' is not defined

      This one stands a chance.

      Comment

      • Paul Boddie

        #4
        Re: minimum install &amp; pickling

        On 17 Sep, 22:18, "Aaron \"Castironpi \" Brady" <castiro...@gma il.com>
        wrote:
        On Sep 17, 4:43 am, Paul Boddie <p...@boddie.or g.ukwrote:
        >>
        These solutions have at least the same bugs that the bare bones
        solution in the corresponding framework has.  Malicious code has fewer
        options, but constructive code does too.  If you're running foreign
        code, what do you want it to do?  What does it want to do?  The more
        options it needs, the more code you have to trust.
        As I noted, instead of just forbidding access to external resources,
        what you'd want to do is to control access instead. This idea is not
        exactly new: although Brett Cannon was working on a sandbox capability
        for CPython, the underlying concepts involving different privilege
        domains have been around since Safe-Tcl, if not longer. The advantage
        of using various operating system features, potentially together with
        tools like fakechroot or, I believe, Plash, is that they should work
        for non-Python programs. Certainly, the chances of successfully
        introducing people to such capabilities are increased if you don't
        have to persuade the CPython core developers to incorporate your
        changes into their code.
        The only way a Python script can return a value is with sys.exit, and
        only an integer at that.  It is going to have output; maybe there's a
        way to place a maximum limit on its consumption.  It's going to have
        input, so that the output is relative to something.  You just make
        copies to prevent it from destroying data.  Maybe command-line
        parameters are enough.  IIRC if I recall correctly, Win32 has a way to
        examine how much time a process has owned so far, and a way to
        terminate it, which could be in Python's future.
        There is support for imposing limits on processes in the Python
        standard library:



        My experimental package, jailtools, relies on each process's sandbox
        being set up explicitly before the process is run, so you'd definitely
        want to copy data into the sandbox. Setting limits on the amount of
        data produced would probably require support from the operating
        system. Generally, when looking into these kinds of systems, most of
        the solutions ultimately come from the operating system: process
        control, resource utilisation, access control, and so on. (This is the
        amusing thing about Java: that Sun attempted to reproduce lots of
        things that a decent operating system would provide *and* insist on
        their use when deploying Java code in a controlled server environment,
        despite actually having a decent operating system to offer already.)
        PyPy sandbox says:  "The C code generated by PyPy is not
        segfaultable."  I find that to be a bold claim (whether it's true or
        not).
        >
        I'm imagining in the general case, you want the foreign code to make
        changes to objects in your particular context, such as exec x in
        vars.  In that case, x can still be productive without any libraries,
        just less productive.
        Defining an interface between trusted and untrusted code can be
        awkward. When I looked into this kind of thing for my undergraduate
        project, I ended up using something similar to CORBA, and my
        conclusion was that trusted code would need to expose an interface
        that untrusted "agents" would rely on to request operations outside
        the sandbox. That seems restrictive, but as the situation with rexec
        has shown, if you expose a broad interface to untrusted programs, it
        becomes increasingly difficult to verify whether or not the solution
        is actually secure.

        Paul

        Comment

        • Aaron \Castironpi\ Brady

          #5
          Re: minimum install &amp; pickling

          On Sep 18, 5:20 am, Paul Boddie <p...@boddie.or g.ukwrote:
          On 17 Sep, 22:18, "Aaron \"Castironpi \" Brady" <castiro...@gma il.com>
          wrote:
          >
          On Sep 17, 4:43 am, Paul Boddie <p...@boddie.or g.ukwrote:
          >>
          These solutions have at least the same bugs that the bare bones
          solution in the corresponding framework has.  Malicious code has fewer
          options, but constructive code does too.  If you're running foreign
          code, what do you want it to do?  What does it want to do?  The more
          options it needs, the more code you have to trust.
          >
          As I noted, instead of just forbidding access to external resources,
          what you'd want to do is to control access instead. This idea is not
          exactly new: although Brett Cannon was working on a sandbox capability
          for CPython, the underlying concepts involving different privilege
          domains have been around since Safe-Tcl, if not longer. The advantage
          of using various operating system features, potentially together with
          tools like fakechroot or, I believe, Plash, is that they should work
          for non-Python programs. Certainly, the chances of successfully
          introducing people to such capabilities are increased if you don't
          have to persuade the CPython core developers to incorporate your
          changes into their code.
          >
          The only way a Python script can return a value is with sys.exit, and
          only an integer at that.  It is going to have output; maybe there's a
          way to place a maximum limit on its consumption.  It's going to have
          input, so that the output is relative to something.  You just make
          copies to prevent it from destroying data.  Maybe command-line
          parameters are enough.  IIRC if I recall correctly, Win32 has a way to
          examine how much time a process has owned so far, and a way to
          terminate it, which could be in Python's future.
          >
          There is support for imposing limits on processes in the Python
          standard library:
          >

          >
          My experimental package, jailtools, relies on each process's sandbox
          being set up explicitly before the process is run, so you'd definitely
          want to copy data into the sandbox. Setting limits on the amount of
          data produced would probably require support from the operating
          system. Generally, when looking into these kinds of systems, most of
          the solutions ultimately come from the operating system: process
          control, resource utilisation, access control, and so on. (This is the
          amusing thing about Java: that Sun attempted to reproduce lots of
          things that a decent operating system would provide *and* insist on
          their use when deploying Java code in a controlled server environment,
          despite actually having a decent operating system to offer already.)
          >
          PyPy sandbox says:  "The C code generated by PyPy is not
          segfaultable."  I find that to be a bold claim (whether it's true or
          not).
          >
          I'm imagining in the general case, you want the foreign code to make
          changes to objects in your particular context, such as exec x in
          vars.  In that case, x can still be productive without any libraries,
          just less productive.
          >
          Defining an interface between trusted and untrusted code can be
          awkward. When I looked into this kind of thing for my undergraduate
          project, I ended up using something similar to CORBA, and my
          conclusion was that trusted code would need to expose an interface
          that untrusted "agents" would rely on to request operations outside
          the sandbox. That seems restrictive, but as the situation with rexec
          has shown, if you expose a broad interface to untrusted programs, it
          becomes increasingly difficult to verify whether or not the solution
          is actually secure.
          >
          Paul
          I think you could autogenerate a file with a copy of the data, then
          run a bare bones Python installation with the foreign code that
          imports the copy, or just concatenate the foreign code and copy. At
          least for input. For output, you'd need a file that had an upper
          bound on its size.

          The problem with Python is that if an agent has access to a part of an
          object, it has the whole thing. Such as the trusted agents-- if they
          can perform an operation, then anything with access to the agent can.
          If they're just policy makers, then however an authorized agent
          performs the action, is available to an unauthorized one. You'd still
          need a 'struct' instance to write your output, since memory is upper-
          bounded, and you can't permit foreign code to store any form of Python
          objects.

          Comment

          Working...