LOFTER for ipad —— 让兴趣,更有趣

点击下载 关闭
聊聊 monkey patch 2
林菡 2019-11-20

~~被monkey patch折磨了一个礼拜!!!你奶奶的!!!~~
~~等空了,一定要抽时间把它里里外外翻个遍看看!!!~~

说道做到,今儿就来翻开monkey patch的大衣,看看里面到底是什么。

其实它一点都不神秘,挺简单的一个东西,但是当你不知道你的程序中被人打了猴子补丁的时候,你真的会疯掉,就像我一样,好东西要好好的用,不然全是一个个坑啊~_~ !!!

定义

首先什么是猴子补丁?网上有一些“猴子补丁”这个说法的来由,感兴趣的可以自己去搜,我这边直接讲重点。所谓猴子补丁就是在程序运行的过程中动态的修改一些模块、类、方法,而不是在静态代码中去修改相应的实现。

首先可以给一个简单的例子:

比如小明小时候最喜欢的东西是Apple,我们可以通过调用它的方法favorite来知道。

但是有一天,上帝不想让小明喜欢apple了,因为上帝喜欢banana 。而小明已经制造出来了,上帝不想修改小明的制造工艺,怎么办?给它打个猴子补丁!

上面的代码可能看着有点low,那么就换个高级点的写法:

是不是跟实际中的使用很像了?当然,一般实际使用都是对模块执行monkey patch,相对而言会更复杂一点,例如eventlet可以对thread、socket等模块执行monkey patch。


原理

那么,为了争做第一等的程序员,不禁要问:为什么可以这么去实现呢?这个才是本文讲的重点。

NameSpace

在这之前需要先了解一下python的一个核心内容:命令空间(NameSpace)。

什么是namespace?简单的讲就是:name到对象的映射。为了方便理解,可以想象一下python中的字典,实际上的确有namespace是以字典的形式实现的。在python中主要有以下四类namespace:

那namespace是派什么用的呢?在python中,如果要访问某一个对象(包括变量,模块,方法等)都是会去namespace中根据对象名称去检索,这里涉及到一个检索顺序,称之为:LEGB ,就是:

locals -->> enclosing function -->> globals -->> __builtins__

如果这四类namespace中都找不到指定name的对象,那么就会报NameError:

Module Import

monkey patch还涉及到python的另一个核心内容,就是模块的导入。

python在启动时会创建一个全局字典:sys.modules

当我们导入一个新的模块的时候,以下两件事情将会发生:

  1. 会在sys.module中插入一条key-value对,key是module名,value就是所导入的module对象。当下一次import相同模块的时候,会先在sys.module中查找该模块,如果存在则直接导入sys.module中的module对象。

  2. 将module对象加入到global namespace中,当程序需要调用该模块时,会从global namespace中检索。

monkey patch实现

根据上述两点,你对monkey patch的实现是否有所猜测?没错,其实很简单,就是将新的module替换掉sys.modules中的对象,如果该module还未被导入,则先对进行加载。这样,当程序需要导入module的时候,就会从sys.modules中导入被修改后打module对象,也就实现了monkey patch。

下面可以举个实际应用中的例子:eventlet库对thread、socket等模块的monkey patch。直接上代码:

def monkey_patch(**on):

    """Globally patches certain system modules to be greenthread-friendly.

    The keyword arguments afford some control over which modules are patched.

    If no keyword arguments are supplied, all possible modules are patched.

    If keywords are set to True, only the specified modules are patched.  E.g.,

    ``monkey_patch(socket=True, select=True)`` patches only the select and

    socket modules.  Most arguments patch the single module of the same name

    (os, time, select).  The exceptions are socket, which also patches the ssl

    module if present; and thread, which patches thread, threading, and Queue.

    It's safe to call monkey_patch multiple times.

    """

    # Workaround for import cycle observed as following in monotonic

    # RuntimeError: no suitable implementation for this system

    # see https://github.com/eventlet/eventlet/issues/401#issuecomment-325015989

    #

    # Make sure the hub is completely imported before any

    # monkey-patching, or we risk recursion if the process of importing

    # the hub calls into monkey-patched modules.

    eventlet.hubs.get_hub()

    accepted_args = set(('os', 'select', 'socket',

                         'thread', 'time', 'psycopg', 'MySQLdb',

                         'builtins', 'subprocess'))

    # To make sure only one of them is passed here

    assert not ('__builtin__' in on and 'builtins' in on)

    try:

        b = on.pop('__builtin__')

    except KeyError:

        pass

    else:

        on['builtins'] = b

    default_on = on.pop("all", None)

    for k in six.iterkeys(on):

        if k not in accepted_args:

            raise TypeError("monkey_patch() got an unexpected "

                            "keyword argument %r" % k)

    if default_on is None:

        default_on = not (True in on.values())

    for modname in accepted_args:

        if modname == 'MySQLdb':

            # MySQLdb is only on when explicitly patched for the moment

            on.setdefault(modname, False)

        if modname == 'builtins':

            on.setdefault(modname, False)

        on.setdefault(modname, default_on)

    if on['thread'] and not already_patched.get('thread'):

        _green_existing_locks()

    modules_to_patch = []

    for name, modules_function in [

        ('os', _green_os_modules),

        ('select', _green_select_modules),

        ('socket', _green_socket_modules),

        ('thread', _green_thread_modules),

        ('time', _green_time_modules),

        ('MySQLdb', _green_MySQLdb),

        ('builtins', _green_builtins),

        ('subprocess', _green_subprocess_modules),

    ]:

        if on[name] and not already_patched.get(name):

            modules_to_patch += modules_function()

            already_patched[name] = True

    if on['psycopg'] and not already_patched.get('psycopg'):

        try:

            from eventlet.support import psycopg2_patcher

            psycopg2_patcher.make_psycopg_green()

            already_patched['psycopg'] = True

        except ImportError:

            # note that if we get an importerror from trying to

            # monkeypatch psycopg, we will continually retry it

            # whenever monkey_patch is called; this should not be a

            # performance problem but it allows is_monkey_patched to

            # tell us whether or not we succeeded

            pass

    imp.acquire_lock()

    try:

        for name, mod in modules_to_patch:

            orig_mod = sys.modules.get(name)

            if orig_mod is None:

                orig_mod = __import__(name)

            for attr_name in mod.__patched__:

                patched_attr = getattr(mod, attr_name, None)

                if patched_attr is not None:

                    setattr(orig_mod, attr_name, patched_attr)

            deleted = getattr(mod, '__deleted__', [])

            for attr_name in deleted:

                if hasattr(orig_mod, attr_name):

                    delattr(orig_mod, attr_name)

    finally:

        imp.release_lock()

    if sys.version_info >= (3, 3):

        import importlib._bootstrap

        thread = original('_thread')

        # importlib must use real thread locks, not eventlet.Semaphore

        importlib._bootstrap._thread = thread

        # Issue #185: Since Python 3.3, threading.RLock is implemented in C and

        # so call a C function to get the thread identifier, instead of calling

        # threading.get_ident(). Force the Python implementation of RLock which

        # calls threading.get_ident() and so is compatible with eventlet.

        import threading

        threading.RLock = threading._PyRLock


首先是对accept的args进行check,检查需要打patch的模块是否在指定的范围之内('os', 'select', 'socket', 'thread', 'time', 'psycopg', 'MySQLdb', 'builtins', 'subprocess'),紧接着检查需要对哪些模块执行patch:

for name, modules_function in [

        ('os', _green_os_modules),

        ('select', _green_select_modules),

        ('socket', _green_socket_modules),

        ('thread', _green_thread_modules),

        ('time', _green_time_modules),

        ('MySQLdb', _green_MySQLdb),

        ('builtins', _green_builtins),

        ('subprocess', _green_subprocess_modules),

    ]:

        if on[name] and not already_patched.get(name):

            modules_to_patch += modules_function()

            already_patched[name] = True

再后面就是核心代码实现:

imp.acquire_lock()

    try:

        for name, mod in modules_to_patch:

            orig_mod = sys.modules.get(name)

            if orig_mod is None:

                orig_mod = __import__(name)

            for attr_name in mod.__patched__:

                patched_attr = getattr(mod, attr_name, None)

                if patched_attr is not None:

                    setattr(orig_mod, attr_name, patched_attr)

            deleted = getattr(mod, '__deleted__', [])

            for attr_name in deleted:

                if hasattr(orig_mod, attr_name):

                    delattr(orig_mod, attr_name)

    finally:

        imp.release_lock()

上述部分代码就是eventlet库monkey patch的核心。遍历每个需要patch的module,如果该module还未被导入到sys.modules,就先将其导入。然后对该module的相关属性进行替换,使用setattr方法。


推荐文章
评论(0)
联系我们|招贤纳士|移动客户端|风格模板|官方博客|侵权投诉 Reporting Infringements|未成年人有害信息举报 0571-89852053|涉企举报专区
网易公司版权所有 ©1997-2024  浙公网安备 33010802010186号 浙ICP备16011220号-11 增值电信业务经营许可证:浙B2-20160599
网络文化经营许可证: 浙网文[2022]1208-054号 自营经营者信息 工业和信息化部备案管理系统网站 12318全国文化市场举报网站
网信算备330108093980202220015号 网信算备330108093980204230011号
分享到
转载我的主页