来自外部范围的函数本地名称绑定答案

【问题标题】：Function local name binding from an outer scope来自外部范围的函数本地名称绑定
【发布时间】：2010-10-11 17:01:33
【问题描述】：

我需要一种将名称从外部代码块“注入”到函数中的方法，以便它们可以在本地访问并且它们不需要由函数的代码专门处理（定义为函数参数，从*args等加载）

简化方案：提供一个框架，用户可以在其中定义（尽可能少的语法）自定义函数来操作框架的其他对象（这些对象不一定global )。

理想情况下，用户定义

def user_func():
    Mouse.eat(Cheese)
    if Cat.find(Mouse):
        Cat.happy += 1

这里的Cat、Mouse 和Cheese 是框架对象，出于充分的理由，它们不能绑定到全局命名空间。

我想为此函数编写一个包装器，使其行为如下：

def framework_wrap(user_func):
    # this is a framework internal and has name bindings to Cat, Mouse and Cheese
    def f():
        inject(user_func, {'Cat': Cat, 'Mouse': Mouse, 'Cheese': Cheese})
        user_func()
    return f

然后这个包装器可以应用于所有用户定义的函数（作为装饰器，由用户自己或自动，尽管我打算使用元类）。

@framework_wrap
def user_func():

我知道 Python 3 的 nonlocal 关键字，但我仍然认为丑陋（从框架的用户角度来看）添加额外的一行：

nonlocal Cat, Mouse, Cheese

并且担心将他需要的每个对象都添加到这一行。

非常感谢任何建议。

【问题讨论】：

不能使用全局命名空间的充分理由是什么？如果是线程，可以使用线程本地存储。
@Ivo van der Wijk：我也知道线程本地化，但线程不是这里的问题。原因之一是我希望将一些通用名称（如 this）注入到多个类中定义的函数中，并让每个类解决特定于类的问题。所以 this 不能是全局的。
@amadaeus 如果函数在 globals dict 中添加了某些内容或使用 global 修改全局，您是否希望该更改影响实际的全局？
@AaronMcSmooth 是的，我愿意。我只想在其命名空间中提供额外的方便引用，而不是沙箱或替换函数的环境。
@amadaeus，我已经发布了一个非常棒的解决方案，但如果您愿意拥有沙箱全局变量，那么有一个更简单的可用的不涉及破解字节码。

标签： python scope decorator

【解决方案1】：

我越是弄乱堆栈，我就越希望自己没有。不要破解全局变量来做你想做的事。而是破解字节码。我可以想到两种方法来做到这一点。

1) 添加包含您想要的引用的单元格到f.func_closure。您必须重新组装函数的字节码以使用 LOAD_DEREF 而不是 LOAD_GLOBAL 并为每个值生成一个单元格。然后将单元格的元组和新的代码对象传递给types.FunctionType，并获得一个具有适当绑定的函数。该函数的不同副本可以具有不同的本地绑定，因此它应该像您想要的那样是线程安全的。

2) 在函数参数列表的末尾添加新本地变量的参数。将出现的 LOAD_GLOBAL 替换为 LOAD_FAST。然后使用types.FunctionType 构造一个新函数，并传入新的代码对象和一个您想要作为默认选项的绑定元组。这在 python 将函数参数限制为 255 个的意义上是有限的，并且它不能用于使用可变参数的函数。尽管如此，我还是觉得这两者中更具挑战性的那一个让我印象深刻，所以这就是我实施的一个（另外还有其他可以用这个完成的东西）。同样，您可以使用不同的绑定制作函数的不同副本，也可以从每个调用位置使用所需的绑定调用函数。所以它也可以像你想要的那样是线程安全的。

import types
import opcode

# Opcode constants used for comparison and replacecment
LOAD_FAST = opcode.opmap['LOAD_FAST']
LOAD_GLOBAL = opcode.opmap['LOAD_GLOBAL']
STORE_FAST = opcode.opmap['STORE_FAST']

DEBUGGING = True

def append_arguments(code_obj, new_locals):
    co_varnames = code_obj.co_varnames   # Old locals
    co_names = code_obj.co_names      # Old globals
    co_argcount = code_obj.co_argcount     # Argument count
    co_code = code_obj.co_code         # The actual bytecode as a string

    # Make one pass over the bytecode to identify names that should be
    # left in code_obj.co_names.
    not_removed = set(opcode.hasname) - set([LOAD_GLOBAL])
    saved_names = set()
    for inst in instructions(co_code):
        if inst[0] in not_removed:
            saved_names.add(co_names[inst[1]])

    # Build co_names for the new code object. This should consist of 
    # globals that were only accessed via LOAD_GLOBAL
    names = tuple(name for name in co_names
                  if name not in set(new_locals) - saved_names)

    # Build a dictionary that maps the indices of the entries in co_names
    # to their entry in the new co_names
    name_translations = dict((co_names.index(name), i)
                             for i, name in enumerate(names))

    # Build co_varnames for the new code object. This should consist of
    # the entirety of co_varnames with new_locals spliced in after the
    # arguments
    new_locals_len = len(new_locals)
    varnames = (co_varnames[:co_argcount] + new_locals +
                co_varnames[co_argcount:])

    # Build the dictionary that maps indices of entries in the old co_varnames
    # to their indices in the new co_varnames
    range1, range2 = xrange(co_argcount), xrange(co_argcount, len(co_varnames))
    varname_translations = dict((i, i) for i in range1)
    varname_translations.update((i, i + new_locals_len) for i in range2)

    # Build the dictionary that maps indices of deleted entries of co_names
    # to their indices in the new co_varnames
    names_to_varnames = dict((co_names.index(name), varnames.index(name))
                             for name in new_locals)

    if DEBUGGING:
        print "injecting: {0}".format(new_locals)
        print "names: {0} -> {1}".format(co_names, names)
        print "varnames: {0} -> {1}".format(co_varnames, varnames)
        print "names_to_varnames: {0}".format(names_to_varnames)
        print "varname_translations: {0}".format(varname_translations)
        print "name_translations: {0}".format(name_translations)


    # Now we modify the actual bytecode
    modified = []
    for inst in instructions(code_obj.co_code):
        # If the instruction is a LOAD_GLOBAL, we have to check to see if
        # it's one of the globals that we are replacing. Either way,
        # update its arg using the appropriate dict.
        if inst[0] == LOAD_GLOBAL:
            print "LOAD_GLOBAL: {0}".format(inst[1])
            if inst[1] in names_to_varnames:
                print "replacing with {0}: ".format(names_to_varnames[inst[1]])
                inst[0] = LOAD_FAST
                inst[1] = names_to_varnames[inst[1]]
            elif inst[1] in name_translations:    
                inst[1] = name_translations[inst[1]]
            else:
                raise ValueError("a name was lost in translation")
        # If it accesses co_varnames or co_names then update its argument.
        elif inst[0] in opcode.haslocal:
            inst[1] = varname_translations[inst[1]]
        elif inst[0] in opcode.hasname:
            inst[1] = name_translations[inst[1]]
        modified.extend(write_instruction(inst))

    code = ''.join(modified)
    # Done modifying codestring - make the code object

    return types.CodeType(co_argcount + new_locals_len,
                          code_obj.co_nlocals + new_locals_len,
                          code_obj.co_stacksize,
                          code_obj.co_flags,
                          code,
                          code_obj.co_consts,
                          names,
                          varnames,
                          code_obj.co_filename,
                          code_obj.co_name,
                          code_obj.co_firstlineno,
                          code_obj.co_lnotab)


def instructions(code):
    code = map(ord, code)
    i, L = 0, len(code)
    extended_arg = 0
    while i < L:
        op = code[i]
        i+= 1
        if op < opcode.HAVE_ARGUMENT:
            yield [op, None]
            continue
        oparg = code[i] + (code[i+1] << 8) + extended_arg
        extended_arg = 0
        i += 2
        if op == opcode.EXTENDED_ARG:
            extended_arg = oparg << 16
            continue
        yield [op, oparg]

def write_instruction(inst):
    op, oparg = inst
    if oparg is None:
        return [chr(op)]
    elif oparg <= 65536L:
        return [chr(op), chr(oparg & 255), chr((oparg >> 8) & 255)]
    elif oparg <= 4294967296L:
        return [chr(opcode.EXTENDED_ARG),
                chr((oparg >> 16) & 255),
                chr((oparg >> 24) & 255),
                chr(op),
                chr(oparg & 255),
                chr((oparg >> 8) & 255)]
    else:
        raise ValueError("Invalid oparg: {0} is too large".format(oparg))



if __name__=='__main__':
    import dis

    class Foo(object):
        y = 1

    z = 1
    def test(x):
        foo = Foo()
        foo.y = 1
        foo = x + y + z + foo.y
        print foo

    code_obj = append_arguments(test.func_code, ('y',))
    f = types.FunctionType(code_obj, test.func_globals, argdefs=(1,))
    if DEBUGGING:
        dis.dis(test)
        print '-'*20
        dis.dis(f)
    f(1)

请注意，此代码的整个分支（与 EXTENDED_ARG 相关的代码）未经测试，但对于常见情况，它似乎相当可靠。我将对其进行破解，目前正在编写一些代码来验证输出。然后（当我开始使用它时）我将针对整个标准库运行它并修复任何错误。

我也可能会实施第一个选项。

【讨论】：

确实很棒！就个人而言，我发现第一种方法（涉及函数的闭包单元）更清洁（如果您可以将字节码黑客标记为“干净”）。我想我会尝试使用 Byteplay (wiki.python.org/moin/ByteplayDoc)
@amadaeus 我同意你关于第一种方法更清洁的观点。我正在编写适用于这两种方法的测试代码。完成后我会发布它。不过，我不知道现有的字节码模块。我得看看他们。感谢发帖。

【解决方案2】：

编辑后的答案 -- 调用user_func()后恢复命名空间字典

使用 Python 2.7.5 和 3.3.2 测试

文件framework.py：

# framework objects
class Cat: pass
class Mouse: pass
class Cheese: pass

_namespace = {'Cat':Cat, 'Mouse':Mouse, 'Cheese':Cheese } # names to be injected

# framework decorator
from functools import wraps
def wrap(f):
    func_globals = f.func_globals if hasattr(f,'func_globals') else f.__globals__
    @wraps(f)
    def wrapped(*args, **kwargs):
        # determine which names in framework's _namespace collide and don't
        preexistent = set(name for name in _namespace if name in func_globals)
        nonexistent = set(name for name in _namespace if name not in preexistent)
        # save any preexistent name's values
        f.globals_save = {name: func_globals[name] for name in preexistent}
        # temporarily inject framework's _namespace
        func_globals.update(_namespace)

        retval = f(*args, **kwargs) # call function and save return value

        # clean up function's namespace
        for name in nonexistent:
             del func_globals[name] # remove those that didn't exist
        # restore the values of any names that collided
        func_globals.update(f.globals_save)
        return retval

    return wrapped

示例用法：

from __future__ import print_function
import framework

class Cat: pass  # name that collides with framework object

@framework.wrap
def user_func():
    print('in user_func():')
    print('  Cat:', Cat)
    print('  Mouse:', Mouse)
    print('  Cheese:', Cheese)

user_func()

print()
print('after user_func():')
for name in framework._namespace:
    if name in globals():
        print('  {} restored to {}'.format(name, globals()[name]))
    else:
        print('  {} not restored, does not exist'.format(name))

输出：

in user_func():
  Cat: <class 'framework.Cat'>
  Mouse: <class 'framework.Mouse'>
  Cheese: <class 'framework.Cheese'>

after user_func():
  Cheese not restored, does not exist
  Mouse not restored, does not exist
  Cat restored to <class '__main__.Cat'>

【讨论】：

我偶然发现了这种方法，并因为docs.python.org/reference/datamodel.html#index-843 而迅速放弃了它（func_globals 被称为只读）。我知道这意味着您不能将 func_globals 重新分配给另一个 dict，但是修改它是否安全？
@amadaeus：是的，我看到了 RO 属性指示，但认为它意味着重新分配给另一个字典，它并没有说要单独留下可变值。 @AaronMcSmooth：感谢您修复三引号文档字符串。我真的很讨厌 StackOverflow 的语法荧光笔，它没有意识到它正在做 Python...
@martineau 我刚刚意识到 f.func_globals 实际上是对 globals() 字典的引用，因此您的代码实际上将名称绑定到全局命名空间。
@amadaeus：其中一个明确的问题是注入的名称确实被放入了模块的命名空间，并且在调用user_func()之后仍然存在。如果小心的话，装饰者可以清理它们（但会更复杂一些）。
@amadaeus：查看修改后的答案。

【解决方案3】：

听起来您可能想使用exec code in dict，其中code 是用户的函数，dict 是您提供的字典，可以

预填充对用户代码应该能够使用的对象的引用
存储用户代码声明的任何函数或变量，以供您的框架以后使用。

执行文档：http://docs.python.org/reference/simple_stmts.html#the-exec-statement

但是，我很确定这只有在将用户代码作为字符串引入并且您需要执行它时才有效。如果该函数已经编译，它将已经设置了其全局绑定。所以像exec "user_func(*args)" in framework_dict 这样的操作是行不通的，因为user_func 的全局变量已经被设置为定义的模块。

由于func_globals 是只读的，我认为您必须执行what martineau suggests 之类的操作才能修改函数全局变量。

我认为可能（除非您正在做一些前所未有的出色的事情，或者我遗漏了一些关键的微妙之处）您可能最好将您的框架对象放入一个模块中，然后让用户代码导入该模块.一旦模块被imported 后，模块变量就可以很容易地被在该模块之外定义的代码重新分配或改变或访问。

我认为这对于代码的可读性也会更好，因为user_func 最终将为Cat、Dog 等提供明确的命名空间，而不是不熟悉您的框架的读者不得不怀疑它们来自哪里。例如。 animal_farm.Mouse.eat(animal_farm.Cheese)，或者类似的行

from animal_farm import Goat
cheese = make_cheese(Goat().milk())

如果您正在做一些前所未有的出色工作，我认为您需要使用 C API 将参数传递给代码对象。看起来函数PyEval_EvalCodeEx 是你想要的。

【讨论】：

我喜欢你的方法多么干净。但是有一些问题：为了避免额外的代码编译，我想执行 user_func.func_code （代码对象），但我找不到任何方法将额外的参数传递给 user_func 调用（如果函数定义需要）。另一个潜在问题是在某些情况下对全局变量的处理，但目前这不是真正的问题。
如果将code添加到dict，则可以exec "code(parameters)" in dict。
但是，当然，在这种情况下，您并不能避免额外的编译，我的错。尽管如果您对编译一个简单的函数调用有性能（？）顾虑，那么（大部分）解释型语言无论如何都不是最佳选择。
哦，等等，您想将参数传递给已声明的包装函数。抱歉，我想我第一次错过了问题的重点。
嗯，这很有趣。我很确定您不能将参数传递给直接 python 代码中的代码对象；您需要使用 C API。我收集到您要调用的函数是PyEval_EvalCodeEx。 exec 通过简化的 C 函数 PyEval_EvalCode 间接调用此函数，该函数不将代码对象的参数作为参数。我对 Python 的 C 源代码进行了一些研究，看起来没有任何 Python 函数可以将 args 传递给代码对象。

【解决方案4】：

如果你的应用程序是严格的 Python 3，我看不出使用 Python 3 的 nonlocal 比编写装饰器来操作函数的本地命名空间更难看。我说试试nonlocal 解决方案或重新考虑这个策略。

【讨论】：

我介绍装饰器方法主要是因为它的简单性。我希望通过元类调用函数的包装器，因此用户无需手动应用装饰器。我还想让项目向后兼容 Python 2.x (>=2.6)