【问题标题】:Get value of substrings after splitting拆分后获取子字符串的值
【发布时间】:2019-06-28 04:38:59
【问题描述】:

我有一个 json 文件,看起来像这样:

{
    "model": "Sequential",
    "layers": [
        {
            "L1": "Conv2D(filters = 64, kernel_size=(2,2), strides=(2,2), padding='same', data_format='channels_last', activation='relu', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.), input_shape=(224,224,3))",
            "L2": "MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', data_format='channels_last')",
            "L3": "Conv2D(filters = 64, kernel_size=(2,2), strides=(2,2), padding='same', data_format='channels_last', activation='relu', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L4": "MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', data_format='channels_last')",
            "L5": "Conv2D(filters = 64, kernel_size=(2,2), strides=(2,2), padding='same', data_format='channels_last', activation='relu', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L6": "Conv2D(filters = 64, kernel_size=(2,2), strides=(2,2), padding='same', data_format='channels_last', activation='relu', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L7": "Conv2D(filters = 64, kernel_size=(2,2), strides=(2,2), padding='same', data_format='channels_last', activation='relu', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L8": "MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same', data_format='channels_last')",
            "L9": "Flatten()",
            "L10": "Dense(4096, activation='softmax', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L11": "Dropout(0.4)",
            "L12": "Dense(2048, activation='softmax', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L13": "Dropout(0.4)",
            "L14": "Dense(1000, activation='softmax', use_bias=True, kernel_initializer='zeros', bias_initializer='zeros', kernel_regularizer=regularizers.l1(0.), bias_regularizer=regularizers.l1(0.), activity_regularizer=regularizers.l1(0.), kernel_constraint=max_norm(2.), bias_constraint=max_norm(2.))",
            "L15": "Dropout(0.4)"
        }
    ]
}

我想获取有关 json 文件中存在什么层的信息。例如,Conv2D、MaxPooling2D、Flatten() 等

另外,我想知道过滤器、内核大小、步幅、激活等字符串的值。

我尝试通过这样做来获取图层名称:

with open('model.json','r') as fb:
    con = json.load(fb)
con['layers'][0]['L1'].split('(', 1)[0].rstrip()

输出为'Conv2d'。同样,我得到了其他图层名称。

我需要帮助的是获取过滤器的值(例如 L1 中的 64)。

我试过这样做:

c = con['layers'][0]['L1'].split('(', 1)[1].rstrip()
c.split(',')
['filters = 8', ' kernel_size=(3', '3)', ' strides=(1', ' 1)', " padding='valid'", " data_format='channels_last'", " activation='relu'", ' use_bias=True', " kernel_initializer='zeros'", " bias_initializer='zeros'", ' kernel_regularizer=regularizers.l1(0.)', ' bias_regularizer=regularizers.l1(0.)', ' activity_regularizer=regularizers.l2(0.)', ' kernel_constraint=max_norm(2.)', ' bias_constraint=max_norm(2.)', ' input_shape=(28', '28', '1))']

但我仍然没有得到价值。

有人知道如何获取这些信息吗?

【问题讨论】:

  • 如果字符串不包含过滤器怎么办。
  • 那么它应该返回 0

标签: python regex python-3.x split


【解决方案1】:

使用正则表达式 - documentation 供进一步参考

import re

string_lst = ['filters','kernel_size','stride','activation']
my_dict = {}
for key,value in con['layers'][0].items():
    my_dict[key] = {}
    layer_names = value.split('(')[0].rstrip()
    my_dict[key][layer_names] = {}
    for i in string_lst:
        match = re.search(i+'(.+?), ', value)
        if match:
            filters = match.group(1).split("=")[1].strip()
            my_dict[key][layer_names][i] = filters

    if len(my_dict[key][layer_names]) <= 0:
        del my_dict[key]

print(my_dict)

O/P:

{
    'L1': {'Conv2D': {'filters': '64', 'kernel_size': '(2,2)', 'stride': '(2,2)', 'activation': "'relu'"}}, '
    L2': {'MaxPooling2D': {'stride': '(2,2)'}}, 'L3': {'Conv2D': 
    {'filters': '64', 'kernel_size': '(2,2)', 'stride': '(2,2)', 'activation': "'relu'"}}, 
    'L4': {'MaxPooling2D': {'stride': '(2,2)'}}, 'L5': 
    {'Conv2D': {'filters': '64', 'kernel_size': '(2,2)', 'stride': '(2,2)', 'activation': "'relu'"}}, 
    'L6': {'Conv2D': {'filters': '64', 'kernel_size': '(2,2)', 'stride': '(2,2)', 'activation': "'relu'"}}, 
    'L7': {'Conv2D': {'filters': '64', 'kernel_size': '(2,2)', 'stride': '(2,2)', 'activation': "'relu'"}}, 
    'L8': {'MaxPooling2D': {'stride': '(2,2)'}}, 'L10': {'Dense': {'activation': "'softmax'"}}, 
    'L12': {'Dense': {'activation': "'softmax'"}}, 'L14': {'Dense': {'activation': "'softmax'"}}
}

JSON 包含重复的图层名称,如果您想要唯一的记录,请替换所有行

my_dict[key][layer_names]

my_dict[layer_names]

并删除此my_dict[key] = {}

【讨论】:

  • 当我搜索“激活”时,它没有返回字符串。例如,在 L1 中,它应该返回 'relu'。
  • 是的,它有效,但现在在内核大小和步幅等方面,没有返回整个元组(例如 (3,3))
  • @AshutoshMishra 更新了我的答案,立即尝试。
【解决方案2】:

我会分两步完成。首先为外部过滤器名称和内容制作一个正则表达式

re.compile(r"^\s*([^(]*)\s*\((.*)\)\s*$")

这有两个组,name 和括号中的内容(...)

然后创建一个正则表达式来拆分不在括号内的逗号。你可以看到一个深入的explanation here

re.compile(r',\s*(?![^()]*\))')

演示:

import re

main_regex = re.compile(r"^\s*([^(]*)\s*\((.*)\)\s*$")
split_regex = re.compile(r',\s*(?![^()]*\))')

input = "Conv2D(filters = 64, kernel_size=(2,2), padding='same)"

main_match = main_regex.match(input)
print(main_match.group(1))
parts = split_regex.split(main_match.group(2))
print(parts)

打印:

Conv2D
['filters = 64', 'kernel_size=(2,2)', "padding='same"]

【讨论】:

    【解决方案3】:

    更新:使用正则表达式,您可以提取关键字参数。然后在 '=' 上拆分以查找每一层的每个关键字参数的值。

    import json
    import re
    
    with open('model.json','r') as fb:
      con = json.load(fb)
    
    for layer_key in con['layers'][0]:
      print("Layer: {}".format(layer_key))
      layer = con['layers'][0][layer_key]
      layers_kwargs = re.sub('^(.*?)\(', '', layer)[:-1]
      if not layers_kwargs:
        print('No kwargs')
        continue
      for kwarg in layers_kwargs.split(', '):
        kwarg = [i.strip() for i in kwarg.split('=')]
        if len(kwarg) != 2:
          print('No key', kwarg)
          continue
        k = kwarg[0]
        v = kwarg[1]
        print(k,v)
    

    【讨论】:

    • 它给出了 IndexError: list index out of range
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-11-17
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多