【问题标题】:How to use tensorflow feature_columns as input to a keras model如何使用 tensorflow feature_columns 作为 keras 模型的输入
【发布时间】:2019-06-19 21:07:08
【问题描述】:

Tensorflow 的feature_columns API 对于非数值特征处理非常有用。但是,当前的 API 文档更多地是关于将 feature_columns 与 tensorflow Estimator 结合使用。有没有可能使用 feature_columns 进行分类特征表示,然后基于 tf.keras 构建模型?

我找到的唯一参考是以下教程。它展示了如何将特征列提供给 Keras Sequential 模型:Link

sn-p代码如下:

from tensorflow.python.feature_column import feature_column_v2 as fc

feature_columns = [fc.embedding_column(ccv, dimension=3), ...]
feature_layer = fc.FeatureLayer(feature_columns)
model = tf.keras.Sequential([
    feature_layer,
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
...
model.fit(dataset, steps_per_epoch=8) # dataset is created from tensorflow Dataset API

问题是如何使用带有 keras 功能模型 API 的自定义模型。我尝试了以下,但它不起作用(tensorflow 1.12版)

feature_layer = fc.FeatureLayer(feature_columns)
dense_features = feature_layer(features) # features is a dict of ndarrays in dataset
layer1 = tf.keras.layers.Dense(128, activation=tf.nn.relu)(dense_features)
layer2 = tf.keras.layers.Dense(64, activation=tf.nn.relu)(layer1)
output = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(layer2)
model = Model(inputs=dense_features, outputs=output)

错误日志:

ValueError: Input tensors to a Model must come from `tf.layers.Input`. Received: Tensor("feature_layer/concat:0", shape=(4, 3), dtype=float32) (missing previous layer metadata).

我不知道如何将特征列转换为 keras 模型的输入。

【问题讨论】:

    标签: tensorflow keras


    【解决方案1】:

    可以实现您想要的行为,它可以结合tf.feature_columnkeras functional API。而且,实际上,在 TF 文档中并未提及。

    这至少在 TF 2.0.0-beta1 中有效,但可能会在后续版本中进行更改甚至简化。

    请查看 TensorFlow github 存储库 Unable to use FeatureColumn with Keras Functional API #27416 中的问题。在那里你会找到关于tf.feature_columnKeras Functional API 的有用的cmets。

    因为您询问一般方法,所以我会从上面的链接中复制带有示例的 sn-p。 更新:下面的代码应该可以工作

    from __future__ import absolute_import, division, print_function
    
    import numpy as np
    import pandas as pd
    
    #!pip install tensorflow==2.0.0-alpha0
    import tensorflow as tf
    
    from tensorflow import feature_column
    from tensorflow import keras
    from tensorflow.keras import layers
    from sklearn.model_selection import train_test_split
    
    csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
    dataframe = pd.read_csv(csv_file, nrows = 10000)
    dataframe.head()
    
    train, test = train_test_split(dataframe, test_size=0.2)
    train, val = train_test_split(train, test_size=0.2)
    print(len(train), 'train examples')
    print(len(val), 'validation examples')
    print(len(test), 'test examples')
    
    # Define method to create tf.data dataset from Pandas Dataframe
    # This worked with tf 2.0 but does not work with tf 2.2
    def df_to_dataset_tf_2_0(dataframe, label_column, shuffle=True, batch_size=32):
        dataframe = dataframe.copy()
        #labels = dataframe.pop(label_column)
        labels = dataframe[label_column]
    
        ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
        if shuffle:
            ds = ds.shuffle(buffer_size=len(dataframe))
        ds = ds.batch(batch_size)
        return ds
    
    def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
        dataframe = dataframe.copy()
        labels = dataframe.pop(label_column)
        #labels = dataframe[label_column]
    
        ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
        if shuffle:
            ds = ds.shuffle(buffer_size=len(dataframe))
        ds = ds.batch(batch_size)
        return ds
    
    
    batch_size = 5 # A small batch sized is used for demonstration purposes
    train_ds = df_to_dataset(train, label_column = 'target', batch_size=batch_size)
    val_ds = df_to_dataset(val,label_column = 'target',  shuffle=False, batch_size=batch_size)
    test_ds = df_to_dataset(test, label_column = 'target', shuffle=False, batch_size=batch_size)
    
    age = feature_column.numeric_column("age")
    
    feature_columns = []
    feature_layer_inputs = {}
    
    # numeric cols
    for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
      feature_columns.append(feature_column.numeric_column(header))
      feature_layer_inputs[header] = tf.keras.Input(shape=(1,), name=header)
    
    # bucketized cols
    age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35])
    feature_columns.append(age_buckets)
    
    # indicator cols
    thal = feature_column.categorical_column_with_vocabulary_list(
          'thal', ['fixed', 'normal', 'reversible'])
    thal_one_hot = feature_column.indicator_column(thal)
    feature_columns.append(thal_one_hot)
    feature_layer_inputs['thal'] = tf.keras.Input(shape=(1,), name='thal', dtype=tf.string)
    
    # embedding cols
    thal_embedding = feature_column.embedding_column(thal, dimension=8)
    feature_columns.append(thal_embedding)
    
    # crossed cols
    crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
    crossed_feature = feature_column.indicator_column(crossed_feature)
    feature_columns.append(crossed_feature)
    
    
    
    feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
    feature_layer_outputs = feature_layer(feature_layer_inputs)
    
    x = layers.Dense(128, activation='relu')(feature_layer_outputs)
    x = layers.Dense(64, activation='relu')(x)
    
    baggage_pred = layers.Dense(1, activation='sigmoid')(x)
    
    model = keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=baggage_pred)
    
    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(train_ds)
    

    【讨论】:

    • 现在应该可以了。诀窍是将 Inputs 设置为输入层列表,如下所示 [v for v in feature_layer_inputs.values()]
    • 谢谢!我试图将DenseFeatures 添加到现有的Sequential 模型中,但最终它只能通过在功能性Modelinputs=feature_layer_inputs 中使用两者来工作。
    • 为什么 Input shape=1 在这一行:feature_layer_inputs['thal'] = tf.keras.Input(shape=(1,), name='thal', dtype=tf.string)
    • @HARSHNILESHPATHAK,'thal' 列的示例说明了 string 值的预处理。这意味着输入数据集的每条记录在“thal”列中只包含一个字符串值,这就是为什么我们需要tf.keras.Input() 的 shape=(1,)。然后输入层将此字符串值传递给DenseFeatures(feature_columns) 层中定义的feature_columns。每个 feature_column 根据自己的逻辑扩展形状。像这里的“thal”一样显示thal_one_hotthal_embedding
    • @prog_guy, No, thal_one_hotthal_embedding 只是为了区分不同类型的feature_columns 的示例
    【解决方案2】:

    我最近一直在阅读 TensorFlow 2.0 alpha 版本中的 this document。它有使用 Keras 和特征列 API 的示例。不确定您是否要使用 TF 2.0

    【讨论】:

      【解决方案3】:

      如果您使用 tensorflow 数据集 API,那么该代码可以做得很好。

      featurlayer = keras.layers.DenseFeatures(feature_columns=feature_columns)
      train_dataset = train_dataset.map(lambda x, y: (featurlayer(x), y))
      test_dataset = test_dataset.map(lambda x, y: (featurlayer(x), y))
      
      model.fit(train_dataset, epochs=, steps_per_epoch=, # all_data/batch_num = 
           validation_data=test_dataset,
           validation_steps=)
      

      【讨论】:

      • 这是正确答案,我在TF 1.13下测试过。它应该得到更多的选票。但是你需要使用from tensorflow.python.feature_column import feature_column_v2 as fc dense_features = fc.DenseFeatures(columns)
      【解决方案4】:

      tf.feature_column.input_layer 使用这个函数,这个 api 文档有一个示例。 你可以将 feature_columns 转换为 Tensor,然后将其用于 Mode()

      【讨论】:

        猜你喜欢
        • 2018-12-09
        • 2021-12-26
        • 2021-05-29
        • 2017-12-07
        • 1970-01-01
        • 2021-06-18
        • 2021-04-18
        • 2017-10-24
        • 2018-05-20
        相关资源
        最近更新 更多