【问题标题】:Categorical and continuous cross feature column in TensorflowTensorflow 中的分类和连续交叉特征列
【发布时间】:2018-10-16 09:12:19
【问题描述】:

在使用 Tensorflow 的估计器和 feature_column 时,可以交叉分类列和分桶连续列 crossed column,但不能交叉分类和数字。是否可以从https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/feature_column/feature_column.py#L704 实现此功能?

如果能在 Tensforflow 图表中看到实现相同结果的任何替代方法,那也很棒。

import numpy as np

cont = np.array([1,2,3])
cat = np.array(['cat', 'dog', 'cat'])

cross_function(cat, cont) = np.array([[1,0],[0,2],[3,0]])

【问题讨论】:

    标签: python python-3.x tensorflow


    【解决方案1】:

    在这里回答我自己的问题。涉及的步骤是:

    1. 对分类特征进行数字编码
      • 在图表内,因此可以在火车和服务中进行
    2. 对数值结果进行热编码
    3. 将其与连续变量相乘

    代码:

    import numpy as np
    import tensorflow as tf
    
    cont = np.array([1,2,3])
    cat = np.array(['cat', 'dog', 'cat'])
    categories = np.unique(cat)
    
    def categorical_continuous_interaction(categorical_onehot, continuous):
    
        cont = tf.expand_dims(continuous, 0)
        return tf.transpose(tf.multiply(tf.transpose(categorical_onehot), cont))
    
    def transformation_function(feature_dictionary, mapping_table):
    
        continuous_feature = feature_dictionary['cont']
    
        categorical_feature = mapping_table.lookup(feature_dictionary['cat'])
        onehot = tf.one_hot(categorical_feature, categories.shape[0])
        cross_feature = categorical_continuous_interaction(onehot, continuous_feature)
    
        return {'feature_name': cross_feature}
    
    def input_function(dataframe, label_key, ...):
        # categorical mapping tables, these must be generated outside of the dataset 
        # transformation function but within the input function
        mapping_table = tf.contrib.lookup.index_table_from_tensor(
            mapping=tf.constant(categories),
            num_oov_buckets=0, 
            default_value=-1
        )
    
        # Generate the dataset of a dictionary of all of the dataframes columns
        dataset = tf.data.Dataset.from_tensor_slices(dict(dataframe))
        # Convert to a dataset of tuples of dicts with the labels as one tuple
        dataset = dataset.map(lambda x: split_label(x, label_key))
        # Transform the features dict within the dataset
        dataset = dataset.map(lambda features, labels: (transformation_function(
            features, mapping_table=mapping_table), labels))
    
        ...
    
        return dataset
    
    def serving_input_fn():
        # categorical mapping tables, these must be generated outside of the dataset 
        # transformation function but within the input function
        mapping_table=tf.contrib.lookup.index_table_from_tensor(
            mapping=tf.constant(categories),
            num_oov_buckets=0, 
            default_value=-1
        )
        numeric_receiver_tensors = {
            name: tf.placeholder(dtype=tf.float32, shape=[1], name=name+"_placeholder")
            for name in numeric_feature_column_names
        }
        categorical_receiver_tensors = {
            name: tf.placeholder(dtype=tf.string, shape=[1], name=name+"_placeholder")
            for name in categorical_feature_column_names
        }
        receiver_tensors = {**numeric_receiver_tensors, **categorical_receiver_tensors}
    
        features = transformation_function(receiver_tensors, 
            country_mapping_table=country_mapping_table)
    
        return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
    

    【讨论】:

    • 你也可以用tf.feature_column.categorical_column_with_identity 包裹数字列,它自己会为你生成one-hot编码。
    • 谢谢@rodrigo 你能举个例子吗?这是否允许 categorical_continuous_interaction 在稀疏时包含在 Tensorflow 估计器中?
    猜你喜欢
    • 2020-09-21
    • 2020-10-17
    • 1970-01-01
    • 2021-11-09
    • 2020-07-08
    • 2013-11-08
    • 2017-08-15
    • 2023-02-10
    • 1970-01-01
    相关资源
    最近更新 更多