【问题标题】:Concatenated columns should have the same type (How to convert or cast DateTime to the right type) ML.NET C# SQL连接的列应具有相同的类型(如何将 DateTime 转换或强制转换为正确的类型) ML.NET C# SQL
【发布时间】:2021-03-19 11:22:42
【问题描述】:

我想使用 DateTime 类型作为我的机器学习模型的特征。

它会产生以下错误:

System.InvalidOperationException:连接的列应该具有相同的类型。 “DateTime”列的类型为 DateTime,但预期的列类型为 Single。

我有以下代码:

public static IEstimator<ITransformer> BuildTrainingPipeLine(MLContext mLContext)
{
    // Data process configuration with pipeline data transformations 
    var dataProcessPipeline = mLContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "LatitudeEncoded", inputColumnName:"Latitude")
        .Append(mLContext.Transforms.Categorical.OneHotEncoding(outputColumnName:"LongitudeEncoded", inputColumnName:"Longitude"))
        .Append(mLContext.Transforms.Concatenate("Features", new[] { "LatitudeEncoded", "LongitudeEncoded", "DateTime", "Temperature", "Unit" }));

    // Set the training algorithm 
    var trainer = mLContext.Regression.Trainers.FastTree(new FastTreeRegressionTrainer.Options()
    {
        NumberOfLeaves = 20,
        MinimumExampleCountPerLeaf = 10,
        NumberOfTrees = 500,
        LearningRate = 0.2822519f,
        Shrinkage = 2.151229f,
        LabelColumnName = "FillLevel",
        FeatureColumnName = "Features"
    });

    var trainingPipeline = dataProcessPipeline.Append(trainer);

    return trainingPipeline;
}

错误发生在这行代码上:

ITransformer model = trainigPipeline.fit(dataView);

public static ITransformer Train(MLContext mLContext, IDataView dataView, IEstimator<ITransformer> trainingPipeline)
{
    Console.WriteLine("Training start");
    ITransformer model = trainingPipeline.Fit(dataView);
    Console.WriteLine("Training done");
    return model;
}

我的主要方法是这样的:

static void Main(string[] args)
{
    var mLContext = new MLContext();
    var loader = mLContext.Data.CreateDatabaseLoader<Message>();
    var connectionString = GetDbConnection();

    var sqlCommand = "SELECT CAST(MessageId as REAL) as MessageId, CAST(DateTime as string) as DateTime, CAST(FillLevel as REAL) as FillLevel, " +
        "CAST(Temperature as REAL) as Temperature, CAST(Latitude as REAL) as Latitude, CAST(Longitude as REAL) as Longitude, " +
        "CAST(MessageType as REAL) as MessageType, CAST(Unit as REAL) as Unit from Test WHERE Unit = 1";

    var dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, sqlCommand);
    Console.WriteLine("Loading data from database");
    IDataView data = loader.Load(dbSource);
    var set = mLContext.Data.TrainTestSplit(data, testFraction: 0.2);
    Console.WriteLine("Preparing training operations");
    var trainingData = set.TrainSet;
    var testData = set.TestSet;
    IEstimator<ITransformer> trainingPipeline = BuildTrainingPipeLine(mLContext);
    ITransformer model = Train(mLContext, trainingData, trainingPipeline);
    Evaluate(mLContext, model, testData, trainingPipeline);
}

GetDbConnection() 函数:

private static string GetDbConnection()
{
    var builder = new ConfigurationBuilder().SetBasePath(Directory.GetCurrentDirectory()).AddJsonFile("appsettings.json", optional: true, reloadOnChange: true);
    return builder.Build().GetConnectionString("DbConnection");
}

我的消息类如下所示:

public class Message
{
    [ColumnName("MessageId"), LoadColumn(0)]
    public float Messageid;

    [ColumnName("DateTime"), LoadColumn(1)]
    public DateTime DateTime;

    [ColumnName("FillLevel"), LoadColumn(2)]
    public float FillLevel;

    [ColumnName("Temperature"), LoadColumn(3)]
    public float Temperature;

    [ColumnName("Latitude"), LoadColumn(4)]
    public float Latitude;

    [ColumnName("Longitude"), LoadColumn(5)]
    public float Longitude;

    [ColumnName("MessageType"), LoadColumn(6)]
    public float MessageType;

    [ColumnName("Unit"), LoadColumn(7)]
    public float Unit;
}

【问题讨论】:

  • 修复错误而不是试图掩盖它们。日期既不是字符串也不是数字。 SQL 查询是错误的,因为它试图将日期转换为字符串。尝试转换其他字段是非常强烈的气味 - 不需要转换,或者数字存储为文本,这是一个严重的错误。 100,000 是什么?在世界上大多数地方是 100。在中国、印度和美国是 100K。在加拿大两者兼而有之
  • 此外,这个错误是在哪里抛出的?您是否尝试合并具有 不同 类型的数据集?也许从 Excel 加载数据不正确?处理日期时的 right 类型是 DateTime,而不是 Single。 Excel 将日期存储为浮点数。许多图书馆会将其视为正确的日期时间,但有些图书馆不会
  • 为什么在所有数据类型中,您要将几乎所有列都转换为realreal 是一种不精确的数据类型,将所有这些列转换为 real 几乎肯定会导致数据丢失。至于CAST(DateTime as string) as DateTimestring 不是 SQL Server 中的数据类型,所以我假设这是用户定义的数据类型;理想情况下,您应该在数据类型前面加上模式(即dbo.string)。
  • 您可以通过转换来转换类型 - context.Transforms.Conversion.ConvertType
  • @JJNL77 很高兴它有帮助!

标签: c# sql-server ml.net


【解决方案1】:

您可以使用 CustomMappingFunctions 或 General Conversions 来解决这个问题:

将自定义输出添加到您的 ModelInput:

public class Message
{
    [ColumnName("MessageId"), LoadColumn(0)]
    public float Messageid;

    [ColumnName("DateTime"), LoadColumn(1)]
    public DateTime DateTime;

    [ColumnName("FillLevel"), LoadColumn(2)]
    public float FillLevel;

    [ColumnName("Temperature"), LoadColumn(3)]
    public float Temperature;

    [ColumnName("Latitude"), LoadColumn(4)]
    public float Latitude;

    [ColumnName("Longitude"), LoadColumn(5)]
    public float Longitude;

    [ColumnName("MessageType"), LoadColumn(6)]
    public float MessageType;

    [ColumnName("Unit"), LoadColumn(7)]
    public float Unit;
}

public class CustomMappingOutput
{
    [ColumnName("CustomMappingOutput")]
    public float CustomDateHour { get; set; }
}

创建自定义映射:

    [CustomMappingFactoryAttribute("CustomDateMapping")]
    private class CustomDate : CustomMappingFactory<ModelInput, CustomMappingOutput>
    {
        public static void CustomAction(ModelInput input, CustomMappingOutput
            output)
        {
            var customDate = Convert.ToDateTime(input.Date);
            output.CustomDateHour = (float)customDate.Hour;
        }

        public override Action<ModelInput, CustomMappingOutput> GetMapping()
            => CustomAction;
    }

将其添加到管道中:

    var dataProcessPipeline =
        _mlContext.Transforms.CustomMapping(new CustomDate().GetMapping(),"CustomDateMapping")
            .Append(_mlContext.Transforms.Concatenate("Features",
                "CustomMappingOutput",
                nameof(ModelInput.CO2)))
            .AppendCacheCheckpoint(_mlContext);

看这里:MlNetcookBook

一般转换:

_mlContext.Transforms.Conversion.ConvertType(nameof(ModelInput.date),outputKind:DataKind.Single)

TransformExtensionsCatalog

【讨论】:

    猜你喜欢
    • 2011-06-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-02-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多