【发布时间】:2021-03-19 11:22:42
【问题描述】:
我想使用 DateTime 类型作为我的机器学习模型的特征。
它会产生以下错误:
System.InvalidOperationException:连接的列应该具有相同的类型。 “DateTime”列的类型为 DateTime,但预期的列类型为 Single。
我有以下代码:
public static IEstimator<ITransformer> BuildTrainingPipeLine(MLContext mLContext)
{
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mLContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "LatitudeEncoded", inputColumnName:"Latitude")
.Append(mLContext.Transforms.Categorical.OneHotEncoding(outputColumnName:"LongitudeEncoded", inputColumnName:"Longitude"))
.Append(mLContext.Transforms.Concatenate("Features", new[] { "LatitudeEncoded", "LongitudeEncoded", "DateTime", "Temperature", "Unit" }));
// Set the training algorithm
var trainer = mLContext.Regression.Trainers.FastTree(new FastTreeRegressionTrainer.Options()
{
NumberOfLeaves = 20,
MinimumExampleCountPerLeaf = 10,
NumberOfTrees = 500,
LearningRate = 0.2822519f,
Shrinkage = 2.151229f,
LabelColumnName = "FillLevel",
FeatureColumnName = "Features"
});
var trainingPipeline = dataProcessPipeline.Append(trainer);
return trainingPipeline;
}
错误发生在这行代码上:
ITransformer model = trainigPipeline.fit(dataView);
public static ITransformer Train(MLContext mLContext, IDataView dataView, IEstimator<ITransformer> trainingPipeline)
{
Console.WriteLine("Training start");
ITransformer model = trainingPipeline.Fit(dataView);
Console.WriteLine("Training done");
return model;
}
我的主要方法是这样的:
static void Main(string[] args)
{
var mLContext = new MLContext();
var loader = mLContext.Data.CreateDatabaseLoader<Message>();
var connectionString = GetDbConnection();
var sqlCommand = "SELECT CAST(MessageId as REAL) as MessageId, CAST(DateTime as string) as DateTime, CAST(FillLevel as REAL) as FillLevel, " +
"CAST(Temperature as REAL) as Temperature, CAST(Latitude as REAL) as Latitude, CAST(Longitude as REAL) as Longitude, " +
"CAST(MessageType as REAL) as MessageType, CAST(Unit as REAL) as Unit from Test WHERE Unit = 1";
var dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, sqlCommand);
Console.WriteLine("Loading data from database");
IDataView data = loader.Load(dbSource);
var set = mLContext.Data.TrainTestSplit(data, testFraction: 0.2);
Console.WriteLine("Preparing training operations");
var trainingData = set.TrainSet;
var testData = set.TestSet;
IEstimator<ITransformer> trainingPipeline = BuildTrainingPipeLine(mLContext);
ITransformer model = Train(mLContext, trainingData, trainingPipeline);
Evaluate(mLContext, model, testData, trainingPipeline);
}
GetDbConnection() 函数:
private static string GetDbConnection()
{
var builder = new ConfigurationBuilder().SetBasePath(Directory.GetCurrentDirectory()).AddJsonFile("appsettings.json", optional: true, reloadOnChange: true);
return builder.Build().GetConnectionString("DbConnection");
}
我的消息类如下所示:
public class Message
{
[ColumnName("MessageId"), LoadColumn(0)]
public float Messageid;
[ColumnName("DateTime"), LoadColumn(1)]
public DateTime DateTime;
[ColumnName("FillLevel"), LoadColumn(2)]
public float FillLevel;
[ColumnName("Temperature"), LoadColumn(3)]
public float Temperature;
[ColumnName("Latitude"), LoadColumn(4)]
public float Latitude;
[ColumnName("Longitude"), LoadColumn(5)]
public float Longitude;
[ColumnName("MessageType"), LoadColumn(6)]
public float MessageType;
[ColumnName("Unit"), LoadColumn(7)]
public float Unit;
}
【问题讨论】:
-
修复错误而不是试图掩盖它们。日期既不是字符串也不是数字。 SQL 查询是错误的,因为它试图将日期转换为字符串。尝试转换其他字段是非常强烈的气味 - 不需要转换,或者数字存储为文本,这是一个严重的错误。
100,000是什么?在世界上大多数地方是 100。在中国、印度和美国是 100K。在加拿大两者兼而有之 -
此外,这个错误是在哪里抛出的?您是否尝试合并具有 不同 类型的数据集?也许从 Excel 加载数据不正确?处理日期时的 right 类型是 DateTime,而不是
Single。 Excel 将日期存储为浮点数。许多图书馆会将其视为正确的日期时间,但有些图书馆不会 -
为什么在所有数据类型中,您要将几乎所有列都转换为
real?real是一种不精确的数据类型,将所有这些列转换为real几乎肯定会导致数据丢失。至于CAST(DateTime as string) as DateTime,string不是 SQL Server 中的数据类型,所以我假设这是用户定义的数据类型;理想情况下,您应该在数据类型前面加上模式(即dbo.string)。 -
您可以通过转换来转换类型 -
context.Transforms.Conversion.ConvertType -
@JJNL77 很高兴它有帮助!
标签: c# sql-server ml.net