【发布时间】:2019-07-03 20:49:01
【问题描述】:
如何创建以下格式的 pandas 数据框:
A B C D
0 [1,2,3,4] [2,3,4,5] [4,5,5,6] [6,3,4,5]
1 [2,3,5,6] [3,4,6,6] [3,4,5,7] [2,6,3,4]
2 [8,9,6,7] [5,7,9,5] [3,7,9,5] [5,7,9,8]
基本上每一行都有一个列表作为元素。我正在尝试使用机器学习对数据进行分类。每个数据点有 40 x 6 个值。是否有其他适合输入分类器的格式。
编辑:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot
from sklearn.neighbors import KNeighborsClassifier
# Read csv data into pandas data frame
data_frame = pd.read_csv('data.csv')
extract_columns = ['LinearAccX', 'LinearAccY', 'LinearAccZ', 'Roll', 'pitch', 'compass']
# Number of sample in one shot
samples_per_shot = 40
# Calculate number of shots in dataframe
count_of_shots = len(data_frame.index)/samples_per_shot
# Initialize Empty data frame
training_index = range(count_of_shots)
training_data_list = []
# flag for backward compatibility
make_old_data_compatible_with_new = 0
if make_old_data_compatible_with_new:
# Convert 40 shot data to 25 shot data
# New logic takes 25 samples/shot
# old logic takes 40 samples/shot
start_shot_sample_index = 9
end_shot_sample_index = 34
else:
# Start index from 1 and continue till lets say 40
start_shot_sample_index = 1
end_shot_sample_index = samples_per_shot
# Extract each shot into pandas series
for shot in range(count_of_shots):
# Extract current shot
current_shot_data = data_frame[data_frame['shot_no']==(shot+1)]
# Select only the following column
selected_columns_from_shot = current_shot_data[extract_columns]
# Select columns from selected rows
# Find start and end row indexes
current_shot_data_start_index = shot * samples_per_shot + start_shot_sample_index
current_shot_data_end_index = shot * samples_per_shot + end_shot_sample_index
selected_rows_from_shot = selected_columns_from_shot.ix[current_shot_data_start_index:current_shot_data_end_index]
# Append to list of lists
# Convert selected short into multi-dimensional array
training_data_list.append([selected_columns_from_shot[extract_columns[index]].values.tolist() for index in range(len(extract_c olumns))])
# Append each sliced shot into training data
training_data = pd.DataFrame(training_data_list, columns=extract_columns)
training_features = [1 for i in range(count_of_shots)]
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(training_data, training_features)
【问题讨论】:
-
你尝试了什么?输入格式是什么?在你的问题中包括这些。创建minimal reproducible example
-
你确定你想要一个列表的DataFrame吗?这很少有道理。
-
我正在尝试使用 IMU 数据对手势进行分类,每个手势有 40 个值 Roll、Pitch、Yaw、AccelerationX、AccY、AccZ 值。所有这 6 列有 40 个值,每列构成一个数据点。有没有更好的表示方式?