【问题标题】:Create Pandas dataframe with list as values in rows使用列表创建 Pandas 数据框作为行中的值
【发布时间】:2019-07-03 20:49:01
【问题描述】:

如何创建以下格式的 pandas 数据框:

      A            B            C             D
0    [1,2,3,4]    [2,3,4,5]     [4,5,5,6]     [6,3,4,5]
1    [2,3,5,6]    [3,4,6,6]     [3,4,5,7]     [2,6,3,4]
2    [8,9,6,7]    [5,7,9,5]     [3,7,9,5]     [5,7,9,8]

基本上每一行都有一个列表作为元素。我正在尝试使用机器学习对数据进行分类。每个数据点有 40 x 6 个值。是否有其他适合输入分类器的格式。

编辑:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

from sklearn.neighbors import KNeighborsClassifier

# Read csv data into pandas data frame
data_frame = pd.read_csv('data.csv')

extract_columns = ['LinearAccX', 'LinearAccY', 'LinearAccZ', 'Roll', 'pitch', 'compass']

# Number of sample in one shot
samples_per_shot = 40

# Calculate number of shots in dataframe
count_of_shots = len(data_frame.index)/samples_per_shot

# Initialize Empty data frame
training_index = range(count_of_shots)
training_data_list = []

# flag for backward compatibility
make_old_data_compatible_with_new = 0

if make_old_data_compatible_with_new:
    # Convert 40 shot data to 25 shot data
    # New logic takes 25 samples/shot
    # old logic takes 40 samples/shot
    start_shot_sample_index = 9
    end_shot_sample_index = 34
else:
    # Start index from 1 and continue till lets say 40
    start_shot_sample_index = 1
    end_shot_sample_index = samples_per_shot

# Extract each shot into pandas series
for shot in range(count_of_shots):
    # Extract current shot
    current_shot_data = data_frame[data_frame['shot_no']==(shot+1)]

    # Select only the following column
    selected_columns_from_shot = current_shot_data[extract_columns]

    # Select columns from selected rows
    # Find start and end row indexes
    current_shot_data_start_index = shot * samples_per_shot + start_shot_sample_index
    current_shot_data_end_index = shot * samples_per_shot + end_shot_sample_index
    selected_rows_from_shot = selected_columns_from_shot.ix[current_shot_data_start_index:current_shot_data_end_index]

# Append to list of lists
# Convert selected short into multi-dimensional array
training_data_list.append([selected_columns_from_shot[extract_columns[index]].values.tolist() for index in range(len(extract_c    olumns))])

# Append each sliced shot into training data
training_data = pd.DataFrame(training_data_list, columns=extract_columns)
training_features = [1 for i in range(count_of_shots)]
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(training_data, training_features)

【问题讨论】:

  • 你尝试了什么?输入格式是什么?在你的问题中包括这些。创建minimal reproducible example
  • 确定你想要一个列表的DataFrame吗?这很少有道理。
  • 我正在尝试使用 IMU 数据对手势进行分类,每个手势有 40 个值 Roll、Pitch、Yaw、AccelerationX、AccY、AccZ 值。所有这 6 列有 40 个值,每列构成一个数据点。有没有更好的表示方式?

标签: python pandas numpy


【解决方案1】:

简单

pd.DataFrame(
    [[[1, 2, 3, 4], [2, 3, 4, 5], [4, 5, 5, 6], [6, 3, 4, 5]],
     [[2, 3, 5, 6], [3, 4, 6, 6], [3, 4, 5, 7], [2, 6, 3, 4]],
     [[8, 9, 6, 7], [5, 7, 9, 5], [3, 7, 9, 5], [5, 7, 9, 8]]],
    columns=list('ABCD')
)

或者

使用MultiIndexunstack 构建Series

lst = [
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [4, 5, 5, 6],
    [6, 3, 4, 5],
    [2, 3, 5, 6],
    [3, 4, 6, 6],
    [3, 4, 5, 7],
    [2, 6, 3, 4],
    [8, 9, 6, 7],
    [5, 7, 9, 5],
    [3, 7, 9, 5],
    [5, 7, 9, 8]]

pd.Series(lst, pd.MultiIndex.from_product([[0, 1, 2], list('ABCD')])).unstack()

              A             B             C             D
0  [1, 2, 3, 4]  [2, 3, 4, 5]  [4, 5, 5, 6]  [6, 3, 4, 5]
1  [2, 3, 5, 6]  [3, 4, 6, 6]  [3, 4, 5, 7]  [2, 6, 3, 4]
2  [8, 9, 6, 7]  [5, 7, 9, 5]  [3, 7, 9, 5]  [5, 7, 9, 8]

【讨论】:

  • 我正在尝试执行二进制分类。上述数据的标签列表只是 [1,1,...1]。当我将上述数据框和训练标签输入到 KNN 分类器中时,出现错误“值错误:使用序列设置数组元素”。训练数据和训练标签的行数相同。
  • 是的,这种格式是非典型的。我相信你知道你想要什么。但我不知道你需要什么,格式明智。您应该发布另一个包含这些详细信息的问题。
【解决方案2】:

你可以试试这个。

import pandas as pd

data = [{'A': [1,2,3,4], 'B': [2,3,4,5], 'C': [4,5,5,6], 'D': [6,3,4,5]}, {'A': [2,3,5,6], 'B': [3,4,6,6], 'C': [3,4,5,7], 'D': [2,6,3,4]}, {'A': [8,9,6,7], 'B': [5,7,9,5], 'C': [3,7,9,5], 'D': [5,7,9,8]}]
df = pd.DataFrame(data)
print(df)

# Output
              A             B             C             D
0  [1, 2, 3, 4]  [2, 3, 4, 5]  [4, 5, 5, 6]  [6, 3, 4, 5]
1  [2, 3, 5, 6]  [3, 4, 6, 6]  [3, 4, 5, 7]  [2, 6, 3, 4]
2  [8, 9, 6, 7]  [5, 7, 9, 5]  [3, 7, 9, 5]  [5, 7, 9, 8]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-14
    • 1970-01-01
    • 2018-03-04
    • 2021-12-06
    • 2020-08-24
    • 2021-10-17
    相关资源
    最近更新 更多