【问题标题】:Combining two pandas dataframes together Python [duplicate]将两个熊猫数据框组合在一起Python [重复]
【发布时间】:2021-11-08 05:14:42
【问题描述】:

下面的代码计算与日期 month_changes 和 month_changes_2 相关的 vals 和 vals_2 值的 list_val。该代码通过分隔年份间隔来计算 mean','median' or 'max','min'。我想对将两个输出图和 graph_2 组合在一起并返回下面的预期输出的代码进行一些操作,我将如何做到这一点?下面的代码来自这个问题的答案:link

代码:

import numpy as np 
import pandas as pd 

month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"]) 
vals = np.array([10, 23, 45, 4,5,12,4,-6])

month_changes_2 = np.array(["2018-04-06 00:00:00", "2018-05-13 00:00:00", "2018-03-01 00:00:00", "2019-02-01 00:00:00","2019-03-12 00:00:00", "2019-12-01 00:00:00", "2019-12-22 00:00:00","2020-04-01 00:00:00","2021-01-01 00:00:00"]) 
vals_2 = np.array([140, 213, 15, 4,53,1,42,-63,120])

list_val = ['mean', 'median', 'max', 'min']
def yearly_intervals(mc, vs, start_year, end_year,series_val):
    print(series_val)
    data = pd.DataFrame({
        "Date": pd.to_datetime(mc),  # Convert to_datetime immediately
        "Averages": vs
    })
    out = (
        data.groupby(data["Date"].dt.year)["Averages"]  # Access Series
            .agg(list_val[series_val[0]:series_val[-1]])
            .rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
    )
    # If start_year
    if start_year is not None:
        # Reindex to ensure index contains all years in range
        out = out.reindex(range(
            start_year,
            # Use last year (maximum value) from index or user defined arg
            (end_year if end_year is not None else out.index.max()) + 1
        ), fill_value=0)
    return out

graph= yearly_intervals(month_changes, vals, start_year=2016, end_year=2021,series_val=[0,2])
graph_2= yearly_intervals(month_changes_2, vals_2, start_year=2016, end_year=2021,series_val = [2,4])

输出:

      Average  Median
Date                 
2016      0.0     0.0
2017      0.0     0.0
2018     16.5    16.5
2019     14.0     5.0
2020      0.0     0.0
2021     -6.0    -6.0

      Max  Min
Date          
2016    0    0
2017    0    0
2018  213   15
2019   53    1
2020  -63  -63
2021  120  120

预期输出

      Average  Median  Max  Min
Date                 
2016      0.0     0.0   0    0
2017      0.0     0.0   0    0
2018     16.5    16.5  213   15
2019     14.0     5.0   53    1
2020      0.0     0.0  -63  -63
2021     -6.0    -6.0  120  120

【问题讨论】:

  • df1.join(df2)?

标签: python arrays pandas numpy datetime


【解决方案1】:

这样的?


import pandas as pd
df1 = pd.DataFrame({
    'Average' : [0.0, 0.0, 16.5],
    'Median' : [0.0, 0.0, 16.5]
}, index=[2016, 2017, 2018])

df2 = pd.DataFrame({
    'Max' : [0, 0, 213],
    'Min' : [0, 0, 15]
}, index= [2016, 2017, 2018])


print(df1)
print(df2)

df = pd.concat([df1, df2], axis=1)

print(df)

【讨论】:

    【解决方案2】:

    我假设您已经创建并处理了两个数据框 graphgraph_2

    试试这个

    combined_df = pd.concat([graph, graph_2], axis=1)
    print(combined_df)
    

    它会输出:

          Average  Median  Max  Min
    Date
    2016      0.0     0.0    0    0
    2017      0.0     0.0    0    0
    2018     16.5    16.5  213   15
    2019     14.0     5.0   53    1
    2020      0.0     0.0  -63  -63
    2021     -6.0    -6.0  120  120
    

    【讨论】:

      【解决方案3】:

      只需使用您现有的工作并运行graph.join(graph_2)

      import numpy as np 
      import pandas as pd 
      
      month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"]) 
      vals = np.array([10, 23, 45, 4,5,12,4,-6])
      
      month_changes_2 = np.array(["2018-04-06 00:00:00", "2018-05-13 00:00:00", "2018-03-01 00:00:00", "2019-02-01 00:00:00","2019-03-12 00:00:00", "2019-12-01 00:00:00", "2019-12-22 00:00:00","2020-04-01 00:00:00","2021-01-01 00:00:00"]) 
      vals_2 = np.array([140, 213, 15, 4,53,1,42,-63,120])
      
      list_val = ['mean', 'median', 'max', 'min']
      def yearly_intervals(mc, vs, start_year, end_year,series_val):
          print(series_val)
          data = pd.DataFrame({
              "Date": pd.to_datetime(mc),  # Convert to_datetime immediately
              "Averages": vs
          })
          out = (
              data.groupby(data["Date"].dt.year)["Averages"]  # Access Series
                  .agg(list_val[series_val[0]:series_val[-1]])
                  .rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
          )
          # If start_year
          if start_year is not None:
              # Reindex to ensure index contains all years in range
              out = out.reindex(range(
                  start_year,
                  # Use last year (maximum value) from index or user defined arg
                  (end_year if end_year is not None else out.index.max()) + 1
              ), fill_value=0)
          return out
      
      graph= yearly_intervals(month_changes, vals, start_year=2016, end_year=2021,series_val=[0,2])
      graph_2= yearly_intervals(month_changes_2, vals_2, start_year=2016, end_year=2021,series_val = [2,4])
      
      print(graph.join(graph_2))
      

      打印出来的

      [0, 2]
      [2, 4]
      
            Average  Median  Max  Min
      Date                           
      2016      0.0     0.0    0    0
      2017      0.0     0.0    0    0
      2018     16.5    16.5  213   15
      2019     14.0     5.0   53    1
      2020      0.0     0.0  -63  -63
      2021     -6.0    -6.0  120  120
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-12-18
        • 2018-09-13
        • 1970-01-01
        • 2021-07-28
        • 2017-11-13
        • 2018-07-18
        相关资源
        最近更新 更多