Python - 在并行字典中查找值的平均值答案

【问题标题】：Python - Finding the average of a value in parallel dictionariesPython - 在并行字典中查找值的平均值
【发布时间】：2019-03-14 09:12:33
【问题描述】：

所以我有一些 .csv 数据文件需要清理。其中一行数据的示例是：

u[i] = {'age': '44', 'salary': '117681.0', 'suburb': None, 'language': 'English'}

我已经过滤掉了我不想要的数据并返回了多行相关的字典。例如：

{'age': '44', 'salary': '117681.0', 'suburb': None}
{'age': '34', 'salary': '56456.0', 'suburb': 'Frankston'}
{'age': '37', 'salary': '59370.0', 'suburb': 'Richmond'}
{'age': '44', 'salary': '91399.0', 'suburb': 'Collingwood'}
{'age': '36', 'salary': '74437.0', 'suburb': 'Toorak'}
{'age': '41', 'salary': '89121.0', 'suburb': 'Frankston'}

我现在想总结这些字典中的薪水以找到平均薪水，但我一生都无法弄清楚。

我尝试仅隔离工资值并使用计数器，但我似乎无法让它发挥作用。我也尝试过制作可以循环的列表，但我似乎无法在一个列表中获取所有值。我的问题是，当我隔离这些值时，它们是并行值/列表，我不知道如何使用它。

非常感谢任何帮助，这让我发疯了！谢谢！

到目前为止，这是我的代码，但在这个阶段我还没有真正值得一看的东西：

def average_salary(data, lower_age, upper_age): 
    u = dict(sorted(data_cleaned.items()))
    count = 0  

    for i in u:
        age = u[i]['age']
        sal = u[i]['salary']
        tally = 0

        if age is not None and sal is not None and lower_age < float(age) < upper_age:
            tally += float(u[i]['salary'])
            print(u[i]['salary'])

【问题讨论】：

虽然@blue_note 解决方案适用于您的特定问题，但您似乎想对来自csv 的表格数据执行操作。因此，您可能需要研究一个更专业的库来执行此类操作，例如 pandas 或 numpy。

标签： python list loops dictionary counter

【解决方案1】：

在列表中收集工资

salaries = [float(my_dict['salary']) for my_dict in my_dicts]
average = sum(salaries) / len(salaries)

【讨论】：

对不起，我有点困惑（编码新手），我的代码的 my_dict['salary'] 是否等同于 u[i]['salary'] ？ my_dicts 到底代表什么？
@user10276362: my_dicts 是包含所有字典的列表的名称（如果我理解正确，可能是变量u）。剩下的就是一个list comprehension（查一下，对缩短代码很有用）

【解决方案2】：

假设您已将其列入列表：

i = [{'age': '44', 'salary': '117681.0', 'suburb': None},
{'age': '34', 'salary': '56456.0', 'suburb': 'Frankston'},
{'age': '37', 'salary': '59370.0', 'suburb': 'Richmond'},
{'age': '44', 'salary': '91399.0', 'suburb': 'Collingwood'},
{'age': '36', 'salary': '74437.0', 'suburb': 'Toorak'},
{'age': '41', 'salary': '89121.0', 'suburb': 'Frankston'}]

age_avg = sum(int(item["age"]) for item in i) / len(i)
salary_avg = sum(float(item["salary"]) for item in i) / len(i)

print (age_avg, salary_avg)

结果：

39.333333333333336 81410.66666666667

【讨论】：

【解决方案3】：

假设你已经安装了 pandas，那么你可以这样做或使用 pip install pandas 安装或使用 anaconda 安装

import pandas as pd
a=[{'age': '44', 'salary': '117681.0', 'suburb': None},
{'age': '34', 'salary': '56456.0', 'suburb': 'Frankston'},
{'age': '37', 'salary': '59370.0', 'suburb': 'Richmond'},
{'age': '44', 'salary': '91399.0', 'suburb': 'Collingwood'},
{'age': '36', 'salary': '74437.0', 'suburb': 'Toorak'},
{'age': '41', 'salary': '89121.0', 'suburb': 'Frankston'}]
df=pd.DataFrame(a)
df['salary']=pd.to_numeric(df['salary'],errors='coerce')
df['age']=pd.to_numeric(df['age'],errors='coerce')
print(df['salary'].mean())
print(df['age'].mean())

输出

81410.66666666667
39.333333333333336

【讨论】：

【解决方案4】：

我也提出了这样的解决方案：

#!/usr/bin/env ipython
import numpy as np

u=[];
u.append({'age': '44', 'salary': '117681.0', 'suburb': None})
u.append({'age': '34', 'salary': '56456.0', 'suburb': 'Frankston'})
u.append({'age': '37', 'salary': '59370.0', 'suburb': 'Richmond'})
u.append({'age': '44', 'salary': '91399.0', 'suburb': 'Collingwood'})
u.append({'age': '36', 'salary': '74437.0', 'suburb': 'Toorak'})
u.append({'age': '41', 'salary': '89121.0', 'suburb': 'Frankston'})
# ------------------------------------------------------------------
def avg_salary(data,lower_age,upper_age):
    salaries = [float(val['salary']) for val in u if float(val['age'])>lower_age and float(val['age'])<upper_age];
    return sum(salaries)/len(salaries)
# -------------------------------------------------------------------
print avg_salary(u,5,65)

【讨论】：