从列表中组织 csv 文件中的数据答案

【问题标题】：Organizing data from csv file from a list从列表中组织 csv 文件中的数据
【发布时间】：2018-09-25 01:08:13
【问题描述】：

所以我需要帮助我的代码来获取这个

No Column Sum
0 Company 28
1 Booth 28
2 Full-Time 25
3 Full-Time Visa Sponsor 5
4 Part-Time 1
5 Internship 18
6 Freshman 7
7 Sophomore 9
8 Junior 17
9 Senior 24
10 Post-Bacs 17
11 MS 17
12 PhD 6
13 Alumni 15

但是我现在的代码正在输出这个

Column Sum
Company 27
Booth 27
Full-Time 27
Full-Time Visa Sponsor 27
Part-Time 27
Internship 27
Freshman 27
Sophomore 27
Junior 27
Senior 27
Post-Bacs 27
MS 27
PhD 27
Alumni 27

我必须使用 csv 文件中的信息，清理它，现在我必须像这样组织它。我的这部分代码如下

company_dict = {0:"Company", 1:"Booth",
                2:"Full-Time", 3:"Full-Time Visa Sponsor",
                4:"Part-Time", 5:"Internship",
                6:"Freshman", 7:"Sophomore",
                8:"Junior", 9:"Senior",
                10:"Post-Bacs", 11:"MS",
                12:"PhD", 13:"Alumni"}

                                            #Loop to organize the company_dict
for lines in company_dict:
    print(repr(lines),company_dict[lines])

keywords = ("AIG","Baylor","CGG","Citi","ExxonMobil","Flow-Cal Inc.",                   #I used a list to help me get the information I wanted from the csv file
           "Global SHop Solutions","Harris Count CTS","HCSS",
           "Hitachi Consulting", "HP Inc.","INT Inc.","JPMorgan Chase & Co",
           "Leidos","McKesson","MRE Consulting Ltd.","NetIQ","PROS",
           "San Jacinto College","SAS","Smartbridge","Sogeti USA",
           "Southwest Research Institute","The Reynolds and Reynolds Company",
           "UH Enterprise Systems","U.S. Marine Corps","ValuD Consuting LLC","Wipro")

DataList = []                                                                           #I made a blank list
with f as filterf:                                                                      #This loop will look for the keywords in the file, and only add those keywords
    output_line_counter = 0                                                             #I needed it to print with rows, so I set it to 0
    for line in filterf:
        if any(keyword in line for keyword in keywords):                                #The actual code that looks for keywords in the line in my file
            output_line_counter += 1                                                    #Adds the column (might not be necessary but it works for me)
            DataList.append(line)

CleanerData = sorted(set(DataList))                                                     #I made a new 'cleaner' list so that it would be alphabetically without spaces
line_counter = 0
for i in CleanerData:                                                                   #I had to do another loop to add rows again, it now prints what is required in the question
    line_counter += 1
    print(line_counter, i, end='')

data_employer = {'No': ('Column', 'Sum')}
for empdata in range(14):
    sum = 0
    for i in CleanerData:
        if i[empdata] != '':
            sum += 1
    data_employer[empdata] = (company_dict[empdata], sum)
for k in data_employer:
    print(list(data_employer.keys()).index(k), data_employer[k][0], data_employer[k][1])

我真的不知道 27 是从哪里来的，我猜是因为一些我没有真正看到的逻辑错误。这是我对代码的尝试，任何输入都将不胜感激。

谢谢！

原始 CSV 文件

ALPHABETICAL ORDER,,,,,,,,,,,,,
,,Positions,,,,Classifications,,,,,,,
Company,Booth,Full-Time,"Full-Time Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
AIG,10,,,,Yes,,,Jr,,,MS,,
Baylor College of Medicine,19,Yes,Yes,,,,,,,,,,Recent
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
,...
Flow-Cal Inc.,16,Yes,,,Yes,,,Jr,Sr,,,,All
Global Shop Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Harris County CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Hitachi Consulting,13,Yes,,,,,,,Sr,,MS,,
HP Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
INT Inc.,20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
JPMorgan Chase & Co,3,Yes,,,Yes,,,Jr,Sr,,,,
Leidos,390,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,
McKesson,26,Yes,,,,,,,Sr,,,,
,,,,,,,,,,,,,
MRE Consulting Ltd.,2,Yes,,,,,,,Sr,PB,MS,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
San Jacinto College  ,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
Sogeti USA,15,Yes,,,,,,,Sr,PB,MS,,
Southwest Research Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The Reynolds and Reynolds Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
UH Enterprise Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
U.S. Marine Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
ValuD Consuting LLC,5,Yes,,,,,,,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
BOOTH ORDER,,,,,,,,,,,,,
,Booth,Positions,,,,Classifications,,,,,,,
Company,#,Full-Time,"Full-Time
Visa Sponsor",Part-Time,Internship,Freshman,Sophomore,Junior,Senior,Post-Bacs,MS,PhD,Alumni
HP�Inc.,1,Yes,,,Yes,,,Jr,,,MS,,Recent
"MRE Consulting, Ltd.",2,Yes,,,,,,,Sr,PB,MS,,All
JPMorgan Chase & Co,3,Yes,,,Yes,,,Jr,Sr,,,,
SAS,4,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
ValuD Consuting LLC,5,Yes,,,,,,,Sr,PB,,,All
NetIQ,7,,,,Yes,,Soph,Jr,Sr,PB,,,
Smartbridge,8,Yes,,,,,,,Sr,PB,MS,,
UH Enterprise Systems,9,Yes,Yes,Yes,Yes,Fr,Soph,Jr,Sr,PB,MS,PhD,All
AIG,10,,,,Yes,,,Jr,,,MS,,
ExxonMobil,11,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,,,
Southwest Research Institute,12,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
Hitachi Consulting,13,Yes,,,,,,,Sr,,MS,,
San Jacinto College  ,14,,,,Yes,,Soph,Jr,Sr,PB,MS,,
Sogeti USA,15,Yes,,,,,,,Sr,PB,MS,,
"Flow-Cal, Inc.",16,Yes,,,Yes,,,Jr,Sr,,,,All
CGG,17,Yes,Yes,,,,,,,,MS,PhD,Recent
Global Shop Solutions,18,Yes,,,Yes,,,,Sr,PB,,,All
Baylor College of Medicine,19,Yes,Yes,,,,,,,,,,Recent
"INT, Inc.",20,Yes,Yes,,Yes,,,Jr,Sr,,MS,PhD,
PROS,21,Yes,,,,,,,Sr,,MS,PhD,All
Harris County CTS,22,Yes,,,Yes,,,Jr,Sr,PB,MS,PhD,All
The Reynolds and Reynolds Company,23,Yes,Yes,,Yes,Fr,Soph,Jr,Sr,PB,,,All
Wipro,24,Yes,,,,,,,Sr,PB,,,
U.S. Marine Corps,25,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,All
McKesson,26,Yes,,,,,,,Sr,,,,
Citi,27/28,Yes,,,Yes,,,Jr,Sr,,,,
HCSS,29,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,Recent
Leidos,30,Yes,,,Yes,Fr,Soph,Jr,Sr,PB,MS,,

更新：我放置了更多代码来帮助澄清。仍在弄清楚为什么它只为列表打印 27 个。我不能在这个项目中使用 pandas

【问题讨论】：

如果没有 csv 的原始格式，我们无法为您提供帮助
在我的错误中重新编辑它
尝试查看 pandas 以使用 csv。它可能会更容易和更清洁。
什么是DataList??你能发一个print(DataList)
“差不多”但不完全是？ range(14) 中的常数从何而来？你为什么要为你的数据制作set()，这会丢弃#rom 原始数据吗？

标签： python list loops csv filter

【解决方案1】：

替换最后一行print(data_employer[k][0], data_employer[k][1])

与

print(list(data_employer.keys()).index(k), data_employer[k][0], data_employer[k][1])

【讨论】：

确实会打印出数字，但总和仍然不正确。新输出0 Column Sum 1 Company 27 2 Booth 27 3 Full-Time 27 4 Full-Time Visa Sponsor 27 5 Part-Time 27 6 Internship 27 7 Freshman 27 8 Sophomore 27 9 Junior 27 10 Senior 27 11 Post-Bacs 27 12 MS 27 13 PhD 27 14 Alumni 27

【解决方案2】：

这是一个使用 pandas 的简单解决方案

import pandas as pd

csv_file_in = 'lines.csv'
csv_file_out = 'return.csv'

df = pd.read_csv(csv_file_in, header=2) # Read in CSV header=2 makes the headers ALumni, PhD etc etc

headers = list(df.columns.values) # get a list of columns to count (headers as row 2)

temp_df = pd.DataFrame([]) # create temp df

for i in headers: #iterate through the columns
    try:
        new_df = pd.DataFrame({'Sum': df[i].count().sum()}, index=[i]) # new dataframe as holding (will be overwritten in the next iteration)
        temp_df = pd.concat([new_df, temp_df]) # concat to temp_df
    except KeyError as e:
        print(e)


temp_df.to_csv(csv_file_out) #output to csv
print(temp_df)

输出

                    Sum
Alumni                   15
PhD                       6
MS                       17
Post-Bacs                17
Senior                   24
Junior                   17
Sophomore                 9
Freshman                  7
Internship               18
Part-Time                 1
Full-Time Visa Sponsor    5
Full-Time                25
Booth                    28
Company                  29

【讨论】：

感谢您抽出宝贵时间回复，但我认为我不能使用 pandas 来完成作业。