【问题标题】:Changing the input after the first function - how to stop this?在第一个函数之后更改输入 - 如何阻止它?
【发布时间】:2019-12-09 10:46:29
【问题描述】:

我试图在同一个脚本中创建两个函数 - 但是当我将它们导入我的主脚本并首先运行第一个 finalGrade(grades) 时,grades 已更改为额外的列 'Final Grade'。我该如何防止这种情况?

在第一个函数中,我使用另一个函数返回 DataSet,每个学生的最终成绩如下:

# Made Andreas Døssing Mortensen s184507 & Mads Westergaard s180799

import numpy as np
import pandas as pd
from roundGrade import roundGrade


def computeFinalGrades(grades):
    #making an array from dataFrame
    arr=grades.values
    #sorting and deleting the first two columns
    gradesarr=np.sort(np.delete(arr,(0,1),axis=1))
    #setting up a list for the final grades
    gradesInList=np.zeros(len(gradesarr))
    for i in range(len(gradesarr)):
        #if -3 is in a row, the final grade should be = -3
        if -3 in gradesarr[i]:

            gradesInList[i]= -3
        #if there is 2 or more grades in a row, execute the procedure below
        elif len(gradesarr[i])>=2:
            #Delete the smallest grade
            meangrade=np.delete(gradesarr,0,axis=1)
            #taking the mean of each row
            finalgrade = np.mean(meangrade[i])
            #making a list with all the final grades
            gradesInList[i] = roundGrade(finalgrade)

        # if there is only one value in the row, return that as the final grade 
        elif len(grades[i])==1: 
            gradesInList[i] = gradesarr
        #Setting up the dataFrame again
        df = pd.DataFrame(grades)
        #Adding final grades to DataFrame
        df['Final Grade']=gradesInList
        #Show all columns
        pd.set_option('display.max_columns', None)
        gradesFinal=df

    return gradesFinal


我的绘图函数如下所示:

#Importing add-ins
import matplotlib.pyplot as plt
import numpy as np
#import pandas as pd

#importing function
from computeFinalGrades import computeFinalGrades



def finalGrade(grades):


    #Counting number of occurencies of each grade and set labels for x,y
    computeFinalGrades(grades)['Final Grade'].value_counts().sort_index().plot("bar",title="Final grades").set(xlabel='Grades',ylabel='Count')


    #show plot
    plt.show()

    return



def assignmentGrades(grades):

    #create an array with all grades
    array = grades.values
    #Sorting and deleting "Name" and "StudentID" from the array
    gradesarray = np.sort(np.delete(array,(0,1),axis=1))
    #making a for loop, to iterate trough array
    for i in range(len(gradesarray)):
        #Defining the length of the array to define max x-values
        num_ass = len(gradesarray[0])
        #Setting the x-axis values to correspond with number of assignments + a jiggler in the interval [-.1,.1] on both axis
        x = np.arange(1,num_ass+1) + np.random.uniform(-0.1,0.1)

        y = gradesarray[i,:] + np.random.uniform(-0.1,0.1)
        #Plotting the x,y "o" for creating scatterplot
        plt.plot(x, y,"o")

    #Drawing mean of grades as line
    meangrade = np.mean(gradesarray,0)
    #Plotting the mean grade as a line
    plt.plot(x,meangrade)
    #Set labels for x,y
    plt.xlabel('Assignments')
    plt.ylabel('Grades')
    #show plot
    plt.show()

    return

希望大家能帮忙解决一个简单的问题,我似乎无法弄清楚。

【问题讨论】:

    标签: python pandas numpy dataframe matplotlib


    【解决方案1】:

    我想告诉你我是如何找到解决你问题的方法的:

    • 您陈述了行为:grades 在调用 assignmentGrades 后发生变化,然后有一个附加列 Final Grade
    • 由于您没有直接更改该方法中的grades,您可能会将其引用传递给另一个变量,然后更改它。 你已经这样做了两次,
      • 曾与arr = grades.values
      • 曾经与df = pd.DataFrame(grades)
      • 然后将Final Grade 列添加到df
    • 我开始怀疑并发现grades 的类型可能是pd.DataFrame 并且构造函数- 当引用另一个pd.DataFrame 时- 不会复制。所以最终dfgrades 在同一个对象上运行。当你改变一个时,你就改变了另一个。

    为了断言这一点,我编写了以下代码

    a = pd.DataFrame({ "first": [1, 2], "second": [3, 4]})
    b = pd.DataFrame(a)
    # b = pd.DataFrame(a.copy()) # Fix
    
    b['new'] = [5, 6]
    print(a)
    print(b)
    

    确实打印了更改后的ab

       first  second  new
    0      1       3    5
    1      2       4    6
       first  second  new
    0      1       3    5
    1      2       4    6
    

    要修复您的代码,请在创建 df 时使用 grades (grades.copy()) 的副本。

    本可以帮助您解决此问题的方法是一个调试器,可让您单步执行代码并查看变量的当前值。这样,您可以看到grades 何时发生变化,然后得出结论。


    我还建议将df 重命名为gradesFinal,因为gradesFinal = df 没有任何作用。

    【讨论】:

      猜你喜欢
      • 2021-05-24
      • 2022-12-19
      • 2011-03-11
      • 1970-01-01
      • 2014-04-02
      • 1970-01-01
      • 1970-01-01
      • 2019-12-20
      • 2020-06-16
      相关资源
      最近更新 更多