【问题标题】:How to load python dataframe on Github repository as a csv file?如何在 Github 存储库上将 python 数据帧作为 csv 文件加载?
【发布时间】:2020-07-21 09:16:36
【问题描述】:

我需要在服务器上部署一个 Dash 应用程序。对于数据存储库,我使用的是 Github。所有被操纵的数据都需要存储在 Github 上,以便我的 Dash 应用程序可以访问它们。

我遇到的所有解决方案都要求我将数据框保存为本地 csv,然后将其提交到 Github。在我的情况下这是不可能的,我需要将数据帧作为 csv 直接提交到 Github。

提前感谢您的帮助。

【问题讨论】:

    标签: python github github-api


    【解决方案1】:

    诀窍是将您的 pandas 数据框转换为文本,然后使用相同的文本上传您的文件。 这对https://stackoverflow.com/a/50072113/7375722 很有帮助。

    我正在分享我目前正在使用的代码 -

    #Import required packages
    import pandas as pd
    from github import Github
    from github import InputGitTreeElement
    from datetime import datetime
    
    #create test pd df to upload
    d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
    df = pd.DataFrame(d)
    #convert pd.df to text. This avoids writing the file as csv to local and again reading it
    df2 = df.to_csv(sep=',', index=False)
    
    #list files to upload and desired file names with which you want to save on GitHub
    file_list = [df2,df2]
    file_names = ['Test.csv','Test2.csv']
    
    #Specify commit message
    commit_message = 'Test Python'
    
    #Create connection with GiHub
    user = "{your-user-id}"
    password = "{your-password}"
    g = Github(user,password)
    
    #Get list of repos
    for repo in g.get_user().get_repos():
        print(repo.name)
        repo.edit(has_wiki=False)
    
    #Create connection with desired repo
    repo = g.get_user().get_repo('{your-repo-name}')
    
    #Check files under the selected repo
    x = repo.get_contents("")
    for labels in x:
        print(labels)
    x = repo.get_contents("Test.csv") #read a specific file from your repo
    
    #Get available branches in your repo
    x = repo.get_git_refs()
    for y in x:
        print(y)
    # output eg:- GitRef(ref="refs/heads/master")
    
    #Select required branch where you want to upload your file.
    master_ref = repo.get_git_ref("heads/master")
    
    #Finally, putting everything in a function to make it re-usable
    
    def updategitfiles(file_names,file_list,userid,pwd,Repo,branch,commit_message =""):
        if commit_message == "":
           commit_message = "Data Updated - "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
        g = Github(userid,pwd)
        repo = g.get_user().get_repo(Repo)
        master_ref = repo.get_git_ref("heads/"+branch)
        master_sha = master_ref.object.sha
        base_tree = repo.get_git_tree(master_sha)
        element_list = list()
        for i in range(0,len(file_list)):
            element = InputGitTreeElement(file_names[i], '100644', 'blob', file_list[i])
            element_list.append(element)
        tree = repo.create_git_tree(element_list, base_tree)
        parent = repo.get_git_commit(master_sha)
        commit = repo.create_git_commit(commit_message, tree, [parent])
        master_ref.edit(commit.sha)
        print('Update complete')
    
    updategitfiles(file_names,file_list,user,password,'{your-repo-name}','{your-branch-name}')
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-12-07
      • 2023-02-09
      • 1970-01-01
      • 2021-04-09
      • 2018-11-28
      • 2017-06-21
      • 2022-07-07
      相关资源
      最近更新 更多