使用 Python 访问 Microsoft Sharepoint 文件和数据答案

【问题标题】：Accessing Microsoft Sharepoint files and data using Python使用 Python 访问 Microsoft Sharepoint 文件和数据
【发布时间】：2020-05-15 16:21:30
【问题描述】：

我正在使用 Microsoft 共享点。我有一个 url，通过使用该 url，我需要获取总数据，如照片、视频、文件夹、子文件夹、文件、帖子等......我需要将这些数据存储在数据库中（Sql server ）。我正在使用 python。

所以，请任何人建议我如何做到这一点，我是访问共享点和工作这类事情的初学者。

【问题讨论】：

欢迎来到stackoverflow！您能解释一下您尝试过的方法以及您开始使用的方法吗？要使问题得到正确答案，您还需要自己努力。
我已经获取了 url，使用 microsoft graph api，我试图获取该 url 中存在的数据，但我无法完全获取数据。当我打开该网址时，我可以看到我需要的信息，但我不知道如何获取数据并将其存储到我的数据库中。

标签： python sharepoint

【解决方案1】：

这是通过 Python 连接到共享点以及访问文件、文件夹列表和 Sharepoint 的单个文件内容的起始代码。您可以在此基础上进行构建以满足您的需求。

请注意，此方法适用于可通过 Internet 访问的公共 Sharepoint 站点。对于托管在公司 Intranet 上的组织受限 Sharepoint 站点，我尚未测试此代码。

您必须稍微修改 Sharepoint 文件的链接，因为您无法使用从 Web 浏览器复制的文件的 URL 地址直接访问 Python 中的 Sharepoint 文件。


from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File 

####inputs########
# This will be the URL that points to your sharepoint site. 
# Make sure you change only the parts of the link that start with "Your"
url_shrpt = 'https://YourOrganisation.sharepoint.com/sites/YourSharepointSiteName'
username_shrpt = 'YourUsername'
password_shrpt = 'YourPassword'
folder_url_shrpt = '/sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/'

#######################



###Authentication###For authenticating into your sharepoint site###
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
  ctx = ClientContext(url_shrpt, ctx_auth)
  web = ctx.web
  ctx.load(web)
  ctx.execute_query()
  print('Authenticated into sharepoint as: ',web.properties['Title'])

else:
  print(ctx_auth.get_last_error())
############################
  
  
  
  
####Function for extracting the file names of a folder in sharepoint###
###If you want to extract the folder names instead of file names, you have to change "sub_folders = folder.files" to "sub_folders = folder.folders" in the below function
global print_folder_contents
def print_folder_contents(ctx, folder_url):
    try:
       
        folder = ctx.web.get_folder_by_server_relative_url(folder_url)
        fold_names = []
        sub_folders = folder.files #Replace files with folders for getting list of folders
        ctx.load(sub_folders)
        ctx.execute_query()
     
        for s_folder in sub_folders:
            
            fold_names.append(s_folder.properties["Name"])

        return fold_names

    except Exception as e:
        print('Problem printing out library contents: ', e)
######################################################
  
  
# Call the function by giving your folder URL as input  
filelist_shrpt=print_folder_contents(ctx,folder_url_shrpt) 

#Print the list of files present in the folder
print(filelist_shrpt)

现在我们能够在 Sharepoint 中检索和打印特定文件夹中的文件列表，下面是访问特定文件的文件内容并将其保存到已知文件名和路径的本地磁盘的代码在 Sharepoint 中。

#Specify the URL of the sharepoint file. Remember to change only the the parts of the link that start with "Your"
file_url_shrpt = '/sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/YourSharepointFileName'

#Load the sharepoint file content to "response" variable
response = File.open_binary(ctx, file_url_shrpt)

#Save the file to your offline path
with open("Your_Offline_File_Path", 'wb') as output_file:  
    output_file.write(response.content)

您可以参考以下链接连接到 SQL Server 并将内容存储在表中： Connecting to Microsoft SQL server using Python

https://datatofish.com/how-to-connect-python-to-sql-server-using-pyodbc/

【讨论】：

非常感谢您提供信息，但在我的共享点中，我将文档作为一个 URL 和几个其他子站点等等。当我访问该站点时，它不会以文件夹的形式出现，而是以帖子/讨论的形式出现。请您说一下与此相关的内容，如何获取这些数据。
如果您只有指向共享点文档的 URL 链接，则必须从 URL 中提取以下参数，即：“YourOrganisation”、“YourSharepointSiteName”、“YourSharepointFolderName”和“YourSharepointFileName”。以上所有参数都将嵌入您的共享点链接本身。所以尝试解析URL，然后提取上面的参数，然后尝试运行上面的脚本。对您的共享点链接进行简单分析即可获得所有这些详细信息
帮我提取对话框/段形式的数据（同样是框的形式）。它类似于 quora 页面（quora.com/topic/Fitness）。那么如何获取这些数据。我的意思是说我无法与您分享我的共享点数据或详细信息，所以我只是附上了与我的页面类似的链接。所以请你说一下如何获取这些数据。
亲爱的@sai。没有一种单一的解决方案可以从共享点链接中提取文件和帖子。两者都是两种不同的方式，需要以不同的方式处理。对于文件提取，我给你的解决方案可以很好地工作。但是为了提取帖子内容，您必须使用 Python 的 Beautifulsoup 包使用网络抓取技术。因此，从网页中提取帖子和任何内容所需的技术是网页抓取，而 BeatufifulSoup 有很好的网页抓取方式，您可以看看dataquest.io/blog/web-scraping-beautifulsoup
非常感谢您提供建议和信息。

【解决方案2】：

您可能想考虑使用 Pysharepoint，它提供了简单的界面来在 python 中上传和下载文件到 Sharepoint。

Pysharepoint

import pysharepoint as ps

sharepoint_base_url = 'https://<abc>.sharepoint.com/'
username = 'username'
password = 'password'

site = ps.SPInterface(sharepoint_base_url,username,password)

source_path = 'Shared Documents/Shared/<Location>'
sink_path = '/full_sink_path/'
filename = 'filename.ext'
sharepoint_site = 'https://<abc>.sharepoint.com/sites/<site_name>

site.download_file_sharepoint(source_path, sink_path,filename,sharepoint_site)
site.upload_file_sharepoint(source_path, sink_path,filename,sharepoint_site)

【讨论】：