【发布时间】:2019-12-12 15:11:37
【问题描述】:
我正在尝试使用 python-docx 和看门狗从 word 文档中读取标题。 我正在做的是,每当创建或修改新文件时,脚本都会读取文件并获取标题中的内容,但我得到一个
docx.opc.exceptions.PackageNotFoundError: Package not found at 'Test6.docx'
错误,我尝试了所有方法,包括将其作为流打开,但没有任何效果,是的,文档已填充。 作为参考,这是我的代码。
**main.py**
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import watchdog.observers
import watchdog.events
import os
import re
import xml.dom.minidom
import zipfile
from docx import Document
class Watcher:
DIRECTORY_TO_WATCH = "/path/to/my/directory"
def __init__(self):
self.observer = Observer()
def run(self):
event_handler = Handler()
self.observer.schedule(event_handler,path='C:/Users/abdsak11/OneDrive - Lärande', recursive=True)
self.observer.start()
try:
while True:
time.sleep(5)
except:
self.observer.stop()
print ("Error")
self.observer.join()
class Handler(FileSystemEventHandler):
@staticmethod
def on_any_event(event):
if event.is_directory:
return None
elif event.event_type == 'created':
# Take any action here when a file is first created.
path = event.src_path
extenstion = '.docx'
base = os.path.basename(path)
if extenstion in path:
print ("Received created event - %s." % event.src_path)
time.sleep(10)
print(base)
doc = Document(base)
print(doc)
section = doc.sections[0]
header = section.header
print (header)
elif event.event_type == 'modified':
# Taken any action here when a file is modified.
path = event.src_path
extenstion = '.docx'
base = os.path.basename(path)
if extenstion in base:
print ("Received modified event - %s." % event.src_path)
time.sleep(10)
print(base)
doc = Document(base)
print(doc)
section = doc.sections[0]
header = section.header
print (header)
if __name__ == '__main__':
w = Watcher()
w.run()
编辑: 尝试将扩展名从 doc 更改为 docx 并且有效,但无论如何都可以打开 docx,因为这就是我所发现的。
另一件事。打开“.doc”文件并尝试读取标题时,我得到的只是
<docx.document.Document object at 0x03195488>
<docx.section._Header object at 0x0319C088>
我要做的是从标题中提取文本
【问题讨论】:
标签: python ms-word python-docx python-watchdog