【发布时间】:2020-06-19 07:54:09
【问题描述】:
我已经编写了一部分代码来使用 python 从图像中读取文本。图片是发票。
import pytesseract as tess
tess.pytesseract.tesseract_cmd = r'C:\Users\Me\AppData\Local\Tesseract-OCR\tesseract.exe'
from PIL import Image
img = Image.open('C:/Users/Me/Desktop/PM/Invoice Formats/TestInv.png')
text = tess.image_to_string(img)
print(text)
代码的结果是发票文本。我有多张不同格式的发票。 谁能帮我从这些非结构化文本中提取发票编号、发票日期和发票金额?
对于少数发票,得到的文本有点像这样。对于其他人来说,它是不同的
ABC Manufacturing Corporation
Invoice 1111 HHH BBB
‘MyCity, AB'11111-111'
(111)111-1111
My exporter details
\xyz.com
Page: 1 of 2
invoice No, b123456
Date: 01/02/2019,
‘My Oil Products My Bill-To No. 3333
PO Box 1234, Account Number.: 12345
sdlfjsdlf slsdo
Invoice Summary
Delivery Terms:
Payment Terms:
Contact:
DELIVERY POINT
Net 20 days date of invoice
MY NAME
111-111-1111
111-111-1111
abc@xyz.com
Copies of Invoices and Delivery Notes are available on
my url/ check site/ here.
Hf you have any, further questions relating to, your Invoice,
lease contact MY NAME immediately on
111111111
Quantity - Price uni
1000 KG KM = 1000M — KG = Kilogram
Hours Litre M3 = Cubic meter
EA = Each) Normal Cubic Meter
Pounds 7OF, 1atm)
Product Price |
Product Price 1000.28
Net value 1000.28
Total to be paid INR 80000.28
提前谢谢。
【问题讨论】:
-
你能解析你的代码的结果吗?