通过结合使用scrapy在python中从图像中获取文本的基本url？答案

【问题标题】：by combining base url getting text out of image in python using scrapy?通过结合使用scrapy在python中从图像中获取文本的基本url？
【发布时间】：2017-09-12 07:02:21
【问题描述】：

我试过这段代码：

src1 = "https://hms.harvard.edu/"<br/>
src = response.css('div.person-line > div > 
      img::attr("src")').extract_first()<br/>
src = sites/default/files/hms-faculty-emails/BX0UVXkP.jpg <br/>
import urlparse <br/>
urlparse.urljoin(src1, src)<br/>
https://hms.harvard.edu/sites/default/files/hms-faculty-emails/BX0UVXkP.jpg<br/>
src2 = urlparse.urljoin(src1,src)<br/>
email = pytesseract.image_to_string(Image.open(src2))<br/>

我收到了这个错误

ioerror errno 22 invalid mode ('rb') or filename

如何从文本图像中获取电子邮件文本..有人可以帮忙吗？

【问题讨论】：

标签： python scrapy

【解决方案1】：

您应该使用io.BufferIO 缓冲区，因为您使用http 路径调用函数image_to_string。你需要这样写代码：

def get_text(src):
    response = urlopen(src)
    buffer = io.BytesIO(response.read())
    return pytesseract.image_to_string(Image.open(buffer))

【讨论】：

@marni...在 src 中获取文本....下一个代码？...我正在运行这个命令..print(src)..getting image url...not text which在图像中..
@rajeshbojja to get text in src 是什么意思？
@marni...我把我的代码放在这里：src1 = "hms.harvard.edu/"<br> src2 = response.css('div.person-line > div > img::attr("src") ').extract_first()
src = urlparse.urljoin(src1, src2)
def get_text(src):
response = urlopen(src)
buffer = io.BytesIO( response.read())
返回 pytesseract.image_to_string(Image.open(buffer))..当我给 print(src)..im 得到这个图像链接为 o/p:hms.harvard.edu/sites/default/files/hms-faculty-emails/… 我想要电子邮件文本在文本图像之外的那个链接中
这是不同的问题
@marni..我们可以获取文本吗...如果可以..请通过代码告诉我..如何获取？