【问题标题】:Upload pdf file with gdata docs python v3.0 with ocr使用 ocr 上传带有 gdata docs python v3.0 的 pdf 文件
【发布时间】:2011-12-31 15:52:22
【问题描述】:

我有以下实现将 pdf 文件上传到谷歌文档(取自 gdata API 示例):

def UploadResourceSample():
  """Upload a document, and convert to Google Docs."""
  client = CreateClient()
  doc = gdata.docs.data.Resource(type='document', title='My Sample Doc')

  # This is a convenient MS Word doc that we know exists
  path = _GetDataFilePath('test.0.doc')
  print 'Selected file at: %s' % path

  # Create a MediaSource, pointing to the file
  media = gdata.data.MediaSource()
  media.SetFileHandle(path, 'application/msword')

  # Pass the MediaSource when creating the new Resource
  doc = client.CreateResource(doc, media=media)
  print 'Created, and uploaded:', doc.title.text, doc.resource_id.text

现在我想对上传的文件进行 OCR 文本识别。但我不确定如何在 gdata docs python API 中启用 OCR 识别。所以我的问题是: 有没有办法在 pdf 文件上使用 gdata python v3.0 API 启用 OCR 识别?

【问题讨论】:

    标签: python pdf ocr gdata gdata-api


    【解决方案1】:

    我已经设法使用以下代码对我的 pdf 文档进行 OCR:

    def UploadResourceSample(filename, filepath, fullpath):
      """Upload a document, and convert to Google Docs."""
      client = CreateClient()
      doc = gdata.docs.data.Resource(type='document', title=filename)
    
      path = fullpath
      print 'Selected file at: %s' % path
    
      # Create a MediaSource, pointing to the file
      media = gdata.data.MediaSource()
      media.SetFileHandle(path, 'application/pdf')
    
      # Pass the MediaSource when creating the new Resource
      create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?ocr=true&ocr-language=de'
      doc = client.CreateResource(doc, create_uri=create_uri, media=media)
      print 'Created, and uploaded:', doc.title.text, doc.resource_id.text
    

    【讨论】:

      猜你喜欢
      • 2012-04-03
      • 1970-01-01
      • 2012-04-23
      • 2012-06-17
      • 2012-06-13
      • 1970-01-01
      • 1970-01-01
      • 2012-06-06
      • 2012-02-26
      相关资源
      最近更新 更多