【问题标题】:readPDF (tm package) in RR中的readPDF(tm包)
【发布时间】:2013-07-07 23:37:57
【问题描述】:

我尝试在 R 中阅读一些在线 pdf 文档。我使用了readRDF 函数。我的脚本是这样的

safex <- readPDF(PdftotextOptions='-layout')(elem=list(uri='C:/Users/FCG/Desktop/NoteF7000.pdf'),language='en',id='id1')

R 显示运行命令状态为 309 的消息。我尝试了不同的 pdftotext 选项。但是,它是相同的信息。并且创建的文本文件没有内容。

任何人都可以read this pdf

【问题讨论】:

  • 我假设您的所有PATHs 都按顺序排列?
  • @RomanLuštrik,你的意思是我在 Windows 上编辑环境变量中的路径吗?
  • 是的,这些路径是否有序?你能访问该函数使用的所有程序吗?

标签: r cygwin tm


【解决方案1】:

readPDFbugs 并且可能不值得打扰(用它查看 this well-documented struggle)。

假设...

  1. 您已经安装了xpdf(有关详细信息,请参阅here

  2. 您的 PATH 全部正常(请参阅 here 了解如何执行此操作的详细信息)并且您已重新启动计算机。

那么您最好避免使用readPDF,而是使用以下解决方法:

system(paste('"C:/Program Files/xpdf/pdftotext.exe"', 
             '"C:/Users/FCG/Desktop/NoteF7000.pdf"'), wait=FALSE)

然后像这样将文本文件读入R...

require(tm)
mycorpus <- Corpus(URISource("C:/Users/FCG/Desktop/NoteF7001.txt"))

看看确认它进展顺利:

inspect(mycorpus)

A corpus with 1 text document

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator 
Available variables in the data frame are:
  MetaID 

[[1]]
Market Notice
Number: Date F7001 08 May 2013

New IDX SSF (EWJG) The following new IDX SSF contract will be added to the list and will be available for trade today.

Summary Contract Specifications Contract Code Underlying Instrument Bloomberg Code ISIN Code EWJG EWJG IShares MSCI Japan Index Fund (US) EWJ US EQUITY US4642868487 1 (R1 per point)

Contract Size / Nominal

Expiry Dates & Times

10am New York Time; 14 Jun 2013 / 16 Sep 2013

Underlying Currency Quotations Minimum Price Movement (ZAR) Underlying Reference Price

USD/ZAR Bloomberg Code (USDZAR Currency) Price per underlying share to two decimals. R0.01 (0.01 in the share price)

4pm underlying spot level as captured by the JSE.

Currency Reference Price

The same method as the one utilized for the expiry of standard currency futures on standard quarterly SAFEX expiry dates.

JSE Limited Registration Number: 2005/022939/06 One Exchange Square, Gwen Lane, Sandown, South Africa. Private Bag X991174, Sandton, 2146, South Africa. Telephone: +27 11 520 7000, Facsimile: +27 11 520 8584, www.jse.co.za

Executive Director: NF Newton-King (CEO), A Takoordeen (CFO) Non-Executive Directors: HJ Borkum (Chairman), AD Botha, MR Johnston, DM Lawrence, A Mazwai, Dr. MA Matooane , NP Mnxasana, NS Nematswerani, N Nyembezi-Heita, N Payne Alternate Directors: JH Burke, LV Parsons

Member of the World Federation of Exchanges

Company Secretary: GC Clarke
Settlement Method

Cash Settled

-

Clearing House Fees -

On-screen IDX Futures Trading: o 1 BP for Taker (Aggressor) o Zero Booking Fees for Maker (Passive) o No Cap o Floor of 0.01 Reported IDX Futures Trades o 1.75 BP for both buyer and seller o No Cap o Floor of 0.01

Initial Margin Class Spread Margin V.S.R. Expiry Date

R 10.00 R 5.00 3.5 14/06/2013, 16/09/2013

The above instrument has been designated as "Foreign" by the South African Reserve Bank

Should you have any queries regarding IDX Single Stock Futures, please contact the IDX team on 011 520-7399 or idx@jse.co.za

Graham Smale Director: Bonds and Financial Derivatives Tel: +27 11 520 7831 Fax:+27 11 520 8831 E-mail: grahams@jse.co.za

Distributed by the Company Secretariat +27 11 520 7346

Page 2 of 2

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-08-01
    • 2013-08-07
    • 2017-02-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多