【问题标题】:How can I query RDF data using Haskell?如何使用 Haskell 查询 RDF 数据?
【发布时间】:2018-03-15 22:23:07
【问题描述】:

我是 Haskell 初学者。我有来自 Project Gutenberg 的 RDF XML,如下所示:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xml:base="http://www.gutenberg.org/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:cc="http://web.resource.org/cc/"
  xmlns:dcam="http://purl.org/dc/dcam/"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:pgterms="http://www.gutenberg.org/2009/pgterms/"
>
  <cc:Work rdf:about="">
    <rdfs:comment>Archives containing the RDF files for *all* our books can be downloaded at
            http://www.gutenberg.org/wiki/Gutenberg:Feeds#The_Complete_Project_Gutenberg_Catalog</rdfs:comment>
    <cc:license rdf:resource="https://creativecommons.org/publicdomain/zero/1.0/"/>
  </cc:Work>
  <pgterms:ebook rdf:about="ebooks/20">
    <pgterms:bookshelf>
      <rdf:Description rdf:nodeID="N3f8445072d8e4499b2646626f94866e0">
        <rdf:value>Poetry</rdf:value>
        <dcam:memberOf rdf:resource="2009/pgterms/Bookshelf"/>
      </rdf:Description>
    </pgterms:bookshelf>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.rdf">
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-16T05:01:13.615047</dcterms:modified>
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">12133</dcterms:extent>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N735ba077c8424051b6470a92682aaa5e">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/rdf+xml</rdf:value>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1991-10-01</dcterms:issued>
    <dcterms:title>Paradise Lost</dcterms:title>
    <dcterms:subject>
      <rdf:Description rdf:nodeID="Ne259525c666c4886a996acbdddca0682">
        <rdf:value>PR</rdf:value>
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCC"/>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/files/20/20.txt">
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">507133</dcterms:extent>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-02T06:33:54</dcterms:modified>
        <dcterms:format>
          <rdf:Description rdf:nodeID="Nbd1740a2927845058b0fe43326dcc48b">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">text/plain; charset=us-ascii</rdf:value>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.epub.images">
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:format>
          <rdf:Description rdf:nodeID="Nb08f3d2980e64e91a402eb5b205c10bc">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/epub+zip</rdf:value>
          </rdf:Description>
        </dcterms:format>
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">232622</dcterms:extent>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:17.425321</dcterms:modified>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.kindle.images">
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">933970</dcterms:extent>
        <dcterms:format>
          <rdf:Description rdf:nodeID="Nff1df57b9552466d96b114f20424b5a2">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/x-mobipocket-ebook</rdf:value>
          </rdf:Description>
        </dcterms:format>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:21.321235</dcterms:modified>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:language>
      <rdf:Description rdf:nodeID="N91273d0bffc74be393cda307d2b05137">
        <rdf:value rdf:datatype="http://purl.org/dc/terms/RFC4646">en</rdf:value>
      </rdf:Description>
    </dcterms:language>
    <dcterms:subject>
      <rdf:Description rdf:nodeID="N5e35fb378b37483ca6ef7a08f27cf936">
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
        <rdf:value>Eve (Biblical figure) -- Poetry</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:license rdf:resource="license"/>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.html.images">
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">614618</dcterms:extent>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:16.685338</dcterms:modified>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N7567260ec2fd48c0be3d2858e08ac35d">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">text/html</rdf:value>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.epub.noimages">
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:17.695324</dcterms:modified>
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">232623</dcterms:extent>
        <dcterms:format>
          <rdf:Description rdf:nodeID="Nb640302bc2a84a31b0e154318df817d1">
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/epub+zip</rdf:value>
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.kindle.noimages">
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">933967</dcterms:extent>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:24.846165</dcterms:modified>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N1857bba1f5484e3d84846e1a554ec593">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/x-mobipocket-ebook</rdf:value>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:publisher>Project Gutenberg</dcterms:publisher>
    <dcterms:rights>Public domain in the USA.</dcterms:rights>
    <dcterms:creator>
      <pgterms:agent rdf:about="2009/agents/17">
        <pgterms:deathdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1674</pgterms:deathdate>
        <pgterms:webpage rdf:resource="http://en.wikipedia.org/wiki/John_Milton"/>
        <pgterms:name>Milton, John</pgterms:name>
        <pgterms:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1608</pgterms:birthdate>
      </pgterms:agent>
    </dcterms:creator>
    <dcterms:type>
      <rdf:Description rdf:nodeID="N0f6e6d76b1ff4ea9a2c5c37949efe82b">
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/DCMIType"/>
        <rdf:value>Text</rdf:value>
      </rdf:Description>
    </dcterms:type>
    <dcterms:subject>
      <rdf:Description rdf:nodeID="N202624c4b5994d39a3ab8bf0a2a31d95">
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
        <rdf:value>Adam (Biblical figure) -- Poetry</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.html.noimages">
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">614618</dcterms:extent>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N79f919d14da448e19eb05c444322ddd2">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">text/html</rdf:value>
          </rdf:Description>
        </dcterms:format>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:16.955332</dcterms:modified>
      </pgterms:file>
    </dcterms:hasFormat>
    <pgterms:bookshelf>
      <rdf:Description rdf:nodeID="Nec598f664c934ed49ba3c0168ef09615">
        <rdf:value>Banned Books from Anne Haight's list</rdf:value>
        <dcam:memberOf rdf:resource="2009/pgterms/Bookshelf"/>
      </rdf:Description>
    </pgterms:bookshelf>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/ebooks/20.txt.utf-8">
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">507105</dcterms:extent>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N069b84f8b10844e9a6c713f4c163880b">
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">text/plain</rdf:value>
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
          </rdf:Description>
        </dcterms:format>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2017-03-01T01:04:15.953358</dcterms:modified>
      </pgterms:file>
    </dcterms:hasFormat>
    <dcterms:subject>
      <rdf:Description rdf:nodeID="Nb489692851fa496d96b1a7fdf7a71b21">
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
        <rdf:value>Fall of man -- Poetry</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <pgterms:downloads rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2088</pgterms:downloads>
    <dcterms:subject>
      <rdf:Description rdf:nodeID="Naa6849a7660b4039baadec8af58f0c58">
        <dcam:memberOf rdf:resource="http://purl.org/dc/terms/LCSH"/>
        <rdf:value>Bible. Genesis -- History of Biblical events -- Poetry</rdf:value>
      </rdf:Description>
    </dcterms:subject>
    <dcterms:hasFormat>
      <pgterms:file rdf:about="http://www.gutenberg.org/files/20/20.zip">
        <dcterms:extent rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">205748</dcterms:extent>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N19cf968278bc4922bd87b17209c20d94">
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">text/plain; charset=us-ascii</rdf:value>
          </rdf:Description>
        </dcterms:format>
        <dcterms:isFormatOf rdf:resource="ebooks/20"/>
        <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-02T06:34:42</dcterms:modified>
        <dcterms:format>
          <rdf:Description rdf:nodeID="N94c2881f340a49c18246b69af3abcf12">
            <rdf:value rdf:datatype="http://purl.org/dc/terms/IMT">application/zip</rdf:value>
            <dcam:memberOf rdf:resource="http://purl.org/dc/terms/IMT"/>
          </rdf:Description>
        </dcterms:format>
      </pgterms:file>
    </dcterms:hasFormat>
  </pgterms:ebook>
  <rdf:Description rdf:about="http://en.wikipedia.org/wiki/John_Milton">
    <dcterms:description>Wikipedia</dcterms:description>
  </rdf:Description>
</rdf:RDF>

并且我想将这些信息转换成可以查询和操作的常规 Haskell 数据结构。例如,我可能想查询该作品的标题,或获取其所有 Wikipedia URL。

我注意到有an RDF library in Haskell, rdh4h 并且它有一个XML 解析器。但是我无法对文档进行正面或反面,而且似乎任何地方都没有教程。

我想过做的另一件事是将所有这些 RDF/XML 文件导入某种数据库,然后使用 Haskell 以某种方式查询该数据库。但我不确定哪个数据库是合适的,或者这是否可能。

当然,我可以将其视为 XML 数据,而忽略 RDF 方面,但这似乎需要大量工作,而且我必须为我想要的这个 XML 文件中的每一件事编写一些非常长的数据结构出去。

有人对如何使用 Haskell 查询这样的数据有任何想法吗?

【问题讨论】:

  • 我太累了,但为什么没有任何东西“让 [..] 工作”?你收到类型错误了吗?
  • 调试我尝试编写的内容是没有用的,因为我不知道自己在做什么。真的,我只是在寻找有关如何在 Haskell 中查询 RDF 数据的想法。更新了我的问题,使其更加明确。

标签: xml haskell rdf


【解决方案1】:

我注意到有an RDF library in Haskell, rdh4h 并且它有一个XML 解析器。但是我无法对文档进行正面或反面,而且似乎到处都没有教程。

这是我试图对文档进行正面或反面的尝试。 (持保留态度,因为我对 RDF 一无所知,也没有真正尝试使用该库。)

如果我们打开顶层模块the docs for Data.RDF,我们会发现三个看似相关的函数:parseStringparseFileparseURL。例如,The documentation for parseString 是:

parseString :: Rdf a => p -> Text -> Either ParseFailure (RDF a)

从给定文本中解析 RDF,产生带有错误消息的失败或生成的 RDF。

然后,要调用它,我们需要提供一个p 和一个Text(要解析的字符串)。但是p 是什么?如果我们向上滚动一点,我们会注意到parseStringRdfParser 类的一个方法。实例列表——这对理解类型类有很大帮助——表明XmlParserRdfParser 的一个实例。看起来很有用!

如果我们现在点击the XmlParser documentation entry 的链接,我们会发现它有一个公开的(或“公共”,如果你愿意的话)构造函数:

XmlParser (Maybe BaseUrl) (Maybe Text)

我们可以进一步点击链接了解BaseUrl 只是Text 周围的一个新类型。不过,似乎没有关于构造函数的参数应该是什么的有用文档。除了呼吁the source code of the module 之外,几乎没有其他事情了,也可以通过链接访问。令人惊讶的是,它揭示了与那里的功能相关的有用文档。这是RdfParser的相关实例:

-- |'XmlParser' is an instance of 'RdfParser'.
instance RdfParser XmlParser where
  parseString (XmlParser bUrl dUrl)  = parseXmlRDF bUrl dUrl
  parseFile   (XmlParser bUrl dUrl)  = parseFile' bUrl dUrl
  parseURL    (XmlParser bUrl dUrl)  = parseURL'  bUrl dUrl

这里的 Haddock 评论是多余的;不过,在 Haddock cmets 中有有用的信息到parseURL'...

-- |Parse the document at the given location URL as an XML document, using an optional @BaseUrl@
-- as the base URI, and using the given document URL as the URI of the XML document itself.
--
-- The @BaseUrl@ is used as the base URI within the document for resolving any relative URI references.
-- It may be changed within the document using the @\@base@ directive. At any given point, the current
-- base URI is the most recent @\@base@ directive, or if none, the @BaseUrl@ given to @parseURL@, or
-- if none given, the document URL given to @parseURL@. For example, if the @BaseUrl@ were
-- @http:\/\/example.org\/@ and a relative URI of @\<b>@ were encountered (with no preceding @\@base@
-- directive), then the relative URI would expand to @http:\/\/example.org\/b@.
--
-- The document URL is for the purpose of resolving references to 'this document' within the document,
-- and may be different than the actual location URL from which the document is retrieved. Any reference
-- to @\<>@ within the document is expanded to the value given here. Additionally, if no @BaseUrl@ is
-- given and no @\@base@ directive has appeared before a relative URI occurs, this value is used as the
-- base URI against which the relative URI is resolved.
--p
-- Returns either a @ParseFailure@ or a new RDF containing the parsed triples.
parseURL' :: (Rdf a) =>
                 Maybe BaseUrl       -- ^ The optional base URI of the document.
                 -> Maybe T.Text -- ^ The document URI (i.e., the URI of the document itself); if Nothing, use location URI.
                 -> String           -- ^ The location URI from which to retrieve the XML document.
                 -> IO (Either ParseFailure (RDF a))
                                     -- ^ The parse result, which is either a @ParseFailure@ or the RDF
                                     --   corresponding to the XML document.
parseURL' bUrl docUrl = _parseURL (parseXmlRDF bUrl docUrl)

...和parseXmlRDF:

-- |Parse a xml T.Text to an RDF representation
parseXmlRDF :: (Rdf a)
            => Maybe BaseUrl           -- ^ The base URL for the RDF if required
            -> Maybe T.Text        -- ^ DocUrl: The request URL for the RDF if available
            -> T.Text              -- ^ The contents to parse
            -> Either ParseFailure (RDF a) -- ^ The RDF representation of the triples or ParseFailure
parseXmlRDF bUrl dUrl xmlStr = case runParseArrow of
                                (_,r:_) -> Right r
                                _ -> Left (ParseFailure "XML parsing failed")
  where runParseArrow = runSLA (xreadDoc >>> isElem >>> addMetaData bUrl dUrl >>> getRDF) initState (T.unpack xmlStr)
        initState = GParseState { stateGenId = 0 }

这些 Haddock cmets 不会出现在实际文档中,因为它们所属的函数没有被导出。

总而言之,我想说这个库的文档可以改进。不过,在这种情况下,了解 Hackage 文档的方式可以减轻打击。


所以我尝试了parsed &lt;- parseURL (XmlParser Nothing Nothing) testText,但它显示的是Ambiguous type variable ‘a0’ arising from a use of ‘parseURL’ prevents the constraint ‘(Rdf a0)’ from being solved. Probable fix: use a type annotation to specify what ‘a0’ should be

错误告诉你必须指定 a 所在的位置...

parseURL :: Rdf a => p -> String -> IO (Either ParseFailure (RDF a))

...要么在需要具体类型的地方使用它,要么添加类型注释。

文档中的其他链接显示Rdf is a class with two instancesTListAdjHashMap)和RDF is a data family。既然如此,你想要这样的东西:

parsed <- parseURL (XmlParser Nothing Nothing) testText :: IO (Either ParseFailure (RDF TList))

(注意类型注解如何匹配parseURL的签名指定的结果类型。)

或者,启用ScopedTypeVariables 可以编写:

parsed :: Either ParseFailure (RDF TList) <- parseURL (XmlParser Nothing Nothing) testText

【讨论】:

  • 所以我尝试了parsed &lt;- parseURL (XmlParser Nothing Nothing) testText,但它显示的是Ambiguous type variable ‘a0’ arising from a use of ‘parseURL’ prevents the constraint ‘(Rdf a0)’ from being solved. Probable fix: use a type annotation to specify what ‘a0’ should be. 所以我尝试了parsed::TList &lt;- ... 和更多奇特的东西,例如Right (parsed::(RDF TList)) &lt;- ... ,但到目前为止,没有任何效果。
  • 我基本上只是希望找到一个可行的例子来说明如何做到这一点。在任何地方的文档中似乎都没有。
  • @Jono 我在答案中添加了一个额外的部分。
  • 感谢双方对 XmlParser 的黑线鳕文档的反馈。我已采纳您的建议:hackage.haskell.org/package/rdf4h-3.0.4/docs/…
猜你喜欢
  • 2012-03-12
  • 1970-01-01
  • 1970-01-01
  • 2013-08-14
  • 1970-01-01
  • 2016-12-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多