【问题标题】:Why the where clause doesn't work with count on xQuery?为什么 where 子句不适用于 xQuery 上的 count?
【发布时间】:2021-10-15 08:43:18
【问题描述】:

我目前正在从事一个项目,该项目需要我使用特定数据集进行一些 xQueries。这是上述数据集的一小部分:

<root>
 <Transaction>
  <Person>
   <Full_Name>  Katherine  Eaton</Full_Name>
   <Age>87</Age>
   <Ssn>314-44-0462</Ssn>
   <Credit_Card>
    <Cc_Provider>JCB  15  digit</Cc_Provider>
    <Cc_Number>5547858204343354 </Cc_Number>
   </Credit_Card>
   <Bought_From>
    <Date>2021-02-13</Date>
    <Price>$34478.90</Price>
    <Status>Undisputed </Status>
    <Merchant>
     <Shop> McDonalds</Shop>
     <Phone>+1-371-602-9171x83395</Phone>
     <Resides_In>
      <Province>Colorado</Province>
      <City>West  Morgantown</City>
      <Address>834  Walker  Canyon</Address>
      <Lat>80.2658445</Lat>
      <Lon>156.324095 </Lon>
     </Resides_In>
    </Merchant>
   </Bought_From>
  </Person>
 </Transaction>
 <Transaction>
  <Person>
   <Full_Name>  Charles  Wright</Full_Name>
   <Age>55</Age>
   <Ssn>420-62-7501</Ssn>
   <Credit_Card>
    <Cc_Provider>Diners  Club  /  Carte  Blanche</Cc_Provider>
    <Cc_Number>4743336688954504 </Cc_Number>
   </Credit_Card>
   <Bought_From>
    <Date>2020-09-24</Date>
    <Price>$477.99</Price>
    <Status>Undisputed </Status>
    <Merchant>
     <Shop> Subway</Shop>
     <Phone>6922856236</Phone>
     <Resides_In>
      <Province>Wisconsin</Province>
      <City>West  Sherri</City>
      <Address>807  Cordova  Ferry</Address>
      <Lat>-6.079631</Lat>
      <Lon>-150.485761 </Lon>
     </Resides_In>
    </Merchant>
   </Bought_From>
  </Person>
 </Transaction>
 <Transaction>
  <Person>
   <Full_Name>  Scott  Gibbs</Full_Name>
    <Age>52</Age>
    <Ssn>717-01-2401</Ssn>
    <Credit_Card>
     <Cc_Provider>VISA  19  digit</Cc_Provider>
     <Cc_Number>371936215412640 </Cc_Number>
    </Credit_Card>
    <Bought_From>
     <Date>2021-01-06</Date>
     <Price>$2.52</Price>
     <Status>Disputed </Status>
    <Merchant>
     <Shop> American  Apparel</Shop>
     <Phone>(453)737-9365</Phone>
     <Resides_In>
      <Province>Nebraska</Province>
      <City>Sheilamouth</City>
      <Address>70734  Frye  Ridge</Address>
      <Lat>51.8881985</Lat>
      <Lon>-147.147829 </Lon>
     </Resides_In>
    </Merchant>
   </Bought_From>
  </Person>
 </Transaction>
 <Transaction>
  <Person>
   <Full_Name>  Wesley  Underwood</Full_Name>
   <Age>82</Age>
   <Ssn>265-39-3658</Ssn>
   <Credit_Card>
    <Cc_Provider>Discover</Cc_Provider>
    <Cc_Number>30354748203291 </Cc_Number>
   </Credit_Card>
   <Bought_From>
    <Date>2021-07-20</Date>
    <Price>$691.93</Price>
    <Status>Disputed </Status>
    <Merchant>
     <Shop> Amazon</Shop>
     <Phone>(274)381-6022</Phone>
     <Resides_In>
      <Province>Minnesota</Province>
      <City>Jorgeview</City>
      <Address>877  Debra  Way  Apt.  305</Address>
      <Lat>-59.405851</Lat>
      <Lon>3.413555 </Lon>
     </Resides_In>
    </Merchant>
   </Bought_From>
  </Person>
 </Transaction>
 <Transaction>
  <Person>
   <Full_Name>  Scott  Gibbs</Full_Name>
   <Age>52</Age>
   <Ssn>717-01-2401</Ssn>
   <Credit_Card>
    <Cc_Provider>VISA  19  digit</Cc_Provider>
    <Cc_Number>371936215412640 </Cc_Number>
   </Credit_Card>
   <Bought_From>
    <Date>2020-12-03</Date>
    <Price>$1.21</Price>
    <Status>Disputed </Status>
   <Merchant>
    <Shop> Amazon</Shop>
    <Phone>(274)381-6022</Phone>
    <Resides_In>
     <Province>Minnesota</Province>
     <City>Jorgeview</City>
     <Address>877  Debra  Way  Apt.  305</Address>
     <Lat>-59.405851</Lat>
     <Lon>3.413555 </Lon>
    </Resides_In>
   </Merchant>
  </Bought_From>
 </Person>
</Transaction>
我想通过此查询获得所有交易中争议最多的人:
for $xml in 
doc("dataset_100.xml")/root/Transaction
 where $xml//Status = "Disputed"
 for $x in
 (
  for $name in distinct-values(//Full_Name)
  order by count(//Full_Name[. = $name]) descending
  return <x>{$name}</x>
 )
 return fn:concat(
           $x, 
           ' - Contexted Transactions - ', 
           xs:string(count(//Full_Name[. = $x])))

但结果是每次从第一个元素到最后一个元素的列表中包含所有进行的交易,无论是有争议的还是无争议的:

`Katherine  Eaton - Contexted Transactions - 3
 Charles  Wright - Contexted Transactions - 6
 Scott  Gibbs - Contexted Transactions - 3
 Wesley  Underwood - Contexted Transactions - 3
 Andres  Hanna - Contexted Transactions - 2`

我知道这是不正确的,因为我已经在 neo4j 中对其进行了测试,但我现在真的不知道该把手放在哪里。

编辑:我实际上发现了如何编写以前不允许我发布的代码的缺失部分。所以,我真的很抱歉,我要感谢你的回答,Martin Honnen,但这是实际的 xml。

【问题讨论】:

  • 我当然希望这些完全是虚构的姓名、SSN 和信用卡号。
  • 您的查询预计会有 &lt;root&gt;&lt;Transaction&gt;,但您的示例数据不包括这些内容。请使其具有代表性。
  • 顺便说一句,在 XQuery 3.0 中,您可以使用 declare context item 将您的文档和查询一起嵌入到可以直接一起运行的单个事物中。这使得构建(和测试)完全独立的minimal reproducible examples 变得更加容易。
  • ("请使其具有代表性" -- 意思是,"请测试您的查询实际上可以针对您提供的数据运行,并且它返回您在运行时声明的特定输出该数据”)
  • 是的,数据完全是用python中的faker类随机生成的

标签: xml xquery basex


【解决方案1】:

对于您展示的看似无序的非结构化数据,您可以先使用tumbling window 对其进行重组,然后将其余部分按Full_Name 分组并计数、排序并选择第一个

(for $person in
for tumbling window $transaction in root/*
start  $s when $s instance of element(Full_Name)
where $transaction[self::Bought_From]/Status/normalize-space() = 'Disputed'
return <transaction>{$transaction}</transaction>
group by $name := $person/Full_Name/normalize-space()
order by count($person) descending
return $name || ':' || count($person)) => head()

对于经过编辑的结构化样本,这会简化为

(for $transaction in root/Transaction[normalize-space(Person/Bought_From/Status) = 'Disputed']
group by $name := normalize-space($transaction/Person/Full_Name)
let $count := count($transaction)
order by $count descending
return $name || ': ' || $count) => head()

然后产生Scott Gibbs: 2

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多