【问题标题】:Get the child with the maximal value for each parent获取每个父母的最大值的孩子
【发布时间】:2018-09-08 20:51:00
【问题描述】:

我创建了一个带有父子join 数据类型的映射。 在单个查询中,我想获得每个父母的最大值的孩子。

有可能吗?我尝试了一些东西,例如 inner_hits 定义和聚合,例如 top_hitschildrenhas_parenthas_child

我的映射基于this elasticsearch_dsl example 中的PostQuestionAnswer 类。

使用 elasticsearch_dsl 代码的解决方案会很棒,但简单的 elasticsearch 查询也会有所帮助。

谢谢:)

编辑:我附上了我的代码,希望对您有所帮助。

LoggerLogBase(基于类Post):

class LoggerLogBase(Document):
    """
    A base class for :class:`~data_classes.Log` and :class:`~data_classes.Logger` data classes.
    """

    logger_log = Join(relations={'logger': 'log'})

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return False

    class Index:
        """
        Meta-class for defining the index name.
        """
        name = 'logger-log'

Logger(基于类Question):

class Logger(LoggerLogBase):
    """
    A class to represent a temperature logger.
    """
    name = Keyword()
    display_name = Keyword()
    is_displayed = Boolean()

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return hit['_source']['logger_log'] == 'logger'

    @classmethod
    def search(cls, **kwargs):
        """
        Creates an :class:`~elasticsearch_dsl.Search` instance that will search
        over this index.
        """
        return cls._index.search(**kwargs).filter('term', logger_log='logger')

    def add_log(self, timestamp, heat_index_celsius, humidity, temperature_celsius):
        """
        Save a new log which was logged by this logger.
        """
        log = Log(
            _routing=self.meta.id,
            logger_log={'name': 'log', 'parent': self.meta.id},
            timestamp=timestamp,
            heat_index_celsius=heat_index_celsius,
            humidity=humidity,
            temperature_celsius=temperature_celsius
        )

        log.save()
        return log

    def search_logs(self):
        """
        Returns the search for this logger's logs.
        """
        search = Log.search()
        search = search.filter('parent_id', type='log', id=self.meta.id)
        search = search.params(routing=self.meta.id)
        return search

    def search_latest_log(self):
        """
        Returns the search for this logger's latest log.
        """
        search = self.search_logs()\
                        .params(size=0)
        search.aggs.metric('latest_log',
                           'top_hits',
                           sort=[{'timestamp': {'order': 'desc'}}],
                           size=1)
        return search

    def save(self, using=None, index=None, validate=True, **kwargs):
        """
        Saves the document into elasticsearch.
        See documentation for elasticsearch_dsl.Document.save for more information.
        """
        self.logger_log = {'name': 'logger'}
        return super().save(using, index, validate, **kwargs)

Log(基于类Answer):

class Log(LoggerLogBase):
    """
    A class to represent a single temperature measurement log.
    """
    timestamp = Date()
    heat_index_celsius = Float()
    humidity = Float()
    temperature_celsius = Float()

    @classmethod
    def _matches(cls, hit):
        """
        Returns whether a hit matches this class or not.
        """
        return isinstance(hit['_source']['logger_log'], dict) \
            and hit['_source']['logger_log'].get('name') == 'log'

    @classmethod
    def search(cls, using=None, **kwargs):
        """
        Creates an :class:`~elasticsearch_dsl.Search` instance that will search
        over this index.
        """
        return cls._index.search(using=using, **kwargs).exclude('term', logger_log='logger')

    @property
    def logger(self):
        """
        Returns the logger that logged this log.
        """
        if 'logger' not in self.meta:
            self.meta.logger = Logger.get(id=self.logger_log.parent, index=self.meta.index)
        return self.meta.logger

    def save(self, using=None, index=None, validate=True, **kwargs):
        """
        Saves the document into elasticsearch.
        See documentation for elasticsearch_dsl.Document.save for more information.
        """
        self.meta.routing = self.logger_log.parent
        return super().save(using, index, validate, **kwargs)

我当前的解决方案是为每个记录器调用 logger.search_latest_log(),但它需要 N 个查询。我希望能够在单个查询中完成,以提高此操作的性能。

【问题讨论】:

    标签: python elasticsearch


    【解决方案1】:

    我认为您的解决方案是Child Aggregationtop_hits 的混合:

    POST logger-log/_search?size=0
    {
      "aggs": {
        "top-loggers": {
          "terms": {
            "field": "name"
          },
          "aggs": {
            "to-logs": {
              "children": {
                "type" : "log" 
              },
              "aggs": {
                "top-logs": {
                  "top_hits": {
                        "size": 1,
                        "sort": [
                            {
                                "timestamp": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
              }
            }
          }
        }
      }
    }
    

    【讨论】:

    • 让我知道它是否有效或发生任何问题;-)
    • 不错,它有效! (如果将“name.keyword”更改为“name”。我不喜欢“top-loggers”中的“terms”查询,但我想我会找到更好的方法来处理这个问题(我还想添加一些条件,所以我想我会从这里找到方法)。谢谢!
    猜你喜欢
    • 2011-12-26
    • 2013-04-10
    • 2021-07-04
    • 1970-01-01
    • 2018-07-23
    • 2015-08-30
    • 2013-06-04
    • 1970-01-01
    • 2018-01-29
    相关资源
    最近更新 更多