【问题标题】:Get all fields of a Haskell data contructor获取 Haskell 数据构造函数的所有字段
【发布时间】:2020-01-29 08:33:53
【问题描述】:

假设我有以下数据类型来映射我的数据库架构。

data Object = Object
    { classification :: Text
    , country :: Text
    , numberOfParts :: Int
    -- Lots of other fields
    }

我想提取数据库中所有对象的统计信息。例如,我想提取Person 数据构造函数中每个字段的频率。所以我会有以下功能:

-- In the return type, the key of the Map is the field name. 
-- Each map value represents possible values with frequency 
-- (ex. "classification" -> [("table", 10), ("chair", 3), ("computer", 2)])
generateStats :: [Object] -> Map Text [(Text, Integer)]

此函数将计算每个字段的频率,因此我必须调用id objectclassification objectcountry object 等。如果数据类型有 50 个字段,我将不得不调用 50 个函数来访问这些字段字段。

有没有办法概括这一点?

它可以推广到任何数据构造函数吗?

有没有更优雅的方法来解决这类问题?

【问题讨论】:

  • Generics 或 ScrapYourBoilerplate 在这里可能会有所帮助。不过,我对那些提出解决方案的人并不十分熟悉。
  • 对于像numberOfParts 这样的非Text 字段,generateStats 应该做什么?你想跳过它们吗?
  • @K.A.Buhr 在我的初始实现中,我将每个值都转换为文本

标签: haskell


【解决方案1】:

这类问题可以用泛型解决。通常,syb 包(Data.GenericsData.Data 或 SYB 或“废弃你的样板”泛型)是最容易使用的,因此值得先尝试它,只有在你不能使用时才转向更复杂的库让它为特定任务工作。

这里,syb 提供了一种从记录构造函数中检索字段名称列表的简单方法。如果您为某些Object 派生Data 实例:

{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text
data Object = Object
    { classification :: Text
    , country :: Text
    , numberOfParts :: Int
    } deriving (Data)

然后您可以在运行时使用以下函数获取字段名称:

-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr

像这样:

λ> :set -XOverloadedStrings
λ> getnames $ Object "prime" "Canada" 5
["classification","country","numberOfParts"]

您可以在运行时使用通用查询gmapQ 获取字段值Text,并编写通用帮助函数toText,将各种类型的字段值转换为Text

-- Get field values as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText

toText 函数的类型为:

toText :: (Data a) => a -> Text

并且需要准备好处理遇到的任何可能的字段。 Data.Data 泛型的一个限制是您只能处理一组固定的显式类型,其默认值为“其余”。在这里,我们处理TextStringIntDouble 类型,并为“其余”抛出unknown 错误:

{-# LANGUAGE TypeApplications #-}

toText = mkQ unknown           -- make a query with default value "unknown"
                id             -- handle:          id     :: Text -> Text
         `extQ` Text.pack      -- extend to:       pack   :: String -> Text
         `extQ` tshow @Int     -- extend to:       tshow  :: Int -> Text
         `extQ` tshow @Double  -- extend to:       tshow  :: Double -> Text
  where tshow :: (Show a) => a -> Text
        tshow = Text.pack . show
        unknown = error "unsupported type"

如果您想使用Show(或其他)实例处理所有类型,那么syb 将无法完成这项工作。 (如果您尝试删除上面的类型应用程序并编写 `extQ` tshow 来处理所有 Show 情况,您会收到错误消息。)相反,您需要升级到 syb-with-class 或其他一些泛型库来处理这个。

有了这一切,从任何对象中获取键/值对列表是直截了当的:

getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields

这适用于Objects:

λ> concatMap getpairs [Object "prime" "Canada" 5, Object "substandard" "Fakeistan" 100]
[("classification","prime"),("country","Canada"),("numberOfParts","5")
,("classification","substandard"),("country","Fakeistan"),("numberOfParts","100")]

或任何其他带有Data 实例的东西。 Sum 类型和无记录构造函数应该可以正常工作。类型:

data OtherObject = Foo { foo :: String, factor :: Double }
                 | Bar { bar :: Int }
                 | NotARecord Int Int Int
                 deriving (Data)

我们得到:

λ> getpairs $ Foo "exchange" 0.75
[("foo","exchange"),("factor","0.75")]
λ> getpairs $ Bar 42
[("bar","42")]
λ> getpairs $ NotARecord 1 2 3
[]

这是一个完整的代码示例:

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}

import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text

data Object = Object
    { classification :: Text
    , country :: Text
    , numberOfParts :: Int
    } deriving (Data)

data OtherObject = Foo { foo :: String, factor :: Double }
                 | Bar { bar :: Int }
                 | NotARecord Int Int Int
                 deriving (Data)

-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr

-- Get field vales as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText

-- Generic function to convert one field.
toText :: (Data a) => a -> Text
toText = mkQ unknown           -- make a query with default value "unknown"
                id             -- handle:          id     :: Text -> Text
         `extQ` Text.pack      -- extend to:       pack   :: String -> Text
         `extQ` tshow @Int     -- extend to:       tshow  :: Int -> Text
         `extQ` tshow @Double  -- extend to:       tshow  :: Double -> Text
  where tshow :: (Show a) => a -> Text
        tshow = Text.pack . show
        unknown = error "unsupported type"

-- Get field name/value pairs from any `Data` object.
getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields

main :: IO ()
main = mapM_ print $
  [ getpairs $ Object "prime" "Canada" 5
  , getpairs $ Foo "exchange" 0.75
  , getpairs $ Bar 42
  , getpairs $ NotARecord 1 2 3
  ]

【讨论】:

    【解决方案2】:

    此解决方案依赖于来自generics-sop 的泛型机制和来自foldl 的流接收器。

    一些必需的编译指示和导入:

    {-# LANGUAGE DeriveGeneric,DeriveAnyClass,ScopedTypeVariables,FlexibleContexts,
                 GADTs,TypeApplications,OverloadedStrings,StandaloneDeriving, TypeOperators #-}
    module Main (main) where
    
    import qualified GHC.Generics as GHC
    import Generics.SOP (All,And,IsProductType,productTypeFrom,
                         DatatypeInfo(..),datatypeInfo,
                         ConstructorInfo(..),FieldInfo(..),FieldName,                         
                         projections, I(..), K(..),type (-.->)(Fn),type (:.:)(Comp),
                         Generic,HasDatatypeInfo)
    import Generics.SOP.NP -- All the *_NP functions come form here
    import Generics.SOP.Dict (Dict(..),zipAll)
    import qualified Control.Foldl as L
    import Data.Proxy
    import Data.Text (Text)
    import qualified Data.Map.Strict as Map
    

    用于计算直方图的数据类型和函数,与任何具体记录无关:

    newtype Histogram a = Histogram (Map.Map a Int) deriving Show
    
    -- Hides the exact type of the key behind an existential
    data SomeHistogram = forall a. (Ord a, Show a) => SomeHistogram (Histogram a) 
    deriving instance Show SomeHistogram
    
    -- Streaming sink for a single field
    histogram :: Ord a => L.Fold a (Histogram a)
    histogram = (L.Fold step Map.empty Histogram) 
      where
        step m a = Map.insertWith (+) a 1 m
    
    -- For any record with Generics.SOP.Generic instance, 
    -- create a streaming sink that accepts record values and
    -- returns a list of histograms, one for each field
    recordHistogram :: forall r xs . (IsProductType r xs, All Ord xs, All Show xs)
                    => L.Fold r [SomeHistogram]
    recordHistogram = 
        let productOfFolds = 
                cliftA_NP 
                    (Proxy @Ord)
                    (\(Fn proj) -> 
                         Comp (L.premap (\o -> let np = productTypeFrom @r @xs o
                                                   I r = proj (K np)
                                                in r) 
                                        histogram))
                    (projections @xs)
            foldToProduct = sequence'_NP productOfFolds -- pull the Fold outward
            -- convince GHC that we have a combination of Ord and Show for all fields
            ordAndShow = zipAll (Dict @(All Ord) @xs) (Dict @(All Show) @xs)
            foldToList = case ordAndShow of -- collapse result of Fold into a list
                Dict -> collapse_NP . cliftA_NP (Proxy @(Ord `And` Show)) (K . SomeHistogram) 
                        <$> 
                        foldToProduct
         in foldToList
    

    如果我们想要一个字段名称列表与直方图列表一起压缩:

    fieldNamesOf :: forall r xs. (IsProductType r xs, HasDatatypeInfo r) 
                 => Proxy r 
                 -> [FieldName] 
    fieldNamesOf _ =
        case datatypeInfo (Proxy @r) of
            ADT _ _ ((Record _ fields) :* Nil) _ -> 
                collapse_NP (liftA_NP (\(FieldInfo i) -> K i) fields)
            _ -> error "hey, not a record!"
    

    Object一起工作:

    data Object = Object
        { classification :: Text
        , country :: Text
        , numberOfParts :: Int
        } deriving (GHC.Generic,Generic,HasDatatypeInfo) 
    -- Generic and HasDatatypeInfo from generics-sop
    
    main :: IO ()
    main = print $ L.fold recordHistogram [Object "foo" "Spain" 4, Object "bar" "France" 4]
    

    这个解决方案有两个潜在的问题:

    • 在内部,recordHistogram 使用来自 generics-sop 的 n-ary products。构建和遍历这些产品可能会产生一些开销。
    • recordHistogram 返回的流接收器(Fold)中可能存在一些空间泄漏。可能需要一些额外的严格性。

    【讨论】:

    • 您实际上不需要在这里使用Dict。您可以将recordHistogram 的类型签名更改为要求All (Ord `And` Show) xs,然后在productOfFolds 中也传递Proxy @(Ord `And` Show)
    猜你喜欢
    • 2013-08-20
    • 1970-01-01
    • 1970-01-01
    • 2019-11-09
    • 2011-07-12
    • 2013-08-14
    • 1970-01-01
    • 1970-01-01
    • 2013-02-22
    相关资源
    最近更新 更多