【问题标题】:Dask read_sql_query raised AttributeError Select object has no attribute subqueryDask read_sql_query 引发 AttributeError Select object has no attribute subquery
【发布时间】:2023-01-02 00:27:12
【问题描述】:

我正在尝试使用 SQL 将数据从 MariaDB 读取到 Dask 数据帧中,基于来自 Dask 的文档,read_sql_query 函数将 sql 参数作为 SQLAlchemy 可选。

所以我尝试通过以下方式将我的 SQL 查询包装到 SQLAlchemy 选择对象中

sql = """

SELECT t2.wip_entity_id
       , t1.class_code
       , t1.attribute2
  FROM table_1 t1
       , table_2 t2
 WHERE t1.wip_entity_id = t2.wip_entity_id

"""
wip_entity_id = sql.column("wip_entity_id")
maria_conn_string = "xxxxx"
sel = text(sql)
sel = sel.columns()
sel = sel.alias('a')
sel = select([wip_entity_id, class_code]).select_from(sel)

data = read_sql_query(sql=sel, maria_conn_string, index_col=wip_entity_id)



AttributeError: 'Select' object has no attribute 'subquery'

但是,如果我使用相同的选择对象,并直接使用 SQLAlchemy 引擎执行它,它就可以工作

sel = text(sql)
sel = sel.columns()
sel = sel.alias('a')
sel = select([wip_entity_id, class_code]).select_from(sel)


engine = create_engine(maria_conn_string )
cursor = engine.execute(sel)
row = cursor.fetchone()

有谁知道如何解决这个问题?

【问题讨论】:

    标签: dataframe sqlalchemy dask


    【解决方案1】:

    首先是一些 cmets,class_code 变量从未定义,我不明白你的空 sel.columns(),它应该做什么?

    然后,如果您只想从文本查询中选择某些列,最简单的方法是将其转换为子查询并选择该子查询定义的列。

    raw_stmt = """
    SELECT t2.wip_entity_id
           , t1.class_code
           , t1.attribute2
      FROM foo t1
           , bar t2
     WHERE t1.wip_entity_id = t2.wip_entity_id
    """
    
    stmt = text(raw_stmt).columns(
        column("wip_entity_id", Integer),
        column("class_code", Integer),
        column("attribute2", Integer),
    ).subquery()
    
    stmt = select([stmt.c.wip_entity_id, stmt.c.class_code])
    

    这将起作用并发出对包含您的 SQL 语句的子查询的选择。

    但是,为什么要将所有列都保留在文本语句中,然后使用 SQLAlchemy 将其缩减?为什么不简化它直接使用呢?

    raw_stmt = """
    SELECT t2.wip_entity_id
           , t1.class_code
      FROM foo t1
           , bar t2
     WHERE t1.wip_entity_id = t2.wip_entity_id
    """
    
    stmt = text(raw_stmt).columns(
        column("wip_entity_id", Integer),
        column("class_code", Integer),
    )
    

    这是使用 pandas 的原始子查询方法的完整演示(不是 dask,因为我没有安装它并且 read_sql_query funcs 对于这样的简单语句表现相同。

    import pandas as pd
    from sqlalchemy import Column, Integer, column, create_engine, select, text
    from sqlalchemy.orm import Session, declarative_base
    
    Base = declarative_base()
    
    
    class Foo(Base):
        __tablename__ = "foo"
        id = Column(Integer, primary_key=True, autoincrement=True)
        wip_entity_id = Column(Integer)
        class_code = Column(Integer)
        attribute2 = Column(Integer)
    
    
    class Bar(Base):
        __tablename__ = "bar"
        id = Column(Integer, primary_key=True, autoincrement=True)
        wip_entity_id = Column(Integer)
    
    
    engine = create_engine("sqlite://", future=True, echo=True)
    
    Base.metadata.create_all(engine)
    
    with Session(bind=engine) as session:
        session.add_all(
            [
                Foo(wip_entity_id=1, class_code=101, attribute2=41),
                Bar(wip_entity_id=1),
                Foo(wip_entity_id=2, class_code=102, attribute2=42),
                Foo(wip_entity_id=3, class_code=103, attribute2=43),
                Bar(wip_entity_id=3),
            ]
        )
        session.commit()
    
    raw_stmt = """
    SELECT t2.wip_entity_id
           , t1.class_code
           , t1.attribute2
      FROM foo t1
           , bar t2
     WHERE t1.wip_entity_id = t2.wip_entity_id
    """
    
    stmt = (
        text(raw_stmt)
        .columns(
            column("wip_entity_id", Integer),
            column("class_code", Integer),
            column("attribute2", Integer),
        )
        .subquery("a")
    )
    
    stmt = select([stmt.c.wip_entity_id, stmt.c.class_code])
    
    with engine.connect() as con:
        df = pd.read_sql_query(stmt, con, index_col="wip_entity_id")
    

    df 包含:

                   class_code
    wip_entity_id            
    1                     101
    3                     103
    

    【讨论】:

      猜你喜欢
      • 2021-07-25
      • 2022-12-01
      • 2023-02-06
      • 2021-07-17
      • 1970-01-01
      • 1970-01-01
      • 2022-08-15
      • 2020-04-26
      • 2020-11-21
      相关资源
      最近更新 更多