首先是一些 cmets,class_code 变量从未定义,我不明白你的空 sel.columns(),它应该做什么?
然后,如果您只想从文本查询中选择某些列,最简单的方法是将其转换为子查询并选择该子查询定义的列。
raw_stmt = """
SELECT t2.wip_entity_id
, t1.class_code
, t1.attribute2
FROM foo t1
, bar t2
WHERE t1.wip_entity_id = t2.wip_entity_id
"""
stmt = text(raw_stmt).columns(
column("wip_entity_id", Integer),
column("class_code", Integer),
column("attribute2", Integer),
).subquery()
stmt = select([stmt.c.wip_entity_id, stmt.c.class_code])
这将起作用并发出对包含您的 SQL 语句的子查询的选择。
但是,为什么要将所有列都保留在文本语句中,然后使用 SQLAlchemy 将其缩减?为什么不简化它直接使用呢?
raw_stmt = """
SELECT t2.wip_entity_id
, t1.class_code
FROM foo t1
, bar t2
WHERE t1.wip_entity_id = t2.wip_entity_id
"""
stmt = text(raw_stmt).columns(
column("wip_entity_id", Integer),
column("class_code", Integer),
)
这是使用 pandas 的原始子查询方法的完整演示(不是 dask,因为我没有安装它并且 read_sql_query funcs 对于这样的简单语句表现相同。
import pandas as pd
from sqlalchemy import Column, Integer, column, create_engine, select, text
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True, autoincrement=True)
wip_entity_id = Column(Integer)
class_code = Column(Integer)
attribute2 = Column(Integer)
class Bar(Base):
__tablename__ = "bar"
id = Column(Integer, primary_key=True, autoincrement=True)
wip_entity_id = Column(Integer)
engine = create_engine("sqlite://", future=True, echo=True)
Base.metadata.create_all(engine)
with Session(bind=engine) as session:
session.add_all(
[
Foo(wip_entity_id=1, class_code=101, attribute2=41),
Bar(wip_entity_id=1),
Foo(wip_entity_id=2, class_code=102, attribute2=42),
Foo(wip_entity_id=3, class_code=103, attribute2=43),
Bar(wip_entity_id=3),
]
)
session.commit()
raw_stmt = """
SELECT t2.wip_entity_id
, t1.class_code
, t1.attribute2
FROM foo t1
, bar t2
WHERE t1.wip_entity_id = t2.wip_entity_id
"""
stmt = (
text(raw_stmt)
.columns(
column("wip_entity_id", Integer),
column("class_code", Integer),
column("attribute2", Integer),
)
.subquery("a")
)
stmt = select([stmt.c.wip_entity_id, stmt.c.class_code])
with engine.connect() as con:
df = pd.read_sql_query(stmt, con, index_col="wip_entity_id")
df 包含:
class_code
wip_entity_id
1 101
3 103