【问题标题】:pandasql::sqldf not capturing looping variablepandasql::sqldf 没有捕获循环变量
【发布时间】:2018-07-19 01:52:50
【问题描述】:

我正在尝试使用 pandasql::sqldf 循环列表,但这个 sqldf 似乎没有捕获循环变量。下面是我的问题的程式化大纲:

import pandas as pd
from pandasql import sqldf
from datetime import datetime

FreqGamePlay = pd.DataFrame({'CONTACT_WID' : [1, 2, 3, 1, 4], 
                         'TITLE_NOMIN_DT' : pd.to_datetime(['20130102', '20140103', '20120518', 
                                        '20140317', '20111123']),
                        'FreqGamePlay' : [12, 9, 22, 4, 5]})
FreqGamePlay = FreqGamePlay[['CONTACT_WID', 'TITLE_NOMIN_DT', 'FreqGamePlay']]

periodsList = ['2012-12-26', '2012-02-28']
for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;", globals())
    print(temp)

上面的程序给出以下错误:

PandaSQLException: (sqlite3.OperationalError) no such column: i [SQL: 'select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;']

但如果我手动硬编码日期,它可以正常工作:

for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '2012-12-26' group by CONTACT_WID;", globals())
    print(temp)

但上述方法效率不高,因为实际程序的日期列表要大得多。任何建议都非常感谢,谢谢

【问题讨论】:

    标签: python sql pandas pandasql


    【解决方案1】:

    这是因为您将“i”变量直接包含在 SQL 字符串中,因此 Python 假定它是字符串的一部分并且变量不会被计算(您可以注意到在错误消息中 i 变量不会被它的价值)。我建议您阅读一些有关使用 Python 字符串和变量的信息。在那之前,试试这个:

    for i in periodsList:
        query = "select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '{}' group by CONTACT_WID;".format(i)
        temp = sqldf(query, globals())
    

    花括号用作变量的占位符,format() 方法可以用变量值替换占位符。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多