【问题标题】:Tuple Index Out of Range - but only when run in a function元组索引超出范围 - 但仅在函数中运行时
【发布时间】:2021-08-12 21:28:25
【问题描述】:

我正在尝试使用 SQLParse 编写一个过程,该过程列出了 SQL 语句中存在的表,目前专注于查询的“FROM”子句。我还在尝试识别 FROM 子句中的嵌套查询(或子查询)并再次运行该过程以识别该嵌套查询中的表。

使用此示例查询

from sqlparse.sql import IdentifierList, Identifier, Function, Where, Parenthesis, TokenList
from sqlparse.tokens import Keyword, DML, Punctuation

sql_2 = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""

以下是正在运行的代码:

full_tables = []
tables = []

from_seen = False
for item in parsed.tokens:
    
    #stop the process if the Where statement is reached
    if isinstance(item, Where):
            from_seen = False
    
    if from_seen:
     
        #multiple tables with Join statements in between, or one table. Doesn't consider subqueries
        if isinstance(item, Identifier):
            
            #checks to see if there is a parenthesis, meaning a subquery 
            if 'SELECT' in item.value.upper():
                subquery = item.value
                
            #returns the db name 
            tables.append(item.get_parent_name())
            
            #returns the table name
            tables.append(item.get_real_name())

            #returns the alias
            tables.append(item.get_alias())
            
            full_tables.append(tables)
            tables = []
        
        
        # if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                #returns the db name
                tables.append(identifier.get_parent_name())
                
                #returns the table name
                tables.append(identifier.get_real_name())
                
                #returns the alias
                tables.append(identifier.get_alias())
                
                full_tables.append(tables)
                tables = []
                 
    else:
        if item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True
       
print(full_tables)
print(len(full_tables))

这从查询开始,并通过搜索单词 select 来标识子查询,然后我就有了。

#process of removing outer-most parentheses and identifying aliases that sit outside that window

#new subquery string ready to parse
res_sub = ""

#capture the alias
alias = ""

#record the number of parentheses as they open and close
paren_cnt = 0


for char in subquery:
    
    #if ( and there's already been a ( , include it
    if char == '(' and paren_cnt > 0:
        res_sub += char
    
    #if (, add to the count
    if char == '(':
        paren_cnt += 1
   
    # if ) and there's at least 2 (, include it
    if char == ')' and paren_cnt > 1:
        res_sub += char
          
    # if ), subtract from the count        
    if char == ')':
        paren_cnt -= 1
    
    # capture the script
    if char != '(' and char != ')' and paren_cnt >0:
        res_sub += char
    
    # capture the alias
    if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
        alias += char
        
subparsed = sqlparse.parse(res_sub)[0]

然后这会删除最外层的括号并解析为新的 SQL 语句。这一切正常,如果我通过前面的代码块手动运行这个解析语句,它会按预期工作。

然后我尝试将其放入单独的函数中:

  • 首先解析查询并调用:
  • 一个扫描 FROM 子句并返回表的函数,但如果它标识一个子查询,它会调用:
  • 一个函数,它删除脚本最外层的括号,然后调用第一个函数将其通过进程发回。

但是当它尝试运行sqlparse.parse(res_sub)[0] 时会发生元组索引超出范围。它不应该是一个元组,它应该是一个 str 然后被解析成 sqlparse.sql.Statement。

我不明白为什么它的行为不同只是因为我将它放入了一系列函数中。函数代码如下:

def parse(sql):
    
    parsed = sqlparse.parse(sql)[0]
        
    #call function to assess the FROM statement of the query
    assess_from_clause(parsed)

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        #checks to see if there is a parenthesis, meaning a subquery 
        if 'SELECT' in item.value.upper():
            subquery = item.value
            subquery_parsing(subquery)
        
        if from_seen:
            #multiple tables with Join statements in between, or one table. Doesn't consider subqueries
            if isinstance(item, Identifier):
                
                #returns the db name 
                tables.append(item.get_parent_name())
            
                #returns the table name
                tables.append(item.get_real_name())

                #returns the alias
                tables.append(item.get_alias())
            
                full_tables.append(tables)
                tables = []
        
        
            # if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
            if isinstance(item, IdentifierList):
                for identifier in item.get_identifiers():
                    #returns the db name
                    tables.append(identifier.get_parent_name())
                
                    #returns the table name
                    tables.append(identifier.get_real_name())
                
                    #returns the alias
                    tables.append(identifier.get_alias())
                
                    full_tables.append(tables)
                    tables = []
                 
        else:
            if item.ttype is Keyword and item.value.upper() == 'FROM':
                from_seen = True
       
    print(full_tables)

def subquery_parsing(subquery):
    
    #new subquery string ready to parse
    res_sub = ''

    #capture the alias
    alias = ''

    #record the number of parentheses as they open and close
    paren_cnt = 0


    for char in subquery:
        #if ( and there's already been a ( , include it
        if char == '(' and paren_cnt > 0:
            res_sub += char
    
        #if (, add to the count
        if char == '(':
            paren_cnt += 1
   
        # if ) and there's at least 2 (, include it
        if char == ')' and paren_cnt > 1:
            res_sub += char
          
        # if ), subtract from the count        
        if char == ')':
            paren_cnt -= 1
    
        # capture the script
        if char != '(' and char != ')' and paren_cnt >0:
            res_sub += char
    
        # capture the alias
        if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
            alias += char
    
    parse(res_sub)

我应该强调我并不精通 Python,而且我在学习过程中学习的很多!

谢谢

【问题讨论】:

  • “工作”sn-p 中的 for item in parsed.tokens: 循环中没有任何内容。
  • 请修正您的代码 sn-ps,使缩进与您的实际代码相匹配。
  • 它所指的元组是sqlparse.parse() 返回的元组。您正在尝试使用[0] 对其进行索引。错误表示元组为空。
  • 为什么"""(""" 使用三引号?多行字符串只需要三引号。
  • 在每次调用sqlparse.parse()之前尝试打印sql,这样您就可以看到是哪一个导致了错误。

标签: python function tuples sql-parser


【解决方案1】:

我相信我现在已经解决了,触发第三个函数的部分触发太早,没有解析代码的子查询。

我已经改变了:

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        #checks to see if there is a parenthesis, meaning a subquery 
        if 'SELECT' in item.value.upper():
            subquery = item.value
            subquery_parsing(subquery)
        
        if from_seen:

到这里:

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        if from_seen:
        
            #checks to see if there is a parenthesis, meaning a subquery 
            if 'SELECT' in item.value.upper():
                subquery = item.value
                subquery_parsing(subquery)

抱歉,目前对我来说这是一个非常反复试验的学习过程,感谢 Barmar 提供的 cmets。

【讨论】:

    【解决方案2】:

    这对我的库SQLGlot来说是微不足道的

    import sqlglot
    import sqlglot.expressions as exp
    
    sql = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""
    
    for column in sqlglot.parse_one(sql).find_all(exp.Table):
        print(column.text("this"))
    
    fake_Table
    test_table
    

    【讨论】:

      猜你喜欢
      • 2023-04-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-07-28
      • 2021-03-02
      • 2018-11-16
      • 2016-04-04
      相关资源
      最近更新 更多