删除两个指定索引之间的可变数量的索引答案

【问题标题】：Removing a variable number of indexes between two specified indexes删除两个指定索引之间的可变数量的索引
【发布时间】：2021-03-07 19:35:10
【问题描述】：

所以我第一次尝试弄乱我从 kanka.io API 中提取的 json。我正在尝试删除 'entry' 与 'section' 或 'entry_parsed' 之间的所有索引，以便确定 ID 是否属于字符或属性，并仅将字符名称附加到列表中。

为了在 python 导师的现场编程模式下进行测试，我已经缩短了我将 json 转换为的列表本身。

# Request data from URL
response = requests.request("GET", url, headers=headers, data=payload)
# Open data
rtext=response.text
# Clean data
punct = ['{','}','[',']','\"',':',',']
rt = ""
for item in rtext:
    if item in punct:
        rt+=str(' ')
    else:
        rt+=str(item)
# Itemize string of text
rsplit = rt.split()
#rsplit = [
#'id', '260405', 'name', 'Frank', 'Burns', 'entry', 'null', 'entry_parsed', 'traits', 
#'id', '260406', 'name', 'Henry', 'Blake', 'entry', 'null', 'entry_parsed', 'null', 'image', 'null', 
#'id', '260407', 'name', 'Margret', 'Houlihan', 'entry', 'null', 'entry_parsed', 'null', 'image', 'true', 'is_private', 'true',  
#'id', '260408', 'name', 'John', 'MacInyre', 'entry', '\\n<p>Graduate', 'of', 'Darthmouth.<\\/p>\\n<p>\\u00a0<\\/p>\\n', 'entry_parsed',
#'id', '260409', 'name', 'Walter', 'O\'Reilly', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https',
#'id', '260410', 'name', 'Benjiam', 'Franklin', 'Pierce', 'entry', 'null', 'entry_parsed', 'null', 'image', 'image_full', 'https', 
#'id', '165148', 'name', 'Eyes', 'entry', 'Blue', 'section', 'appearance', 'is_private', 'false', 'default_order', '1', 
#'id', '260411', 'name', 'Francis', 'Mulcahy', 'entry', 'null', 'entry_parsed', 'null',
#]

#########
# NAMES #
#########
# Append character names into list
this1=0
# Cycle throught all the words
while this1 < len(rsplit):
  next1 = this1+1
  last1 = this1-1
# Stop at the first element after 'name'
  if rsplit[last1] == "name":
# Read and concatenate elements until the element 'entry'
    while rsplit[next1] != "entry":  
      nextword = rsplit[next1]
      rsplit[this1]+='_'+nextword
# Remove redundant elements by replacing next with last
      rsplit[next1]=rsplit[this1]
      rsplit.remove(rsplit[this1]) 

# Remove words inbetween entry and (entry_parsed or section)
    if rsplit[this1] == "entry":
      while rsplit[next1] != ("entry_parsed" or "section"):
        rsplit.remove(rsplit[descWord])
    print(rsplit[this1:next1+4])
    
  this1+=1

我希望它从打印行打印的是

['Frank_Burns', 'entry', 'entry_parsed', 'traits']
['Henry_Blake', 'entry', 'entry_parsed', 'null']
['Margret_Houlihan', 'entry', 'entry_parsed', 'null']
['John_MacInyre', 'entry','entry_parsed']
["Walter_O'Reilly", 'entry', 'entry_parsed', 'null']
['Benjiam_Franklin_Pierce', 'entry', 'entry_parsed', 'null']
['Eyes', 'entry', 'section', 'appearance']
['Francis_Mulcahy', 'entry', 'entry_parsed', 'null']

我尝试了不同的变体，其中 entry 后的索引是 == this1、last1、next1，但它们都没有真正删除 'entry' 和 'entry_parsed' 或 'section' 之间的索引对象。我也试过了

if rsplit[this1] == "entry":
      while not rsplit[next1] == "entry_parsed" or "section":

它仍然不断打印出'null'或'Blue'等。

【问题讨论】：

您究竟是如何获得rsplit的？看起来它曾经是一本字典，但您设法以某种方式将其转换为列表，现在您很难从列表中提取值，而这在使用字典时会很容易？
这是我用来在我的代码中实际填充 rsplit 的内容。 `# 从 URL 请求数据 response = requests.request("GET", url, headers=headers, data=payload) # 打开数据 rtext=response.text # 清理数据 punct = ['{','}','[ ',']','\"',':',','] rt = "" for item in rtext: if item in punct: rt+=str(' ') else: rt+=str(item) # Itemize文本字符串 rsplit = rt.split()'
是的。不要用空格替换 JSON 语法并将字符串拆分为列表，而是将响应解析为 JSON。
您能否编辑问题并显示您想要得到的结果而不是当前打印的结果？
见stackoverflow.com/questions/6386308/…

标签： python json python-3.x python-requests

【解决方案1】：

根据 cmets 中的信息，您希望执行以下操作：

向 kanka.io API 发出请求
将响应解析为 JSON，需要字典列表
选择那些有关键字'entry_parsed'的字典
为所有选定的字典创建'name' 值列表

因此，您应该只保留¹ 代码的第一行（发出请求）并废弃其余部分，并改用它：

# 1. Request data from URL
response = requests.get(url, headers=headers, data=payload)

# 2. parse as JSON
data = response.json()

# 3. + 4. list of 'name' values for all dicts having 'entry_parsed'
names = [d['name'] for d in data if 'entry_parsed' in d]

¹不用requests.request('GET', ...)，你可以直接用requests.get(...)。

【讨论】：

【解决方案2】：

我能够使用以下代码仅提取名称值（以及 'entity_id' 和 'tags' 的值）。

# Request data from URL
response = requests.get(url, headers=headers)

rj = response.json()

# using .items() allowed me to keep the tuples together so I could call keys to get their values

ri = rj.items()

name=[]
enid=[]
tags=[]

for i in ri:
  for j in i:
    for k in j:
      # since there is string metadata at the beginning and end of ri I narrowed it down to only the dictionaries which contained the values I needed
      if type(k) == dict:
        name.append(k['name'])
        enid.append(k['entity_id'])
        tags.append(k['tags'])

由于数据保留为字典而不是字符串，我不需要使用 'entity_parsed' 或 'section' 来识别字符与属性，因为属性的 'id' 和 'name' 是 'traits 的值'键。

非常感谢@mkrieger1 为我指明了正确的方向！

【讨论】：