【问题标题】:string (file1.txt) search from file2.txt字符串 (file1.txt) 从 file2.txt 搜索
【发布时间】:2016-06-12 15:11:14
【问题描述】:

file1.txt 包含用户名,即

tony  
peter  
john  
...

file2.txt包含用户详细信息,每个用户详细信息一行,即

alice 20160102 1101 abc  
john 20120212 1110 zjc9  
mary 20140405 0100 few3  
peter 20140405 0001 io90  
tango 19090114 0011 n4-8  
tony 20150405 1001 ewdf  
zoe 20000211 0111 jn09  
...

我想从file2.txt 获得用户详细信息的候选名单,由file1.txt 用户提供,即

john 20120212 1110 zjc9  
peter 20140405 0001 io90  
tony 20150405 1001 ewdf  

如何使用python来做到这一点?

【问题讨论】:

  • 如果每行以四个空格开头,它将以代码格式呈现 - 或者您可以使用 {} 按钮在降价编辑器中格式化突出显示的代码。
  • SO 既不是代码编写也不是教程服务。请学习How to Ask
  • 请阅读python filesstrings。如果您在编程时遇到错误,请提出问题。

标签: python string search


【解决方案1】:

你可以使用.split(' '),假设file2.txt中的名称和其他信息之间总是有空格

这是一个例子:

UserList = []

with open("file1.txt","r") as fuser:
        UserLine = fuser.readline()
        while UserLine!='':
            UserList.append(UserLine.split("\n")[0])    # Separate the user name from the new line command in the text file.
            UserLine = fuser.readline() 

InfoUserList = []   
InfoList = []

with open("file2.txt","r") as finfo:
        InfoLine = finfo.readline()
        while InfoLine!='':
            InfoList.append(InfoLine)
            line1 = InfoLine.split(' ')
            InfoUserList.append(line1[0])   # Take just the user name to compare it later
            InfoLine = finfo.readline()

for user in UserList:
    for i in range(len(InfoUserList)):
        if user == InfoUserList[i]:
            print InfoList[i]

【讨论】:

    【解决方案2】:

    你可以使用pandas:

    import pandas as pd
    
    file1 = pd.read_csv('file1.txt', sep =' ', header=None)
    file2 = pd.read_csv('file2.txt', sep=' ', header=None)
    
    shortlist = file2.loc[file2[0].isin(file1.values.T[0])]
    

    它会给你以下结果:

           0         1     2     3
    1   john  20120212  1110  zjc9
    3  peter  20140405     1  io90
    5   tony  20150405  1001  ewdf
    

    上面是DataFrame 将其转换回数组只需使用shortlist.values

    【讨论】:

      【解决方案3】:
      import pandas as pd
      
      df1 = pd.read_csv('df1.txt', header=None)
      df2 = pd.read_csv('df2.txt', header=None)
      df1[0] = df1[0].str.strip() # remove the 2 whitespace followed by the feild
      df2 = df2[0].str[0:-2].str.split(' ').apply(pd.Series) # split the word and remove whitespace
      df = df1.merge(df2)
      
      Out[26]: 
             0         1     2     3
      0   tony  20150405  1001  ewdf
      1  peter  20140405  0001  io90
      2   john  20120212  1110  zjc9
      

      【讨论】:

        猜你喜欢
        • 2022-07-27
        • 1970-01-01
        • 2021-06-24
        • 2017-09-07
        • 2016-02-20
        • 2017-02-19
        • 2021-10-20
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多