字符串 (file1.txt) 从 file2.txt 搜索答案

【问题标题】：string (file1.txt) search from file2.txt字符串 (file1.txt) 从 file2.txt 搜索
【发布时间】：2016-06-12 15:11:14
【问题描述】：

file1.txt 包含用户名，即

tony  
peter  
john  
...

file2.txt包含用户详细信息，每个用户详细信息一行，即

alice 20160102 1101 abc  
john 20120212 1110 zjc9  
mary 20140405 0100 few3  
peter 20140405 0001 io90  
tango 19090114 0011 n4-8  
tony 20150405 1001 ewdf  
zoe 20000211 0111 jn09  
...

我想从file2.txt 获得用户详细信息的候选名单，由file1.txt 用户提供，即

john 20120212 1110 zjc9  
peter 20140405 0001 io90  
tony 20150405 1001 ewdf

如何使用python来做到这一点？

【问题讨论】：

如果每行以四个空格开头，它将以代码格式呈现 - 或者您可以使用 {} 按钮在降价编辑器中格式化突出显示的代码。
SO 既不是代码编写也不是教程服务。请学习How to Ask。
请阅读python files和strings。如果您在编程时遇到错误，请提出问题。

标签： python string search

【解决方案1】：

你可以使用.split(' ')，假设file2.txt中的名称和其他信息之间总是有空格

这是一个例子：

UserList = []

with open("file1.txt","r") as fuser:
        UserLine = fuser.readline()
        while UserLine!='':
            UserList.append(UserLine.split("\n")[0])    # Separate the user name from the new line command in the text file.
            UserLine = fuser.readline() 

InfoUserList = []   
InfoList = []

with open("file2.txt","r") as finfo:
        InfoLine = finfo.readline()
        while InfoLine!='':
            InfoList.append(InfoLine)
            line1 = InfoLine.split(' ')
            InfoUserList.append(line1[0])   # Take just the user name to compare it later
            InfoLine = finfo.readline()

for user in UserList:
    for i in range(len(InfoUserList)):
        if user == InfoUserList[i]:
            print InfoList[i]

【讨论】：

【解决方案2】：

你可以使用pandas:

import pandas as pd

file1 = pd.read_csv('file1.txt', sep =' ', header=None)
file2 = pd.read_csv('file2.txt', sep=' ', header=None)

shortlist = file2.loc[file2[0].isin(file1.values.T[0])]

它会给你以下结果：

       0         1     2     3
1   john  20120212  1110  zjc9
3  peter  20140405     1  io90
5   tony  20150405  1001  ewdf

上面是DataFrame 将其转换回数组只需使用shortlist.values

【讨论】：

【解决方案3】：

import pandas as pd

df1 = pd.read_csv('df1.txt', header=None)
df2 = pd.read_csv('df2.txt', header=None)
df1[0] = df1[0].str.strip() # remove the 2 whitespace followed by the feild
df2 = df2[0].str[0:-2].str.split(' ').apply(pd.Series) # split the word and remove whitespace
df = df1.merge(df2)

Out[26]: 
       0         1     2     3
0   tony  20150405  1001  ewdf
1  peter  20140405  0001  io90
2   john  20120212  1110  zjc9

【讨论】：