【问题标题】:Find list of strings in UTF-16 hex bin file and record their offset positions在 UTF-16 hex bin 文件中查找字符串列表并记录它们的偏移位置
【发布时间】:2021-02-16 20:22:50
【问题描述】:

让我先说我对代码非常缺乏经验。十多年前我参加了一些课程,并且可以记住一些基本原则,但仅此而已。我没有我熟悉或积极使用的语言。无论如何,关于我的问题。

我有一个我试图在一个大型 .bin 名称主文件中找到的选定名称列表,并为每个名称记录它们的偏移位置。每个名称也可能有多个匹配项,因此我需要它在新列中记录每个位置(假设某种表格输出)。

我可以使用 HxD 或 HexEditorNeo 等十六进制编辑器打开 .bin 文件,并在“解码文本”部分查看名称。该文件采用 UTF-16 格式,因此 HexEditorNeo 让我设置该编码以删除“。”每个字符之间(不是实际的句点,而是它如何表示 00 个空字符)。

我可以使用查找工具搜索名称,并且可以查看和复制偏移量。但是,我有几千个名字,所以手工操作非常繁琐。

以下是我想要的输入文件和所需输出的示例:

Selected_Names.txt

John Williams
Howard Shore
Hans Zimmer

Master_Name_File.bin

47 00 61 00 6E 00 64 00 61 00 6C 00 66 00 00 00
48 00 6F 00 77 00 61 00 72 00 64 00 20 00 53 00 
68 00 6F 00 72 00 65 00 00 00 44 00 61 00 72 00 
6B 00 20 00 4B 00 6E 00 69 00 67 00 68 00 74 00 
00 00 48 00 61 00 6E 00 73 00 20 00 5A 00 69 00 
6D 00 6D 00 65 00 72 00 00 00 4C 00 75 00 6B 00 
65 00 20 00 53 00 6B 00 79 00 77 00 61 00 6C 00 
6B 00 65 00 72 00 00 00 4A 00 6F 00 68 00 6E 00 
20 00 57 00 69 00 6C 00 6C 00 69 00 61 00 6D 00 
73 00 00 00 48 00 6F 00 77 00 61 00 72 00 64 00 
20 00 53 00 68 00 6F 00 72 00 65 00 00 00 48 00 
61 00 6E 00 73 00 20 00 5A 00 69 00 6D 00 6D 00 
65 00 72 00 00 00 00 00 00 00 00 00 00 00 00 00

G.a.n.d.a.l.f...
H.o.w.a.r.d. .S.
h.o.r.e...D.a.r.
k. .K.n.i.g.h.t.
..H.a.n.s. .Z.i.
m.m.e.r...L.u.k.
e. .S.k.y.w.a.l.
k.e.r...J.o.h.n.
 .W.i.l.l.i.a.m.
s...H.o.w.a.r.d.
 .S.h.o.r.e...H.
a.n.s. .Z.i.m.m.
e.r.............

期望的输出

John Williams, 00 00 00 78
Howard Shore, 00 00 00 10, 00 00 00 94
Hans Zimmer, 00 00 00 42, 00 00 00 AE

我试图思考这在代码中可能是什么样子,并想出了以下伪代码:

// get list of names to search for in array
nameArray = read file of selected names to search for // this is from a txt list
nameCount = length(nameArray)
nameCounter = 0

// get master name file
masterNameArray = read master file of names to search within  // this is the hex file in UTF-16
masterNameCount = length(masterNameArray)

// loop through each name we're searching for
while nameCounter <= nameCount

     // start the position over at 0 for each new name we are searching
     offset = 0
     match = 0

     // loop through each position of the nameArray
     while offset <= masterNameCount

          if nameArray(nameCounter) == masterNameArray(nameCounter)  // check if names match. THIS IS HEX, though, so a straight check can't be done. need to convert, as well as account for how much of the array to check (i.e. name length)

               // record current offset position. record in new column for each match, since there may be multiple matches
               masterNamePosition(nameCounter,match) = offset
               match = match + 1
          end if

          offset = offset + 1
     end while

     nameCounter = nameCounter + 1

end while

write masterNamePosition to file

感谢任何愿意阅读本文并提供帮助的人!这对我来说意义重大!

【问题讨论】:

    标签: search hex offset utf-16


    【解决方案1】:
    #!/usr/bin/python3
    
    name_file = open('Master_Name_File.bin', 'rb').read()
    names = open('Selected_Names.txt').read().splitlines()
    
    def h(n):
      s = '%08X' % n
      return ' '.join([s[i:i + 2] for i in range(0, len(s), 2)])
      
      
    for n in names:
      d = n.encode('utf-16le')
      indexes = []
      i = 0
      while i >= 0:
        i = name_file.find(d, i)
        if i >= 0:
          indexes += [i]
          i += 1
      if indexes:
        print(f'{n}, {", ".join([h(i) for i in indexes])}')
    

    输出:

    John Williams, 00 00 00 78
    Howard Shore, 00 00 00 10, 00 00 00 94
    Hans Zimmer, 00 00 00 42, 00 00 00 AE
    

    【讨论】:

      猜你喜欢
      • 2016-01-20
      • 1970-01-01
      • 2019-05-23
      • 1970-01-01
      • 2020-09-06
      • 2018-03-06
      • 2019-12-05
      • 1970-01-01
      • 2015-06-12
      相关资源
      最近更新 更多