如何从文件列表开始创建电影数据库答案

【问题标题】：How to create a movie database starting from a list of files如何从文件列表开始创建电影数据库
【发布时间】：2014-05-14 21:50:18
【问题描述】：

我的家庭服务器上有大量电影（大约 4000 部）。这些文件都命名为Title - Subtitle (year).extension。我想为我所有的电影创建一个数据库（即使在 excel 中也可以）。数据库应包含以下列：标题、副标题（如果存在）、服务器上文件的年份和位置（某些电影按类型或演员组织在文件夹中）。到目前为止，我有一个 bash 脚本，它只返回一个 txt 文件，其中包含每个硬盘驱动器的文件列表（每个文件都包含每个硬盘驱动器的列表）。如何在我的家庭服务器（运行 debian）上自动创建这种数据库？

使用一些电影数据库api自动检索有关电影的其他信息也很好，但我想这会很复杂。

【问题讨论】：

标签： python database debian

【解决方案1】：

这是一个相当广泛的问题，在这里并不合适（这更像是一个教程而不是一个快速代码问题），但这里有一些战略建议：

Excel 将打开一个 .csv 文件并将逗号/换行符视为单元格。所以
您需要对目录进行迭代，也许是递归的
扩展路径名——如果你使用像 Python 这样的高级语言，这可以通过标准函数来实现；然后使用正则表达式解析最后一位
将每个路径的格式化内容存储为列表中的行
将该列表打印到文本文件，用逗号连接每个元素，用换行符连接每行
为所述文件提供 .csv 后缀并在 Excel 中打开

请注意，如果您真的想要一个合适的数据库，Python 又是一个不错的选择——SQLite 是标准安装的一部分。

干杯，祝你好运

更新：哈哈，您在我回答时编辑了问题。似乎您需要的一切都在文件名中，但如果您打算使用元数据，请注意。如果元数据并非都来自同一个来源，那么从文件中提取元数据会变得更加棘手；并非每种媒体类型都具有相同的元数据结构，并非每个创建文件的应用程序都提供相同的元数据结构。因此，获取元数据的逻辑可能会变得混乱。

您是否有理由不能使用现有程序来执行此操作？

最后你提到在你的网络服务器上安装它；再次遵循 Python，标准包中也内置了向您的服务器发出所需请求的能力。

最终更新

无法帮助您使用 bash；我很喜欢，我也不是 Python 专家，但你的目标很简单。我还没有对此进行测试——可能有一两个错字，认为它是伪代码，主要是 Python 就绪的。

# import the standard libraries you'll need
import os # https://docs.python.org/2/library/os.html
import re # https://docs.python.org/2/library/re.html

# this function will walk your directories and output a list of file paths
def getFilePaths(directory):
    file_paths = []
    for root, directories, files in os.walk(directory):
        for filename in files:
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)
    return file_paths



video_file_paths = getFilePaths("path/to/video/library")
output_to_csv = [];
for video_file in video_file_paths:
    base_path, fname = os.path.split(video_file) 

     """ This is a super simple bit of regex that, provided  your files are all formatted as
     written, will parse out title, subtitle, year and file extension. If your file names
     turn out to have more exceptions than you expect (I'd be shocked if not), you may need
     to make this part more robust, either with much more savvy regex, or else some conditional
     logic—maybe a recursive try... catch loop"""
    reg_ex = re.compile("/^(.*) - (.*) \((.*)\)\.(.*)$/");

    # now apply the compiled regex to each path
    name_components = reg_ex.match(fname);

    """Each output is a row of your CSV file; .join() will join the 4 elements of the regex
    match (assuming, again, that your filenames are as clean as you claim), and then add
    the basepath, so you should be building, in this loop, a list with elements like:
    title, subtitle, year, file_extension, full path"""

    output_to_csv.append("{0},{1}".format(name_components.join(","), base_path));

#create the file, making sure the location is writeable
csv_doc = open("my_video_database.csv", "w");

# now join all the rows with line breaks and write the compiled text to the file
csv_doc.write( ouput_to_csv.join("\n") ); 

#close  your new database
csv_doc.close()

【讨论】：

感谢您的回答！我真正需要的是一个 bash 脚本，它创建一个包含我提到的数据的行和列的输出文件：文件的标题、副标题、年份和目录。我可以像你说的那样轻松创建一个 .csv，但我不知道如何将所有这些信息收集到一个文件中，也不知道如何创建行和列。希望这能解释得更好一点。
我通过一个小 Python 脚本来帮助您入门。抱歉，不太了解bash。不过，祝你好运；我花了一周时间编写一个脚本来查找我所有的音乐，提取元标签，将每首歌曲转换为 WAV，然后重新应用元标签，以确保我所有的文件格式/比特率相同，并具有相同的信息.. ..我们只是说事实证明这比我预期的要多得多。祝你好运！
这令人印象深刻。我将在 24 小时内对其进行测试，nad 将提供反馈。非常感谢你！
非常欢迎。但是在你发现它是否有效之后感谢我，哈哈。 ;)
我在代码顶部添加了这一行来修复错误：# coding: utf-8。我收到以下错误：Traceback (most recent call last): File "listfilms.py", line 20, in <module> base_path, fname = os.path.split(ipath) NameError: name 'ipath' is not defined