*.ics 使用多行拆分字符串的问题 *Python*答案

【问题标题】：issue with *.ics splitting strings with more than one line *Python**.ics 使用多行拆分字符串的问题 *Python*
【发布时间】：2023-01-26 07:28:45
【问题描述】：

我尝试了尽可能多的方法，但总是得到相同的结果，但必须有解决办法吗？

我正在从网站下载 ICS，其中一行“摘要”被分成两部分。当我将它加载到一个字符串中时，这两行会自动连接成一个字符串，除非有“\n”。

所以我尝试同时替换“\n”和“\r”，但我的问题没有任何变化。

代码

from icalendar import Calendar, Event
from datetime import datetime
import icalendar
import urllib.request
import re
from clear import clear_screen

cal = Calendar()

def download_ics():
    url = "https://www.pogdesign.co.uk/cat/download_ics/7d903a054695a48977d46683f29384de"
    file_name = "pogdesign.ics"
    urllib.request.urlretrieve(url, file_name)

def get_start_time(time):
    time = datetime.strftime(time, "%A - %H:%M")
    return time

def get_time(time):
    time = datetime.strftime(time, "%H:%M")
    return time

def check_Summary(text):
    #newline = re.sub('[\r\n]', '', text)
    newline = text.translate(str.maketrans("", "", "\r\n"))
    return newline

def main():
    download_ics()
    clear_screen()
    e = open('pogdesign.ics', 'rb')
    ecal = icalendar.Calendar.from_ical(e.read())
    for component in ecal.walk():
        if component.name == "VEVENT":
            summary = check_Summary(component.get("SUMMARY"))
            print(summary)
            print("\t Start : " + get_start_time(component.decoded("DTSTART")) + " - " + get_time(component.decoded("DTEND")))

            print()
    e.close()

if __name__ == "__main__":
    main()

输出

小谢尔顿 S06E11 - 无情无牙和卧床一周开始 : 星期五 - 02:00 - 02:30

良医 S06E11 - 好孩子开始 : 星期二 - 04:00 - 05:00

国家宝藏：历史的边缘 S01E08 - 家谱开始 : 星期四 - 05:59 - 06:59

国家宝藏：历史的边缘 S01E09 与萨拉查的会面开始 : 星期四 - 05:59 - 06:59

最后生还者 S01E03 - 好久好久开始 : 星期一 - 03:00 - 04:00

最后生还者 S01E04 请牵着我的手开始 : 星期一 - 03:00 - 04:00

安妮·赖斯的梅菲尔女巫 S01E04 - Curiouser and Curiouser 开始 : 星期一 - 03:00 - 04:00

安妮赖斯的梅菲尔女巫 S01E05 - 奴隶开始 : 星期一 - 03:00 - 04:00

方舟 S01E01 - 每个人都想登上这艘船开始 : 星期四 - 04:00 - 05:00

我查看了各种解决方案，比如将文本转换为“utf-8”和“ISO-8859-8”。我尝试了在 icalendar 中找到的一些功能。甚至向 ChatGPT 寻求帮助。

正如您可能在输出的第一行看到的那样：小谢尔顿 S06E11 - 无情、无牙和一周的卧床休息和国家宝藏：历史的边缘 S01E09 相会与Salazar

下载的图片中的这两行位于两条单独的行上，我无法设法让它们分开，或者根本不加入......

【问题讨论】：

标签： python icalendar

【解决方案1】：

就 icalendar.Calendar 类而言，该 ical 格式不正确。

icalendar.Calendar.from_ical() 呼叫icalendar.Calendar.parser.Contentlines.from_ical() which is

    def from_ical(cls, ical, strict=False):
        """Unfold the content lines in an iCalendar into long content lines.
        """
        ical = to_unicode(ical)
        # a fold is carriage return followed by either a space or a tab
        return cls(uFOLD.sub('', ical), strict=strict)

其中 uFOLD 是 re.compile('( ? )+[ ]')

这意味着它会删除后跟一个空格或制表符的每一系列换行符——而不是用空格替换它。您正在检索的 ical 文件有例如

SUMMARY:Young Sheldon S06E11 - \nRuthless\, Toothless\, and a Week of
 Bed Rest

所以当匹配到of Bed时，它就变成了ofBed。

这个折行格式is defined in RFC 2445给出了例子

例如行：
DESCRIPTION:This is a long description that exists on a long line.
可以表示为：
DESCRIPTION:This is a lo
 ng description
  that exists on a long line.
这清楚地表明 from_ical() 中的实现是正确的。

如果您非常确定源代码总是会折叠单词上的行，则可以通过在每行折叠后添加一个空格来进行调整，例如：
    ecal = icalendar.Calendar.from_ical(e.read().replace(b'
 ', b'
  '))

【讨论】：