【发布时间】:2021-05-02 22:11:19
【问题描述】:
Speaker ID Utterances
0 S1 [alright Sue now it's like uh i dropped like C...
1 S2 [this year? this term?, ri- oh but you dropped...
2 S3 [yeah. hi, hi, yeah i already signed [S2: okay...
3 S4 [back in i was like w- what is that?, yeah and...
4 S5 [okay well i'm not here for a drop-add class [...
5 S6 [me, yeah. that's right, i have a question lik...
6 S7 [hello, hi, what was your name?, i thought i o...
实际上,最终目标是创建一个新列,其中“话语”列下的所有内容都已删除标点符号并已被标记化。我只需要先把字符串列表转成字符串,对吧?
附:我知道格式很奇怪,但我不知道如何解决这个问题,而且我还没有在任何地方找到答案。如果有人能告诉我我应该如何包含我正在使用的文本以便它看起来不奇怪,那就太好了。谢谢!
【问题讨论】:
-
df.to_dict()在此处发布干净的示例数据。 -
df.Utterances.str.join(SEP),其中SEP是单词之间所需的分隔符。