【发布时间】:2014-06-22 11:36:15
【问题描述】:
我已经尝试了很多,但这根本不会发生。
输入:-
condor t airline airline
eight n 0 flightnumber
nine n 0 flightnumber
five n 0 flightnumber
hallo t 0 sentence
turn t com turn_heading
left t 0 direction
heading t com turn_heading
three n 0 degree_absolute
two n 0 degree_absolute
zero n 0 degree_absolute
预期输出:
<s> <callsign> <airline> condor </airline> <flightnumber> eight nine five </flightnumber> </callsign> hallo <command="turn_heading"> turn <direction> left </direction> heading <degree_absolute> three two zero </degree_absolute> </command> </s>
每次我尝试输入内容时,选项卡都会妨碍对字符串进行标记,即使我将它们作为列表或字符串输入也是如此。这就是我尝试剥离标签时发生的情况
['condor\tt\tairline\tairline\n', 'eight\tn\t \tflightnumber\n', 'nine\tn\t \tflightnumber\n', 'five\tn\t \tflightnumber\n', 'hallo\tt\t \tsentence\n', 'turn\tt\tcom\tturn_heading\n', 'left\tt\t \tdirection\n', 'heading\tt\tcom\tturn_heading\n', 'three\tn\t \tdegree_absolute\n', 'two\tn\t \tdegree_absolute\n', 'zero\tn\t \tdegree_absolute\n', '\n', 'aeh\tt\t \tsentence\n', 'two\tn\t \tflightnumber\n', 'eight\tn\t \tflightnumber\n', 'november\tt\tflightnumber\tflightnumber\n', 'hallo\tt\t \tsentence\n', 'reduce\tt\tcom\treduce\n', 'two\tn\t \tspeed\n', 'two\tn\t \tspeed\n', 'zero\tn\t \tspeed\n', 'knots\tt\t \treduce\n', '\n', 'condor\tt\tairline\tairline\n', 'eight\tn\t \tflightnumber\n', 'nine\tn\t \tflightnumber\n', 'five\tn\t \tflightnumber\n', 'descend\tt\tcom\tdescend\n', 'three\tn\t \taltitude\n', 'thousand\tn\t \taltitude\n', 'feet\tt\t \tdescend\n', 'turn\tt\tcom\tturn_heading\n', 'left\tt\t \tdirection\n', 'heading\tt\tcom\tturn_heading\n', 'two\tn\t \tdegree_absolute\n', 'six\tn\t \tdegree_absolute\n', 'zero\tn\t \tdegree_absolute\n', 'cleared\tt\tcom\tcleared_ils\n', 'ils\tt\t \tcleared_ils\n', 'runway\tt\t \tcleared_ils\n', 'two\tn\t \trunway\n', 'three\tn\t \trunway\n', 'left\tt\t \trunway\n', 'turn\tt\tcom\tturn_heading\n', 'left\tt\t \tdirection\n', 'heading\tt\tcom\tturn_heading\n', 'two\tn\t \tdegree_absolute\n', 'five\tn\t \tdegree_absolute\n', 'zero\tn\t \tdegree_absolute\n']
任何帮助,以便我可以剥离标签并将它们标记化并将它们转换为标记格式??
我用来删除控制字符的代码:
import string
with open('input.txt', 'r') as file1:
lines = str(list(file1))
print lines.translate(string.maketrans("\n\t\r", " "))
【问题讨论】:
-
查看post删除特定控制字符
-
@KobiK Nope 仍然无法正常工作。仍然给出相同的输出。
-
分享你的代码,这是知道你的问题出在哪里的唯一方法。
-
@KobiK import string with open('input.txt', 'r') as file1:lines = str(list(file1)) print lines.translate(string.maketrans("\n\ t\r", ""))
-
你为什么不用the
csvmodule?
标签: python string tokenize markup