在处理字幕的时候,linux的编码格式转换很烦。
步骤: 用python先判断 其编码,再用iconv 转编码,再用awk处理格式。
file不能判断吗?file有时不准。
1.python判断编码
$ cat t1.py # -*- coding:utf8 -*- import sys #f1=open(sys.argv[2],'w') with open(sys.argv[1], 'rb') as f: for line in f: # 转码,因为文件内的编码不一致 try: line = line.decode('utf-8') except: try: line = line.decode('GB2312') #right print('hehe') except: try: line = line.decode('gbk') print('hehe1') except: try: line = line.decode('GB18030') print('hehe2') except: try: line = line.decode('iso-8859-1') #wrong except: continue line = line.strip() # 去除首尾的空格tab回车换行 print(line) #f1.write(line)