【发布时间】:2021-09-06 14:34:55
【问题描述】:
我在AWS SageMaker 上运行JupyterLab。内核:conda_amazonei_mxnet_p27
找到的字段数:saw 9 每次运行时递增 1。
错误: ParserError: Error tokenizing data. C error: Expected 2 fields in line 50, saw 9
代码:
调用(在此之前运行所有单元格时不会出现错误,但在运行时会出现错误):
train = open('train_textcorrupted.csv', 'a')
val = open('val.csv', 'a')
classes = open('classes.txt', 'a')
uni_label = 'Organisation\tUniversity'
n_pad = 4
for i in range(len(unis)-n_pad):
record = ' '.join(unis[i:(i+n_pad)])
full_record = f'{uni_label}\t{record}\n'
if random.random() > 0.9:
val.write(full_record)
else:
train.write(full_record)
classes.write(uni_label)
classes.close()
val.close()
train.close()
追溯:
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-8-89b1728bd5a6> in <module>
7 --gpus 1
8 """.split()
----> 9 run_training(args)
<ipython-input-5-091daf2638a1> in run_training(input)
55 csv_logger = pl.loggers.CSVLogger(save_dir=f'{args.modeldir}/csv_logs')
56 loggers = [logger, csv_logger]
---> 57 dm = OntologyTaggerDataModule.from_argparse_args(args)
58 if args.model_uri:
59 local_model_uri = os.environ.get('SM_CHANNEL_MODEL', '.')
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_lightning/core/datamodule.py in from_argparse_args(cls, args, **kwargs)
324 datamodule_kwargs.update(**kwargs)
325
--> 326 return cls(**datamodule_kwargs)
327
328 @classmethod
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pytorch_lightning/core/datamodule.py in __call__(cls, *args, **kwargs)
47
48 # Get instance of LightningDataModule by mocking its __init__ via __call__
---> 49 obj = type.__call__(cls, *args, **kwargs)
50
51 return obj
<ipython-input-3-66ee2be72e78> in __init__(self, traindir, train_file, validate_file, model_name, labels, batch_size)
30 print('tokenizer', tokenizer)
31 print('labels_file', labels_file)
---> 32 label_mapper = LabelMapper(labels_file)
33 self.batch_size = batch_size
34 self.num_classes = label_mapper.num_classes
<ipython-input-3-66ee2be72e78> in __init__(self, classes_file)
102
103 def __init__(self, classes_file):
--> 104 self._raw_labels = pd.read_csv(classes_file, header=None, sep='\t')
105
106 self._map = []
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
686 )
687
--> 688 return _read(filepath_or_buffer, kwds)
689
690
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
458
459 try:
--> 460 data = parser.read(nrows)
461 finally:
462 parser.close()
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
1196 def read(self, nrows=None):
1197 nrows = _validate_integer("nrows", nrows)
-> 1198 ret = self._engine.read(nrows)
1199
1200 # May alter columns / col_dict
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 2 fields in line 50, saw 9
classes.txt(制表符分隔)运行前
Activity Event
Actor Person
Agent Person
Album Product
Animal Object
ArchitecturalStructure Location
Artist Person
Athlete Person
AutomobileEngine Product
Award Object
Biomolecule Object
Bird Object
BodyOfWater Location
Building Location
ChemicalSubstance Object
Company Organisation
Competition Event
Device Product
Disease Object
District Location
Eukaryote Object
Event Event
Film Object
Food Object
Language Object
Location Location
MeanOfTransportation Product
MotorsportSeason Event
Municipality Location
MusicalWork Product
Organisation Organisation
Painter Person
PeriodicalLiterature Product
Person Person
PersonFunction Person
Plant Object
Poet Person
Politician Person
River Location
School Organisation
Settlement Location
Software Product
Song Product
Species Object
SportsSeason Event
Station Location
Town Location
Village Location
Writer Person
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
Organisation University
【问题讨论】:
-
请发一个minimal reproducible example——没有那个,我们只能猜测。
-
感谢您发布代码。但是,(1)请将代码缩减到其基本核心——您发布的大部分内容与您所看到的问题无关。 (2)您还需要提供输入数据(再次:尽可能减少,同时仍然重现问题)。
-
当然,泰寻求帮助。下载数据集以便我可以添加更多详细信息以发布@KonradRudolph
-
我现在都添加了@KonradRudolph
-
请不要在问题中回答你自己的问题——而是把它放到一个答案中!
标签: python python-3.x amazon-web-services jupyter-lab amazon-sagemaker