【发布时间】:2011-10-08 23:40:54
【问题描述】:
自定义迭代器存在问题,因为它只会对文件进行一次迭代。我在迭代之间的相关文件对象上调用seek(0),但在第二次运行时第一次调用next() 时抛出StopIteration。我觉得我忽略了一些明显的东西,但希望对此有一些新的看法:
class MappedIterator(object):
"""
Given an iterator of dicts or objects and a attribute mapping dict,
will make the objects accessible via the desired interface.
Currently it will only produce dictionaries with string values. Can be
made to support actual objects later on. Somehow... :D
"""
def __init__(self, obj=None, mapping={}, *args, **kwargs):
self._obj = obj
self._mapping = mapping
self.cnt = 0
def __iter__(self):
return self
def reset(self):
self.cnt = 0
def next(self):
try:
try:
item = self._obj.next()
except AttributeError:
item = self._obj[self.cnt]
# If no mapping is provided, an empty object will be returned.
mapped_obj = {}
for mapped_attr in self._mapping:
attr = mapped_attr.attribute
new_attr = mapped_attr.mapped_name
val = item.get(attr, '')
val = str(val).strip() # get rid of whitespace
# TODO: apply transformers...
# This allows multi attribute mapping or grouping of multiple
# attributes in to one.
try:
mapped_obj[new_attr] += val
except KeyError:
mapped_obj[new_attr] = val
self.cnt += 1
return mapped_obj
except (IndexError, StopIteration):
self.reset()
raise StopIteration
class CSVMapper(MappedIterator):
def __init__(self, reader, mapping={}, *args, **kwargs):
self._reader = reader
self._mapping = mapping
self._file = kwargs.pop('file')
super(CSVMapper, self).__init__(self._reader, self._mapping, *args, **kwargs)
@classmethod
def from_csv(cls, file, mapping, *args, **kwargs):
# TODO: Parse kwargs for various DictReader kwargs.
return cls(reader=DictReader(file), mapping=mapping, file=file)
def __len__(self):
return int(self._reader.line_num)
def reset(self):
if self._file:
self._file.seek(0)
super(CSVMapper, self).reset()
示例用法:
file = open('somefile.csv', 'rb') # say this file has 2 rows + a header row
mapping = MyMappingClass() # this isn't really relevant
reader = CSVMapper.from_csv(file, mapping)
# > 'John'
# > 'Bob'
for r in reader:
print r['name']
# This won't print anything
for r in reader:
print r['name']
【问题讨论】:
-
文档说并非所有文件对象都可以使用 seek(),尽管它没有说明哪些类型。我猜不是文本文件,但可能值得研究docs.python.org/release/2.4.4/lib/bltin-file-objects.html
-
另外,如果您愿意,您是否可以不只是重新打开或重新实例化
reader对象以获得所需的效果? -
哦,这提出了一个很好的观点。这是 Django 文件对象的一个实例。 docs.djangoproject.com/en/1.3/ref/files/file