【发布时间】:2015-11-19 08:39:58
【问题描述】:
考虑一下这段文字:
...
bedeubedeu France The Provençal name for tripe
bee balmbee balm Bergamot
beechmastbeechmast Beech nut
beech nutbeech nut A small nut from the beech tree,
genus Fagus and Nothofagus, similar in
flavour to a hazelnut but not commonly used.
A flavoursome oil can be extracted from
them. Also called beechmast
beechwheatbeechwheat Buckwheat
beefbeef The meat of the animal known as a cow
(female) or bull (male) (NOTE: The Anglo-
saxon name ‘Ox’ is still used for some of what
were once the less desirable parts e.g. oxtail,
ox liver)
beef bourguignonnebeef bourguignonne See boeuf à la
bourguignonne
...
我想用 python 解析这个文本,只保留恰好出现两次并且相邻的字符串。例如,可接受的结果应该是
bedeu
bee balm
beechmast
beech nut
beechwheat
beef
beef bourguignonne
因为趋势是每个字符串都与相同的字符串相邻,就像这样:
bedeubedeu
bee balmbee balm
beechmastbeechmast
beech nutbeech nut
beechwheatbeechwheat
beefbeef
beef bourguignonnebeef bourguignonne
那么,如何使用正则表达式搜索相邻且相同的字符串?我正在测试我的试验here。谢谢!
【问题讨论】:
标签: python regex regex-negation regex-lookarounds