Home

Neil Cerutti <horpner@yahoo.com> wrote:

> It's probable that a simpler implementation using slice
> operations will be faster for shortish lengths of subseq. It was
> certainly easier to get it working correctly. ;)
>
> def find(seq, subseq):
> for i, j in itertools.izip(xrange(len(seq)-len(subseq)),
> xrange(len(subseq), len(seq))):
> if subseq == seq[i:j]:
> return i
> return -1

Simpler yet (though maybe slower!-):

def find(seq, subseq):
L = len(subseq)
for i in xrange(0, len(seq)-L):
if subseq == seq[i:i+L]: return i
return -1

also worth trying (may be faster in some cases, e.g. when the first item
of the subsequence occurs rarely in the sequence):

def find(seq, subseq):
L = len(subseq)
firstitem = subseq[0]
end = len(seq) - len(subseq)
i = -1
while 1:
try: i = seq.index(firstitem, i+1, end)
except ValueError: return -1
if subseq == seq[i:i+L]: return i

For particularly long sequences (with hashable items) it might even be
worth trying variants of Boyer-Moore, Horspool, or Knuth-Morris-Pratt;
while these search algorithms are mostly intended for text strings,
since you need tables indexed by the item values, using dicts for such
tables might yet be feasible (however, the program won't be quite that
simple). Benchmarking of various possibilities on typical input data
for your application is recommended, as performance may vary!


Alex

previous
next

Re: Looking for the "isfinite" function in C/C++?
virtual operator +
Re: Howto identify a string value vs. a numeric value in std::string
Re: Need a better understanding on how MRO works?
Re: Would Anonymous Functions Help in Learning Programming/Python?
Pajacyk
Podaruj Zycie
Fundacja Hobbit
Fundacja Iskierka
Kidprotect