【问题标题】:generate a sequence with respect to subsequences in python在python中生成关于子序列的序列
【发布时间】:2021-02-01 17:08:11
【问题描述】:

我尝试生成以下序列。

text   = ACCCEBCE
target = 000000D0

生成不同字符的随机文本。在文本序列中,如果找到以下子序列,则目标为D或E,否则目标为0。

ABC     -->  D
BCD     -->  E

我编写以下代码。如果我生成少量字符,它会很好地工作。但如果我让 timesteps = 1000 等,它不会给出任何输出。

import string
import random as rn
import numpy as np
def is_subseq(x, y):
    it = iter(y)
    return all(any(c == ch for c in it) for ch in x)


def count(a, b, m, n):  
  
    # If both first and second string  
    # is empty, or if second string  
    # is empty, return 1  
    if ((m == 0 and n == 0) or n == 0):  
        return 1
  
    # If only first string is empty  
    # and second string is not empty, 
    # return 0  
    if (m == 0): 
        return 0
  
    # If last characters are same  
    # Recur for remaining strings by  
    # 1. considering last characters  
    #    of both strings  
    # 2. ignoring last character  
    #    of first string  
    if (a[m - 1] == b[n - 1]):  
        return (count(a, b, m - 1, n - 1) + 
                count(a, b, m - 1, n))  
    else: 
          
        # If last characters are different,  
        # ignore last char of first string  
        # and recur for remaining string  
        return count(a, b, m - 1, n)  

# create a sequence classification instance
def get_sequence(n_timesteps):

    alphabet="ABCDE"#string.ascii_uppercase 
    text = ''.join(rn.choices(alphabet, k=n_timesteps))
    print(text)

    seq_length=3
    subseqX = []
    subseqY = []
    for i in range(0, len(alphabet) - seq_length, 1):
        seq_in = alphabet[i:i + seq_length]
        seq_out = alphabet[i + seq_length]
        subseqX.append([char for char in seq_in])
        subseqY.append(seq_out)
        print(seq_in, "\t-->\t",seq_out)
    
    y2 = []
    match = 0 
    countlist=np.zeros(len(subseqX))
    for i, val in enumerate(text):
        found = False
        counter = 0
        for g, val2 in enumerate(subseqX):
            listToStr = ''.join(map(str, subseqX[g]))
            howmany = count(text[:i], listToStr, len(text[:i]),len(listToStr))
            if is_subseq(listToStr, text[:i]):
                if countlist[g] < howmany:
                    match = match + howmany
                    countlist[g] = howmany
                    temp = g
                    found = True
        if found:
            y2.append(subseqY[temp])
        else:
            y2.append(0)
    print("counter:\t", counter)
    print(text)
    print(y2)
     
# define problem properties
n_timesteps = 100
get_sequence(n_timesteps)

这可能是因为递归函数的深度。但我需要生成 1000 或 10000 个字符。 我该如何解决这个问题?有什么想法吗?

【问题讨论】:

  • 您可以创建所需子序列的字典,例如d={'ABC':'A', 'BCD':B},您将在其中监视下一个所需字符以堆叠需求。找到最后一个字符后,用所需的字母填充列表(为此使用另一个字典)并从头开始重新开始循环。对不起,它需要很多代码,我没有时间处理它跨度>

标签: python algorithm sequence


【解决方案1】:

我不确定我是否理解你想要做的所有事情(那里有很多代码),但我相信这个简化的函数形式应该可以工作。它维护了迄今为止看到的一组子序列。它仅通过在遇到下一个字母时添加它们来扩展它们。这允许标记知道当前字符之前的序列的前缀是否曾被看到过。

def flagSequence(S,letters="ABCDE",seqLen=3):
    subSeqs    = set()
    result     = "0"
    for c in S[:-1]:
        p = letters.index(c)
        subSeqs.add(c)
        if p>0:
            subSeqs.update([s+c for s in subSeqs if s[-1]==letters[p-1]])
        if p in range(seqLen-1,len(letters)-1) and letters[p-seqLen+1:p+1] in subSeqs:
            result += letters[p+1]
        else:
            result += "0"
    return result

输出:

text = "BDBACCBECEECAEAEDCAACBCCDDDBBDEEDABDBDE"

print(text)
print(flagSequence(text))

BDBACCBECEECAEAEDCAACBCCDDDBBDEEDABDBDE
000000000D00D0000ED00D0DDEEE00E00E00E0E

更多字母:

alphabet=string.ascii_uppercase 
text  = ''.join(rn.choices(alphabet, k=10000))
flags = flagSequence(text,alphabet)
print(text[:60])
print(flags[:60])

CHUJKAMWCAAIBXGIZFHALAWWFDDELXREMOQQVXFPNYJRQESRVEJKIAQILYSJ...
000000000000000000000M000000FM00FN00000G0OZK0RFTS0FKLJ0RJMZT...

更长的序列:

alphabet=string.ascii_uppercase 
text  = ''.join(rn.choices(alphabet, k=10000))
flags = flagSequence(text,alphabet,seqLen=10)
print(text[200:260])
print(flags[200:260])

...PMZCDQXAOHVMTRLYCNCJABGGNZYAWIHJJCQKMMAENQFHNQTOQOPPGHVQZXZU...
...00N0000Y000WN000Z0O0K0000O0Z0X00KK00LNN00O000O00P0PQQ00WR0Y0...

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-11-16
    • 2012-01-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多