【问题标题】:Finding the ranking of a word (permutations) with duplicate letters查找具有重复字母的单词(排列)的排名
【发布时间】:2014-05-03 17:49:38
【问题描述】:

尽管已经发布了很多关于这个问题的帖子,但我还是发布了这个。我不想发布作为答案,因为它不起作用。这篇文章的答案 (Finding the rank of the Given string in list of all possible permutations with Duplicates) 对我不起作用。

所以我尝试了这个(这是我抄袭的代码的汇编以及我处理重复的尝试)。非重复案例工作正常。 BOOKKEEPER 生成 83863,而不是所需的 10743。

(阶乘函数和字母计数器数组“重复”工作正常。我没有发布以节省空间。)

while (pointer != length)
{
    if (sortedWordChars[pointer] != wordArray[pointer])
    {
        // Swap the current character with the one after that
        char temp = sortedWordChars[pointer];
        sortedWordChars[pointer] = sortedWordChars[next];
        sortedWordChars[next] = temp;
        next++;

        //For each position check how many characters left have duplicates, 
        //and use the logic that if you need to permute n things and if 'a' things 
        //are similar the number of permutations is n!/a!


        int ct = repeats[(sortedWordChars[pointer]-64)];
        // Increment the rank
        if (ct>1) { //repeats?
            System.out.println("repeating " + (sortedWordChars[pointer]-64));
            //In case of repetition of any character use: (n-1)!/(times)!
            //e.g. if there is 1 character which is repeating twice,
            //x* (n-1)!/2!                      
                int dividend = getFactorialIter(length - pointer - 1);
                int divisor = getFactorialIter(ct);
                int quo = dividend/divisor;
                rank += quo;
        } else {
            rank += getFactorialIter(length - pointer - 1);                 
        }                       
    } else
    {
        pointer++;
        next = pointer + 1;
    }
}

【问题讨论】:

  • 我想你想要词典排序?
  • 是的,大卫 - 例如QUESTION=24572(在我的代码中工作,因为没有欺骗。)感谢您的回复。

标签: string algorithm permutation


【解决方案1】:

注意:此答案适用于基于 1 的排名,如示例所隐含指定的。这是一些至少适用于所提供的两个示例的 Python。关键事实是suffixperms * ctr[y] // ctr[x] 是首字母为y 的排列数,长度为(i + 1) 的后缀perm

from collections import Counter

def rankperm(perm):
    rank = 1
    suffixperms = 1
    ctr = Counter()
    for i in range(len(perm)):
        x = perm[((len(perm) - 1) - i)]
        ctr[x] += 1
        for y in ctr:
            if (y < x):
                rank += ((suffixperms * ctr[y]) // ctr[x])
        suffixperms = ((suffixperms * (i + 1)) // ctr[x])
    return rank
print(rankperm('QUESTION'))
print(rankperm('BOOKKEEPER'))

Java 版本:

public static long rankPerm(String perm) {
    long rank = 1;
    long suffixPermCount = 1;
    java.util.Map<Character, Integer> charCounts =
        new java.util.HashMap<Character, Integer>();
    for (int i = perm.length() - 1; i > -1; i--) {
        char x = perm.charAt(i);
        int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
        charCounts.put(x, xCount);
        for (java.util.Map.Entry<Character, Integer> e : charCounts.entrySet()) {
            if (e.getKey() < x) {
                rank += suffixPermCount * e.getValue() / xCount;
            }
        }
        suffixPermCount *= perm.length() - i;
        suffixPermCount /= xCount;
    }
    return rank;
}

未排序的排列:

from collections import Counter

def unrankperm(letters, rank):
    ctr = Counter()
    permcount = 1
    for i in range(len(letters)):
        x = letters[i]
        ctr[x] += 1
        permcount = (permcount * (i + 1)) // ctr[x]
    # ctr is the histogram of letters
    # permcount is the number of distinct perms of letters
    perm = []
    for i in range(len(letters)):
        for x in sorted(ctr.keys()):
            # suffixcount is the number of distinct perms that begin with x
            suffixcount = permcount * ctr[x] // (len(letters) - i)
            if rank <= suffixcount:
                perm.append(x)
                permcount = suffixcount
                ctr[x] -= 1
                if ctr[x] == 0:
                    del ctr[x]
                break
            rank -= suffixcount
    return ''.join(perm)

【讨论】:

  • 感谢您的快速回复,大卫!让我找一顶 Python 帽子(我不懂 Python),并从这个看起来优雅的代码中理解一下。我会发布更新。再次感谢,马克斯
  • @MaxTomlinson 音译成您选择的语言应该不会太难。循环i in range(len(perm))0len(perm) - 11 步进i。运算符// 正在截断除法。 perm 索引自 0。变量ctr 是从排列字母到频率的映射,其中每个字母隐式初始化为零频率。
  • 让我有点吃惊的是 for 循环结束的地方(隐含的括号),所以 for 循环一直包含到返回排名。通过字符串 perm 的索引实际上是从头到尾(对)?每次迭代都会触发计数器,并且每次迭代都会执行“for y”循环,这是一种即时的阶乘?
  • @David Eisenstat,非常酷的解决方案!但是由于 int 和 long 溢出,您的代码无法处理非常大的字符串,例如“adsfadfjkzcvzadfadfasdfqq”。我知道,这个问题的解决方案之一是使用模乘逆,但我不明白应该如何使用它来查找秩问题。也许,您对处理大字符串有一些想法,或者知道在这种情况下如何使用模乘逆?
  • @David Eisenstat,感谢您的回复。你的意思是从这行rank += suffixPermCount * e.getValue() / xCount;suffixPermCount /= xCount; 或两者中替换xCount?但更重要的是,我没有得到在计算模乘逆时需要使用什么变量(我应该为“模数”使用什么值,什么是“a”和“b”)。我检查了一些关于这个问题的数学,但我不明白应该如何在查找排名问题的上下文中使用它(使用大输入字符串)
【解决方案2】:

如果我们使用数学,复杂度会降低,并且能够更快地找到排名。这对于大字符串特别有用。 (更多详情可查看here

建议以编程方式定义here 所示的方法(下面附上屏幕截图) 下面给出)

【讨论】:

  • 很酷,但是如何从数字排名中检索单词?
【解决方案3】:

我会说大卫的帖子(接受的答案)非常酷。但是,我想进一步改进它以提高速度。内部循环试图找到逆序对,并且对于每个这样的逆序,它都试图为排名的增加做出贡献。如果我们在那个地方使用有序映射结构(二叉搜索树或 BST),我们可以简单地从第一个节点(左下)进行中序遍历,直到它到达 BST 中的当前字符,而不是遍历整个地图(英国夏令时)。在 C++ 中,std::map 是 BST 实现的完美选择。以下代码减少了循环中必要的迭代并删除了 if 检查。

long long rankofword(string s)
{
    long long rank = 1;
    long long suffixPermCount = 1;
    map<char, int> m;
    int size = s.size();
    for (int i = size - 1; i > -1; i--)
    {
        char x = s[i];
        m[x]++;
        for (auto it = m.begin(); it != m.find(x); it++)
                rank += suffixPermCount * it->second / m[x];

        suffixPermCount *= (size - i);
        suffixPermCount /= m[x];
    }
    return rank;
}

【讨论】:

    【解决方案4】:

    @Dvaid Einstat,这真的很有帮助。我花了一段时间才弄清楚你在做什么,因为我还在学习我的第一语言(C#)。我将它翻译成 C# 并认为我也会给出那个解决方案,因为这个清单对我帮助很大!

    谢谢!

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Text.RegularExpressions;
    
    namespace CsharpVersion
    {
        class Program
        {
            //Takes in the word and checks to make sure that the word
            //is between 1 and 25 charaters inclusive and only
            //letters are used
            static string readWord(string prompt, int high)
            {
                Regex rgx = new Regex("^[a-zA-Z]+$");
                string word;
                string result;
                do
                {
                    Console.WriteLine(prompt);
                    word = Console.ReadLine();
                } while (word == "" | word.Length > high | rgx.IsMatch(word) == false);
                result = word.ToUpper();
                return result;
            }
    
            //Creates a sorted dictionary containing distinct letters 
            //initialized with 0 frequency
            static SortedDictionary<char,int> Counter(string word)
            {
                char[] wordArray = word.ToCharArray();
                int len = word.Length;
                SortedDictionary<char,int> count = new SortedDictionary<char,int>();
               foreach(char c in word)
               {
                   if(count.ContainsKey(c))
                   {
                   }
                   else
                   {
                       count.Add(c, 0);
                   }
    
               }
               return count;
            }
    
            //Creates a factorial function
            static int Factorial(int n)
            {
                if (n <= 1)
                {
                    return 1;
                }
                else
                {
                    return n * Factorial(n - 1);
                }
            }
            //Ranks the word input if there are no repeated charaters 
            //in the word
            static Int64 rankWord(char[] wordArray)
            {
                int n = wordArray.Length; 
                Int64 rank = 1; 
                //loops through the array of letters
                for (int i = 0; i < n-1; i++) 
                { 
                    int x=0; 
                //loops all letters after i and compares them for factorial calculation
                    for (int j = i+1; j<n ; j++) 
                    { 
                        if (wordArray[i] > wordArray[j]) 
                        {
                            x++;
                        }
                    }
                    rank = rank + x * (Factorial(n - i - 1)); 
                 }
                return rank;
            }
    
            //Ranks the word input if there are repeated charaters
            //in the word
            static Int64 rankPerm(String word) 
            {
            Int64 rank = 1;
            Int64 suffixPermCount = 1;
            SortedDictionary<char, int> counter = Counter(word);
            for (int i = word.Length - 1; i > -1; i--) 
            {
                char x = Convert.ToChar(word.Substring(i,1));
                int xCount;
                if(counter[x] != 0) 
                {
                    xCount = counter[x] + 1; 
                }
                else
                {
                   xCount = 1;
                }
                counter[x] = xCount;
                foreach (KeyValuePair<char,int> e in counter)
                {
                    if (e.Key < x)
                    {
                        rank += suffixPermCount * e.Value / xCount;
                    }
                }
    
                suffixPermCount *= word.Length - i;
                suffixPermCount /= xCount;
            }
            return rank;
            }
    
    
    
    
            static void Main(string[] args)
            {
               Console.WriteLine("Type Exit to end the program.");
               string prompt = "Please enter a word using only letters:";
               const int MAX_VALUE = 25;
               Int64 rank = new Int64();
               string theWord;
               do
               {
                   theWord = readWord(prompt, MAX_VALUE);
                   char[] wordLetters = theWord.ToCharArray();
                   Array.Sort(wordLetters);
                   bool duplicate = false;
                   for(int i = 0; i< theWord.Length - 1; i++)
                   {
                     if(wordLetters[i] < wordLetters[i+1])
                     {
                         duplicate = true;
                     }
                   }
                   if(duplicate)
                   {
                   SortedDictionary<char, int> counter = Counter(theWord);
                   rank = rankPerm(theWord);
                   Console.WriteLine("\n" + theWord + " = " + rank);
                   }
                   else
                   {
                   char[] letters = theWord.ToCharArray();
                   rank = rankWord(letters);
                   Console.WriteLine("\n" + theWord + " = " + rank);
                   }
               } while (theWord != "EXIT");
    
                Console.WriteLine("\nPress enter to escape..");
                Console.Read();
            }
        }
    }
    

    【讨论】:

      【解决方案5】:

      如果有 k 个不同的字符,第 i^th 个字符重复 n_i 次,那么排列的总数由下式给出

                  (n_1 + n_2 + ..+ n_k)!
      ------------------------------------------------ 
                    n_1! n_2! ... n_k!
      

      这是多项式系数。
      现在我们可以使用它来计算给定排列的秩,如下所示:

      考虑第一个字符(最左边)。说它是按字符排序顺序排列的第 r^th 个。

      现在,如果您用第 1,2,3,..,(r-1)^th 字符中的任何一个替换第一个字符并考虑所有可能的排列,这些排列中的每一个都将在给定排列之前。总数可以使用上述公式计算。

      计算第一个字符的数字后,修复第一个字符,然后对第二个字符重复相同的操作,依此类推。

      这是您问题的 C++ 实现

      #include<iostream>
      
      using namespace std;
      
      int fact(int f) {
          if (f == 0) return 1;
          if (f <= 2) return f;
          return (f * fact(f - 1));
      }
      int solve(string s,int n) {
          int ans = 1;
          int arr[26] = {0};
          int len = n - 1;
          for (int i = 0; i < n; i++) {
              s[i] = toupper(s[i]);
              arr[s[i] - 'A']++;
          }
          for(int i = 0; i < n; i++) {
              int temp = 0;
              int x = 1;
              char c = s[i];
              for(int j = 0; j < c - 'A'; j++) temp += arr[j];
              for (int j = 0; j < 26; j++) x = (x * fact(arr[j]));
              arr[c - 'A']--;
              ans = ans + (temp * ((fact(len)) / x));
              len--;
          }
          return ans;
      }
      int main() {
          int i,n;
          string s;
          cin>>s;
          n=s.size();
          cout << solve(s,n);
          return 0;
      }
      

      【讨论】:

        【解决方案6】:

        字符串的 unrank 的 Java 版本:

        public static String unrankperm(String letters, int rank) {
            Map<Character, Integer> charCounts = new java.util.HashMap<>();
            int permcount = 1;
            for(int i = 0; i < letters.length(); i++) {
                char x = letters.charAt(i);
                int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
                charCounts.put(x, xCount);
        
                permcount = (permcount * (i + 1)) / xCount;
            }
            // charCounts is the histogram of letters
            // permcount is the number of distinct perms of letters
            StringBuilder perm = new StringBuilder();
        
            for(int i = 0; i < letters.length(); i++) {
                List<Character> sorted = new ArrayList<>(charCounts.keySet());
                Collections.sort(sorted);
        
                for(Character x : sorted) {
                    // suffixcount is the number of distinct perms that begin with x
                    Integer frequency = charCounts.get(x);
                    int suffixcount = permcount * frequency / (letters.length() - i); 
        
                    if (rank <= suffixcount) {
                        perm.append(x);
        
                        permcount = suffixcount;
        
                        if(frequency == 1) {
                            charCounts.remove(x);
                        } else {
                            charCounts.put(x, frequency - 1);
                        }
                        break;
                    }
                    rank -= suffixcount;
                }
            }
            return perm.toString();
        }
        

        另见n-th-permutation-algorithm-for-use-in-brute-force-bin-packaging-parallelization

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多