查找具有重复字母的单词（排列）的排名答案

【问题标题】：Finding the ranking of a word (permutations) with duplicate letters查找具有重复字母的单词（排列）的排名
【发布时间】：2014-05-03 17:49:38
【问题描述】：

尽管已经发布了很多关于这个问题的帖子，但我还是发布了这个。我不想发布作为答案，因为它不起作用。这篇文章的答案 (Finding the rank of the Given string in list of all possible permutations with Duplicates) 对我不起作用。

所以我尝试了这个（这是我抄袭的代码的汇编以及我处理重复的尝试）。非重复案例工作正常。 BOOKKEEPER 生成 83863，而不是所需的 10743。

（阶乘函数和字母计数器数组“重复”工作正常。我没有发布以节省空间。）

while (pointer != length)
{
    if (sortedWordChars[pointer] != wordArray[pointer])
    {
        // Swap the current character with the one after that
        char temp = sortedWordChars[pointer];
        sortedWordChars[pointer] = sortedWordChars[next];
        sortedWordChars[next] = temp;
        next++;

        //For each position check how many characters left have duplicates, 
        //and use the logic that if you need to permute n things and if 'a' things 
        //are similar the number of permutations is n!/a!


        int ct = repeats[(sortedWordChars[pointer]-64)];
        // Increment the rank
        if (ct>1) { //repeats?
            System.out.println("repeating " + (sortedWordChars[pointer]-64));
            //In case of repetition of any character use: (n-1)!/(times)!
            //e.g. if there is 1 character which is repeating twice,
            //x* (n-1)!/2!                      
                int dividend = getFactorialIter(length - pointer - 1);
                int divisor = getFactorialIter(ct);
                int quo = dividend/divisor;
                rank += quo;
        } else {
            rank += getFactorialIter(length - pointer - 1);                 
        }                       
    } else
    {
        pointer++;
        next = pointer + 1;
    }
}

【问题讨论】：

我想你想要词典排序？
是的，大卫 - 例如QUESTION=24572（在我的代码中工作，因为没有欺骗。）感谢您的回复。

标签： string algorithm permutation

【解决方案1】：

注意：此答案适用于基于 1 的排名，如示例所隐含指定的。这是一些至少适用于所提供的两个示例的 Python。关键事实是suffixperms * ctr[y] // ctr[x] 是首字母为y 的排列数，长度为(i + 1) 的后缀perm。

from collections import Counter

def rankperm(perm):
    rank = 1
    suffixperms = 1
    ctr = Counter()
    for i in range(len(perm)):
        x = perm[((len(perm) - 1) - i)]
        ctr[x] += 1
        for y in ctr:
            if (y < x):
                rank += ((suffixperms * ctr[y]) // ctr[x])
        suffixperms = ((suffixperms * (i + 1)) // ctr[x])
    return rank
print(rankperm('QUESTION'))
print(rankperm('BOOKKEEPER'))

Java 版本：

public static long rankPerm(String perm) {
    long rank = 1;
    long suffixPermCount = 1;
    java.util.Map<Character, Integer> charCounts =
        new java.util.HashMap<Character, Integer>();
    for (int i = perm.length() - 1; i > -1; i--) {
        char x = perm.charAt(i);
        int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
        charCounts.put(x, xCount);
        for (java.util.Map.Entry<Character, Integer> e : charCounts.entrySet()) {
            if (e.getKey() < x) {
                rank += suffixPermCount * e.getValue() / xCount;
            }
        }
        suffixPermCount *= perm.length() - i;
        suffixPermCount /= xCount;
    }
    return rank;
}

未排序的排列：

from collections import Counter

def unrankperm(letters, rank):
    ctr = Counter()
    permcount = 1
    for i in range(len(letters)):
        x = letters[i]
        ctr[x] += 1
        permcount = (permcount * (i + 1)) // ctr[x]
    # ctr is the histogram of letters
    # permcount is the number of distinct perms of letters
    perm = []
    for i in range(len(letters)):
        for x in sorted(ctr.keys()):
            # suffixcount is the number of distinct perms that begin with x
            suffixcount = permcount * ctr[x] // (len(letters) - i)
            if rank <= suffixcount:
                perm.append(x)
                permcount = suffixcount
                ctr[x] -= 1
                if ctr[x] == 0:
                    del ctr[x]
                break
            rank -= suffixcount
    return ''.join(perm)

【讨论】：

感谢您的快速回复，大卫！让我找一顶 Python 帽子（我不懂 Python），并从这个看起来优雅的代码中理解一下。我会发布更新。再次感谢，马克斯
@MaxTomlinson 音译成您选择的语言应该不会太难。循环i in range(len(perm)) 从0 到len(perm) - 1 由1 步进i。运算符// 正在截断除法。 perm 索引自 0。变量ctr 是从排列字母到频率的映射，其中每个字母隐式初始化为零频率。
让我有点吃惊的是 for 循环结束的地方（隐含的括号），所以 for 循环一直包含到返回排名。通过字符串 perm 的索引实际上是从头到尾（对）？每次迭代都会触发计数器，并且每次迭代都会执行“for y”循环，这是一种即时的阶乘？
@David Eisenstat，非常酷的解决方案！但是由于 int 和 long 溢出，您的代码无法处理非常大的字符串，例如“adsfadfjkzcvzadfadfasdfqq”。我知道，这个问题的解决方案之一是使用模乘逆，但我不明白应该如何使用它来查找秩问题。也许，您对处理大字符串有一些想法，或者知道在这种情况下如何使用模乘逆？
@David Eisenstat，感谢您的回复。你的意思是从这行rank += suffixPermCount * e.getValue() / xCount; 或suffixPermCount /= xCount; 或两者中替换xCount？但更重要的是，我没有得到在计算模乘逆时需要使用什么变量（我应该为“模数”使用什么值，什么是“a”和“b”）。我检查了一些关于这个问题的数学，但我不明白应该如何在查找排名问题的上下文中使用它（使用大输入字符串）

【解决方案2】：

如果我们使用数学，复杂度会降低，并且能够更快地找到排名。这对于大字符串特别有用。（更多详情可查看here）

建议以编程方式定义here 所示的方法（下面附上屏幕截图）下面给出）

【讨论】：

很酷，但是如何从数字排名中检索单词？

【解决方案3】：

我会说大卫的帖子（接受的答案）非常酷。但是，我想进一步改进它以提高速度。内部循环试图找到逆序对，并且对于每个这样的逆序，它都试图为排名的增加做出贡献。如果我们在那个地方使用有序映射结构（二叉搜索树或 BST），我们可以简单地从第一个节点（左下）进行中序遍历，直到它到达 BST 中的当前字符，而不是遍历整个地图（英国夏令时）。在 C++ 中，std::map 是 BST 实现的完美选择。以下代码减少了循环中必要的迭代并删除了 if 检查。

long long rankofword(string s)
{
    long long rank = 1;
    long long suffixPermCount = 1;
    map<char, int> m;
    int size = s.size();
    for (int i = size - 1; i > -1; i--)
    {
        char x = s[i];
        m[x]++;
        for (auto it = m.begin(); it != m.find(x); it++)
                rank += suffixPermCount * it->second / m[x];

        suffixPermCount *= (size - i);
        suffixPermCount /= m[x];
    }
    return rank;
}

【讨论】：

【解决方案4】：

@Dvaid Einstat，这真的很有帮助。我花了一段时间才弄清楚你在做什么，因为我还在学习我的第一语言（C#）。我将它翻译成 C# 并认为我也会给出那个解决方案，因为这个清单对我帮助很大！

谢谢！

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;

namespace CsharpVersion
{
    class Program
    {
        //Takes in the word and checks to make sure that the word
        //is between 1 and 25 charaters inclusive and only
        //letters are used
        static string readWord(string prompt, int high)
        {
            Regex rgx = new Regex("^[a-zA-Z]+$");
            string word;
            string result;
            do
            {
                Console.WriteLine(prompt);
                word = Console.ReadLine();
            } while (word == "" | word.Length > high | rgx.IsMatch(word) == false);
            result = word.ToUpper();
            return result;
        }

        //Creates a sorted dictionary containing distinct letters 
        //initialized with 0 frequency
        static SortedDictionary<char,int> Counter(string word)
        {
            char[] wordArray = word.ToCharArray();
            int len = word.Length;
            SortedDictionary<char,int> count = new SortedDictionary<char,int>();
           foreach(char c in word)
           {
               if(count.ContainsKey(c))
               {
               }
               else
               {
                   count.Add(c, 0);
               }

           }
           return count;
        }

        //Creates a factorial function
        static int Factorial(int n)
        {
            if (n <= 1)
            {
                return 1;
            }
            else
            {
                return n * Factorial(n - 1);
            }
        }
        //Ranks the word input if there are no repeated charaters 
        //in the word
        static Int64 rankWord(char[] wordArray)
        {
            int n = wordArray.Length; 
            Int64 rank = 1; 
            //loops through the array of letters
            for (int i = 0; i < n-1; i++) 
            { 
                int x=0; 
            //loops all letters after i and compares them for factorial calculation
                for (int j = i+1; j<n ; j++) 
                { 
                    if (wordArray[i] > wordArray[j]) 
                    {
                        x++;
                    }
                }
                rank = rank + x * (Factorial(n - i - 1)); 
             }
            return rank;
        }

        //Ranks the word input if there are repeated charaters
        //in the word
        static Int64 rankPerm(String word) 
        {
        Int64 rank = 1;
        Int64 suffixPermCount = 1;
        SortedDictionary<char, int> counter = Counter(word);
        for (int i = word.Length - 1; i > -1; i--) 
        {
            char x = Convert.ToChar(word.Substring(i,1));
            int xCount;
            if(counter[x] != 0) 
            {
                xCount = counter[x] + 1; 
            }
            else
            {
               xCount = 1;
            }
            counter[x] = xCount;
            foreach (KeyValuePair<char,int> e in counter)
            {
                if (e.Key < x)
                {
                    rank += suffixPermCount * e.Value / xCount;
                }
            }

            suffixPermCount *= word.Length - i;
            suffixPermCount /= xCount;
        }
        return rank;
        }




        static void Main(string[] args)
        {
           Console.WriteLine("Type Exit to end the program.");
           string prompt = "Please enter a word using only letters:";
           const int MAX_VALUE = 25;
           Int64 rank = new Int64();
           string theWord;
           do
           {
               theWord = readWord(prompt, MAX_VALUE);
               char[] wordLetters = theWord.ToCharArray();
               Array.Sort(wordLetters);
               bool duplicate = false;
               for(int i = 0; i< theWord.Length - 1; i++)
               {
                 if(wordLetters[i] < wordLetters[i+1])
                 {
                     duplicate = true;
                 }
               }
               if(duplicate)
               {
               SortedDictionary<char, int> counter = Counter(theWord);
               rank = rankPerm(theWord);
               Console.WriteLine("\n" + theWord + " = " + rank);
               }
               else
               {
               char[] letters = theWord.ToCharArray();
               rank = rankWord(letters);
               Console.WriteLine("\n" + theWord + " = " + rank);
               }
           } while (theWord != "EXIT");

            Console.WriteLine("\nPress enter to escape..");
            Console.Read();
        }
    }
}

【讨论】：

【解决方案5】：

如果有 k 个不同的字符，第 i^th 个字符重复 n_i 次，那么排列的总数由下式给出

            (n_1 + n_2 + ..+ n_k)!
------------------------------------------------ 
              n_1! n_2! ... n_k!

这是多项式系数。
现在我们可以使用它来计算给定排列的秩，如下所示：

考虑第一个字符（最左边）。说它是按字符排序顺序排列的第 r^th 个。

现在，如果您用第 1,2,3,..,(r-1)^th 字符中的任何一个替换第一个字符并考虑所有可能的排列，这些排列中的每一个都将在给定排列之前。总数可以使用上述公式计算。

计算第一个字符的数字后，修复第一个字符，然后对第二个字符重复相同的操作，依此类推。

这是您问题的 C++ 实现

#include<iostream>

using namespace std;

int fact(int f) {
    if (f == 0) return 1;
    if (f <= 2) return f;
    return (f * fact(f - 1));
}
int solve(string s,int n) {
    int ans = 1;
    int arr[26] = {0};
    int len = n - 1;
    for (int i = 0; i < n; i++) {
        s[i] = toupper(s[i]);
        arr[s[i] - 'A']++;
    }
    for(int i = 0; i < n; i++) {
        int temp = 0;
        int x = 1;
        char c = s[i];
        for(int j = 0; j < c - 'A'; j++) temp += arr[j];
        for (int j = 0; j < 26; j++) x = (x * fact(arr[j]));
        arr[c - 'A']--;
        ans = ans + (temp * ((fact(len)) / x));
        len--;
    }
    return ans;
}
int main() {
    int i,n;
    string s;
    cin>>s;
    n=s.size();
    cout << solve(s,n);
    return 0;
}

【讨论】：

【解决方案6】：

字符串的 unrank 的 Java 版本：

public static String unrankperm(String letters, int rank) {
    Map<Character, Integer> charCounts = new java.util.HashMap<>();
    int permcount = 1;
    for(int i = 0; i < letters.length(); i++) {
        char x = letters.charAt(i);
        int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
        charCounts.put(x, xCount);

        permcount = (permcount * (i + 1)) / xCount;
    }
    // charCounts is the histogram of letters
    // permcount is the number of distinct perms of letters
    StringBuilder perm = new StringBuilder();

    for(int i = 0; i < letters.length(); i++) {
        List<Character> sorted = new ArrayList<>(charCounts.keySet());
        Collections.sort(sorted);

        for(Character x : sorted) {
            // suffixcount is the number of distinct perms that begin with x
            Integer frequency = charCounts.get(x);
            int suffixcount = permcount * frequency / (letters.length() - i); 

            if (rank <= suffixcount) {
                perm.append(x);

                permcount = suffixcount;

                if(frequency == 1) {
                    charCounts.remove(x);
                } else {
                    charCounts.put(x, frequency - 1);
                }
                break;
            }
            rank -= suffixcount;
        }
    }
    return perm.toString();
}

另见n-th-permutation-algorithm-for-use-in-brute-force-bin-packaging-parallelization。

【讨论】：