通过记录拆分字符串来查看问题所在:
يَا
أَيُّهَا
الَّذِينَ
آمَنُوا
لَا
تَقْرَبُوا
الصَّلَاةَ
وَأَنْتُمْ
سُكَارَىٰ
حَتَّىٰ
تَعْلَمُوا
مَا
تَقُولُونَ
وَلَا
جُنُبًا
إِلَّا
عَابِرِي
سَبِيلٍ
حَتَّىٰ
تَغْتَسِلُوا
ۚ >>>>>>>>>>>>>>>>>>>>> Problem here
وَإِنْ
كُنْتُمْ
مَرْضَىٰ
أَوْ
عَلَىٰ
سَفَرٍ
أَوْ
جَاءَ
أَحَدٌ
مِنْكُمْ
مِنَ
الْغَائِطِ
أَوْ
لَامَسْتُمُ
النِّسَاءَ
فَلَمْ
تَجِدُوا
مَاءً
فَتَيَمَّمُوا
صَعِيدًا
طَيِّبًا
فَامْسَحُوا
بِوُجُوهِكُمْ
وَأَيْدِيكُمْ
ۗ >>>>>>>>>>>>>>>>>>>>> Problem here
إِنَّ
اللَّهَ
كَانَ
عَفُوًّا
غَفُورًا
因此,显然问题出在像 ۚ 或 ۗ 这样的高位变音符号(或准确地说的标记)上em> 因为它们不被视为有效字符。
我相信 Kotlin 版本比 Swift 版本更准确,因为您需要的是:
用空格分隔这个字符串作为分隔符(句号)
Swift 倾向于做的是它不识别上面的变音符号/标记,即它不考虑它们,并且在拆分字符串时不计算它们。可能还有另一个 Swift 函数可以检测到这一点,但不确定,因为这不是您问题的一部分。
因为你有几个这样的标记;因此,Kotlin 版本的数量比 Swift 多一个两个(即 51 而不是 49)。
所以,问题是:如何在拆分之前从字符串中删除上面的变音符号/标记?
感谢this answer 列出了这些类型的标记;在 Kotlin 中,您可以使用 String replace() 方法将它们替换为空:
这是一个修复您的示例的 sn-p:
var str = getString(R.string.valueHere)
str = str
.replace("\u0615", "") //ARABIC SMALL HIGH TAH
.replace("\u0616", "") //ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
.replace("\u0617", "") //ARABIC SMALL HIGH ZAIN
.replace("\u0618", "") //ARABIC SMALL FATHA
.replace("\u0619", "") //ARABIC SMALL DAMMA
.replace("\u061A", "") //ARABIC SMALL KASRA
.replace("\u06D6", "") //ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA
.replace("\u06D7", "") //ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA
.replace("\u06D8", "") //ARABIC SMALL HIGH MEEM INITIAL FORM
.replace("\u06D9", "") //ARABIC SMALL HIGH LAM ALEF
.replace("\u06DA", "") //ARABIC SMALL HIGH JEEM
.replace("\u06DB", "") //ARABIC SMALL HIGH THREE DOTS
.replace("\u06DC", "") //ARABIC SMALL HIGH SEEN
.replace("\u06DD", "") //ARABIC END OF AYAH
.replace("\u06DE", "") //ARABIC START OF RUB EL HIZB
.replace("\u06DF", "") //ARABIC SMALL HIGH ROUNDED ZERO
.replace("\u06E0", "") //ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO
.replace("\u06E1", "") //ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
.replace("\u06E2", "") //ARABIC SMALL HIGH MEEM ISOLATED FORM
.replace("\u06E3", "") //ARABIC SMALL LOW SEEN
.replace("\u06E4", "") //ARABIC SMALL HIGH MADDA
.replace("\u06E5", "") //ARABIC SMALL WAW
.replace("\u06E6", "") //ARABIC SMALL YEH
.replace("\u06E7", "") //ARABIC SMALL HIGH YEH
.replace("\u06E8", "") //ARABIC SMALL HIGH NOON
.replace("\u06E9", "") //ARABIC PLACE OF SAJDAH
.replace("\u06EA", "") //ARABIC EMPTY CENTRE LOW STOP
.replace("\u06EB", "") //ARABIC EMPTY CENTRE HIGH STOP
.replace("\u06EC", "") //ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE
.replace("\u06ED", "") //ARABIC SMALL LOW MEEM
val split = str.split(" ")
val count = str.split(" ").count {
it.isNotBlank()
}
Log.d("count is ", "$count")
This is the test verification result 在 Kotlin 编译器上
更新:
我有一个长字符串,我需要用 textView 内的不同颜色为其中的范围着色,所以用空格分割它,通过上下词索引获取所需的词,然后将它们加入一个字符串以着色它们内部的范围长字符串,上面的答案确实给出了 49 但它删除了替换提到的重要字符,所以任何尝试调整你的代码来考虑这个?
所以,如果你遵循上面的方法,你只需要从分割字符串中删除空格,为此你可以在用空格替换所有标记后使用filter{} 减少
fun getColorRange(input: String, wordFrom: Int, wordTo: Int): Range<Int> {
val text = input
.replace("\u0615", "") //ARABIC SMALL HIGH TAH
.replace("\u0616", "") //ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
.replace("\u0617", "") //ARABIC SMALL HIGH ZAIN
.replace("\u0618", "") //ARABIC SMALL FATHA
.replace("\u0619", "") //ARABIC SMALL DAMMA
.replace("\u061A", "") //ARABIC SMALL KASRA
.replace("\u06D6", "") //ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA
.replace("\u06D7", "") //ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA
.replace("\u06D8", "") //ARABIC SMALL HIGH MEEM INITIAL FORM
.replace("\u06D9", "") //ARABIC SMALL HIGH LAM ALEF
.replace("\u06DA", "") //ARABIC SMALL HIGH JEEM
.replace("\u06DB", "") //ARABIC SMALL HIGH THREE DOTS
.replace("\u06DC", "") //ARABIC SMALL HIGH SEEN
.replace("\u06DD", "") //ARABIC END OF AYAH
.replace("\u06DE", "") //ARABIC START OF RUB EL HIZB
.replace("\u06DF", "") //ARABIC SMALL HIGH ROUNDED ZERO
.replace("\u06E0", "") //ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO
.replace("\u06E1", "") //ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
.replace("\u06E2", "") //ARABIC SMALL HIGH MEEM ISOLATED FORM
.replace("\u06E3", "") //ARABIC SMALL LOW SEEN
.replace("\u06E4", "") //ARABIC SMALL HIGH MADDA
.replace("\u06E5", "") //ARABIC SMALL WAW
.replace("\u06E6", "") //ARABIC SMALL YEH
.replace("\u06E7", "") //ARABIC SMALL HIGH YEH
.replace("\u06E8", "") //ARABIC SMALL HIGH NOON
.replace("\u06E9", "") //ARABIC PLACE OF SAJDAH
.replace("\u06EA", "") //ARABIC EMPTY CENTRE LOW STOP
.replace("\u06EB", "") //ARABIC EMPTY CENTRE HIGH STOP
.replace("\u06EC", "") //ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE
.replace("\u06ED", "") //ARABIC SMALL LOW MEEM
val all = text.split(" ").filter { it.isNotBlank() } // Remove the blanks (i.e. the markers)
val sub = (wordFrom..wordTo).map { all[it] }.joinToString(" ")
Log.d("LOG_TAG", "getColorRange: $sub")
val range = text.indexOf(sub[0], wordFrom)
return Range<Int>(range, range + sub.length)
}
示例用法:
getColorRange(str, 18, 22)
// Output:
// حَتَّىٰ تَغْتَسِلُوا وَإِنْ كُنْتُمْ مَرْضَىٰ
getColorRange(str, 0, 48) // Should return the entire string as this is the total number of words
// Output:
// يَا أَيُّهَا الَّذِينَ آمَنُوا لَا تَقْرَبُوا الصَّلَاةَ وَأَنْتُمْ سُكَارَىٰ حَتَّىٰ تَعْلَمُوا مَا تَقُولُونَ وَلَا جُنُبًا إِلَّا عَابِرِي سَبِيلٍ حَتَّىٰ تَغْتَسِلُوا وَإِنْ كُنْتُمْ مَرْضَىٰ أَوْ عَلَىٰ سَفَرٍ أَوْ جَاءَ أَحَدٌ مِنْكُمْ مِنَ الْغَائِطِ أَوْ لَامَسْتُمُ النِّسَاءَ فَلَمْ تَجِدُوا مَاءً فَتَيَمَّمُوا صَعِيدًا طَيِّبًا فَامْسَحُوا بِوُجُوهِكُمْ وَأَيْدِيكُمْ إِنَّ اللَّهَ كَانَ عَفُوًّا غَفُورًا
还要注意range 的值有问题,因为sub 是一个列表,而不是一个字符串,所以下面是错误的
val range = text.indexOf(sub)
相反,您需要获取sub 中第一项的索引,并且从wordFrom 开始而不是从字符串的开头:
val range = text.indexOf(sub[0], wordFrom)