【问题标题】:How to count the duplicate lines in a file and find the most duplicated line?如何计算文件中的重复行并找到重复最多的行?
【发布时间】:2021-02-27 03:03:56
【问题描述】:

或者更好的是,告诉我某个元素在地图中被复制了多少次。地图是这样创建的:

fun prirazovac() {
    var lineNumber = 0

    File("src/60.ips.txt").forEachLine {
        lineNumber++
        val ipcode = mutableMapOf(lineNumber to it)
        for (ii in 1..200) {
            for (i in 200 downTo 1) {
                val truth = (ipcode.get(ii)== ipcode.get(i))
                if (truth) {
                    println(ipcode)
                }
            }

        }
    }
}

60.ips.txt:

66.249.64.33
66.249.64.124
66.249.76.13
66.249.76.11
142.54.183.122
142.54.183.122
180.76.15.162
173.234.153.122
173.234.153.122
173.234.153.122
173.234.153.122
180.76.15.154
180.76.15.33
66.249.76.110
66.249.76.109
46.119.118.233
46.119.118.233
46.119.118.233
207.46.13.231
207.46.13.231
40.77.167.29
52.3.127.144
66.249.64.33
66.249.76.109
63.249.66.212
63.249.66.212
207.46.13.237
207.46.13.237
40.77.167.29
40.77.167.29
157.55.39.251
207.46.13.142
66.249.76.9
40.77.167.7
157.55.39.251
157.55.39.251
157.55.39.251
157.55.39.251
157.55.39.251
207.46.13.142
207.46.13.142
198.204.240.219
198.204.240.219
68.180.231.40
68.180.231.40
66.249.64.124
139.167.180.171
139.167.180.171
52.3.127.144
217.69.133.169
66.249.76.13
131.161.8.209
223.16.201.219
223.16.201.219
68.180.231.40
162.210.196.97
162.210.196.97
106.75.74.148
106.75.74.148
106.75.74.148
137.226.158.12
137.226.158.12
106.75.74.148
106.75.74.148
123.125.71.53
178.255.215.84
178.255.215.84
66.249.76.9
63.249.66.212
63.249.66.212
63.249.66.212
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
198.204.227.58
142.54.183.122
142.54.183.122
66.249.76.109
151.80.31.167
51.255.65.21
202.46.58.80
84.185.64.239
84.185.64.239
178.255.215.84
178.255.215.84
52.3.127.144
180.76.15.21
66.249.64.20
66.249.76.127
80.112.180.113
66.249.76.109
180.76.15.6
223.16.201.219
223.16.201.219
84.121.51.229
84.121.51.229
123.125.71.79
157.55.39.251
217.69.133.253
217.69.133.252
92.204.106.99
188.251.22.226
80.183.10.116
68.180.228.62
68.180.228.62
173.208.211.250
173.208.211.250
66.249.65.158
180.76.15.6
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
88.198.117.52
68.180.228.62
180.76.15.6
173.208.211.250
173.208.211.250
5.248.253.78
5.248.253.78
5.248.253.78
123.125.71.95
92.204.106.99
93.95.103.45
52.3.127.144
52.3.127.144
68.180.228.62
163.172.66.14
190.200.185.85
190.200.185.85
157.55.39.251
157.55.39.113
180.76.15.137
180.76.15.25
92.204.106.99
66.249.73.136
46.229.167.149
46.229.167.149
46.229.167.149
92.229.161.46
92.204.106.99
92.204.106.99
92.204.106.99
66.249.65.158
66.249.65.154
207.46.13.141
207.46.13.141
207.46.13.141
173.208.211.250
173.208.211.250
66.249.73.131
66.249.73.131
163.172.14.55
178.255.215.84
91.64.61.78
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
46.246.39.81
87.78.248.247
87.78.248.247
69.64.40.177
223.16.201.219
223.16.201.219
63.249.66.212
63.249.66.212
178.137.95.202
178.137.95.202
178.137.95.202
92.204.106.99

它会打印出数千个结果。我需要它们单个,并且在最好的情况下,显示一个结果的重复次数,例如:ip adress - 20 次。我认为 HashMap() 会有所帮助,但它没有。有什么想法吗?

【问题讨论】:

  • 我真的不知道您要检查的结果是重复的。地图不会跟踪对象被放入其中的次数,除非您专门制作地图跟踪对象的频率。
  • 让我复制粘贴一个结果示例:{6=142.54.183.122} {6=142.54.183.122} {6=142.54.183.122} {6=142.54.183.122} 我想要 4x {6 =142.54.183.122} 或者真的只是获取数字的方法。
  • 您的代码没有显示足够的信息来帮助您。这些都没有显示,例如,142.54.183.122 是如何进入地图的。
  • 我想尽可能简化它以消除干扰,但我会编辑它。
  • 我提供了一个答案,展示了如何计算文件中的重复行。你的prirazovac 函数试图计算什么?

标签: dictionary for-loop kotlin hashmap duplicates


【解决方案1】:

Kotlin 有一些很棒的功能:groupingByeachCount 可以完全满足您的需求:

import java.io.File

fun main() {
    File("src/60.ips.txt")
        .readLines()
        .groupingBy { it }
        .eachCount()
        .forEach { (ip, count) -> println("$ip -> $count times") }
}

部分输出:

66.249.64.33 -> 2 times
66.249.64.124 -> 2 times
66.249.76.13 -> 2 times
66.249.76.11 -> 1 times
142.54.183.122 -> 4 times

要查找最常见的重复项,您可以使用maxByOrNull

File("src/60.ips.txt")
    .readLines()
    .groupingBy { it }
    .eachCount()
    .maxByOrNull { it.value }
    ?.let { (ip, count) -> println("IP $ip appeared the most: $count times") }

输出:

IP 46.246.39.81 appeared the most: 17 times

【讨论】:

  • 是的,非常完美。非常感谢。为了回答你的问题,我应该找到最常用的 IP 地址。谢谢你给我看。
  • 好的,也添加了一个示例。
猜你喜欢
  • 2011-10-06
  • 1970-01-01
  • 1970-01-01
  • 2020-12-05
  • 2011-12-29
  • 1970-01-01
  • 2020-06-10
  • 2014-08-04
  • 1970-01-01
相关资源
最近更新 更多