【发布时间】:2020-02-10 23:12:47
【问题描述】:
我执行了这个查询
查询:mongodump --db=elastic --collection=q_moonx_notifications_2019-08-26 --out=/home/centos/mongo-dump/
在我的 mongo db 上转储周围的一个集合
集合名称:q_moonx_notifications_2019-08-26
大小:80GB
存储大小:23.6GB
执行此操作后不久,mongod 服务崩溃了。 我通过 /var/log/messages 找到了问题。 我知道它是由于“内存不足”问题而发生的。 有人可以帮助我这是如何发生的吗?如何在不影响正在运行的 mongo 服务的情况下转储单个集合。
机器有 32 GB 内存和 0 swp。
/var/log/messages 内容
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb61e41>] dump_stack+0x19/0x1b
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb5c86a>] dump_header+0x90/0x229
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd700bcb>] ? cred_has_capability+0x6b/0x120
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5ba4e4>] oom_kill_process+0x254/0x3d0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd700c9c>] ? selinux_capable+0x1c/0x40
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5bad26>] out_of_memory+0x4b6/0x4f0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb5d36e>] __alloc_pages_slowpath+0x5d6/0x724
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5c1105>] __alloc_pages_nodemask+0x405/0x420
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd60df68>] alloc_pages_current+0x98/0x110
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5b6347>] __page_cache_alloc+0x97/0xb0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5b8fa8>] filemap_fault+0x298/0x490
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffc0400d0e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffc0400f0c>] xfs_filemap_fault+0x2c/0x30 [xfs]
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5e444a>] __do_fault.isra.59+0x8a/0x100
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5e49fc>] do_read_fault.isra.61+0x4c/0x1b0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5e93a4>] handle_pte_fault+0x2f4/0xd10
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd50cbf8>] ? get_futex_key+0x1c8/0x2c0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbd5ebedd>] handle_mm_fault+0x39d/0x9b0
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb6f5e3>] __do_page_fault+0x203/0x500
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb6f915>] do_page_fault+0x35/0x90
Oct 11 07:07:33 ip-1.23.345.678 kernel: [<ffffffffbdb6b758>] page_fault+0x28/0x30
Oct 11 07:07:33 ip-1.23.345.678 kernel: Mem-Info:
Oct 11 07:07:33 ip-1.23.345.678 kernel: active_anon:7972560 inactive_anon:30565 isolated_anon:0#012 active_file:2831 inactive_file:4651 isolated_file:0#012 unevictable:0 dirty:0 writeback:5 unstable:0#012 slab_reclaimable:42065 slab_unreclaimable:12016#012 mapped:18928 shmem:55424 pagetables:19349 bounce:0#012 free:49154 free_pcp:558 free_cma:0
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 DMA free:15904kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Oct 11 07:07:33 ip-1.23.345.678 kernel: lowmem_reserve[]: 0 3597 31992 31992
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 DMA32 free:121020kB min:7596kB low:9492kB high:11392kB active_anon:3397760kB inactive_anon:11020kB active_file:284kB inactive_file:508kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3684320kB mlocked:0kB dirty:0kB writeback:0kB mapped:5892kB shmem:14892kB slab_reclaimable:131668kB slab_unreclaimable:5368kB kernel_stack:1472kB pagetables:7364kB unstable:0kB bounce:0kB free_pcp:1184kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2829 all_unreclaimable? yes
Oct 11 07:07:33 ip-1.23.345.678 kernel: lowmem_reserve[]: 0 0 28394 28394
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 Normal free:66532kB min:59952kB low:74940kB high:89928kB active_anon:28492480kB inactive_anon:111240kB active_file:11040kB inactive_file:11684kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29622272kB managed:29079004kB mlocked:0kB dirty:0kB writeback:20kB mapped:69820kB shmem:206804kB slab_reclaimable:36592kB slab_unreclaimable:42696kB kernel_stack:6720kB pagetables:70032kB unstable:0kB bounce:0kB free_pcp:3320kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8574 all_unreclaimable? no
Oct 11 07:07:33 ip-1.23.345.678 kernel: lowmem_reserve[]: 0 0 0 0
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 DMA32: 2724*4kB (UEM) 1319*8kB (UEM) 2734*16kB (UEM) 1065*32kB (UEM) 296*64kB (UEM) 32*128kB (UEM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 122312kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 Normal: 10268*4kB (UEM) 3615*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69992kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: 61902 total pagecache pages
Oct 11 07:07:33 ip-1.23.345.678 kernel: 0 pages in swap cache
Oct 11 07:07:33 ip-1.23.345.678 kernel: Swap cache stats: add 0, delete 0, find 0/0
Oct 11 07:07:33 ip-1.23.345.678 kernel: Free swap = 0kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: Total swap = 0kB
Oct 11 07:07:33 ip-1.23.345.678 kernel: 8388509 pages RAM
Oct 11 07:07:33 ip-1.23.345.678 kernel: 0 pages HighMem/MovableOnly
Oct 11 07:07:33 ip-1.23.345.678 kernel: 193702 pages reserved
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 2415] 0 2415 55590 32605 114 0 0 systemd-journal
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 2456] 0 2456 11953 611 25 0 -1000 systemd-udevd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 2704] 0 2704 15511 170 29 0 -1000 auditd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4346] 32 4346 18412 189 38 0 0 rpcbind
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4464] 81 4464 16600 204 34 0 -900 dbus-daemon
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4532] 998 4532 29446 143 29 0 0 chronyd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4588] 0 4588 6652 156 19 0 0 systemd-logind
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4589] 999 4589 153057 1381 63 0 0 polkitd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4592] 0 4592 5416 101 14 0 0 irqbalance
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4596] 0 4596 50404 162 38 0 0 gssproxy
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 4940] 0 4940 26839 508 51 0 0 dhclient
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5035] 0 5035 143455 3309 99 0 0 tuned
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5102] 0 5102 31253 535 58 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5103] 995 5103 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5104] 995 5104 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5105] 995 5105 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5106] 995 5106 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5107] 995 5107 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5108] 995 5108 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5109] 995 5109 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5110] 995 5110 31375 641 59 0 0 nginx
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5183] 0 5183 22603 310 42 0 0 master
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5189] 89 5189 22673 286 43 0 0 qmgr
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5231] 997 5231 4365796 4120011 8150 0 0 mongod
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5295] 0 5295 104225 16973 121 0 0 rsyslogd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5297] 0 5297 28189 267 58 0 -1000 sshd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5337] 0 5337 31580 194 18 0 0 crond
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5343] 0 5343 27523 50 10 0 0 agetty
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5347] 0 5347 27523 50 13 0 0 agetty
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 5662] 27 5662 691356 97174 303 0 0 mysqld
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 7361] 996 7361 12739660 2366239 5813 0 0 java
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 7462] 996 7462 17192 173 30 0 0 controller
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 7661] 1000 7661 3588101 1239871 2579 0 0 java
Oct 11 07:07:33 ip-1.23.345.678 kernel: [14974] 0 14974 371181 6509 107 0 0 metricbeat
Oct 11 07:07:33 ip-1.23.345.678 kernel: [ 6781] 994 6781 430375 60775 604 0 0 node
Oct 11 07:07:33 ip-1.23.345.678 kernel: [24008] 89 24008 22629 301 47 0 0 pickup
Oct 11 07:07:33 ip-1.23.345.678 kernel: [25163] 0 25163 39154 367 77 0 0 sshd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [25167] 1000 25167 39154 366 74 0 0 sshd
Oct 11 07:07:33 ip-1.23.345.678 kernel: [25168] 1000 25168 28893 149 15 0 0 bash
Oct 11 07:07:33 ip-1.23.345.678 kernel: [28554] 1000 28554 260690 34619 132 0 0 mongodump
Oct 11 07:07:33 ip-1.23.345.678 kernel: Out of memory: Kill process 5231 (mongod) score 503 or sacrifice child
Oct 11 07:07:33 ip-1.23.345.678 kernel: Killed process 5231 (mongod) total-vm:17463184kB, anon-rss:16480044kB, file-rss:0kB, shmem-rss:0kB
Oct 11 07:07:34 ip-1.23.345.678 systemd: mongod.service: main process exited, code=killed, status=9/KILL
Oct 11 07:07:34 ip-1.23.345.678 systemd: Unit mongod.service entered failed state.
Oct 11 07:07:34 ip-1.23.345.678 systemd: mongod.service failed.
Oct 11 07:07:39 ip-1.23.345.678 systemd-logind: New session 1220 of user centos.
【问题讨论】: