Baru-baru ini saya memutakhirkan server rumah saya dari Ubuntu 10.04 hingga 12.04.1. Ini menjalankan kernel linux-image-server, x86_64 arch.
Tidak ada yang berjalan sangat tidak biasa, saya pikir - daemon banjir, apache2, firewall iptables dengan IP masquerading, server DHCP, ikat server DNS yang memiliki file zona yang diperbarui secara otomatis dengan nama host yang diidentifikasi oleh klien DHCP dengan, sshd, server nfs, segelintir hal-hal lain. Mesin ini adalah router saya - terletak di antara internet dan jaringan lokal.
Sejak peningkatan itu telah gagal sebentar-sebentar. Ini akan baik-baik saja untuk sementara waktu setelah boot dan kemudian tiba-tiba kita akan kehilangan koneksi jaringan kami di wifi. Jika saya mencolokkan kabel jaringan saya tidak bisa mendapatkan alamat IP dari server DHCP. Jika saya mengatur sendiri alamat IP statis saya dapat terus mengakses internet dengan baik. Ini membuatnya tampak seperti server DHCP yang gagal (memang, saya menjalankan dhclient -v eth0
dan tidak ada yang menanggapi shoutout dhcpdiscover), perhatikan ketika klien mencoba untuk memperpanjang sewa IP mereka. Tapi kabel dengan IP statis saya masih bisa ke internet, jadi iptables masih baik-baik saja.
Jadi saya mencoba masuk ke mesin melalui SSH, tetapi sepertinya hang. Jika saya membuat ssh verbose saya melihat bahwa itu membuat koneksi ke server, kemudian gagal sedikit lebih jauh di telepon - sulit untuk melihat di mana tepatnya.
Saya perhatikan bahwa jika saya mencoba mengambil halaman web dari server HTTPnya, saya mendapatkan halaman yang saya minta tetapi permintaan tambahan apa pun yang dibuat (untuk gambar, stylesheet, javascript) tidak dilayani. Namun saya bisa mendapatkan file-file ini jika saya memintanya secara langsung, misalnya dari curl.
Apakah ini menunjukkan bahwa segala sesuatu akan menurun setiap kali ada upaya untuk memotong?
Saya menyeret monitor dan keyboard ke server (biasanya tanpa kepala) dan melihat - Saya melihat tumpukan jejak.
Saya beralih ke terminal virtual baru dan mencoba masuk. Saya mendapatkan jejak stack (kesalahan perlindungan umum) setelah memasukkan kata sandi saya. Ini dia:
Jan 6 20:19:54 localhost kernel: [ 1475.178245] general protection fault: 0000 [#12] SMP
Jan 6 20:19:54 localhost kernel: [ 1475.178292] CPU 1
Jan 6 20:19:54 localhost kernel: [ 1475.178309] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:19:54 localhost kernel: [ 1475.178911]
Jan 6 20:19:54 localhost kernel: [ 1475.178927] Pid: 1305, comm: login Tainted: G B D 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:19:54 localhost kernel: [ 1475.179028] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.179096] RSP: 0018:ffff88006b251d78 EFLAGS: 00010206
Jan 6 20:19:54 localhost kernel: [ 1475.179135] RAX: 0000000000000000 RBX: 00007f062bb91000 RCX: 000000000005b2ed
Jan 6 20:19:54 localhost kernel: [ 1475.179186] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179236] RBP: ffff88006b251dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179287] R10: 00000000000000d1 R11: ffff88006b23a8f0 R12: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179336] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:19:54 localhost kernel: [ 1475.179387] FS: 00007f062bb81700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:19:54 localhost kernel: [ 1475.179486] CR2: 00007f9b4d79da00 CR3: 0000000059a34000 CR4: 00000000000006e0
Jan 6 20:19:54 localhost kernel: [ 1475.179536] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179586] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:19:54 localhost kernel: [ 1475.179637] Process login (pid: 1305, threadinfo ffff88006b250000, task ffff880036058000)
Jan 6 20:19:54 localhost kernel: [ 1475.179695] Stack:
Jan 6 20:19:54 localhost kernel: [ 1475.179711] ffff880036058000 0000000000000041 0000000000000001 ffffffff81188cec
Jan 6 20:19:54 localhost kernel: [ 1475.179777] 0000000000000282 00007f062bb91000 ffff88006822ce00 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179841] 0000000000001000 0000000000000000 ffff88006b251e88 ffffffff811447c5
Jan 6 20:19:54 localhost kernel: [ 1475.179905] Call Trace:
Jan 6 20:19:54 localhost kernel: [ 1475.179928] [<ffffffff81188cec>] ? path_openat+0xfc/0x3f0
Jan 6 20:19:54 localhost kernel: [ 1475.179971] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:19:54 localhost kernel: [ 1475.180012] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:19:54 localhost kernel: [ 1475.180054] [<ffffffff81144e36>] sys_mmap_pgoff+0xc6/0x230
Jan 6 20:19:54 localhost kernel: [ 1475.180098] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:19:54 localhost kernel: [ 1475.180136] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:19:54 localhost kernel: [ 1475.180180] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:19:54 localhost kernel: [ 1475.180503] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.180552] RSP <ffff88006b251d78>
Jan 6 20:19:54 localhost kernel: [ 1475.180603] ---[ end trace 766ef1ef52f774b9 ]---
Jika saya menonton cukup lama, saya melihat lebih banyak kesalahan perlindungan umum. Aku pernah melihat mereka untuk login
, apache2
, deluge-web
, head
, powerbtn.sh
sejauh ini.
Saya harus mengatur ulang mesin untuk mengembalikannya ke kondisi kerja (saya bahkan mendapatkan kesalahan perlindungan umum powerbtn.sh
ketika saya menekan tombol power), tetapi itu tidak lama sebelum menjadi seperti ini lagi.
Saya belum menemukan cara mereproduksi ini sesuai permintaan - sepertinya terjadi secara acak.
Dalam hal ini berguna, saya melihat melalui kern.log dan menemukan kesalahan pertama. Ada ton dari mereka semua dalam satu baris dimulai dengan zsh
, kemudian deluged
, apache2
, cron
, head
, console-kit-dae
, irqbalance
, nmbd
... Inilah zsh
satu dan halaman yang buruk kesalahan negara yang datang tepat setelah:
Jan 6 20:13:35 localhost kernel: [ 1096.184250] general protection fault: 0000 [#1] SMP
Jan 6 20:13:35 localhost kernel: [ 1096.186339] CPU 1
Jan 6 20:13:35 localhost kernel: [ 1096.186355] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:35 localhost kernel: [ 1096.188008]
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Pid: 2564, comm: zsh Not tainted 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP: 0018:ffff880059877d78 EFLAGS: 00010206
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RAX: 0000000000000000 RBX: 00007f202c59d000 RCX: 000000000005b2ed
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RBP: ffff880059877dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R10: 0000000000100073 R11: ffff880059dbb2c0 R12: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] FS: 00007f202c5ac700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CR2: 00000000025991f0 CR3: 0000000059dbc000 CR4: 00000000000006e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Process zsh (pid: 2564, threadinfo ffff880059876000, task ffff88006b6b5c00)
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Stack:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 0000000000001000 0000000000000001 ffffffff8129e2e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 00007f202c59d000 ffff88006822f480 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000001000 0000000000000000 ffff880059877e88 ffffffff811447c5
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Call Trace:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff8129e2e0>] ? cap_vm_enough_memory+0x50/0x60
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144eb1>] sys_mmap_pgoff+0x141/0x230
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP <ffff880059877d78>
Jan 6 20:13:35 localhost kernel: [ 1096.274513] ---[ end trace 766ef1ef52f774ae ]---
Jan 6 20:13:37 localhost kernel: [ 1097.836149] BUG: Bad page state in process swapper/0 pfn:59a33
Jan 6 20:13:37 localhost kernel: [ 1097.838885] page:ffffea0001668cc0 count:0 mapcount:-1 mapping: (null) index:0xffff880059a33160
Jan 6 20:13:37 localhost kernel: [ 1097.841673] page flags: 0x100000000000000()
Jan 6 20:13:37 localhost kernel: [ 1097.844440] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:37 localhost kernel: [ 1097.856881] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-35-generic #55-Ubuntu
Jan 6 20:13:37 localhost kernel: [ 1097.860020] Call Trace:
Jan 6 20:13:37 localhost kernel: [ 1097.863063] <IRQ> [<ffffffff8111fe8f>] bad_page.part.61+0x9f/0xf0
Jan 6 20:13:37 localhost kernel: [ 1097.866119] [<ffffffff8111fef8>] bad_page+0x18/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.869158] [<ffffffff8112098e>] free_pages_prepare+0x10e/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.872178] [<ffffffff81120af9>] free_hot_cold_page+0x49/0x1a0
Jan 6 20:13:37 localhost kernel: [ 1097.875183] [<ffffffff81120c7d>] __free_pages+0x2d/0x40
Jan 6 20:13:37 localhost kernel: [ 1097.878163] [<ffffffff8159a8fb>] tcp_v4_destroy_sock+0x25b/0x2c0
Jan 6 20:13:37 localhost kernel: [ 1097.881105] [<ffffffff81582695>] inet_csk_destroy_sock+0x55/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.883970] [<ffffffff815849b0>] tcp_done+0x50/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.886853] [<ffffffff81591d92>] tcp_rcv_state_process+0x422/0x5f0
Jan 6 20:13:37 localhost kernel: [ 1097.889724] [<ffffffff8159a597>] tcp_v4_do_rcv+0xc7/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.892513] [<ffffffff8159c1f1>] tcp_v4_rcv+0x581/0x820
Jan 6 20:13:37 localhost kernel: [ 1097.895301] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.898110] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.900915] [<ffffffff81577c3d>] ip_local_deliver_finish+0xdd/0x280
Jan 6 20:13:37 localhost kernel: [ 1097.903716] [<ffffffff81577fa8>] ip_local_deliver+0x88/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.906502] [<ffffffff815778fd>] ip_rcv_finish+0x10d/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.909279] [<ffffffff815781e5>] ip_rcv+0x235/0x300
Jan 6 20:13:37 localhost kernel: [ 1097.912067] [<ffffffff81613dc7>] ? packet_rcv_spkt+0x47/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.914831] [<ffffffff81543446>] __netif_receive_skb+0x4d6/0x550
Jan 6 20:13:37 localhost kernel: [ 1097.917624] [<ffffffff81544230>] netif_receive_skb+0x80/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.920415] [<ffffffff81536474>] ? __netdev_alloc_skb+0x24/0x50
Jan 6 20:13:37 localhost kernel: [ 1097.923124] [<ffffffffa00d6e90>] rtl8139_rx+0x150/0x2b0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.925754] [<ffffffffa00d704a>] rtl8139_poll+0x5a/0xd0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.928274] [<ffffffff81544bd4>] net_rx_action+0x134/0x290
Jan 6 20:13:37 localhost kernel: [ 1097.930698] [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.933115] [<ffffffff8106f6e8>] __do_softirq+0xa8/0x210
Jan 6 20:13:37 localhost kernel: [ 1097.935495] [<ffffffff810967f5>] ? do_timer+0x25/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.937836] [<ffffffff81035dc2>] ? ack_apic_level+0x72/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.940163] [<ffffffff8166782c>] call_softirq+0x1c/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.942464] [<ffffffff81016305>] do_softirq+0x65/0xa0
Jan 6 20:13:37 localhost kernel: [ 1097.944778] [<ffffffff8106face>] irq_exit+0x8e/0xb0
Jan 6 20:13:37 localhost kernel: [ 1097.947068] [<ffffffff816680e3>] do_IRQ+0x63/0xe0
Jan 6 20:13:37 localhost kernel: [ 1097.949327] [<ffffffff8165d46e>] common_interrupt+0x6e/0x6e
Jan 6 20:13:37 localhost kernel: [ 1097.951597] <EOI> [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.953891] [<ffffffff810900a8>] ? hrtimer_start+0x18/0x20
Jan 6 20:13:37 localhost kernel: [ 1097.956171] [<ffffffff8101c983>] default_idle+0x53/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.958426] [<ffffffff8101cb5d>] amd_e400_idle+0x5d/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.960704] [<ffffffff81013236>] cpu_idle+0xd6/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.962970] [<ffffffff816235ee>] rest_init+0x72/0x74
Jan 6 20:13:37 localhost kernel: [ 1097.965195] [<ffffffff81cfbc03>] start_kernel+0x3b0/0x3bd
Jan 6 20:13:37 localhost kernel: [ 1097.967421] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
Jan 6 20:13:37 localhost kernel: [ 1097.969660] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.971888] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
Apa yang sedang terjadi disini? Apa yang dapat saya?
memtest
. Tetapi ketika jejak Anda muncul sangat awal, saya ragu itu adalah ingatan. Perangkat keras apa yang dimiliki server Anda? Apakah Anda melakukan tweaking / overclocking?