在VirtualBox上运行的Linux内核oops会破坏服务器上一些与IO相关的function

我们在Windows 7机器上的VirtualBox上运行的CentOS 6.3版本遇到问题。 症状如下:

一切正常工作几个小时,甚至几天。 然后发生一些破坏系统的事情。

事情发生之后,我们仍然可以做些什么

  • 访问Web服务器
  • 使用现有的SSH会话来运行顶级和免费的

什么不行

  • 开始新的SSH会话(input用户名和密码后挂起)
  • 在现有SSH会话中运行ls(挂起)
  • SSI包括从我们的Web服务器获取远程机器的数据
  • 可能更多

发生这种情况时,我们在服务器上看到的是以下内容

  • 负荷平均从基本没有到3左右
  • CPU使用率仍然很低(5%)
  • 磁盘活动不足(正在运行iostat)
  • 大量的内存可用
  • 大量的磁盘空间可用

在/ var / log / messages中,我们得到以下内容:

Jun 14 01:10:48 devvm kernel: e1000 0000:00:03.0: eth0: Detected Tx Unit Hang Jun 14 01:10:48 devvm kernel: Tx Queue <0> Jun 14 01:10:48 devvm kernel: TDH <2e> Jun 14 01:10:48 devvm kernel: TDT <30> Jun 14 01:10:48 devvm kernel: next_to_use <30> Jun 14 01:10:48 devvm kernel: next_to_clean <2e> Jun 14 01:10:48 devvm kernel: buffer_info[next_to_clean] Jun 14 01:10:48 devvm kernel: time_stamp <1038284db> Jun 14 01:10:48 devvm kernel: next_to_watch <2f> Jun 14 01:10:48 devvm kernel: jiffies <103828b42> Jun 14 01:10:48 devvm kernel: next_to_watch.status <0> Jun 14 01:10:50 devvm kernel: e1000 0000:00:03.0: eth0: Detected Tx Unit Hang Jun 14 01:10:50 devvm kernel: Tx Queue <0> Jun 14 01:10:50 devvm kernel: TDH <2e> Jun 14 01:10:50 devvm kernel: TDT <30> Jun 14 01:10:50 devvm kernel: next_to_use <30> Jun 14 01:10:50 devvm kernel: next_to_clean <2e> Jun 14 01:10:50 devvm kernel: buffer_info[next_to_clean] Jun 14 01:10:50 devvm kernel: time_stamp <1038284db> Jun 14 01:10:50 devvm kernel: next_to_watch <2f> Jun 14 01:10:50 devvm kernel: jiffies <103829312> Jun 14 01:10:50 devvm kernel: next_to_watch.status <0> Jun 14 01:10:52 devvm kernel: ------------[ cut here ]------------ Jun 14 01:10:52 devvm kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Jun 14 01:10:52 devvm kernel: Hardware name: VirtualBox Jun 14 01:10:52 devvm kernel: NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out Jun 14 01:10:52 devvm kernel: Modules linked in: vboxsf(U) ipv6 ppdev parport_pc parport microcode sg vboxguest(U) i2c_piix4 i2c_core e1000 snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc pcnet32 mii ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun 14 01:10:52 devvm kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #1 Jun 14 01:10:52 devvm kernel: Call Trace: Jun 14 01:10:52 devvm kernel: <IRQ> [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0 Jun 14 01:10:52 devvm kernel: [<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50 Jun 14 01:10:52 devvm kernel: [<ffffffff814595fd>] ? dev_watchdog+0x26d/0x280 Jun 14 01:10:52 devvm kernel: [<ffffffff81099138>] ? sched_clock_cpu+0xb8/0x110 Jun 14 01:10:52 devvm kernel: [<ffffffff81459390>] ? dev_watchdog+0x0/0x280 Jun 14 01:10:52 devvm kernel: [<ffffffff8107e897>] ? run_timer_softirq+0x197/0x340 Jun 14 01:10:52 devvm kernel: [<ffffffff810a21c0>] ? tick_sched_timer+0x0/0xc0 Jun 14 01:10:52 devvm kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30 Jun 14 01:10:52 devvm kernel: [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0 Jun 14 01:10:52 devvm kernel: [<ffffffff81096c50>] ? hrtimer_interrupt+0x140/0x250 Jun 14 01:10:52 devvm kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Jun 14 01:10:52 devvm kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Jun 14 01:10:52 devvm kernel: [<ffffffff81073ca5>] ? irq_exit+0x85/0x90 Jun 14 01:10:52 devvm kernel: [<ffffffff81505be0>] ? smp_apic_timer_interrupt+0x70/0x9b Jun 14 01:10:52 devvm kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Jun 14 01:10:52 devvm kernel: <EOI> [<ffffffff810387cb>] ? native_safe_halt+0xb/0x10 Jun 14 01:10:52 devvm kernel: [<ffffffff810149cd>] ? default_idle+0x4d/0xb0 Jun 14 01:10:52 devvm kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Jun 14 01:10:52 devvm kernel: [<ffffffff814e433a>] ? rest_init+0x7a/0x80 Jun 14 01:10:52 devvm kernel: [<ffffffff81c21f7b>] ? start_kernel+0x424/0x430 Jun 14 01:10:52 devvm kernel: [<ffffffff81c2133a>] ? x86_64_start_reservations+0x125/0x129 Jun 14 01:10:52 devvm kernel: [<ffffffff81c21438>] ? x86_64_start_kernel+0xfa/0x109 Jun 14 01:10:52 devvm kernel: ---[ end trace 2c7bb984812cf120 ]--- Jun 14 01:10:52 devvm kernel: e1000 0000:00:03.0: eth0: Reset adapter Jun 14 01:10:53 devvm abrtd: Directory 'oops-2013-06-14-01:10:53-1537-0' creation detected Jun 14 01:10:53 devvm abrt-dump-oops: Reported 1 kernel oopses to Abrt Jun 14 01:10:53 devvm abrtd: Can't open file '/var/spool/abrt/oops-2013-06-14-01:10:53-1537-0/uid': No such file or directory Jun 14 01:10:55 devvm kernel: Bridge firewalling registered 

在此之后,我们每隔两分钟就会看到:

 Jun 14 01:14:22 devvm kernel: INFO: task events/0:19 blocked for more than 120 seconds. Jun 14 01:14:22 devvm kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 14 01:14:22 devvm kernel: events/0 D 0000000000000000 0 19 2 0x00000000 Jun 14 01:14:22 devvm kernel: ffff880116c4fb90 0000000000000046 00000000ffffffff 0000000000000008 Jun 14 01:14:22 devvm kernel: 0000000000016680 0000000000016680 ffff880028210400 0000000000016680 Jun 14 01:14:22 devvm kernel: ffff880116c4daf8 ffff880116c4ffd8 000000000000fb88 ffff880116c4daf8 Jun 14 01:14:22 devvm kernel: Call Trace: Jun 14 01:14:22 devvm kernel: [<ffffffff8105b483>] ? perf_event_task_sched_out+0x33/0x80 Jun 14 01:14:22 devvm kernel: [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 Jun 14 01:14:22 devvm kernel: [<ffffffff8100975d>] ? __switch_to+0x13d/0x320 Jun 14 01:14:22 devvm kernel: [<ffffffff814fe323>] wait_for_common+0x123/0x180 Jun 14 01:14:22 devvm kernel: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff8108d093>] __cancel_work_timer+0x1b3/0x1e0 Jun 14 01:14:22 devvm kernel: [<ffffffff8108cbe0>] ? wq_barrier_func+0x0/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff8108d0f0>] cancel_work_sync+0x10/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffffa01c5ca5>] e1000_down_and_stop+0x25/0x50 [e1000] Jun 14 01:14:22 devvm kernel: [<ffffffffa01cb695>] e1000_down+0x155/0x200 [e1000] Jun 14 01:14:22 devvm kernel: [<ffffffffa01cbcb0>] ? e1000_reset_task+0x0/0xe0 [e1000] Jun 14 01:14:22 devvm kernel: [<ffffffffa01cbd1e>] e1000_reset_task+0x6e/0xe0 [e1000] Jun 14 01:14:22 devvm kernel: [<ffffffff8108c760>] worker_thread+0x170/0x2a0 Jun 14 01:14:22 devvm kernel: [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 Jun 14 01:14:22 devvm kernel: [<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0 Jun 14 01:14:22 devvm kernel: [<ffffffff81091d66>] kthread+0x96/0xa0 Jun 14 01:14:22 devvm kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0 Jun 14 01:14:22 devvm kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20 Jun 14 01:14:22 devvm kernel: INFO: task parted:8069 blocked for more than 120 seconds. Jun 14 01:14:22 devvm kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 14 01:14:22 devvm kernel: parted D 0000000000000003 0 8069 7994 0x00000080 Jun 14 01:14:22 devvm kernel: ffff8800908b3bb8 0000000000000082 0000000000000000 ffff88010ab50080 Jun 14 01:14:22 devvm kernel: ffff880116c7d500 0000000000000001 0000000000000000 0000000000000000 Jun 14 01:14:22 devvm kernel: ffff88010ab50638 ffff8800908b3fd8 000000000000fb88 ffff88010ab50638 Jun 14 01:14:22 devvm kernel: Call Trace: Jun 14 01:14:22 devvm kernel: [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 Jun 14 01:14:22 devvm kernel: [<ffffffff814fe323>] wait_for_common+0x123/0x180 Jun 14 01:14:22 devvm kernel: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff8112b6d0>] ? lru_add_drain_per_cpu+0x0/0x10 Jun 14 01:14:22 devvm kernel: [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff8108d177>] flush_work+0x77/0xc0 Jun 14 01:14:22 devvm kernel: [<ffffffff8108cbe0>] ? wq_barrier_func+0x0/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff8108d2f3>] schedule_on_each_cpu+0x133/0x180 Jun 14 01:14:22 devvm kernel: [<ffffffff811ad440>] ? invalidate_bh_lru+0x0/0x50 Jun 14 01:14:22 devvm kernel: [<ffffffff8112ae35>] lru_add_drain_all+0x15/0x20 Jun 14 01:14:22 devvm kernel: [<ffffffff811adf6a>] invalidate_bdev+0x2a/0x50 Jun 14 01:14:22 devvm kernel: [<ffffffff8125e9a4>] blkdev_ioctl+0x3b4/0x6e0 Jun 14 01:14:22 devvm kernel: [<ffffffff811b381c>] block_ioctl+0x3c/0x40 Jun 14 01:14:22 devvm kernel: [<ffffffff8118dec2>] vfs_ioctl+0x22/0xa0 Jun 14 01:14:22 devvm kernel: [<ffffffff8118e064>] do_vfs_ioctl+0x84/0x580 Jun 14 01:14:22 devvm kernel: [<ffffffff8118e5e1>] sys_ioctl+0x81/0xa0 Jun 14 01:14:22 devvm kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 

在/ var / spool / abrt / oops-2013-06-14-01:10:53-1537-0我们可以看到以下信息:

在回溯:

 WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Hardware name: VirtualBox NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out Modules linked in: vboxsf(U) ipv6 ppdev parport_pc parport microcode sg vboxguest(U) i2c_piix4 i2c_core e1000 snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc pcnet32 mii ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff814595fd>] ? dev_watchdog+0x26d/0x280 [<ffffffff81099138>] ? sched_clock_cpu+0xb8/0x110 [<ffffffff81459390>] ? dev_watchdog+0x0/0x280 [<ffffffff8107e897>] ? run_timer_softirq+0x197/0x340 [<ffffffff810a21c0>] ? tick_sched_timer+0x0/0xc0 [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30 [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0 [<ffffffff81096c50>] ? hrtimer_interrupt+0x140/0x250 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81073ca5>] ? irq_exit+0x85/0x90 [<ffffffff81505be0>] ? smp_apic_timer_interrupt+0x70/0x9b [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff810387cb>] ? native_safe_halt+0xb/0x10 [<ffffffff810149cd>] ? default_idle+0x4d/0xb0 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 [<ffffffff814e433a>] ? rest_init+0x7a/0x80 [<ffffffff81c21f7b>] ? start_kernel+0x424/0x430 [<ffffffff81c2133a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81c21438>] ? x86_64_start_kernel+0xfa/0x109 

在cmdline:

 ro root=/dev/mapper/vg_01-lv_root rd_NO_LUKS LANG=en_US.UTF-8 KEYBOARDTYPE=pc KEYTABLE=sv-latin1 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_01/lv_root crashkernel=129M@0M rhgb quiet rd_LVM_LV=vg_01/lv_swap rd_NO_DM rhgb quie 

附加信息:

 # uname -a Linux devvm 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release CentOS release 6.3 (Final) 

VirtualBox版本4.2.6。

任何有关如何进行故障排除的见解, 如果您需要更多信息,请告诉我。

尝试切换NICtypes。 可能是一个驱动程序的错误。