WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61

To: <giamteckchoon@xxxxxxxxx>
Subject: [Xen-devel] RE: Kernel BUG at arch/x86/mm/tlb.c:61
From: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Date: Thu, 14 Apr 2011 19:16:37 +0800
Cc: jeremy@xxxxxxxx, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, konrad.wilk@xxxxxxxxxx
Delivery-date: Thu, 14 Apr 2011 04:17:35 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
Importance: Normal
In-reply-to: <BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, <BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx>, <BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx>, <BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>, <BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@xxxxxxxxxxxxxx>, <BANLkTimkMgYNyANcKiZu5tJTL4==zdP3xg@xxxxxxxxxxxxxx>, <BLU157-w116F1BB57ABFDE535C7851DAA80@xxxxxxx>, <4DA3438A.6070503@xxxxxxxx>, <BLU157-w2C6CD57CEA345B8D115E8DAAB0@xxxxxxx>, <BLU157-w36F4E0A7503A357C9DE6A3DAAB0@xxxxxxx>, <20110412100000.GA15647@xxxxxxxxxxxx>, <BLU157-w14B84A51C80B41AB72B6CBDAAD0@xxxxxxx>, <BANLkTinNxLnJxtZD68ODLSJqafq0tDRPfw@xxxxxxxxxxxxxx>, <BLU157-w30A1A208238A9031F0D18EDAAD0@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hi:
 
       As I go through the code.
       From tlb.c:60, it looks like it  cpu_tlbstate.state  is TLBSTATE_OK, 
which indicates in user space, but the caller, in mmu.c:1512, 
(active_mm == mm) indicates kernel space, that the conflict.
    
   Well, the panic CPU is processing IPI interrupt, could it be something wrong
with CPU mask?
 
   thanks. 
 
======arch/x86/mm/tlb.c===
 58 void leave_mm(int cpu)
 59 {
 60 <+++if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 61 <+++<+++BUG();                                                                                                                                          
 62 <+++cpumask_clear_cpu(cpu,
 63 <+++<+++<+++  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
 64 <+++load_cr3(swapper_pg_dir);
 65 }
 66 EXPORT_SYMBOL_GPL(leave_mm);
 67
 
///arch/x86/xen/mmu.c
 
1502 #ifdef CONFIG_SMP
1503 /* Another cpu may still have their %cr3 pointing at the pagetable, so
1504    we need to repoint it somewhere else before we can unpin it. */
1505 static void drop_other_mm_ref(void *info)
1506 {
1507 <+++struct mm_struct *mm = info;
1508 <+++struct mm_struct *active_mm;
1509
1510 <+++active_mm = percpu_read(cpu_tlbstate.active_mm);
1511
1512 <+++if (active_mm == mm)
1513 <+++<+++leave_mm(smp_processor_id());                                                                                & nbsp;                                 
1514
1515 <+++/* If this cpu still has a stale cr3 reference, then make sure
1516 <+++   it has been flushed. */
1517 <+++if (percpu_read(xen_current_cr3) == __pa(mm->pgd))
1518 <+++<+++load_cr3(swapper_pg_dir);
1519 }


 
> Date: Thu, 14 Apr 2011 15:26:14 +0800
> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> From: giamteckchoon@xxxxxxxxx
> To: tinnycloud@xxxxxxxxxxx
> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; jeremy@xxxxxxxx; konrad.wilk@xxxxxxxxxx
>
> 2011/4/14 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> > Hi:
> >
> >       I've done test with "cpuidle=0 cpufreq=none", two machine crashed.
> >
> > blktap_sysfs_destroy
> > blktap_sysfs_destroy
> > blktap_sysfs_create: adding attributes for dev ffff8800ad581000
> > blktap_sysfs_create: adding attributes for dev ffff8800a48e3e00
> > ------------[ cut here ]------------
> > kernel BUG at arch/x86/mm/tlb.c:61!
> > invalid opcode: 0000 [#1] SMP
> > last& nbsp;sysfs file: /sys/block/tapdeve/dev
> > CPU 0
> > Modules linked in: 8021q garp blktap xen_netback xen_blkback blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_ms
> > ghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy bnx2
> > serio_raw snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer i2c_core snd iT
> > CO_wdt pata_acpi soundcore iTCO_vendor_
> > support ata_generic snd_page_alloc pcspkr ata_piix shpchp mptsas mptscsih mptbase [last unloa
> > ded: freq_t able]
> > Pid: 8022, comm: khelper Not tainted 2.6.32.36xen #1 Tecal RH2285
> > RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
> > RSP: e02b:ffff88002803ee48  EFLAGS: 00010046
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81675980
> > RDX: ffff88002803ee78 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff88002803ee48 R08: ffff8800a4929000 R09: dead000000200200
> > R10: dead000000100100 R11: ffffffff81447292 R12: ffff88012ba07b80
> > R13: ffff880028046020 R14: 00000000000004fb R15: 0000000000000000
> > FS:  00007f410af416e0(0000) GS:ffff88002803b000(0000) knlGS:0000000000000000
> > CS:  e033&nb sp;DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000469000 CR3: 00000000ad639000 CR4: 0000000000002660
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process khelper (pid: 8022, threadinfo ffff8800a4846000, task ffff8800a9ed0000)
> > Stack:
> >  ffff88002803ee68 ffffffff8100e4a4 0000000000000001 ffff880097de3b88
> > <0> ffff88002803ee98 ffffffff81087224 ffff88002803ee78 ffff88002803ee78
> > <0> ffff88015f808180 00000000000004fb ffff88002803eea8 ffffffff810100e8
> > Call Trace:
> >  <IRQ>
> >  [<ffffffff8100e4a4>] drop_other_mm_ref+0x2a/0x5 3
> >  [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
> >  [<ffffffff810100e8>] xen_call_function_single_interrupt+0x13/0x28
> >  [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
> >  [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
> >  [<ffffffff8128c1a8>] __xen_evtchn_do_upcall+0x1ab/0x27d
> >  [<ffffffff8128dcf9>] xen_evtchn_do_upcall+0x33/0x46
> >  [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
> >  <EOI>
> >  [<ffffffff81447292>] ? _spin_unlock_irqrestore+0x15/0x17
> >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff81113f75>] ? flush_old_exec+0x3ac/0x500
> >  [<ffffffff81150dc9>] ? load_elf_binary+ 0x0/0x17ef
> >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
> >  [<ffffffff81151161>] ? load_elf_binary+0x398/0x17ef
> >  [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
> >
> > [<ffffffff811f463c>] ? process_measurement+0xc0/0xd7
> >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
> >  [<ffffffff81113098>] ? search_binary_handler+0xc8/0x255
> >  [<ffffffff81114366>] ? do_execve+0x1c3/0x29e
> >  [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff8100 f8af>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
> >  [<ffffffff81013daa>] ? child_rip+0xa/0x20
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
> >  [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
> >  [<ffffffff81013da0>] ? c
> > hild_rip+0x0/0x20
> > Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe  65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
> > RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
> >  RSP <ffff88002803ee48>
> > ---[ end trace 1522f17fdfc9162d ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
> > Pid: 8022, comm: khelper Tainted: G      D    2.6.32.36xen #1
> > Call Trace:
> >  <IRQ>  [<ffffffff8105682e>] panic+0xe0/0x19a
> >  [<ffffffff8144006a>] ? init_amd+0x296/0x37a
> >  [<ffffffff8100f169>] ? xen_force_evtchn_callback+0xd/0xf
> >  [<ffffffff8100f8c2>] ? check_events+0x12/0x20
> >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff81056487>] ? print_oops_end_marker+0x23/0x25
> >  [<ffffffff81448165>] oops_end+0xb6/0xc6
> >  [<ffffffff810166e5>] die+0x5a/0x63
> >  [<ffffffff81447a3c>] do_trap+0x115/0x124
> >  [<ffffffff810148e6>] do_invalid_op+0x9c/0xa5
> >  [<ffffffff8103a3cb>] ? leave_mm+0x15/0x46
> >  [<ffffffff8100f6e6>] ? xen_clocksource_read+0x21/0x23
> >  [<ffffffff8100f258>] ? HYPERVISOR_vcpu_op+0xf/0x11
> >  [<ffffffff8100f753>] ? xen_vcpuop_set_next_event+0x52/0x67
> > ;  [<ffffffff81013b3b>] invalid_op+0x1b/0x20
> >  [<ffffffff81447292>] ? _spin_unlock_irqrestore+0x15/0x17
> >  [<ffffffff8103a3cb>] ? leave_mm+0x15/0x46
> >  [<ffffffff8100e4a4>] drop_other_mm_ref+0x2a/0x53
> >  [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
> >  [<ffffffff810100e8>] xen_call_function_single_interrupt+0x13/0x28
> >  [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
> >  [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
> >  [<ffffffff8128c1a8>] __xen_evtchn_do_upcall+0x1ab/0x27d
> >  [<ffffffff8128dcf9>] xen_evtchn_do_upcall+0x33/0x46
> >  [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
> >  <EOI>  [<ffffffff81447292> ] ? _spin_unlock_irqrestore+0x15/0x17
> >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff81113f75>] ? flush_old_exec+0x3ac/0x500
> >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
> >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
> >  [<ffffffff81151161>] ? load_elf_binary+0x398/0x17ef
> >  [<ffffffff81042fcf>] ? need_resched+0x23/0x
> > 2d
> >  [<ffffffff811f463c>] ? process_measurement+0xc0/0xd7
> >  [<ffffffff81150dc9>] ? load_elf_binary+0x0/0x17ef
> >  [<ffffffff81113098>] ? search_binary_handler+0xc8/0x255
> >  [<ffffffff81114366>] ? do_execve+0x1c3/0x29e
> >  [<ffffffff8101155d>] ? sys_execve+ 0x43/0x5d
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff8100f8af>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
> >  [<ffffffff81013daa>] ? child_rip+0xa/0x20
> >  [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
> >  [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
> >  [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
> >  [<ffffffff81013da0>] ? child_rip+0x0/0x20
> > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> >
> >> Date: Tue, 12 Apr 2011 06:00:00 -0400
> >> From: konrad.wilk@xxxxxxxxxx
> >> To: tinnycloud@xxxxxxxxxxx
> >> CC: xen-devel@xxxxxxxxxxxxxxxxxxx; giamteckchoon@xxxxxxxxx;
> >> jeremy@xxxxxxxx
> >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61
> >>
> >> On Tue, Apr 12, 2011 at 05:11:51PM +0800, MaoXiaoyun wrote:
> >> >
> >> > Hi :
> >> >
> >> > We are using pvops kernel 2.6.32.36 + xen 4.0.1, but confront a kernel
> >> > panic bug.
> >> >
> >> > 2.6.32.36 Kernel:
> >> > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=bb1a15e55ec665a64c8a9c6bd699b1f16ac01ff4
> >> > Xen 4.0.1 http://xenbits.xen.org/hg/xen-4.0-testing.hg/rev/b536ebfba183
> >> >
> >> > Our test is simple, 24 HVMS(Win2003 ) on a single host, each HVM loopes
> >> > in restart every 15minutes.
> >>
> >> What is the storage that you are using for your guests? AoE? Local disks?
> >>
> >> > About 17 machines are invovled in the test, after 10 hours run, one
> >> > confrontted a crash at arch/x86/mm/tlb.c:61
> >> >
> >> > Currently I am trying "cpuidle=0 cpufreq=none" tests based on Teck's
> >> > suggestion.
> >> >
> >> > Any comments, thanks.
> >> >



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>