WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen 3.0.0 32bit-pae (testing changeset 8270) crashes(pgt

To: xen-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-users] Xen 3.0.0 32bit-pae (testing changeset 8270) crashes(pgtable.c:284, kernel bug?)
From: Ralph Passgang <ralph@xxxxxxxxxxxxx>
Date: Thu, 2 Feb 2006 15:09:48 +0100
Cc: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, ian.pratt@xxxxxxxxxxxx
Delivery-date: Thu, 02 Feb 2006 14:20:07 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <A95E2296287EAD4EB592B5DEEFCE0E9D40A4C4@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D40A4C4@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.1
Am Donnerstag, 2. Februar 2006 12:52 schrieb Ian Pratt:
> > 3.0.1 seems to fix the bug I saw on my two machines, but now
> > there is another
> > (but somehow related) problem for me in 3.0.1-pae. I don't
> > know if it's still
> > related to the 3ware controller, but at least it only appears
> > for domains
> > that have memory above the 32bit adress-space again, so the
> > first started
> > domUs run fine. The big difference is that I don't have any
> > complete freezes
> > of the xen machine anymore, just domUs are crashing this time.
>
> Interesting. It looks like xen is running out of memory below 4GB, and
> can't service the domain's request for a new L3 PGD, causing the domain
> to bug out.
>
> Are you using dom0_mem= on the xen command line to constrain dom0's
> memory usage or are relying on dom0 releasing memory automatically as
> you start other domains? If the latter, I expect dom0 is hogging all the
> pages below 4GB. [Grrr, PAE is such a crock...]
>
> Given that your 3ware controller is already putting pressure on the
> bottom 4GB you'd be better off setting your initial dom0 memory at boot
> time.

your right, if dom0_mem set to 196MB then I can start 20 domains using all 
available ram without any problems. no crashing of a domU or even the whole 
system. Really seems to work now... Great! Thanks a lot!

> Please let me know how you get on. BTW: can you get a serial line on the
> machine? It might be interesting to see some of xen's memory usage
> diagnostics.

Before I rebooted with dom0_mem set to 196MB I took some information from the 
half-crashed domU (which was still running, but not useable because more or 
less all commands were crashing):

(XEN) General information for domain 52:
(XEN)     flags=0 refcnt=3 nr_pages=32763 xenheap_pages=5 dirty_cpus={}
(XEN)     handle=e4bc6beb-2398-402e-9956-6c2975f74fea
(XEN) Rangesets belonging to domain 52:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 52:
(XEN)     DomPage list too long to display
(XEN)     XenPage 00b70000: mfn=00000b70, caf=80000002, taf=f0000002
(XEN)     XenPage 00b71000: mfn=00000b71, caf=80000002, taf=f0000002
(XEN)     XenPage 00b72000: mfn=00000b72, caf=80000002, taf=f0000002
(XEN)     XenPage 00b73000: mfn=00000b73, caf=80000002, taf=f0000002
(XEN)     XenPage 00b63000: mfn=00000b63, caf=80000002, taf=f0000002
(XEN) VCPU information and callbacks for domain 52:
(XEN)     VCPU0: CPU3 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={3} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
(XEN)     VCPU1: CPU0 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
(XEN)     VCPU2: CPU1 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)

(I hope I got the right information for you)

I also saw:  "xen_net: Memory squeeze in netback driver." in dom0 and a lot 
of: "Timer ISR/0: Time went backwards: delta=-3312252171 cpu_delta=107747829 
shadow=379343963229 ..." messages as I tried to get information with SysRQ 
via serial.

Even if I have a solution now (using dom0_mem and 3.0.1) I would like to help 
solve that problem. So if you need more information, please let me know. I 
think I will have this testing machine for about a week from now on. After 
that time it should be used as production system as well and then no more 
debugging can be made.

> Ian
>
> > the domU doesn't always crash at the very same place,
> > sometimes at the
> > beginning of the init process, sometimes when it loads
> > modules, sometimes
> > when services gets started... Sometimes this crash happens
> > more then once
> > before the domU panics.
> >
> > here is what I see in the domU console:
> >
> > ------------[ cut here ]------------
> > kernel BUG at <bad filename>:63723!
> > invalid operand: 0000 [#1]
> > SMP
> > Modules linked in: 8250 reiserfs efs isofs vfat fat ext3 jbd
> > evdev pci_hotplug
> > dm_mod sd_mod 3w_xxxx e1000 jedec_probe cfi_probe gen_probe
> > chipreg mtdcore
> > map_funcs i2c_i801 i2c_core parport_pc parport serial_core
> > usbhid pcmcia
> > yenta_socket rsrc_nonstatic pcmcia_core processor genrtc sbp2
> > ohci1394
> > ieee1394 usb_storage ohci_hcd uhci_hcd 3w_9xxx scsi_mod unix
> > CPU:    0
> > EIP:    0061:[<c01182b6>]    Not tainted VLI
> > EFLAGS: 00010282   (2.6.12.6-xen)
> > EIP is at pgd_ctor+0x26/0x30
> > eax: fffffff4   ebx: 00000001   ecx: f577e000   edx: 00000000
> > esi: c118fd80   edi: c12bd258   ebp: c12bd240   esp: c864dd38
> > ds: 007b   es: 007b   ss: 0069
> > Process rcS (pid: 1041, threadinfo=c864c000 task=c06f8a40)
> > Stack: c77ae000 00000000 00000020 c014dd51 c77ae000 c118fd80
> > 00000001 c12bd240
> >        c77ae000 c118fd80 00000000 c014decd c118fd80 c12bd240
> > 00000001 000000d0
> >        c118fde0 00000001 000000d0 c119d980 0000000c 000000d0
> > 00000000 c014e0db
> > Call Trace:
> >  [<c014dd51>] cache_init_objs+0x71/0x80
> >  [<c014decd>] cache_grow+0x10d/0x1a0
> >  [<c014e0db>] cache_alloc_refill+0x17b/0x220
> >  [<c014e39f>] kmem_cache_alloc+0x7f/0x90
> >  [<c011833d>] pgd_alloc+0x1d/0x310
> >  [<c01216fe>] mm_init+0xce/0x100
> >  [<c0121a14>] copy_mm+0xd4/0x3d0
> >  [<c0121fdf>] copy_files+0x1af/0x320
> >  [<c03f9d00>] parse_header+0xb0/0xe0
> >  [<c03f9d04>] parse_header+0xb4/0xe0
> >  [<c01225af>] copy_process+0x3df/0xd00
> >  [<c0166f4f>] fd_install+0x2f/0x60
> >  [<c0122fc9>] do_fork+0x69/0x18f
> >  [<c0130e4a>] sys_rt_sigprocmask+0xaa/0x110
> >  [<c0108f91>] sys_fork+0x31/0x40
> >  [<c010a65d>] syscall_call+0x7/0xb
> > Code: 00 f3 ab 5f c3 83 ec 0c b8 20 00 00 00 89 44 24 08 31
> > c0 89 44 24 04 8b
> > 44 24 10 89 04 24 e8 d2 2b 00 00 85 c0 75 04 83 c4 0c c3 <0f>
> > 0b eb f8 8d b6
> > 00 00 00 00 83 ec 08 b8 f8 e3 36 c0 89 5c 24
> >  /etc/init.d/rcS: line 57:  1041 Segmentation fault      (
> > trap - INT QUIT
> > TSTP; set start; . $i )
> >
> > something I can do to help resolving that?
> >
> > thx & regards,
> > -- Ralph
> >
> > > Ian
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>