WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

[Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872

To: MaoXiaoyun <tinnycloud@xxxxxxxxxxx>
Subject: [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872
From: Teck Choon Giam <giamteckchoon@xxxxxxxxx>
Date: Mon, 11 Apr 2011 04:14:45 +0800
Cc: jeremy@xxxxxxxx, xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, keir@xxxxxxx, ian.campbell@xxxxxxxxxx, konrad.wilk@xxxxxxxxxx, dave@xxxxxxxxxx
Delivery-date: Sun, 10 Apr 2011 13:15:51 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7cKNtasEH63vYqeYIU1UbOyUhcCl+xihV3WvNX7il8c=; b=HaCGS8I3t06JRndrUxMU7W3gq5c/T53eEqOsQiel4uK1PXahw2p9uPnPM8aPg6X0Sn HGFkGw6BglJLH7GTjvQlw4mN6+67QXylP1hQ5P6Ej4IdsAbUyY7BdU3zcY8jeEv4k7+R EYJFdHEoVy0gUNuQlQEzwD+BR74JmiGSgq+NM=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=gb565dRrGv+DTYKbiffFODBe0crrnXK0w94ympuIwAWtYDnLFoU5JzmSsszPE8O/Xr h7GlTjVwzWtl/iDnUlFgHmL7N7/dsoJEjkxkujeuKK4E+9IjtvlQ1uhdGhocmrfA3uiF 9WK5uLmOdU1cO2bsyPjwUDyNy/UXLX0wW+hXE=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <COL0-MC1-F14hmBzxHs00230882@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <BLU157-w488E5FEBD5E2DBC0666EF1DAA70@xxxxxxx> <BLU157-w5025BFBB4B1CDFA7AA0966DAA90@xxxxxxx> <BLU157-w540B39FBA137B4D96278D2DAA90@xxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
2011/4/10 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>:
> Hi Konrad & Jeremy:
>
>             I think we finally located the missing patch for this commit.
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=c97f681f138039425c87f35ea46a92385d81e70e
>             which is works.
>
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=221c64dbf860d37f841f40893bddf8d804aa55bd
>             which server crashed.
>
>              Later I found the comments for this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
>
>             So It looks like this fix is not applied on 2.6.32.36, Could you
> take a look at this?
>
>             Many thanks.
>
> =====================================================
>>Hi Konrad & Jeremy:
>>
>>     I'd like to open this BUG in a new thread, since the old thread is too
>> long for easy read.
>>
>>     We recently want to upgrade our kernel to 2.6.32, but unfortunately,
>> we confront a kernel crash bug.
>>Our test case is simple, start 24 win2003 HVMS on our physical machine, and
>> each HVM reboot
>>every 15minutes. The kernel will crash in half an hour.(That is crash on VM
>> second starts).
>>
>>Our test go much further.
>>We test different kernel version.
>>2.6.32.10
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=d945b014ac5df9592c478bf9486d97e8914aab59
>>2.6.32.11
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27f948a3bf365a5bc3d56119637a177d41147815
>>2.6.32.12
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ba739f9abd3f659b907a824af1161926b420a2ce
>>2.6.32.13
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=f6fe6583b77a49b569eef1b66c3d761eec2e561b
>>2.6.32.15
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27ed1b0e0dae5f1d5da5c76451bc84cb529128bd
>>2.6.32.21
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=69e50db231723596ed8ef9275d0068d6697f466a
>>
>>There are basic three different result we met.
>>
>>i1) grant table issue
>>The host still function, but use xm  dmesg, we have abnormal log.
>>please refer to the attched log of grant table
>>
>>i2) kernel crash on a different place.
>>Host die during the test, after reboot, we can see nothing abnormal in
>> /var/log/messages
>>
>>i3) kernel BUG at arch/x86/xen/mmu.c:1872;
>>Host die during the test, after reboot, we see the crash log in messages,
>> refer to the attached log of 2.6.32.36
>>Summary of the test result, can be classified in two:
>>
>>1) 2.6.32.10
>>30 machines involved the test, and three has issue (i1), and two has issue
>> (i2), *no* issue (i3)
>>Other machines run tests successfully till now, more than 8 hours
>>
>>2)2.6.32.11 or later version.
>>Each version containers 10 machine for tests, and all machine crashed in
>> less than half an hour.
>>
>>Conclusion:
>>1) grant table issue exists in all kernel version
>>2) kernerl crash at different place may exist in all kernel versions, but
>> not happen so frequently, 2 out of 30
>>3) We observe the major difference of issue i3), from the test, it looks
>> like it is introduced between the version
>>2.6.32.10 and 2.6.32.11.
>>
>>Hope this help to locate the bug.
>>Many thanks.
>>
>>
>

Hi,

Sorry, since this mmu related BUG has been troubled me for very
long... I really want to "kill" this BUG but my knowledge in kernel
hacking and/or xen is very limited.

While waiting for Jeremy or Konrad or others ...

Many thanks for spending time to track down this mmu related BUG.  I
have backported the commit from
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
to 2.6.32.36 PVOPS kernel and patch attached.  I won't know whether
did I backport it correctly nor does it affects anything.  I am
currently testing the 2.6.32.36 PVOPS kernel with this patch applied
and also unset CONFIG_DEBUG_PAGEALLOC.  Currently running testcrash.sh
loop 1000 as I am unable to reproduce this mmu BUG 1872 in
testcrash.sh loop 100.  Please note that when CONFIG_DEBUG_PAGEALLOC
is unset, I can reproduce this mmu BUG 1872 easily within <50
testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36
kernel.  Now test with this backport patch to see whether I can
reproduce this mmu BUG... ...

Kindest regards,
Giam Teck Choon

Attachment: vmalloc__eagerly_clear_ptes_on_vunmap.patch
Description: Text Data

Attachment: testcrash.sh
Description: Bourne shell script

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel