Xen 
 
Home About Xen.org Xen Xen Summit Wiki Mailing List Bug Tracker Xen Downloads
 
   
 

xen-devel

[PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu h

To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Wed, 31 Jan 2007 14:17:45 +0800
Delivery-date: Tue, 30 Jan 2007 22:17:39 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <C1E5053F.812F%Keir.Fraser@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAP4q2wADxf1QAAFVV3AAAMUYXAAACCkwAACFZyAAAV9OMAABEoucACBxTEA=
Thread-topic: [PATCH][RESEND]RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>Sent: 2007年1月30日 22:23
>
>On 30/1/07 2:11 pm, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
>> BTW, do you think whether it's worthy to destroy vcpu from
>> scheduler when it's down and then re-init that vcpu into scheduler
>> when it's on? I don't know whether this will make any influence to
>> accounting of scheduler. Actually domain save/restore doesn't show
>> this bug, and one obvious distinct compared to vcpu-hotplug is that
>> domain is restored in a new context...
>
>I wouldn't expect this to make any significant difference to scheduling
>accounting, certainly over a multi-second time period.
>
>Does the time you hoy-unplug the vcpu for make a difference to how
>often you
>see this problem? Did you try repro'ing with a 2.6.16 kernel?
>
> -- Keir

Hi, Keir,
        I verified that attached patch does fix the issue by restricting max 
timeout to 1s. Either vcpu unplug/plug, or suspend cancel works fine. 
Actually domain runs well several hours after intensive testing.

        I also tried 2.6.16, and it's immune to this issue. I add some debug 
info in both 2.6.16 and 2.6.18, to print out delta value when delta > 1s. 
The results further proves our analysis.

        In 2.6.16, all the prints are:
                Delta 101 > HZ for cpuN
                Delta 101 > HZ for cpuN
                Delta 101 > HZ for cpuN
                ...

        While in 2.6.18, something like:
                Delta 199 > HZ for cpuN
                Delta 156 > HZ for cpuN
                Delta 192 > HZ for cpuN
                Delta 102 > HZ for cpuN
                ...
        After unplug/plug a cpu:
                Delta 951 > HZ for cpuN
                ...
        And then soflockup warning jumps out.

        So in 2.6.16, watchdog thread itself promises max timeout
to about 1s by hooking a timer, while In 2.6.18, the max timeout 
value is volatile

        So I'm inclined to consider it as a fix, since there's no easy way 
to deduce an appropriate timeout without explicit/hard-code knowledge 
on such requirement like watchdog thread. How do you think? :-)

P.S. The warning reported by Simon on 2.6.16 may be fixed by my 
previous patch, due to the late check.

Thanks,
Kevin

Attachment: fix_softlockup_2618.patch
Description: fix_softlockup_2618.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>