Discussion:
panic: fpudna from userland
(too old to reply)
Emmanuel Dreyfus
2024-04-21 14:22:41 UTC
Permalink
Hello

On NetBSD-10.0 with XEN3PAE_DOMU kernel, I get this panic when building packages:
[ 35714.4540347] panic: fpudna from userland, ip 0xbbe74f, trapframe 0xdbe1dfa8
[ 35714.4540347] cpu0: Begin traceback...
[ 35714.4540347] vpanic(c0505984,dbe1df8c,dbe1df9c,c01322bb,c0505984,c05059a8,bb
e74f,dbe1dfa8,1a62000,bf7fc880) at netbsd:vpanic+0x18e
[ 35714.4540347] panic(c0505984,c05059a8,bbe74f,dbe1dfa8,1a62000,bf7fc880,c0102f
9e,dbe1dfa8,b3,ab) at netbsd:panic+0x18
[ 35714.4540347] fpudna(dbe1dfa8,b3,ab,bac3001f,bf7f001f,b9869d00,b98ea350,bf7fc
880,1a62000,0) at netbsd:fpudna+0x3b
[ 35714.4540347] cpu0: End traceback...
[ 35714.4540347] fatal breakpoint trap in supervisor mode
[ 35714.4540347] trap type 1 code 0 eip 0xc0129304 cs 0x9 eflags 0x202 cr2 0xb98
ee014 ilevel 0 esp 0xdbe1df70
[ 35714.4540347] curlwp 0xc36ea200 pid 26009 lid 26009 lowest kstack 0xdbe1c2c0

Reading the code, the reason why we must panic here seems obscure. Anyone
can explain?
--
Emmanuel Dreyfus
***@netbsd.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2024-04-21 15:32:48 UTC
Permalink
Post by Emmanuel Dreyfus
Hello
[ 35714.4540347] panic: fpudna from userland, ip 0xbbe74f, trapframe 0xdbe1dfa8
[ 35714.4540347] cpu0: Begin traceback...
[ 35714.4540347] vpanic(c0505984,dbe1df8c,dbe1df9c,c01322bb,c0505984,c05059a8,bb
e74f,dbe1dfa8,1a62000,bf7fc880) at netbsd:vpanic+0x18e
[ 35714.4540347] panic(c0505984,c05059a8,bbe74f,dbe1dfa8,1a62000,bf7fc880,c0102f
9e,dbe1dfa8,b3,ab) at netbsd:panic+0x18
[ 35714.4540347] fpudna(dbe1dfa8,b3,ab,bac3001f,bf7f001f,b9869d00,b98ea350,bf7fc
880,1a62000,0) at netbsd:fpudna+0x3b
[ 35714.4540347] cpu0: End traceback...
[ 35714.4540347] fatal breakpoint trap in supervisor mode
[ 35714.4540347] trap type 1 code 0 eip 0xc0129304 cs 0x9 eflags 0x202 cr2 0xb98
ee014 ilevel 0 esp 0xdbe1df70
[ 35714.4540347] curlwp 0xc36ea200 pid 26009 lid 26009 lowest kstack 0xdbe1c2c0
Reading the code, the reason why we must panic here seems obscure. Anyone
can explain?
We're not supposed to get this trap from userland, as between -9 and -10
the lazy FPU context switching was removed; the FPU is context-switched
as other non-FPU state. As a consequence, the FPU is always enabled before
return to userland (or should) an no "FPU not available" trap should
be triggered.

I'm seeing this too for the daily anita tests from time to time.
It's hard to reproduce and didn't find a Xen-specific place were we could
fail to restore FPU context. I'm also seeing FPU-related panic for
HVM or HVM guests, and also for amd64. So I suspect it may be a bug in Xen.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Emmanuel Dreyfus
2024-04-21 16:15:10 UTC
Permalink
Post by Manuel Bouyer
I'm seeing this too for the daily anita tests from time to time.
It's hard to reproduce
I got it thrice in a row while bulk-building packages.
--
Emmanuel Dreyfus
***@netbsd.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Emmanuel Dreyfus
2024-04-22 15:24:38 UTC
Permalink
Post by Emmanuel Dreyfus
Post by Manuel Bouyer
I'm seeing this too for the daily anita tests from time to time.
It's hard to reproduce
I got it thrice in a row while bulk-building packages.
I get is so often that it is difficult to build big packages, the
kernel will panic before completion.

Building on Xen/i386 is not possible because of this bug. It is
not possble to cross-build from Xen/amd64 either because of kern/58158.
This is a huge regression.
--
Emmanuel Dreyfus
***@netbsd.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Brad Spencer
2024-04-22 20:00:30 UTC
Permalink
Post by Emmanuel Dreyfus
Post by Emmanuel Dreyfus
Post by Manuel Bouyer
I'm seeing this too for the daily anita tests from time to time.
It's hard to reproduce
I got it thrice in a row while bulk-building packages.
I get is so often that it is difficult to build big packages, the
kernel will panic before completion.
Building on Xen/i386 is not possible because of this bug. It is
not possble to cross-build from Xen/amd64 either because of kern/58158.
This is a huge regression.
I had the panic happen a couple of times, mostly during a build of
pkgsrc packages and what I thought was a daily cron job (not sure as
much about that one any more), which is why I put in the PR for this
problem. When the i386 Xen guest is quiet, even with normal cron stuff,
it didn't panic, however... even after many days of running.

To test further, I reran the pkgsrc build and did manage to provoke the
FPU panic. In DDB nothing really unusual appeared to be going on at the
time, EXCEPT, that the cron job for backups (which around here uses
dump) started, so there would have been excessive disk activity right
before the panic. The build was working on devel/gmp and chunking along
just fine before the panic, although I am not sure that this panic is
related to any particular package being built.

I don't have a good provoking case for this problem, except to say that
building packages appears to help cause it to happen for me.

I am using pkgsrc from a NFS mounted file system with WRKOBJDIR set in
/etc/mk.conf to a local FFS filesystem. The artifacts are also local,
with a null mount on /usr/pkgsrc/packages over the NFS filesystem on
/usr/pkgsrc. I also ran the pkgsrc build from the Xen console (xl
console ...) so there would have been a lot of activity going on there.

My amd64, earmv6hf and earmv7hf builds never had an problems with
panics.
--
Brad Spencer - ***@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Emmanuel Dreyfus
2024-04-26 14:44:23 UTC
Permalink
Post by Manuel Bouyer
We're not supposed to get this trap from userland, as between -9 and -10
the lazy FPU context switching was removed; the FPU is context-switched
as other non-FPU state. As a consequence, the FPU is always enabled before
return to userland (or should) an no "FPU not available" trap should
be triggered.
How does Xen's eager-fpu option interract here? It is related, right?
--
Emmanuel Dreyfus
***@netbsd.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2024-04-28 18:14:19 UTC
Permalink
Post by Emmanuel Dreyfus
Post by Manuel Bouyer
We're not supposed to get this trap from userland, as between -9 and -10
the lazy FPU context switching was removed; the FPU is context-switched
as other non-FPU state. As a consequence, the FPU is always enabled before
return to userland (or should) an no "FPU not available" trap should
be triggered.
How does Xen's eager-fpu option interract here? It is related, right?
It's related but I don't think it's supposed to impact the guest's
behavior
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Emmanuel Dreyfus
2024-05-13 14:56:06 UTC
Permalink
Post by Emmanuel Dreyfus
[ 35714.4540347] panic: fpudna from userland, ip 0xbbe74f, trapframe 0xdbe1dfa8
It seems Xen produces spurious fpudna traps, as a comment in the netbsd-9
source says. What about the workaround below? I have been building stuff
with it without a crash.


--- sys/arch/x86/x86/fpu.c 25 Jul 2023 11:41:42 -0000 1.79.4.2
+++ sys/arch/x86/x86/fpu.c 13 May 2024 14:54:59 -0000
@@ -602,8 +602,17 @@

void
fpudna(struct trapframe *frame)
{
+#ifdef XENPV
+ /*
+ * Xen produes spurious fpudna traps, just do nothing.
+ */
+ if (USERMODE(frame->tf_cs)) {
+ clts();
+ return;
+ }
+#endif
panic("fpudna from %s, ip %p, trapframe %p",
USERMODE(frame->tf_cs) ? "userland" : "kernel",
(void *)X86_TF_RIP(frame), frame);
}
--
Emmanuel Dreyfus
***@netbsd.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Brian Buhrow
2024-05-13 15:13:04 UTC
Permalink
hello Emmanuel. Do you know what versions of Xen are producing these spurious traps? I'm
running NetBSD-domu's with an fpdna() function that just panics, i.e. with no #ifdef and I'm
not seeing this issue at all, with Xen-4.16.0 running on top of FreeBSD-13.1 as the dom0. I'm
about to run some domu's with Xen-4.18.1, so maybe it's happening there? Or, is there
something special about our dom0?
-thanks
-Brian


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2024-05-13 19:58:09 UTC
Permalink
Post by Brian Buhrow
hello Emmanuel. Do you know what versions of Xen are producing these spurious traps? I'm
running NetBSD-domu's with an fpdna() function that just panics, i.e. with no #ifdef and I'm
not seeing this issue at all, with Xen-4.16.0 running on top of FreeBSD-13.1 as the dom0. I'm
about to run some domu's with Xen-4.18.1, so maybe it's happening there? Or, is there
something special about our dom0?
I've seen it with 4.13, 4.15 and 4.18 so I don't think it's specific to a
Xen version. But in my case it shows up once in a while while running
anita tests, but I've not seen it while building packages.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Brad Spencer
2024-05-13 20:04:23 UTC
Permalink
Post by Manuel Bouyer
Post by Brian Buhrow
hello Emmanuel. Do you know what versions of Xen are producing these spurious traps? I'm
running NetBSD-domu's with an fpdna() function that just panics, i.e. with no #ifdef and I'm
not seeing this issue at all, with Xen-4.16.0 running on top of FreeBSD-13.1 as the dom0. I'm
about to run some domu's with Xen-4.18.1, so maybe it's happening there? Or, is there
something special about our dom0?
I've seen it with 4.13, 4.15 and 4.18 so I don't think it's specific to a
Xen version. But in my case it shows up once in a while while running
anita tests, but I've not seen it while building packages.
I only have 4.15 here. It has only happened on 32 bit i386 PAE guests.
Those guests also use PVSHIM (I have not tried them without that). It
started to appear with NetBSD 10.x (didn't happen before that) and can
happen in a nearly idle guest for me, but seems to occur more often when
building pkgsrc stuff.
--
Brad Spencer - ***@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Brad Spencer
2024-05-15 10:48:00 UTC
Permalink
Post by Emmanuel Dreyfus
Post by Emmanuel Dreyfus
[ 35714.4540347] panic: fpudna from userland, ip 0xbbe74f, trapframe 0xdbe1dfa8
It seems Xen produces spurious fpudna traps, as a comment in the netbsd-9
source says. What about the workaround below? I have been building stuff
with it without a crash.
--- sys/arch/x86/x86/fpu.c 25 Jul 2023 11:41:42 -0000 1.79.4.2
+++ sys/arch/x86/x86/fpu.c 13 May 2024 14:54:59 -0000
@@ -602,8 +602,17 @@
void
fpudna(struct trapframe *frame)
{
+#ifdef XENPV
+ /*
+ * Xen produes spurious fpudna traps, just do nothing.
+ */
+ if (USERMODE(frame->tf_cs)) {
+ clts();
+ return;
+ }
+#endif
panic("fpudna from %s, ip %p, trapframe %p",
USERMODE(frame->tf_cs) ? "userland" : "kernel",
(void *)X86_TF_RIP(frame), frame);
}
The above patch appears to have helped. I was somewhat able to prompt
the panic to happen by doing a build of a bunch of pkgsrc packages on a
NetBSD/i386 PAE Xen guest and without the patch, the panic would happen
at some random point during the pkgsrc build. With the patch the build
proceeded to the end and there doesn't appear to be any obvious negative
side effects.
--
Brad Spencer - ***@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...