<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/arch/x86/kernel/process_64.c, branch v3.2.70</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.70</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.70'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-02-20T00:49:30Z</updated>
<entry>
<title>x86_64, switch_to(): Load TLS descriptors before switching DS and ES</title>
<updated>2015-02-20T00:49:30Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2014-12-08T21:55:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cca3e6170e186ad88c11ee91cfd37d400dcaa9b0'/>
<id>urn:sha1:cca3e6170e186ad88c11ee91cfd37d400dcaa9b0</id>
<content type='text'>
commit f647d7c155f069c1a068030255c300663516420e upstream.

Otherwise, if buggy user code points DS or ES into the TLS
array, they would be corrupted after a context switch.

This also significantly improves the comments and documents some
gotchas in the code.

Before this patch, the both tests below failed.  With this
patch, the es test passes, although the gsbase test still fails.

 ----- begin es test -----

/*
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned short GDT3(int idx)
{
	return (idx &lt;&lt; 3) | 3;
}

static int create_tls(int idx, unsigned int base)
{
	struct user_desc desc = {
		.entry_number    = idx,
		.base_addr       = base,
		.limit           = 0xfffff,
		.seg_32bit       = 1,
		.contents        = 0, /* Data, grow-up */
		.read_exec_only  = 0,
		.limit_in_pages  = 1,
		.seg_not_present = 0,
		.useable         = 0,
	};

	if (syscall(SYS_set_thread_area, &amp;desc) != 0)
		err(1, "set_thread_area");

	return desc.entry_number;
}

int main()
{
	int idx = create_tls(-1, 0);
	printf("Allocated GDT index %d\n", idx);

	unsigned short orig_es;
	asm volatile ("mov %%es,%0" : "=rm" (orig_es));

	int errors = 0;
	int total = 1000;
	for (int i = 0; i &lt; total; i++) {
		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
		usleep(100);

		unsigned short es;
		asm volatile ("mov %%es,%0" : "=rm" (es));
		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
		if (es != GDT3(idx)) {
			if (errors == 0)
				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
				       GDT3(idx), es);
			errors++;
		}
	}

	if (errors) {
		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
		return 1;
	} else {
		printf("[OK]\tES was preserved\n");
		return 0;
	}
}

 ----- end es test -----

 ----- begin gsbase test -----

/*
 * gsbase.c, a gsbase test
 * Copyright (c) 2014 Andy Lutomirski
 * GPL v2
 */

static unsigned char *testptr, *testptr2;

static unsigned char read_gs_testvals(void)
{
	unsigned char ret;
	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
	return ret;
}

int main()
{
	int errors = 0;

	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr == MAP_FAILED)
		err(1, "mmap");

	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
	if (testptr2 == MAP_FAILED)
		err(1, "mmap");

	*testptr = 0;
	*testptr2 = 1;

	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
		err(1, "ARCH_SET_GS");

	usleep(100);

	if (read_gs_testvals() == 1) {
		printf("[OK]\tARCH_SET_GS worked\n");
	} else {
		printf("[FAIL]\tARCH_SET_GS failed\n");
		errors++;
	}

	asm volatile ("mov %0,%%gs" : : "r" (0));

	if (read_gs_testvals() == 0) {
		printf("[OK]\tWriting 0 to gs worked\n");
	} else {
		printf("[FAIL]\tWriting 0 to gs failed\n");
		errors++;
	}

	usleep(100);

	if (read_gs_testvals() == 0) {
		printf("[OK]\tgsbase is still zero\n");
	} else {
		printf("[FAIL]\tgsbase was corrupted\n");
		errors++;
	}

	return errors == 0 ? 0 : 1;
}

 ----- end gsbase test -----

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>i387: re-introduce FPU state preloading at context switch time</title>
<updated>2012-02-27T18:25:55Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-02-18T20:56:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9016ec427136d5b5d025948319cf1114dc7734e4'/>
<id>urn:sha1:9016ec427136d5b5d025948319cf1114dc7734e4</id>
<content type='text'>
commit 34ddc81a230b15c0e345b6b253049db731499f7e upstream.

After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3ff ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

 - properly abstracted as a true FPU state switch, rather than as
   open-coded save and restore with various hacks.

   In particular, implementing it as a proper FPU state switch allows us
   to optimize the CR0.TS flag accesses: there is no reason to set the
   TS bit only to then almost immediately clear it again.  CR0 accesses
   are quite slow and expensive, don't flip the bit back and forth for
   no good reason.

 - Make sure that the same model works for both x86-32 and x86-64, so
   that there are no gratuitous differences between the two due to the
   way they save and restore segment state differently due to
   architectural differences that really don't matter to the FPU state.

 - Avoid exposing the "preload" state to the context switch routines,
   and in particular allow the concept of lazy state restore: if nothing
   else has used the FPU in the meantime, and the process is still on
   the same CPU, we can avoid restoring state from memory entirely, just
   re-expose the state that is still in the FPU unit.

   That optimized lazy restore isn't actually implemented here, but the
   infrastructure is set up for it.  Of course, older CPU's that use
   'fnsave' to save the state cannot take advantage of this, since the
   state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage.  Hopefully it's easier to
follow as a result.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore</title>
<updated>2012-02-27T18:25:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-02-17T03:11:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9147fbe60acc9125e7b0deae409f1da5c3f8bdda'/>
<id>urn:sha1:9147fbe60acc9125e7b0deae409f1da5c3f8bdda</id>
<content type='text'>
commit 4903062b5485f0e2c286a23b44c9b59d9b017d53 upstream.

The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending.  In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state.  That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.

We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it

 (a) corrupts what is potentially perfectly good FPU state that we might
     want to lazy avoid restoring later and

 (b) on x86-64 it resulted in a very annoying ordering constraint, where
     "__unlazy_fpu()" in the task switch needs to be delayed until after
     the DS segment has been reloaded just to get the new DS value.

Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used).  It's
simply a much more natural place for the leaked state cleanup.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>i387: do not preload FPU state at task switch time</title>
<updated>2012-02-27T18:25:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-02-16T23:45:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba6aaed5cc8f55b77644daf56e9ae3a75f042908'/>
<id>urn:sha1:ba6aaed5cc8f55b77644daf56e9ae3a75f042908</id>
<content type='text'>
commit b3b0870ef3ffed72b92415423da864f440f57ad6 upstream.

Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code.  And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.

Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better.  If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.

In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has.  For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86, nmi: Add in logic to handle multiple events and unknown NMIs</title>
<updated>2011-10-10T04:57:01Z</updated>
<author>
<name>Don Zickus</name>
<email>dzickus@redhat.com</email>
</author>
<published>2011-09-30T19:06:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b227e23399dc59977aa42c49bd668bdab7a61812'/>
<id>urn:sha1:b227e23399dc59977aa42c49bd668bdab7a61812</id>
<content type='text'>
Previous patches allow the NMI subsystem to process multipe NMI events
in one NMI.  As previously discussed this can cause issues when an event
triggered another NMI but is processed in the current NMI.  This causes the
next NMI to go unprocessed and become an 'unknown' NMI.

To handle this, we first have to flag whether or not the NMI handler handled
more than one event or not.  If it did, then there exists a chance that
the next NMI might be already processed.  Once the NMI is flagged as a
candidate to be swallowed, we next look for a back-to-back NMI condition.

This is determined by looking at the %rip from pt_regs.  If it is the same
as the previous NMI, it is assumed the cpu did not have a chance to jump
back into a non-NMI context and execute code and instead handled another NMI.

If both of those conditions are true then we will swallow any unknown NMI.

There still exists a chance that we accidentally swallow a real unknown NMI,
but for now things seem better.

An optimization has also been added to the nmi notifier rountine.  Because x86
can latch up to one NMI while currently processing an NMI, we don't have to
worry about executing _all_ the handlers in a standalone NMI.  The idea is
if multiple NMIs come in, the second NMI will represent them.  For those
back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
execute all the handlers in the second half of a detected back-to-back NMI.

Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>cpuidle: stop depending on pm_idle</title>
<updated>2011-08-03T23:06:37Z</updated>
<author>
<name>Len Brown</name>
<email>len.brown@intel.com</email>
</author>
<published>2011-04-01T23:34:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a0bfa1373859e9d11dc92561a8667588803e42d8'/>
<id>urn:sha1:a0bfa1373859e9d11dc92561a8667588803e42d8</id>
<content type='text'>
cpuidle users should call cpuidle_call_idle() directly
rather than via (pm_idle)() function pointer.

Architecture may choose to continue using (pm_idle)(),
but cpuidle need not depend on it:

  my_arch_cpu_idle()
	...
	if(cpuidle_call_idle())
		pm_idle();

cc: Kevin Hilman &lt;khilman@deeprootsystems.com&gt;
cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
cc: x86@kernel.org
Acked-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
Signed-off-by: Len Brown &lt;len.brown@intel.com&gt;
</content>
</entry>
<entry>
<title>exec: delay address limit change until point of no return</title>
<updated>2011-06-09T19:50:05Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@googlemail.com</email>
</author>
<published>2011-06-09T18:05:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dac853ae89043f1b7752875300faf614de43c74b'/>
<id>urn:sha1:dac853ae89043f1b7752875300faf614de43c74b</id>
<content type='text'>
Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function.  This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause &lt;minipli@googlemail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>x86: mark associated mm when running a task in 32 bit compatibility mode</title>
<updated>2011-03-23T20:36:53Z</updated>
<author>
<name>Stephen Wilson</name>
<email>wilsons@start.ca</email>
</author>
<published>2011-03-13T19:49:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=375906f8765e131a4a159b1ffebf78c15db7b3bf'/>
<id>urn:sha1:375906f8765e131a4a159b1ffebf78c15db7b3bf</id>
<content type='text'>
This patch simply follows the same practice as for setting the TIF_IA32 flag.
In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is
exec'ed.  Both ELF and a.out formats are updated.

Signed-off-by: Stephen Wilson &lt;wilsons@start.ca&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer</title>
<updated>2011-01-12T23:05:16Z</updated>
<author>
<name>Thomas Renninger</name>
<email>trenn@suse.de</email>
</author>
<published>2011-01-07T10:29:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f77cfe4ea21760268c0277fa3e4b02dfd2a2c2f4'/>
<id>urn:sha1:f77cfe4ea21760268c0277fa3e4b02dfd2a2c2f4</id>
<content type='text'>
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -&gt; this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-&gt; this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-&gt; everything is fine.

Signed-off-by: Thomas Renninger &lt;trenn@suse.de&gt;
CC: Robert Schoene &lt;robert.schoene@tu-dresden.de&gt;
CC: Jean Pihet &lt;j-pihet@ti.com&gt;
CC: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
CC: Ingo Molnar &lt;mingo@elte.hu&gt;
CC: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown &lt;len.brown@intel.com&gt;
</content>
</entry>
<entry>
<title>perf: Clean up power events by introducing new, more generic ones</title>
<updated>2011-01-04T07:16:54Z</updated>
<author>
<name>Thomas Renninger</name>
<email>trenn@suse.de</email>
</author>
<published>2011-01-03T16:50:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=25e41933b58777f2d020c3b0186b430ea004ec28'/>
<id>urn:sha1:25e41933b58777f2d020c3b0186b430ea004ec28</id>
<content type='text'>
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger &lt;trenn@suse.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Acked-by: Jean Pihet &lt;jean.pihet@newoldbits.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: rjw@sisk.pl
LKML-Reference: &lt;1294073445-14812-3-git-send-email-trenn@suse.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
LKML-Reference: &lt;1290072314-31155-2-git-send-email-trenn@suse.de&gt;
</content>
</entry>
</feed>
