<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v3.10.62</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.62</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.62'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-12-06T23:05:46Z</updated>
<entry>
<title>uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME</title>
<updated>2014-12-06T23:05:46Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2014-11-21T21:26:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0061b518b1b85224bd697f2474d72b18b67a8d53'/>
<id>urn:sha1:0061b518b1b85224bd697f2474d72b18b67a8d53</id>
<content type='text'>
commit 82975bc6a6df743b9a01810fb32cb65d0ec5d60b upstream.

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Acked-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf: Handle compat ioctl</title>
<updated>2014-11-21T17:22:55Z</updated>
<author>
<name>Pawel Moll</name>
<email>pawel.moll@arm.com</email>
</author>
<published>2014-06-13T15:03:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=858879737bb5b7b7dca1d84df9018e7eb46d7294'/>
<id>urn:sha1:858879737bb5b7b7dca1d84df9018e7eb46d7294</id>
<content type='text'>
commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream.

When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.

For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.

This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.

Reported-by: Drew Richardson &lt;drew.richardson@arm.com&gt;
Signed-off-by: Pawel Moll &lt;pawel.moll@arm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: David Ahern &lt;daahern@cisco.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>audit: keep inode pinned</title>
<updated>2014-11-21T17:22:52Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@suse.cz</email>
</author>
<published>2014-11-04T10:27:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd501a2eb28282b657555b32acbc65b6c102af1d'/>
<id>urn:sha1:bd501a2eb28282b657555b32acbc65b6c102af1d</id>
<content type='text'>
commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>posix-timers: Fix stack info leak in timer_create()</title>
<updated>2014-11-14T16:48:00Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@googlemail.com</email>
</author>
<published>2014-10-04T21:06:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9b6264a0fdf9268e61a065177dde1c8f823dfc0'/>
<id>urn:sha1:f9b6264a0fdf9268e61a065177dde1c8f823dfc0</id>
<content type='text'>
commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa7307285 ("posix-timers: kill -&gt;it_sigev_signo and...")
Signed-off-by: Mathias Krause &lt;minipli@googlemail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Brad Spengler &lt;spender@grsecurity.net&gt;
Cc: PaX Team &lt;pageexec@freemail.hu&gt;
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>PM / Sleep: fix recovery during resuming from hibernation</title>
<updated>2014-11-14T16:48:00Z</updated>
<author>
<name>Imre Deak</name>
<email>imre.deak@intel.com</email>
</author>
<published>2014-10-24T17:29:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1eaaef74ada050d820e5dd2132a56fd3c66b1549'/>
<id>urn:sha1:1eaaef74ada050d820e5dd2132a56fd3c66b1549</id>
<content type='text'>
commit 94fb823fcb4892614f57e59601bb9d4920f24711 upstream.

If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.

Signed-off-by: Imre Deak &lt;imre.deak@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>OOM, PM: OOM killed task shouldn't escape PM suspend</title>
<updated>2014-11-14T16:47:58Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2014-10-20T16:12:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e033782a2669ae60db31d28127974bac18c63e33'/>
<id>urn:sha1:e033782a2669ae60db31d28127974bac18c63e33</id>
<content type='text'>
commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
  check_frozen_processes and invert the condition to make the code more
  readable as per Rafael

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>freezer: Do not freeze tasks killed by OOM killer</title>
<updated>2014-11-14T16:47:58Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-10-21T07:27:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f95ad6ed20948dee0a7c1472250b530136f75db3'/>
<id>urn:sha1:f95ad6ed20948dee0a7c1472250b530136f75db3</id>
<content type='text'>
commit 51fae6da640edf9d266c94f36bc806c63c301991 upstream.

Since f660daac474c6f (oom: thaw threads if oom killed thread is frozen
before deferring) OOM killer relies on being able to thaw a frozen task
to handle OOM situation but a3201227f803 (freezer: make freezing() test
freeze conditions in effect instead of TIF_FREEZE) has reorganized the
code and stopped clearing freeze flag in __thaw_task. This means that
the target task only wakes up and goes into the fridge again because the
freezing condition hasn't changed for it. This reintroduces the bug
fixed by f660daac474c6f.

Fix the issue by checking for TIF_MEMDIE thread flag in
freezing_slow_path and exclude the task from freezing completely. If a
task was already frozen it would get woken by __thaw_task from OOM killer
and get out of freezer after rechecking freezing().

Changes since v1
- put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
  as per Oleg
- return __thaw_task into oom_scan_process_thread because
  oom_kill_process will not wake task in the fridge because it is
  sleeping uninterruptible

[mhocko@suse.cz: rewrote the changelog]
Fixes: a3201227f803 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>modules, lock around setting of MODULE_STATE_UNFORMED</title>
<updated>2014-11-14T16:47:55Z</updated>
<author>
<name>Prarit Bhargava</name>
<email>prarit@redhat.com</email>
</author>
<published>2014-10-13T16:21:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5ade16953067e758d0ea86f133f74319a851328e'/>
<id>urn:sha1:5ade16953067e758d0ea86f133f74319a851328e</id>
<content type='text'>
commit d3051b489aa81ca9ba62af366149ef42b8dae97c upstream.

A panic was seen in the following sitation.

There are two threads running on the system. The first thread is a system
monitoring thread that is reading /proc/modules. The second thread is
loading and unloading a module (in this example I'm using my simple
dummy-module.ko).  Note, in the "real world" this occurred with the qlogic
driver module.

When doing this, the following panic occurred:

 ------------[ cut here ]------------
 kernel BUG at kernel/module.c:3739!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
 CPU: 37 PID: 186343 Comm: cat Tainted: GF          O--------------   3.10.0+ #7
 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
 task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
 RIP: 0010:[&lt;ffffffff810d64c5&gt;]  [&lt;ffffffff810d64c5&gt;] module_flags+0xb5/0xc0
 RSP: 0018:ffff88080fa7fe18  EFLAGS: 00010246
 RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
 RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
 RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
 R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
 R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
 FS:  00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
  ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
  ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
 Call Trace:
  [&lt;ffffffff810d666c&gt;] m_show+0x19c/0x1e0
  [&lt;ffffffff811e4d7e&gt;] seq_read+0x16e/0x3b0
  [&lt;ffffffff812281ed&gt;] proc_reg_read+0x3d/0x80
  [&lt;ffffffff811c0f2c&gt;] vfs_read+0x9c/0x170
  [&lt;ffffffff811c1a58&gt;] SyS_read+0x58/0xb0
  [&lt;ffffffff81605829&gt;] system_call_fastpath+0x16/0x1b
 Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb &lt;0f&gt; 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
 RIP  [&lt;ffffffff810d64c5&gt;] module_flags+0xb5/0xc0
  RSP &lt;ffff88080fa7fe18&gt;

    Consider the two processes running on the system.

    CPU 0 (/proc/modules reader)
    CPU 1 (loading/unloading module)

    CPU 0 opens /proc/modules, and starts displaying data for each module by
    traversing the modules list via fs/seq_file.c:seq_open() and
    fs/seq_file.c:seq_read().  For each module in the modules list, seq_read
    does

            op-&gt;start()  &lt;-- this is a pointer to m_start()
            op-&gt;show()   &lt;- this is a pointer to m_show()
            op-&gt;stop()   &lt;-- this is a pointer to m_stop()

    The m_start(), m_show(), and m_stop() module functions are defined in
    kernel/module.c. The m_start() and m_stop() functions acquire and release
    the module_mutex respectively.

    ie) When reading /proc/modules, the module_mutex is acquired and released
    for each module.

    m_show() is called with the module_mutex held.  It accesses the module
    struct data and attempts to write out module data.  It is in this code
    path that the above BUG_ON() warning is encountered, specifically m_show()
    calls

    static char *module_flags(struct module *mod, char *buf)
    {
            int bx = 0;

            BUG_ON(mod-&gt;state == MODULE_STATE_UNFORMED);
    ...

    The other thread, CPU 1, in unloading the module calls the syscall
    delete_module() defined in kernel/module.c.  The module_mutex is acquired
    for a short time, and then released.  free_module() is called without the
    module_mutex.  free_module() then sets mod-&gt;state = MODULE_STATE_UNFORMED,
    also without the module_mutex.  Some additional code is called and then the
    module_mutex is reacquired to remove the module from the modules list:

        /* Now we can delete it from the lists */
        mutex_lock(&amp;module_mutex);
        stop_machine(__unlink_module, mod, NULL);
        mutex_unlock(&amp;module_mutex);

This is the sequence of events that leads to the panic.

CPU 1 is removing dummy_module via delete_module().  It acquires the
module_mutex, and then releases it.  CPU 1 has NOT set dummy_module-&gt;state to
MODULE_STATE_UNFORMED yet.

CPU 0, which is reading the /proc/modules, acquires the module_mutex and
acquires a pointer to the dummy_module which is still in the modules list.
CPU 0 calls m_show for dummy_module.  The check in m_show() for
MODULE_STATE_UNFORMED passed for dummy_module even though it is being
torn down.

Meanwhile CPU 1, which has been continuing to remove dummy_module without
holding the module_mutex, now calls free_module() and sets
dummy_module-&gt;state to MODULE_STATE_UNFORMED.

CPU 0 now calls module_flags() with dummy_module and ...

static char *module_flags(struct module *mod, char *buf)
{
        int bx = 0;

        BUG_ON(mod-&gt;state == MODULE_STATE_UNFORMED);

and BOOM.

Acquire and release the module_mutex lock around the setting of
MODULE_STATE_UNFORMED in the teardown path, which should resolve the
problem.

Testing: In the unpatched kernel I can panic the system within 1 minute by
doing

while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done

and

while (true) do cat /proc/modules; done

in separate terminals.

In the patched kernel I was able to run just over one hour without seeing
any issues.  I also verified the output of panic via sysrq-c and the output
of /proc/modules looks correct for all three states for the dummy_module.

        dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
        dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
        dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)

Signed-off-by: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>tracing/syscalls: Ignore numbers outside NR_syscalls' range</title>
<updated>2014-11-14T16:47:53Z</updated>
<author>
<name>Rabin Vincent</name>
<email>rabin@rab.in</email>
</author>
<published>2014-10-29T22:06:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ad3add775181f56f51ed14324ed4e7f1c9d3d1e'/>
<id>urn:sha1:3ad3add775181f56f51ed14324ed4e7f1c9d3d1e</id>
<content type='text'>
commit 086ba77a6db00ed858ff07451bedee197df868c9 upstream.

ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

 # trace-cmd record -e raw_syscalls:* true &amp;&amp; trace-cmd report
 ...
 true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
 true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
 true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
 true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
 ...

 # trace-cmd record -e syscalls:* true
 [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
 [   17.289590] pgd = 9e71c000
 [   17.289696] [aaaaaace] *pgd=00000000
 [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 [   17.290169] Modules linked in:
 [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
 [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
 [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
 [   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in

Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
Signed-off-by: Rabin Vincent &lt;rabin@rab.in&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>jiffies: Fix timeval conversion to jiffies</title>
<updated>2014-10-09T19:18:43Z</updated>
<author>
<name>Andrew Hunter</name>
<email>ahh@google.com</email>
</author>
<published>2014-09-04T21:17:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00790d4526bd88e711999b9af04a0e896cfbf5a8'/>
<id>urn:sha1:00790d4526bd88e711999b9af04a0e896cfbf5a8</id>
<content type='text'>
commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream.

timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &amp;val, NULL);
setitimer(ITIMER_PROF, NULL, &amp;val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.)  Doing
this repeatedly would cause unbounded growth in val.  So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) &gt;&gt; USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec-&gt;nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies.  This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
  struct itimerval zero = {{0, 0}, {0, 0}};
  /* Initially set to 10 ms. */
  struct itimerval initial = zero;
  initial.it_interval.tv_usec = 10000;
  setitimer(ITIMER_PROF, &amp;initial, NULL);
  /* Save and restore several times. */
  for (size_t i = 0; i &lt; 10; ++i) {
    struct itimerval prev;
    setitimer(ITIMER_PROF, &amp;zero, &amp;prev);
    /* on old kernels, this goes up by TICK_USEC every iteration */
    printf("previous value: %ld %ld %ld %ld\n",
           prev.it_interval.tv_sec, prev.it_interval.tv_usec,
           prev.it_value.tv_sec, prev.it_value.tv_usec);
    setitimer(ITIMER_PROF, &amp;prev, NULL);
  }
    return 0;
}


Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reviewed-by: Paul Turner &lt;pjt@google.com&gt;
Reported-by: Aaron Jacobs &lt;jacobsa@google.com&gt;
Signed-off-by: Andrew Hunter &lt;ahh@google.com&gt;
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
