<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/security, branch v3.12.34</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.34</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.34'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-11-13T18:02:18Z</updated>
<entry>
<title>selinux: fix inode security list corruption</title>
<updated>2014-11-13T18:02:18Z</updated>
<author>
<name>Stephen Smalley</name>
<email>sds@tycho.nsa.gov</email>
</author>
<published>2014-10-06T20:32:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ff6fe80314fb2b131503835ce59b5ba9210d5864'/>
<id>urn:sha1:ff6fe80314fb2b131503835ce59b5ba9210d5864</id>
<content type='text'>
commit 923190d32de4428afbea5e5773be86bea60a9925 upstream.

sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during -&gt;mount().   This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux:  Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element  of the inode security structure for call_rcu()
upon an inode_free_security().  But the underlying issue
was already present before that commit as a possible use-after-free
of isec.

Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element.  However,
this would merely hide the issue and not truly fix the code.

This patch instead moves up the deletion of the list entry
prior to dropping the sbsec-&gt;isec_lock initially.  Then,
if the inode is dropped subsequently, there will be no further
references to the isec.

Reported-by: Shivnandan Kumar &lt;shivnandan.k@samsung.com&gt;
Signed-off-by: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>evm: check xattr value length and type in evm_inode_setxattr()</title>
<updated>2014-11-13T18:02:14Z</updated>
<author>
<name>Dmitry Kasatkin</name>
<email>d.kasatkin@samsung.com</email>
</author>
<published>2014-10-28T12:28:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=54ae5738de038c08cc19eee64f58a5faf46d091a'/>
<id>urn:sha1:54ae5738de038c08cc19eee64f58a5faf46d091a</id>
<content type='text'>
commit 3b1deef6b1289a99505858a3b212c5b50adf0c2f upstream.

evm_inode_setxattr() can be called with no value. The function does not
check the length so that following command can be used to produce the
kernel oops: setfattr -n security.evm FOO. This patch fixes it.

Changes in v3:
* there is no reason to return different error codes for EVM_XATTR_HMAC
  and non EVM_XATTR_HMAC. Remove unnecessary test then.

Changes in v2:
* testing for validity of xattr type

[ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1106.398192] IP: [&lt;ffffffff812af7b8&gt;] evm_inode_setxattr+0x2a/0x48
[ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0
[ 1106.399953] Oops: 0000 [#1] SMP
[ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse
[ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936
[ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000
[ 1106.400020] RIP: 0010:[&lt;ffffffff812af7b8&gt;]  [&lt;ffffffff812af7b8&gt;] evm_inode_setxattr+0x2a/0x48
[ 1106.400020] RSP: 0018:ffff88002917fd50  EFLAGS: 00010246
[ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000
[ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8
[ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df
[ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00
[ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1106.400020] FS:  00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000
[ 1106.400020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0
[ 1106.400020] Stack:
[ 1106.400020]  ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98
[ 1106.400020]  ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000
[ 1106.400020]  0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8
[ 1106.400020] Call Trace:
[ 1106.400020]  [&lt;ffffffff812a1030&gt;] security_inode_setxattr+0x5d/0x6a
[ 1106.400020]  [&lt;ffffffff8116d08a&gt;] vfs_setxattr+0x6b/0x9f
[ 1106.400020]  [&lt;ffffffff8116d1e0&gt;] setxattr+0x122/0x16c
[ 1106.400020]  [&lt;ffffffff811687e8&gt;] ? mnt_want_write+0x21/0x45
[ 1106.400020]  [&lt;ffffffff8114d011&gt;] ? __sb_start_write+0x10f/0x143
[ 1106.400020]  [&lt;ffffffff811687e8&gt;] ? mnt_want_write+0x21/0x45
[ 1106.400020]  [&lt;ffffffff811687c0&gt;] ? __mnt_want_write+0x48/0x4f
[ 1106.400020]  [&lt;ffffffff8116d3e6&gt;] SyS_setxattr+0x6e/0xb0
[ 1106.400020]  [&lt;ffffffff81529da9&gt;] system_call_fastpath+0x16/0x1b
[ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 &lt;41&gt; 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83
[ 1106.400020] RIP  [&lt;ffffffff812af7b8&gt;] evm_inode_setxattr+0x2a/0x48
[ 1106.400020]  RSP &lt;ffff88002917fd50&gt;
[ 1106.400020] CR2: 0000000000000000
[ 1106.428061] ---[ end trace ae08331628ba3050 ]---

Reported-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dmitry Kasatkin &lt;d.kasatkin@samsung.com&gt;
Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>CAPABILITIES: remove undefined caps from all processes</title>
<updated>2014-09-17T14:55:03Z</updated>
<author>
<name>Eric Paris</name>
<email>eparis@redhat.com</email>
</author>
<published>2014-07-23T19:36:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ace57c92048721f447c4e9baffaf560ebaccef5b'/>
<id>urn:sha1:ace57c92048721f447c4e9baffaf560ebaccef5b</id>
<content type='text'>
commit 7d8b6c63751cfbbe5eef81a48c22978b3407a3ad upstream.

This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	ffffffc000000000

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here:
1) stop hiding capability bits in status
	This makes debugging easier!

2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
	This fixes the cap_issubset() tests and resulting fallout (which
	made the init task in a docker container untraceable among other
	things)

3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
	This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
	This lets 'setcap all+pe /bin/bash; /bin/bash' run

Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Cc: Serge E. Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Steve Grubb &lt;sgrubb@redhat.com&gt;
Cc: Dan Walsh &lt;dwalsh@redhat.com&gt;
Signed-off-by: James Morris &lt;james.l.morris@oracle.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
</entry>
<entry>
<title>evm: prohibit userspace writing 'security.evm' HMAC value</title>
<updated>2014-06-23T08:27:54Z</updated>
<author>
<name>Mimi Zohar</name>
<email>zohar@linux.vnet.ibm.com</email>
</author>
<published>2014-05-11T04:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f422f975187e23c15cf8be951490623428659eba'/>
<id>urn:sha1:f422f975187e23c15cf8be951490623428659eba</id>
<content type='text'>
commit 2fb1c9a4f2dbc2f0bd2431c7fa64d0b5483864e4 upstream.

Calculating the 'security.evm' HMAC value requires access to the
EVM encrypted key.  Only the kernel should have access to it.  This
patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
from setting/modifying the 'security.evm' HMAC value directly.

Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ima: introduce ima_kernel_read()</title>
<updated>2014-06-23T08:27:54Z</updated>
<author>
<name>Dmitry Kasatkin</name>
<email>d.kasatkin@samsung.com</email>
</author>
<published>2014-05-08T11:03:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a3802293bfe49344739ff355d9d9f0514f31e19'/>
<id>urn:sha1:4a3802293bfe49344739ff355d9d9f0514f31e19</id>
<content type='text'>
commit 0430e49b6e7c6b5e076be8fefdee089958c9adad upstream.

Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify"
introduced the kernel opps since the kernel v3.10, which happens when
Apparmor and IMA-appraisal are enabled at the same time.

----------------------------------------------------------------------
[  106.750167] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[  106.750221] IP: [&lt;ffffffff811ec7da&gt;] our_mnt+0x1a/0x30
[  106.750241] PGD 0
[  106.750254] Oops: 0000 [#1] SMP
[  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
pps_core
[  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
[  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
09/19/2012
[  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
ffff880400fca000
[  106.750704] RIP: 0010:[&lt;ffffffff811ec7da&gt;]  [&lt;ffffffff811ec7da&gt;]
our_mnt+0x1a/0x30
[  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
[  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
ffff8800d51523e7
[  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
ffff880402d20020
[  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
0000000000000001
[  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff8800d5152300
[  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
ffff8800d51523e7
[  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
knlGS:0000000000000000
[  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
00000000001407e0
[  106.750962] Stack:
[  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
0000000000000000
[  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
0000000000000100
[  106.751093]  0000010000000000 0000000000000002 000000000000000e
ffff8803eb8df500
[  106.751149] Call Trace:
[  106.751172]  [&lt;ffffffff813434eb&gt;] ? aa_path_name+0x2ab/0x430
[  106.751199]  [&lt;ffffffff8101b9b9&gt;] ? sched_clock+0x9/0x10
[  106.751225]  [&lt;ffffffff8134a68d&gt;] aa_path_perm+0x7d/0x170
[  106.751250]  [&lt;ffffffff8101b945&gt;] ? native_sched_clock+0x15/0x80
[  106.751276]  [&lt;ffffffff8134aa73&gt;] aa_file_perm+0x33/0x40
[  106.751301]  [&lt;ffffffff81348c5e&gt;] common_file_perm+0x8e/0xb0
[  106.751327]  [&lt;ffffffff81348d78&gt;] apparmor_file_permission+0x18/0x20
[  106.751355]  [&lt;ffffffff8130c853&gt;] security_file_permission+0x23/0xa0
[  106.751382]  [&lt;ffffffff811c77a2&gt;] rw_verify_area+0x52/0xe0
[  106.751407]  [&lt;ffffffff811c789d&gt;] vfs_read+0x6d/0x170
[  106.751432]  [&lt;ffffffff811cda31&gt;] kernel_read+0x41/0x60
[  106.751457]  [&lt;ffffffff8134fd45&gt;] ima_calc_file_hash+0x225/0x280
[  106.751483]  [&lt;ffffffff8134fb52&gt;] ? ima_calc_file_hash+0x32/0x280
[  106.751509]  [&lt;ffffffff8135022d&gt;] ima_collect_measurement+0x9d/0x160
[  106.751536]  [&lt;ffffffff810b552d&gt;] ? trace_hardirqs_on+0xd/0x10
[  106.751562]  [&lt;ffffffff8134f07c&gt;] ? ima_file_free+0x6c/0xd0
[  106.751587]  [&lt;ffffffff81352824&gt;] ima_update_xattr+0x34/0x60
[  106.751612]  [&lt;ffffffff8134f0d0&gt;] ima_file_free+0xc0/0xd0
[  106.751637]  [&lt;ffffffff811c9635&gt;] __fput+0xd5/0x300
[  106.751662]  [&lt;ffffffff811c98ae&gt;] ____fput+0xe/0x10
[  106.751687]  [&lt;ffffffff81086774&gt;] task_work_run+0xc4/0xe0
[  106.751712]  [&lt;ffffffff81066fad&gt;] do_exit+0x2bd/0xa90
[  106.751738]  [&lt;ffffffff8173c958&gt;] ? retint_swapgs+0x13/0x1b
[  106.751763]  [&lt;ffffffff8106780c&gt;] do_group_exit+0x4c/0xc0
[  106.751788]  [&lt;ffffffff81067894&gt;] SyS_exit_group+0x14/0x20
[  106.751814]  [&lt;ffffffff8174522d&gt;] system_call_fastpath+0x1a/0x1f
[  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
e5 5d &lt;48&gt; 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
[  106.752185] RIP  [&lt;ffffffff811ec7da&gt;] our_mnt+0x1a/0x30
[  106.752214]  RSP &lt;ffff880400fcba60&gt;
[  106.752236] CR2: 0000000000000018
[  106.752258] ---[ end trace 3c520748b4732721 ]---
----------------------------------------------------------------------

The reason for the oops is that IMA-appraisal uses "kernel_read()" when
file is closed. kernel_read() honors LSM security hook which calls
Apparmor handler, which uses current-&gt;nsproxy-&gt;mnt_ns. The 'guilty'
commit changed the order of cleanup code so that nsproxy-&gt;mnt_ns was
not already available for Apparmor.

Discussion about the issue with Al Viro and Eric W. Biederman suggested
that kernel_read() is too high-level for IMA. Another issue, except
security checking, that was identified is mandatory locking. kernel_read
honors it as well and it might prevent IMA from calculating necessary hash.
It was suggested to use simplified version of the function without security
and locking checks.

This patch introduces special version ima_kernel_read(), which skips security
and mandatory locking checking. It prevents the kernel oops to happen.

Signed-off-by: Dmitry Kasatkin &lt;d.kasatkin@samsung.com&gt;
Suggested-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ima: audit log files opened with O_DIRECT flag</title>
<updated>2014-06-20T15:34:22Z</updated>
<author>
<name>Mimi Zohar</name>
<email>zohar@linux.vnet.ibm.com</email>
</author>
<published>2014-06-20T13:12:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6e3ee68f0906382b0440086b6b33f2bc9adcc0d'/>
<id>urn:sha1:e6e3ee68f0906382b0440086b6b33f2bc9adcc0d</id>
<content type='text'>
commit f9b2a735bdddf836214b5dca74f6ca7712e5a08c upstream.

Files are measured or appraised based on the IMA policy.  When a
file, in policy, is opened with the O_DIRECT flag, a deadlock
occurs.

The first attempt at resolving this lockdep temporarily removed the
O_DIRECT flag and restored it, after calculating the hash.  The
second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
time.  The third attempt, by Dmitry Kasatkin, resolves the i_mutex
locking issue, by re-introducing the IMA mutex, but uncovered
another problem.  Reading a file with O_DIRECT flag set, writes
directly to userspace pages.  A second patch allocates a user-space
like memory.  This works for all IMA hooks, except ima_file_free(),
which is called on __fput() to recalculate the file hash.

Until this last issue is addressed, do not 'collect' the
measurement for measuring, appraising, or auditing files opened
with the O_DIRECT flag set.  Based on policy, permit or deny file
access.  This patch defines a new IMA policy rule option named
'permit_directio'.  Policy rules could be defined, based on LSM
or other criteria, to permit specific applications to open files
with the O_DIRECT flag set.

Changelog v1:
- permit or deny file access based IMA policy rules

Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Acked-by: Dmitry Kasatkin &lt;d.kasatkin@samsung.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>device_cgroup: check if exception removal is allowed</title>
<updated>2014-06-09T13:53:24Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-05-05T15:18:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9c660fc8dad9a2c98c3cf16b31b3f5072914edf4'/>
<id>urn:sha1:9c660fc8dad9a2c98c3cf16b31b3f5072914edf4</id>
<content type='text'>
commit d2c2b11cfa134f4fbdcc34088824da26a084d8de upstream.

[PATCH v3 1/2] device_cgroup: check if exception removal is allowed

When the device cgroup hierarchy was introduced in
	bd2953ebbb53 - devcg: propagate local changes down the hierarchy

a specific case was overlooked. Consider the hierarchy bellow:

	A	default policy: ALLOW, exceptions will deny access
	 \
	  B	default policy: ALLOW, exceptions will deny access

There's no need to verify when an new exception is added to B because
in this case exceptions will deny access to further devices, which is
always fine. Hierarchy in device cgroup only makes sure B won't have
more access than A.

But when an exception is removed (by writing devices.allow), it isn't
checked if the user is in fact removing an inherited exception from A,
thus giving more access to B.

Example:

	# echo 'a' &gt;A/devices.allow
	# echo 'c 1:3 rw' &gt;A/devices.deny
	# echo $$ &gt;A/B/tasks
	# echo &gt;/dev/null
	-bash: /dev/null: Operation not permitted
	# echo 'c 1:3 w' &gt;A/B/devices.allow
	# echo &gt;/dev/null
	#

This shouldn't be allowed and this patch fixes it by making sure to never allow
exceptions in this case to be removed if the exception is partially or fully
present on the parent.

v3: missing '*' in function description
v2: improved log message and formatting fixes

Cc: cgroups@vger.kernel.org
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>device_cgroup: rework device access check and exception checking</title>
<updated>2014-06-09T13:53:24Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-04-21T16:13:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2daa1de05e24e824a90bfa87316f2c120debe0e'/>
<id>urn:sha1:b2daa1de05e24e824a90bfa87316f2c120debe0e</id>
<content type='text'>
commit 79d719749d23234e9b725098aa49133f3ef7299d upstream.

Whenever a device file is opened and checked against current device
cgroup rules, it uses the same function (may_access()) as when a new
exception rule is added by writing devices.{allow,deny}. And in both
cases, the algorithm is the same, doesn't matter the behavior.

First problem is having device access to be considered the same as rule
checking. Consider the following structure:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: allow, exceptions disallow access)

A new exception is added to B by writing devices.deny:

	c 12:34 rw

When checking if that exception is allowed in may_access():

	if (dev_cgroup-&gt;behavior == DEVCG_DEFAULT_ALLOW) {
		if (behavior == DEVCG_DEFAULT_ALLOW) {
			/* the exception will deny access to certain devices */
			return true;

Which is ok, since B is not getting more privileges than A, it doesn't
matter and the rule is accepted

Now, consider it's a device file open check and the process belongs to
cgroup B. The access will be generated as:

	behavior: allow
	exception: c 12:34 rw

The very same chunk of code will allow it, even if there's an explicit
exception telling to do otherwise.

A simple test case:

	# mkdir new_group
	# cd new_group
	# echo $$ &gt;tasks
	# echo "c 1:3 w" &gt;devices.deny
	# echo &gt;/dev/null
	# echo $?
	0

This is a serious bug and was introduced on

	c39a2a3018f8 devcg: prepare may_access() for hierarchy support

To solve this problem, the device file open function was split from the
new exception check.

Second problem is how exceptions are processed by may_access(). The
first part of the said function tries to match fully with an existing
exception:

	list_for_each_entry_rcu(ex, &amp;dev_cgroup-&gt;exceptions, list) {
		if ((refex-&gt;type &amp; DEV_BLOCK) &amp;&amp; !(ex-&gt;type &amp; DEV_BLOCK))
			continue;
		if ((refex-&gt;type &amp; DEV_CHAR) &amp;&amp; !(ex-&gt;type &amp; DEV_CHAR))
			continue;
		if (ex-&gt;major != ~0 &amp;&amp; ex-&gt;major != refex-&gt;major)
			continue;
		if (ex-&gt;minor != ~0 &amp;&amp; ex-&gt;minor != refex-&gt;minor)
			continue;
		if (refex-&gt;access &amp; (~ex-&gt;access))
			continue;
		match = true;
		break;
	}

That means the new exception should be contained into an existing one to
be considered a match:

	New exception		Existing	match?	notes
	b 12:34 rwm		b 12:34 rwm	yes
	b 12:34 r		b *:34 rw	yes
	b 12:34 rw		b 12:34 w	no	extra "r"
	b *:34 rw		b 12:34 rw	no	too broad "*"
	b *:34 rw		b *:34 rwm	yes

Which is fine in some cases. Consider:

	A	(default behavior: deny, exceptions allow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case the full match makes sense, the new exception cannot add
more access than the parent allows

But this doesn't always work, consider:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case, a new exception in B shouldn't match any of the exceptions
in A, after all you can't allow something that was forbidden by A. But
consider this scenario:

	New exception	Existing in A	match?	outcome
	b 12:34 rw	b 12:34 r	no	exception is accepted

Because the new exception has "w" as extra, it doesn't match, so it'll
be added to B's exception list.

The same problem can happen during a file access check. Consider a
cgroup with allow as default behavior:

	Access		Exception	match?
	b 12:34 rw	b 12:34 r	no

In this case, the access didn't match any of the exceptions in the
cgroup, which is required since exceptions will disallow access.

To solve this problem, two new functions were created to match an
exception either fully or partially. In the example above, a partial
check will be performed and it'll produce a match since at least
"b 12:34 r" from "b 12:34 rw" access matches.

Cc: cgroups@vger.kernel.org
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>selinux: correctly label /proc inodes in use before the policy is loaded</title>
<updated>2014-04-13T14:35:51Z</updated>
<author>
<name>Paul Moore</name>
<email>pmoore@redhat.com</email>
</author>
<published>2014-03-19T20:46:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a1bdd99a8f04c5dd591079a00b1c0b2dad7c51d'/>
<id>urn:sha1:4a1bdd99a8f04c5dd591079a00b1c0b2dad7c51d</id>
<content type='text'>
commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream.

This patch is based on an earlier patch by Eric Paris, he describes
the problem below:

  "If an inode is accessed before policy load it will get placed on a
   list of inodes to be initialized after policy load.  After policy
   load we call inode_doinit() which calls inode_doinit_with_dentry()
   on all inodes accessed before policy load.  In the case of inodes
   in procfs that means we'll end up at the bottom where it does:

     /* Default to the fs superblock SID. */
     isec-&gt;sid = sbsec-&gt;sid;

     if ((sbsec-&gt;flags &amp; SE_SBPROC) &amp;&amp; !S_ISLNK(inode-&gt;i_mode)) {
             if (opt_dentry) {
                     isec-&gt;sclass = inode_mode_to_security_class(...)
                     rc = selinux_proc_get_sid(opt_dentry,
                                               isec-&gt;sclass,
                                               &amp;sid);
                     if (rc)
                             goto out_unlock;
                     isec-&gt;sid = sid;
             }
     }

   Since opt_dentry is null, we'll never call selinux_proc_get_sid()
   and will leave the inode labeled with the label on the superblock.
   I believe a fix would be to mimic the behavior of xattrs.  Look
   for an alias of the inode.  If it can't be found, just leave the
   inode uninitialized (and pick it up later) if it can be found, we
   should be able to call selinux_proc_get_sid() ..."

On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax

However, with this patch in place we see the expected result:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax

Cc: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Acked-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>SELinux: Increase ebitmap_node size for 64-bit configuration</title>
<updated>2014-03-12T12:25:43Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hp.com</email>
</author>
<published>2013-07-23T21:38:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17d633ca63add6ccaa77e125ba1e8693635bc9f1'/>
<id>urn:sha1:17d633ca63add6ccaa77e125ba1e8693635bc9f1</id>
<content type='text'>
commit a767f680e34bf14a36fefbbb6d85783eef99fd57 upstream.

Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.

On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.

This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (&lt; 2) as more bits are available in each node.

Signed-off-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
</feed>
