<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/security, branch v3.12.25</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.25</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.25'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-06-23T08:27:54Z</updated>
<entry>
<title>evm: prohibit userspace writing 'security.evm' HMAC value</title>
<updated>2014-06-23T08:27:54Z</updated>
<author>
<name>Mimi Zohar</name>
<email>zohar@linux.vnet.ibm.com</email>
</author>
<published>2014-05-11T04:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f422f975187e23c15cf8be951490623428659eba'/>
<id>urn:sha1:f422f975187e23c15cf8be951490623428659eba</id>
<content type='text'>
commit 2fb1c9a4f2dbc2f0bd2431c7fa64d0b5483864e4 upstream.

Calculating the 'security.evm' HMAC value requires access to the
EVM encrypted key.  Only the kernel should have access to it.  This
patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
from setting/modifying the 'security.evm' HMAC value directly.

Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ima: introduce ima_kernel_read()</title>
<updated>2014-06-23T08:27:54Z</updated>
<author>
<name>Dmitry Kasatkin</name>
<email>d.kasatkin@samsung.com</email>
</author>
<published>2014-05-08T11:03:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a3802293bfe49344739ff355d9d9f0514f31e19'/>
<id>urn:sha1:4a3802293bfe49344739ff355d9d9f0514f31e19</id>
<content type='text'>
commit 0430e49b6e7c6b5e076be8fefdee089958c9adad upstream.

Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify"
introduced the kernel opps since the kernel v3.10, which happens when
Apparmor and IMA-appraisal are enabled at the same time.

----------------------------------------------------------------------
[  106.750167] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[  106.750221] IP: [&lt;ffffffff811ec7da&gt;] our_mnt+0x1a/0x30
[  106.750241] PGD 0
[  106.750254] Oops: 0000 [#1] SMP
[  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
pps_core
[  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
[  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
09/19/2012
[  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
ffff880400fca000
[  106.750704] RIP: 0010:[&lt;ffffffff811ec7da&gt;]  [&lt;ffffffff811ec7da&gt;]
our_mnt+0x1a/0x30
[  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
[  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
ffff8800d51523e7
[  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
ffff880402d20020
[  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
0000000000000001
[  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff8800d5152300
[  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
ffff8800d51523e7
[  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
knlGS:0000000000000000
[  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
00000000001407e0
[  106.750962] Stack:
[  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
0000000000000000
[  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
0000000000000100
[  106.751093]  0000010000000000 0000000000000002 000000000000000e
ffff8803eb8df500
[  106.751149] Call Trace:
[  106.751172]  [&lt;ffffffff813434eb&gt;] ? aa_path_name+0x2ab/0x430
[  106.751199]  [&lt;ffffffff8101b9b9&gt;] ? sched_clock+0x9/0x10
[  106.751225]  [&lt;ffffffff8134a68d&gt;] aa_path_perm+0x7d/0x170
[  106.751250]  [&lt;ffffffff8101b945&gt;] ? native_sched_clock+0x15/0x80
[  106.751276]  [&lt;ffffffff8134aa73&gt;] aa_file_perm+0x33/0x40
[  106.751301]  [&lt;ffffffff81348c5e&gt;] common_file_perm+0x8e/0xb0
[  106.751327]  [&lt;ffffffff81348d78&gt;] apparmor_file_permission+0x18/0x20
[  106.751355]  [&lt;ffffffff8130c853&gt;] security_file_permission+0x23/0xa0
[  106.751382]  [&lt;ffffffff811c77a2&gt;] rw_verify_area+0x52/0xe0
[  106.751407]  [&lt;ffffffff811c789d&gt;] vfs_read+0x6d/0x170
[  106.751432]  [&lt;ffffffff811cda31&gt;] kernel_read+0x41/0x60
[  106.751457]  [&lt;ffffffff8134fd45&gt;] ima_calc_file_hash+0x225/0x280
[  106.751483]  [&lt;ffffffff8134fb52&gt;] ? ima_calc_file_hash+0x32/0x280
[  106.751509]  [&lt;ffffffff8135022d&gt;] ima_collect_measurement+0x9d/0x160
[  106.751536]  [&lt;ffffffff810b552d&gt;] ? trace_hardirqs_on+0xd/0x10
[  106.751562]  [&lt;ffffffff8134f07c&gt;] ? ima_file_free+0x6c/0xd0
[  106.751587]  [&lt;ffffffff81352824&gt;] ima_update_xattr+0x34/0x60
[  106.751612]  [&lt;ffffffff8134f0d0&gt;] ima_file_free+0xc0/0xd0
[  106.751637]  [&lt;ffffffff811c9635&gt;] __fput+0xd5/0x300
[  106.751662]  [&lt;ffffffff811c98ae&gt;] ____fput+0xe/0x10
[  106.751687]  [&lt;ffffffff81086774&gt;] task_work_run+0xc4/0xe0
[  106.751712]  [&lt;ffffffff81066fad&gt;] do_exit+0x2bd/0xa90
[  106.751738]  [&lt;ffffffff8173c958&gt;] ? retint_swapgs+0x13/0x1b
[  106.751763]  [&lt;ffffffff8106780c&gt;] do_group_exit+0x4c/0xc0
[  106.751788]  [&lt;ffffffff81067894&gt;] SyS_exit_group+0x14/0x20
[  106.751814]  [&lt;ffffffff8174522d&gt;] system_call_fastpath+0x1a/0x1f
[  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
e5 5d &lt;48&gt; 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
[  106.752185] RIP  [&lt;ffffffff811ec7da&gt;] our_mnt+0x1a/0x30
[  106.752214]  RSP &lt;ffff880400fcba60&gt;
[  106.752236] CR2: 0000000000000018
[  106.752258] ---[ end trace 3c520748b4732721 ]---
----------------------------------------------------------------------

The reason for the oops is that IMA-appraisal uses "kernel_read()" when
file is closed. kernel_read() honors LSM security hook which calls
Apparmor handler, which uses current-&gt;nsproxy-&gt;mnt_ns. The 'guilty'
commit changed the order of cleanup code so that nsproxy-&gt;mnt_ns was
not already available for Apparmor.

Discussion about the issue with Al Viro and Eric W. Biederman suggested
that kernel_read() is too high-level for IMA. Another issue, except
security checking, that was identified is mandatory locking. kernel_read
honors it as well and it might prevent IMA from calculating necessary hash.
It was suggested to use simplified version of the function without security
and locking checks.

This patch introduces special version ima_kernel_read(), which skips security
and mandatory locking checking. It prevents the kernel oops to happen.

Signed-off-by: Dmitry Kasatkin &lt;d.kasatkin@samsung.com&gt;
Suggested-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ima: audit log files opened with O_DIRECT flag</title>
<updated>2014-06-20T15:34:22Z</updated>
<author>
<name>Mimi Zohar</name>
<email>zohar@linux.vnet.ibm.com</email>
</author>
<published>2014-06-20T13:12:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6e3ee68f0906382b0440086b6b33f2bc9adcc0d'/>
<id>urn:sha1:e6e3ee68f0906382b0440086b6b33f2bc9adcc0d</id>
<content type='text'>
commit f9b2a735bdddf836214b5dca74f6ca7712e5a08c upstream.

Files are measured or appraised based on the IMA policy.  When a
file, in policy, is opened with the O_DIRECT flag, a deadlock
occurs.

The first attempt at resolving this lockdep temporarily removed the
O_DIRECT flag and restored it, after calculating the hash.  The
second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
time.  The third attempt, by Dmitry Kasatkin, resolves the i_mutex
locking issue, by re-introducing the IMA mutex, but uncovered
another problem.  Reading a file with O_DIRECT flag set, writes
directly to userspace pages.  A second patch allocates a user-space
like memory.  This works for all IMA hooks, except ima_file_free(),
which is called on __fput() to recalculate the file hash.

Until this last issue is addressed, do not 'collect' the
measurement for measuring, appraising, or auditing files opened
with the O_DIRECT flag set.  Based on policy, permit or deny file
access.  This patch defines a new IMA policy rule option named
'permit_directio'.  Policy rules could be defined, based on LSM
or other criteria, to permit specific applications to open files
with the O_DIRECT flag set.

Changelog v1:
- permit or deny file access based IMA policy rules

Signed-off-by: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Acked-by: Dmitry Kasatkin &lt;d.kasatkin@samsung.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>device_cgroup: check if exception removal is allowed</title>
<updated>2014-06-09T13:53:24Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-05-05T15:18:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9c660fc8dad9a2c98c3cf16b31b3f5072914edf4'/>
<id>urn:sha1:9c660fc8dad9a2c98c3cf16b31b3f5072914edf4</id>
<content type='text'>
commit d2c2b11cfa134f4fbdcc34088824da26a084d8de upstream.

[PATCH v3 1/2] device_cgroup: check if exception removal is allowed

When the device cgroup hierarchy was introduced in
	bd2953ebbb53 - devcg: propagate local changes down the hierarchy

a specific case was overlooked. Consider the hierarchy bellow:

	A	default policy: ALLOW, exceptions will deny access
	 \
	  B	default policy: ALLOW, exceptions will deny access

There's no need to verify when an new exception is added to B because
in this case exceptions will deny access to further devices, which is
always fine. Hierarchy in device cgroup only makes sure B won't have
more access than A.

But when an exception is removed (by writing devices.allow), it isn't
checked if the user is in fact removing an inherited exception from A,
thus giving more access to B.

Example:

	# echo 'a' &gt;A/devices.allow
	# echo 'c 1:3 rw' &gt;A/devices.deny
	# echo $$ &gt;A/B/tasks
	# echo &gt;/dev/null
	-bash: /dev/null: Operation not permitted
	# echo 'c 1:3 w' &gt;A/B/devices.allow
	# echo &gt;/dev/null
	#

This shouldn't be allowed and this patch fixes it by making sure to never allow
exceptions in this case to be removed if the exception is partially or fully
present on the parent.

v3: missing '*' in function description
v2: improved log message and formatting fixes

Cc: cgroups@vger.kernel.org
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>device_cgroup: rework device access check and exception checking</title>
<updated>2014-06-09T13:53:24Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>aris@redhat.com</email>
</author>
<published>2014-04-21T16:13:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2daa1de05e24e824a90bfa87316f2c120debe0e'/>
<id>urn:sha1:b2daa1de05e24e824a90bfa87316f2c120debe0e</id>
<content type='text'>
commit 79d719749d23234e9b725098aa49133f3ef7299d upstream.

Whenever a device file is opened and checked against current device
cgroup rules, it uses the same function (may_access()) as when a new
exception rule is added by writing devices.{allow,deny}. And in both
cases, the algorithm is the same, doesn't matter the behavior.

First problem is having device access to be considered the same as rule
checking. Consider the following structure:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: allow, exceptions disallow access)

A new exception is added to B by writing devices.deny:

	c 12:34 rw

When checking if that exception is allowed in may_access():

	if (dev_cgroup-&gt;behavior == DEVCG_DEFAULT_ALLOW) {
		if (behavior == DEVCG_DEFAULT_ALLOW) {
			/* the exception will deny access to certain devices */
			return true;

Which is ok, since B is not getting more privileges than A, it doesn't
matter and the rule is accepted

Now, consider it's a device file open check and the process belongs to
cgroup B. The access will be generated as:

	behavior: allow
	exception: c 12:34 rw

The very same chunk of code will allow it, even if there's an explicit
exception telling to do otherwise.

A simple test case:

	# mkdir new_group
	# cd new_group
	# echo $$ &gt;tasks
	# echo "c 1:3 w" &gt;devices.deny
	# echo &gt;/dev/null
	# echo $?
	0

This is a serious bug and was introduced on

	c39a2a3018f8 devcg: prepare may_access() for hierarchy support

To solve this problem, the device file open function was split from the
new exception check.

Second problem is how exceptions are processed by may_access(). The
first part of the said function tries to match fully with an existing
exception:

	list_for_each_entry_rcu(ex, &amp;dev_cgroup-&gt;exceptions, list) {
		if ((refex-&gt;type &amp; DEV_BLOCK) &amp;&amp; !(ex-&gt;type &amp; DEV_BLOCK))
			continue;
		if ((refex-&gt;type &amp; DEV_CHAR) &amp;&amp; !(ex-&gt;type &amp; DEV_CHAR))
			continue;
		if (ex-&gt;major != ~0 &amp;&amp; ex-&gt;major != refex-&gt;major)
			continue;
		if (ex-&gt;minor != ~0 &amp;&amp; ex-&gt;minor != refex-&gt;minor)
			continue;
		if (refex-&gt;access &amp; (~ex-&gt;access))
			continue;
		match = true;
		break;
	}

That means the new exception should be contained into an existing one to
be considered a match:

	New exception		Existing	match?	notes
	b 12:34 rwm		b 12:34 rwm	yes
	b 12:34 r		b *:34 rw	yes
	b 12:34 rw		b 12:34 w	no	extra "r"
	b *:34 rw		b 12:34 rw	no	too broad "*"
	b *:34 rw		b *:34 rwm	yes

Which is fine in some cases. Consider:

	A	(default behavior: deny, exceptions allow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case the full match makes sense, the new exception cannot add
more access than the parent allows

But this doesn't always work, consider:

	A	(default behavior: allow, exceptions disallow access)
	 \
	  B	(default behavior: deny, exceptions allow access)

In this case, a new exception in B shouldn't match any of the exceptions
in A, after all you can't allow something that was forbidden by A. But
consider this scenario:

	New exception	Existing in A	match?	outcome
	b 12:34 rw	b 12:34 r	no	exception is accepted

Because the new exception has "w" as extra, it doesn't match, so it'll
be added to B's exception list.

The same problem can happen during a file access check. Consider a
cgroup with allow as default behavior:

	Access		Exception	match?
	b 12:34 rw	b 12:34 r	no

In this case, the access didn't match any of the exceptions in the
cgroup, which is required since exceptions will disallow access.

To solve this problem, two new functions were created to match an
exception either fully or partially. In the example above, a partial
check will be performed and it'll produce a match since at least
"b 12:34 r" from "b 12:34 rw" access matches.

Cc: cgroups@vger.kernel.org
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>selinux: correctly label /proc inodes in use before the policy is loaded</title>
<updated>2014-04-13T14:35:51Z</updated>
<author>
<name>Paul Moore</name>
<email>pmoore@redhat.com</email>
</author>
<published>2014-03-19T20:46:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a1bdd99a8f04c5dd591079a00b1c0b2dad7c51d'/>
<id>urn:sha1:4a1bdd99a8f04c5dd591079a00b1c0b2dad7c51d</id>
<content type='text'>
commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream.

This patch is based on an earlier patch by Eric Paris, he describes
the problem below:

  "If an inode is accessed before policy load it will get placed on a
   list of inodes to be initialized after policy load.  After policy
   load we call inode_doinit() which calls inode_doinit_with_dentry()
   on all inodes accessed before policy load.  In the case of inodes
   in procfs that means we'll end up at the bottom where it does:

     /* Default to the fs superblock SID. */
     isec-&gt;sid = sbsec-&gt;sid;

     if ((sbsec-&gt;flags &amp; SE_SBPROC) &amp;&amp; !S_ISLNK(inode-&gt;i_mode)) {
             if (opt_dentry) {
                     isec-&gt;sclass = inode_mode_to_security_class(...)
                     rc = selinux_proc_get_sid(opt_dentry,
                                               isec-&gt;sclass,
                                               &amp;sid);
                     if (rc)
                             goto out_unlock;
                     isec-&gt;sid = sid;
             }
     }

   Since opt_dentry is null, we'll never call selinux_proc_get_sid()
   and will leave the inode labeled with the label on the superblock.
   I believe a fix would be to mimic the behavior of xattrs.  Look
   for an alias of the inode.  If it can't be found, just leave the
   inode uninitialized (and pick it up later) if it can be found, we
   should be able to call selinux_proc_get_sid() ..."

On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax

However, with this patch in place we see the expected result:

   # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
   system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax

Cc: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Acked-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>SELinux: Increase ebitmap_node size for 64-bit configuration</title>
<updated>2014-03-12T12:25:43Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hp.com</email>
</author>
<published>2013-07-23T21:38:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17d633ca63add6ccaa77e125ba1e8693635bc9f1'/>
<id>urn:sha1:17d633ca63add6ccaa77e125ba1e8693635bc9f1</id>
<content type='text'>
commit a767f680e34bf14a36fefbbb6d85783eef99fd57 upstream.

Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.

On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.

This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (&lt; 2) as more bits are available in each node.

Signed-off-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>SELinux: Reduce overhead of mls_level_isvalid() function call</title>
<updated>2014-03-12T12:25:42Z</updated>
<author>
<name>Waiman Long</name>
<email>Waiman.Long@hp.com</email>
</author>
<published>2013-07-23T21:38:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3b32517baa8fc344dd216ed0d57f1589a041746a'/>
<id>urn:sha1:3b32517baa8fc344dd216ed0d57f1589a041746a</id>
<content type='text'>
commit fee7114298cf54bbd221cdb2ab49738be8b94f4c upstream.

While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:

  5.04%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  1.96%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  1.95%            ls  [kernel.kallsyms]     [k] find_next_bit

The ebitmap_get_bit() was the hottest function in the perf-report
output.  Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.

Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.

The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().

With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.

  0.07%            ls  [kernel.kallsyms]     [k] ebitmap_contains
  0.05%            ls  [kernel.kallsyms]     [k] ebitmap_get_bit
  0.01%            ls  [kernel.kallsyms]     [k] mls_level_isvalid
  0.01%            ls  [kernel.kallsyms]     [k] find_next_bit

The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.

This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime |     +0.1%     |     +0.9%      |     +2.6%       |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long &lt;Waiman.Long@hp.com&gt;
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>SELinux: bigendian problems with filename trans rules</title>
<updated>2014-03-05T16:13:53Z</updated>
<author>
<name>Eric Paris</name>
<email>eparis@redhat.com</email>
</author>
<published>2014-02-20T15:56:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6c46a5ba4a133fec554853cd27fb94211ccdefa1'/>
<id>urn:sha1:6c46a5ba4a133fec554853cd27fb94211ccdefa1</id>
<content type='text'>
commit 9085a6422900092886da8c404e1c5340c4ff1cbf upstream.

When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian.  On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu.  So the values are all screwed up.  Write the values in le
format like it should have been to start.

Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
Acked-by:  Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
</entry>
<entry>
<title>SELinux: Fix kernel BUG on empty security contexts.</title>
<updated>2014-02-20T19:07:58Z</updated>
<author>
<name>Stephen Smalley</name>
<email>sds@tycho.nsa.gov</email>
</author>
<published>2014-01-30T16:26:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=81451fe1e821f3fd715d102faf6181bbfb301878'/>
<id>urn:sha1:81451fe1e821f3fd715d102faf6181bbfb301878</id>
<content type='text'>
commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream.

Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] ------------[ cut here ]------------
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode: 0000 [#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[  474.183707] RIP: 0010:[&lt;ffffffff814681c7&gt;]  [&lt;ffffffff814681c7&gt;]
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[  474.556058] Stack:
[  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [&lt;ffffffff811b549b&gt;] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [&lt;ffffffff81468824&gt;] security_compute_av+0xf4/0x20b
[  474.811398]  [&lt;ffffffff8196f419&gt;] avc_compute_av+0x2a/0x179
[  474.843813]  [&lt;ffffffff8145727b&gt;] avc_has_perm+0x45/0xf4
[  474.875694]  [&lt;ffffffff81457d0e&gt;] inode_has_perm+0x2a/0x31
[  474.907370]  [&lt;ffffffff81457e76&gt;] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [&lt;ffffffff81455cf6&gt;] security_inode_getattr+0x1b/0x22
[  474.970036]  [&lt;ffffffff811b057d&gt;] vfs_getattr+0x19/0x2d
[  475.000618]  [&lt;ffffffff811b05e5&gt;] vfs_fstatat+0x54/0x91
[  475.030402]  [&lt;ffffffff811b063b&gt;] vfs_lstat+0x19/0x1b
[  475.061097]  [&lt;ffffffff811b077e&gt;] SyS_newlstat+0x15/0x30
[  475.094595]  [&lt;ffffffff8113c5c1&gt;] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [&lt;ffffffff8197791e&gt;] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 &lt;0f&gt; 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  [&lt;ffffffff814681c7&gt;]
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP &lt;ffff8805c0ac3c38&gt;
[  475.328734] ---[ end trace f076482e9d754adc ]---

Reported-by:  Matthew Thode &lt;mthode@mthode.org&gt;
Signed-off-by: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
