<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/cgroup.h, branch v4.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-02-16T18:04:58Z</updated>
<entry>
<title>cgroup: introduce cgroup namespaces</title>
<updated>2016-02-16T18:04:58Z</updated>
<author>
<name>Aditya Kali</name>
<email>adityakali@google.com</email>
</author>
<published>2016-01-29T08:54:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a79a908fd2b080977b45bf103184b81c9d11ad07'/>
<id>urn:sha1:a79a908fd2b080977b45bf103184b81c9d11ad07</id>
<content type='text'>
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
 they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali &lt;adityakali@google.com&gt;
Signed-off-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2016-01-13T03:20:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-01-13T03:20:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8'/>
<id>urn:sha1:34a9304a96d6351c2d35dcdc9293258378fc0bd8</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup v2 interface is now official.  It's no longer hidden behind a
   devel flag and can be mounted using the new cgroup2 fs type.

   Unfortunately, cpu v2 interface hasn't made it yet due to the
   discussion around in-process hierarchical resource distribution and
   only memory and io controllers can be used on the v2 interface at the
   moment.

 - The existing documentation which has always been a bit of mess is
   relocated under Documentation/cgroup-v1/. Documentation/cgroup-v2.txt
   is added as the authoritative documentation for the v2 interface.

 - Some features are added through for-4.5-ancestor-test branch to
   enable netfilter xt_cgroup match to use cgroup v2 paths.  The actual
   netfilter changes will be merged through the net tree which pulled in
   the said branch.

 - Various cleanups

* 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rename cgroup documentations
  cgroup: fix a typo.
  cgroup: Remove resource_counter.txt in Documentation/cgroup-legacy/00-INDEX.
  cgroup: demote subsystem init messages to KERN_DEBUG
  cgroup: Fix uninitialized variable warning
  cgroup: put controller Kconfig options in meaningful order
  cgroup: clean up the kernel configuration menu nomenclature
  cgroup_pids: fix a typo.
  Subject: cgroup: Fix incomplete dd command in blkio documentation
  cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends
  cpuset: Replace all instances of time_t with time64_t
  cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
  cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
  cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-12-18T03:08:28Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-12-18T03:08:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b3e0d3d7bab14f2544a3314bec53a23dc7dd2206'/>
<id>urn:sha1:b3e0d3d7bab14f2544a3314bec53a23dc7dd2206</id>
<content type='text'>
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock, cgroup: add sock-&gt;sk_cgroup</title>
<updated>2015-12-09T03:02:33Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-07T22:38:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd1060a1d67128bb8fbe2e1384c518912cbe54e7'/>
<id>urn:sha1:bd1060a1d67128bb8fbe2e1384c518912cbe54e7</id>
<content type='text'>
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound.  As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter.  They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical.  Both inherit
configuration from the parent on creation but there's no interaction
afterwards.  An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level.  net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

In preparation, this patch updates sock-&gt;sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used.  Once either of the two is used,
sock-&gt;sk_cgrp_data reverts to its previous role of carrying prioidx
and classid.  This is to avoid adding yet another cgroup related field
to struct sock.

As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead.  It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs.  Non-critical inaccuracies from small race windows won't
make any noticeable difference.

This patch doesn't make use of the pointer yet.  The following patch
will implement netfilter match for cgroup2 membership.

v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
    cgroup specific field.

v3: Add comments explaining why sock_data_prioidx() and
    sock_data_classid() use different fallback values.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
CC: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.5-ancestor-test' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.5</title>
<updated>2015-12-07T22:24:10Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-07T22:24:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=177493987c1a15145922a65240f2f8ab6c63770a'/>
<id>urn:sha1:177493987c1a15145922a65240f2f8ab6c63770a</id>
<content type='text'>
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends</title>
<updated>2015-12-03T15:24:08Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-12-03T15:24:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b53202e6308939d33ba0c78712e850f891b4e76f'/>
<id>urn:sha1:b53202e6308939d33ba0c78712e850f891b4e76f</id>
<content type='text'>
Now that nobody use the "priv" arg passed to can_fork/cancel_fork/fork we can
kill CGROUP_CANFORK_COUNT/SUBSYS_TAG/etc and cgrp_ss_priv[] in copy_process().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix handling of multi-destination migration from subtree_control enabling</title>
<updated>2015-12-03T15:18:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-03T15:18:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1f7dd3e5a6e4f093017fff12232572ee1aa4639b'/>
<id>urn:sha1:1f7dd3e5a6e4f093017fff12232572ee1aa4639b</id>
<content type='text'>
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [&lt;ffffffff81551ffc&gt;] dump_stack+0x4e/0x82
  [&lt;ffffffff810de202&gt;] warn_slowpath_common+0x82/0xc0
  [&lt;ffffffff810de2fa&gt;] warn_slowpath_null+0x1a/0x20
  [&lt;ffffffff8118e031&gt;] pids_cancel.constprop.6+0x31/0x40
  [&lt;ffffffff8118e0fd&gt;] pids_can_attach+0x6d/0xf0
  [&lt;ffffffff81188a4c&gt;] cgroup_taskset_migrate+0x6c/0x330
  [&lt;ffffffff81188e05&gt;] cgroup_migrate+0xf5/0x190
  [&lt;ffffffff81189016&gt;] cgroup_attach_task+0x176/0x200
  [&lt;ffffffff8118949d&gt;] __cgroup_procs_write+0x2ad/0x460
  [&lt;ffffffff81189684&gt;] cgroup_procs_write+0x14/0x20
  [&lt;ffffffff811854e5&gt;] cgroup_file_write+0x35/0x1c0
  [&lt;ffffffff812e26f1&gt;] kernfs_fop_write+0x141/0x190
  [&lt;ffffffff81265f88&gt;] __vfs_write+0x28/0xe0
  [&lt;ffffffff812666fc&gt;] vfs_write+0xac/0x1a0
  [&lt;ffffffff81267019&gt;] SyS_write+0x49/0xb0
  [&lt;ffffffff81bcef32&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, -&gt;can_attach, -&gt;cancel_attach() and -&gt;attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
</content>
</entry>
<entry>
<title>cgroup: implement cgroup_get_from_path() and expose cgroup_put()</title>
<updated>2015-11-20T20:55:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-11-20T20:55:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=16af439645455fbf36984ca5e72f31073ee19ab7'/>
<id>urn:sha1:16af439645455fbf36984ca5e72f31073ee19ab7</id>
<content type='text'>
Implement cgroup_get_from_path() using kernfs_walk_and_get() which
obtains a default hierarchy cgroup from its path.  This will be used
to allow cgroup path based matching from outside cgroup proper -
e.g. networking and perf.

v2: Add EXPORT_SYMBOL_GPL(cgroup_get_from_path).

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: record ancestor IDs and reimplement cgroup_is_descendant() using it</title>
<updated>2015-11-20T20:55:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-11-20T20:55:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b11cfb5807e30333b36c02701382b820b7dcf0d5'/>
<id>urn:sha1:b11cfb5807e30333b36c02701382b820b7dcf0d5</id>
<content type='text'>
cgroup_is_descendant() currently walks up the hierarchy and compares
each ancestor to the cgroup in question.  While enough for cgroup core
usages, this can't be used in hot paths to test cgroup membership.
This patch adds cgroup-&gt;ancestor_ids[] which records the IDs of all
ancestors including self and cgroup-&gt;level for the nesting level.

This allows testing whether a given cgroup is a descendant of another
in three finite steps - testing whether the two belong to the same
hierarchy, whether the descendant candidate is at the same or a higher
level than the ancestor and comparing the recorded ancestor_id at the
matching level.  cgroup_is_descendant() is accordingly reimplmented
and made inline.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix cftype-&gt;file_offset handling</title>
<updated>2015-11-16T15:58:26Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-11-05T05:12:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34c06254ff82a815fdccdfae7517a06c9b768cee'/>
<id>urn:sha1:34c06254ff82a815fdccdfae7517a06c9b768cee</id>
<content type='text'>
6f60eade2433 ("cgroup: generalize obtaining the handles of and
notifying cgroup files") introduced cftype-&gt;file_offset so that the
handles for per-css file instances can be recorded.  These handles
then can be used, for example, to generate file modified
notifications.

Unfortunately, it made the wrong assumption that files are created
once for a given css and removed on its destruction.  Due to the
dependencies among subsystems, a css may be hidden from userland and
then later shown again.  This is implemented by removing and
re-creating the affected files, so the associated kernfs_node for a
given cgroup file may change over time.  This incorrect assumption led
to the corruption of css-&gt;files lists.

Reimplement cftype-&gt;file_offset handling so that cgroup_file-&gt;kn is
protected by a lock and updated as files are created and destroyed.
This also makes keeping them on per-cgroup list unnecessary.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: James Sedgwick &lt;jsedgwick@fb.com&gt;
Fixes: 6f60eade2433 ("cgroup: generalize obtaining the handles of and notifying cgroup files")
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
</entry>
</feed>
