<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/cgroup.h, branch v4.4.29</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.29</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.29'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-12-03T15:18:21Z</updated>
<entry>
<title>cgroup: fix handling of multi-destination migration from subtree_control enabling</title>
<updated>2015-12-03T15:18:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-03T15:18:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1f7dd3e5a6e4f093017fff12232572ee1aa4639b'/>
<id>urn:sha1:1f7dd3e5a6e4f093017fff12232572ee1aa4639b</id>
<content type='text'>
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [&lt;ffffffff81551ffc&gt;] dump_stack+0x4e/0x82
  [&lt;ffffffff810de202&gt;] warn_slowpath_common+0x82/0xc0
  [&lt;ffffffff810de2fa&gt;] warn_slowpath_null+0x1a/0x20
  [&lt;ffffffff8118e031&gt;] pids_cancel.constprop.6+0x31/0x40
  [&lt;ffffffff8118e0fd&gt;] pids_can_attach+0x6d/0xf0
  [&lt;ffffffff81188a4c&gt;] cgroup_taskset_migrate+0x6c/0x330
  [&lt;ffffffff81188e05&gt;] cgroup_migrate+0xf5/0x190
  [&lt;ffffffff81189016&gt;] cgroup_attach_task+0x176/0x200
  [&lt;ffffffff8118949d&gt;] __cgroup_procs_write+0x2ad/0x460
  [&lt;ffffffff81189684&gt;] cgroup_procs_write+0x14/0x20
  [&lt;ffffffff811854e5&gt;] cgroup_file_write+0x35/0x1c0
  [&lt;ffffffff812e26f1&gt;] kernfs_fop_write+0x141/0x190
  [&lt;ffffffff81265f88&gt;] __vfs_write+0x28/0xe0
  [&lt;ffffffff812666fc&gt;] vfs_write+0xac/0x1a0
  [&lt;ffffffff81267019&gt;] SyS_write+0x49/0xb0
  [&lt;ffffffff81bcef32&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, -&gt;can_attach, -&gt;cancel_attach() and -&gt;attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
</content>
</entry>
<entry>
<title>cgroup: fix cftype-&gt;file_offset handling</title>
<updated>2015-11-16T15:58:26Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-11-05T05:12:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34c06254ff82a815fdccdfae7517a06c9b768cee'/>
<id>urn:sha1:34c06254ff82a815fdccdfae7517a06c9b768cee</id>
<content type='text'>
6f60eade2433 ("cgroup: generalize obtaining the handles of and
notifying cgroup files") introduced cftype-&gt;file_offset so that the
handles for per-css file instances can be recorded.  These handles
then can be used, for example, to generate file modified
notifications.

Unfortunately, it made the wrong assumption that files are created
once for a given css and removed on its destruction.  Due to the
dependencies among subsystems, a css may be hidden from userland and
then later shown again.  This is implemented by removing and
re-creating the affected files, so the associated kernfs_node for a
given cgroup file may change over time.  This incorrect assumption led
to the corruption of css-&gt;files lists.

Reimplement cftype-&gt;file_offset handling so that cgroup_file-&gt;kn is
protected by a lock and updated as files are created and destroyed.
This also makes keeping them on per-cgroup list unnecessary.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: James Sedgwick &lt;jsedgwick@fb.com&gt;
Fixes: 6f60eade2433 ("cgroup: generalize obtaining the handles of and notifying cgroup files")
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cgroup: keep zombies associated with their original cgroups</title>
<updated>2015-10-15T20:41:53Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-15T20:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e91fa7f6d451e3ea9fec999065d2fd199691f9d'/>
<id>urn:sha1:2e91fa7f6d451e3ea9fec999065d2fd199691f9d</id>
<content type='text'>
cgroup_exit() is called when a task exits and disassociates the
exiting task from its cgroups and half-attach it to the root cgroup.
This is unnecessary and undesirable.

No controller actually needs an exiting task to be disassociated with
non-root cgroups.  Both cpu and perf_event controllers update the
association to the root cgroup from their exit callbacks just to keep
consistent with the cgroup core behavior.

Also, this disassociation makes it difficult to track resources held
by zombies or determine where the zombies came from.  Currently, pids
controller is completely broken as it uncharges on exit and zombies
always escape the resource restriction.  With cgroup association being
reset on exit, fixing it is pretty painful.

There's no reason to reset cgroup membership on exit.  The zombie can
be removed from its css_set so that it doesn't show up on
"cgroup.procs" and thus can't be migrated or interfere with cgroup
removal.  It can still pin and point to the css_set so that its cgroup
membership is maintained.  This patch makes cgroup core keep zombies
associated with their cgroups at the time of exit.

* Previous patches decoupled populated_cnt tracking from css_set
  lifetime, so a dying task can be simply unlinked from its css_set
  while pinning and pointing to the css_set.  This keeps css_set
  association from task side alive while hiding it from "cgroup.procs"
  and populated_cnt tracking.  The css_set reference is dropped when
  the task_struct is freed.

* -&gt;exit() callback no longer needs the css arguments as the
  associated css never changes once PF_EXITING is set.  Removed.

* cpu and perf_events controllers no longer need -&gt;exit() callbacks.
  There's no reason to explicitly switch away on exit.  The final
  schedule out is enough.  The callbacks are removed.

* On traditional hierarchies, nothing changes.  "/proc/PID/cgroup"
  still reports "/" for all zombies.  On the default hierarchy,
  "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged
  to at the time of exit.  If the cgroup gets removed before the task
  is reaped, " (deleted)" is appended.

v2: Build brekage due to missing dummy cgroup_free() when
    !CONFIG_CGROUP fixed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock</title>
<updated>2015-10-15T20:41:53Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-15T20:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f0d9a5f175753a371bc7fdff0d584a8d9cd72bb0'/>
<id>urn:sha1:f0d9a5f175753a371bc7fdff0d584a8d9cd72bb0</id>
<content type='text'>
css_set_rwsem is the inner lock protecting css_sets and is accessed
from hot paths such as fork and exit.  Internally, it has no reason to
be a rwsem or even mutex.  There are no internal blocking operations
while holding it.  This was rwsem because css task iteration used to
expose it to external iterator users.  As the previous patch updated
css task iteration such that the locking is not leaked to its users,
there's no reason to keep it a rwsem.

This patch converts css_set_rwsem to a spinlock and rename it to
css_set_lock.  It uses bh-safe operations as a planned usage needs to
access it from RCU callback context.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: don't hold css_set_rwsem across css task iteration</title>
<updated>2015-10-15T20:41:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-15T20:41:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed27b9f7a17ddfbc007e16d4d11f33dff4fc2de7'/>
<id>urn:sha1:ed27b9f7a17ddfbc007e16d4d11f33dff4fc2de7</id>
<content type='text'>
css_sets are synchronized through css_set_rwsem but the locking scheme
is kinda bizarre.  The hot paths - fork and exit - have to write lock
the rwsem making the rw part pointless; furthermore, many readers
already hold cgroup_mutex.

One of the readers is css task iteration.  It read locks the rwsem
over the entire duration of iteration.  This leads to silly locking
behavior.  When cpuset tries to migrate processes of a cgroup to a
different NUMA node, css_set_rwsem is held across the entire migration
attempt which can take a long time locking out forking, exiting and
other cgroup operations.

This patch updates css task iteration so that it locks css_set_rwsem
only while the iterator is being advanced.  css task iteration
involves two levels - css_set and task iteration.  As css_sets in use
are practically immutable, simply pinning the current one is enough
for resuming iteration afterwards.  Task iteration is tricky as tasks
may leave their css_set while iteration is in progress.  This is
solved by keeping track of active iterators and advancing them if
their next task leaves its css_set.

v2: put_task_struct() in css_task_iter_next() moved outside
    css_set_rwsem.  A later patch will add cgroup operations to
    task_struct free path which may grab the same lock and this avoids
    deadlock possibilities.

    css_set_move_task() updated to use list_for_each_entry_safe() when
    walking task_iters and advancing them.  This is necessary as
    advancing an iter may remove it from the list.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cgroup_has_tasks() with cgroup_is_populated()</title>
<updated>2015-10-15T20:41:50Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-15T20:41:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=27bd4dbb8d51c476298e62bd088225317b7853de'/>
<id>urn:sha1:27bd4dbb8d51c476298e62bd088225317b7853de</id>
<content type='text'>
Currently, cgroup_has_tasks() tests whether the target cgroup has any
css_set linked to it.  This works because a css_set's refcnt converges
with the number of tasks linked to it and thus there's no css_set
linked to a cgroup if it doesn't have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch replaces cgroup_has_tasks() with cgroup_is_populated()
which tests cgroup-&gt;nr_populated instead which locally counts the
number of populated css_sets.  Unlike cgroup_has_tasks(),
cgroup_is_populated() is recursive - if any of the descendants is
populated, the cgroup is populated too.  While this changes the
meaning of the test, all the existing users are okay with the change.

While at it, replace the open-coded -&gt;populated_cnt test in
cgroup_events_show() with cgroup_is_populated().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader()</title>
<updated>2015-09-22T16:46:53Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-11T19:00:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4530eddb59494b89650d6bcd980fc7f7717ad80c'/>
<id>urn:sha1:4530eddb59494b89650d6bcd980fc7f7717ad80c</id>
<content type='text'>
It wasn't explicitly documented but, when a process is being migrated,
cpuset and memcg depend on cgroup_taskset_first() returning the
threadgroup leader; however, this approach is somewhat ghetto and
would no longer work for the planned multi-process migration.

This patch introduces explicit cgroup_taskset_for_each_leader() which
iterates over only the threadgroup leaders and replaces
cgroup_taskset_first() usages for accessing the leader with it.

This prepares both memcg and cpuset for multi-process migration.  This
patch also updates the documentation for cgroup_taskset_for_each() to
clarify the iteration rules and removes comments mentioning task
ordering in tasksets.

v2: A previous patch which added threadgroup leader test was dropped.
    Patch updated accordingly.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
</content>
</entry>
<entry>
<title>cgroup: generalize obtaining the handles of and notifying cgroup files</title>
<updated>2015-09-18T21:54:23Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-18T21:54:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f60eade2433cb3a38687d5f8a4f44b92c6c51bf'/>
<id>urn:sha1:6f60eade2433cb3a38687d5f8a4f44b92c6c51bf</id>
<content type='text'>
cgroup core handles creations and removals of cgroup interface files
as described by cftypes.  There are cases where the handle for a given
file instance is necessary, for example, to generate a file modified
event.  Currently, this is handled by explicitly matching the callback
method pointer and storing the file handle manually in
cgroup_add_file().  While this simple approach works for cgroup core
files, it can't for controller interface files.

This patch generalizes cgroup interface file handle handling.  struct
cgroup_file is defined and each cftype can optionally tell cgroup core
to store the file handle by setting -&gt;file_offset.  A file handle
remains accessible as long as the containing css is accessible.

Both "cgroup.procs" and "cgroup.events" are converted to use the new
generic mechanism instead of hooking directly into cgroup_add_file().
Also, cgroup_file_notify() which takes a struct cgroup_file and
generates a file modified event on it is added and replaces explicit
kernfs_notify() invocations.

This generalizes cgroup file handle handling and allows controllers to
generate file modified notifications.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cgroup_on_dfl() tests in controllers with cgroup_subsys_on_dfl()</title>
<updated>2015-09-18T15:56:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-18T15:56:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9e10a130d9b62af976d17d120c95f3650769312c'/>
<id>urn:sha1:9e10a130d9b62af976d17d120c95f3650769312c</id>
<content type='text'>
cgroup_on_dfl() tests whether the cgroup's root is the default
hierarchy; however, an individual controller is only interested in
whether the controller is attached to the default hierarchy and never
tests a cgroup which doesn't belong to the hierarchy that the
controller is attached to.

This patch replaces cgroup_on_dfl() tests in controllers with faster
static_key based cgroup_subsys_on_dfl().  This leaves cgroup core as
the only user of cgroup_on_dfl() and the function is moved from the
header file to cgroup.c.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: implement static_key based cgroup_subsys_enabled() and cgroup_subsys_on_dfl()</title>
<updated>2015-09-18T15:56:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-18T15:56:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49d1dc4b81797f88270832b11e9f73809e7e7209'/>
<id>urn:sha1:49d1dc4b81797f88270832b11e9f73809e7e7209</id>
<content type='text'>
Whether a subsys is enabled and attached to the default hierarchy
seldom changes and may be tested in the hot paths.  This patch
implements static_key based cgroup_subsys_enabled() and
cgroup_subsys_on_dfl() tests.

The following patches will update the users and remove duplicate
mechanisms.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
</entry>
</feed>
