user/sven/linux.git/include/linux/cpufreq.h, branch v4.19.114

cpufreq: Avoid leaving stale IRQ work items during CPU offline

2019-12-31T15:36:22Z

commit 85572c2c4a45a541e880e087b5b17a48198b2416 upstream. The scheduler code calling cpufreq_update_util() may run during CPU offline on the target CPU after the IRQ work lists have been flushed for it, so the target CPU should be prevented from running code that may queue up an IRQ work item on it at that point. Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set for at least one cpufreq policy in the system, because that allows the CPU going offline to run the utilization update callback of the cpufreq governor on behalf of another (online) CPU in some cases. If that happens, the cpufreq governor callback may queue up an IRQ work on the CPU running it, which is going offline, and the IRQ work may not be flushed after that point. Moreover, that IRQ work cannot be flushed until the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync() to wait for the completion of that IRQ work, it will have to wait until the "offlining" CPU is back online and that may not happen forever. In particular, a system-wide deadlock may occur during CPU online as a result of that. The failing scenario is as follows. CPU0 is the boot CPU, so it creates a cpufreq policy and becomes the "leader" of it (policy->cpu). It cannot go offline, because it is the boot CPU. Next, other CPUs join the cpufreq policy as they go online and they leave it when they go offline. The last CPU to go offline, say CPU3, may queue up an IRQ work while running the governor callback on behalf of CPU0 after leaving the cpufreq policy because of the dvfs_possible_from_any_cpu effect described above. Then, CPU0 is the only online CPU in the system and the stale IRQ work is still queued on CPU3. When, say, CPU1 goes back online, it will run irq_work_sync() to wait for that IRQ work to complete and so it will wait for CPU3 to go back online (which may never happen even in principle), but (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide deadlock occurs. To address this problem notice that CPUs which cannot run cpufreq utilization update code for themselves (for example, because they have left the cpufreq policies that they belonged to), should also be prevented from running that code on behalf of the other CPUs that belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so in that case the cpufreq_update_util_data pointer of the CPU running the code must not be NULL as well as for the CPU which is the target of the cpufreq utilization update in progress. Accordingly, change cpufreq_this_cpu_can_update() into a regular function in kernel/sched/cpufreq.c (instead of a static inline in a header file) and make it check the cpufreq_update_util_data pointer of the local CPU if dvfs_possible_from_any_cpu is set for the target cpufreq policy. Also update the schedutil governor to do the cpufreq_this_cpu_can_update() check in the non-fast-switch case too to avoid the stale IRQ work issues. Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the platform permits") Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/ Reported-by: Anson Huang Tested-by: Anson Huang Cc: 4.14+ # 4.14+ Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Tested-by: Peng Fan (i.MX8QXP-MEK) Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

cpufreq: Use struct kobj_attribute instead of struct global_attr

2019-03-10T06:17:15Z

commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream. The cpufreq_global_kobject is created using kobject_create_and_add() helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store routines are set to kobj_attr_show() and kobj_attr_store(). These routines pass struct kobj_attribute as an argument to the show/store callbacks. But all the cpufreq files created using the cpufreq_global_kobject expect the argument to be of type struct attribute. Things work fine currently as no one accesses the "attr" argument. We may not see issues even if the argument is used, as struct kobj_attribute has struct attribute as its first element and so they will both get same address. But this is logically incorrect and we should rather use struct kobj_attribute instead of struct global_attr in the cpufreq core and drivers and the show/store callbacks should take struct kobj_attribute as argument instead. This bug is caught using CFI CLANG builds in android kernel which catches mismatch in function prototypes for such callbacks. Reported-by: Donghee Han Reported-by: Sangkyu Kim Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

cpufreq: Rename cpufreq_can_do_remote_dvfs()

2018-05-23T08:37:08Z

This routine checks if the CPU running this code belongs to the policy of the target CPU or if not, can it do remote DVFS for it remotely. But the current name of it implies as if it is only about doing remote updates. Rename it to make it more relevant. Suggested-by: Rafael J. Wysocki Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki

cpufreq: Drop cpufreq_table_validate_and_show()

2018-04-10T06:40:45Z

This isn't used anymore. Remove the helper and update documentation accordingly. Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki

cpufreq: Validate frequency table in the core

2018-02-27T17:22:12Z

By design, cpufreq drivers are responsible for calling cpufreq_frequency_table_cpuinfo() from their ->init() callbacks to validate the frequency table. However, if a cpufreq driver is buggy and fails to do so properly, it lead to unexpected behavior of the driver or the cpufreq core at a later point in time. It would be better if the core could validate the frequency table during driver initialization. To that end, introduce cpufreq_table_validate_and_sort() and make the cpufreq core call it right after invoking the ->init() callback of the driver and destroy the cpufreq policy if the table is invalid. For the time being the validation of the table happens twice, once from the driver and then from the core. The individual drivers will be updated separately to drop table validation if they don't need it for other reasons. The frequency table is marked "sorted" or "unsorted" by the new helper now instead of in cpufreq_table_validate_and_show(), as it should only be done after validating the table (which the drivers won't do going forward). Signed-off-by: Viresh Kumar [ rjw: Subject/changelog ] Signed-off-by: Rafael J. Wysocki

cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx()

2018-02-08T09:21:39Z

Pointer subtraction is slow and tedious. Therefore, replace all instances where cpufreq_for_each_{valid_,}entry loops contained such substractions with an iteration macro providing an index to the frequency_table entry. Suggested-by: Al Viro Link: http://lkml.kernel.org/r/20180120020237.GM13338@ZenIV.linux.org.uk Acked-by: Viresh Kumar Signed-off-by: Dominik Brodowski Signed-off-by: Rafael J. Wysocki

x86 / CPU: Always show current CPU frequency in /proc/cpuinfo

2017-11-15T18:46:50Z

After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki Acked-by: Thomas Gleixner Tested-by: Thomas Gleixner Acked-by: Ingo Molnar

cpufreq: provide default frequency-invariance setter function

2017-10-03T00:37:53Z

Frequency-invariant accounting support based on the ratio of current frequency and maximum supported frequency is an optional feature an arch can implement. Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for different arch's a default implementation of the frequency-invariance setter function arch_set_freq_scale() is needed. This default implementation is an empty weak function which will be overwritten by a strong function in case the arch provides one. The setter function passes the cpumask of related (to the frequency change) cpus (online and offline cpus), the (new) current frequency and the maximum supported frequency. Signed-off-by: Dietmar Eggemann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki

Merge branch 'pm-cpufreq-sched'

2017-09-03T22:05:22Z

* pm-cpufreq-sched: cpufreq: schedutil: Always process remote callback with slow switching cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily cpufreq: Return 0 from ->fast_switch() on errors cpufreq: Simplify cpufreq_can_do_remote_dvfs() cpufreq: Process remote callbacks from any CPU if the platform permits sched: cpufreq: Allow remote cpufreq callbacks cpufreq: schedutil: Use unsigned int for iowait boost cpufreq: schedutil: Make iowait boost more energy efficient

cpufreq: Simplify cpufreq_can_do_remote_dvfs()

2017-08-08T15:09:02Z

The if () in cpufreq_can_do_remote_dvfs() is superfluous, so drop it and simply return the value of the expression under it. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki