user/sven/linux.git/drivers/cpufreq, branch v3.18.51

cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

2016-06-20T03:47:45Z

[ Upstream commit 983e600e88835f0321d1a0ea06f52d48b7b5a544 ] When turbo is disabled, the ->set_policy() interface is broken. For example, when turbo is disabled and cpuinfo.max = 2900000 (full max turbo frequency), setting the limits results in frequency less than the requested one: Set 1000000 KHz results in 0700000 KHz Set 1500000 KHz results in 1100000 KHz Set 2000000 KHz results in 1500000 KHz This is because the limits->max_perf fraction is calculated using the max turbo frequency as the reference, but when the max P-State is capped in intel_pstate_get_min_max(), the reference is not the max turbo P-State. This results in reducing max P-State. One option is to always use max turbo as reference for calculating limits. But this will not be correct. By definition the intel_pstate sysfs limits, shows percentage of available performance. So when BIOS has disabled turbo, the available performance is max non turbo. So the max_perf_pct should still show 100%. Signed-off-by: Srinivas Pandruvada [ rjw : Subject & changelog, rewrite in fewer lines of code ] Cc: All applicable Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

intel_pstate: Fix overflow in busy_scaled due to long delay

2015-10-28T02:14:30Z

[ Upstream commit 7180dddf7c32c49975c7e7babf2b60ed450cb760 ] The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will crash with a divide by zero error: divide error: 0000 [#1] SMP Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1 Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014 task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000 RIP: 0010:[] [] intel_pstate_timer_func+0x179/0x3d0 RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206 RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001 R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5 R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100 Call Trace: [] ? run_posix_cpu_timers+0x51/0x840 [] ? trigger_load_balance+0x5d/0x200 [] ? pid_param_set+0x130/0x130 [] call_timer_fn+0x36/0x110 [] ? pid_param_set+0x130/0x130 [] run_timer_softirq+0x21f/0x320 [] __do_softirq+0xef/0x280 [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0x115/0x120 [] smp_apic_timer_interrupt+0x45/0x60 [] apic_timer_interrupt+0x6d/0x80 [] ? cpuidle_enter_state+0x52/0xc0 [] ? cpuidle_enter_state+0x48/0xc0 [] cpuidle_idle_call+0xc5/0x200 [] arch_cpu_idle+0xe/0x30 [] cpu_startup_entry+0xf1/0x290 [] start_secondary+0x1ba/0x230 Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29 RIP [] intel_pstate_timer_func+0x179/0x3d0 RSP The kernel values for cpudata for CPU 113 were: struct cpudata { cpu = 113, timer = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 8357799745, base = 0xffff883fe84ec001, function = 0xffffffff814a9100 , data = 18446612406765768960, i_gain = 0, d_gain = 0, deadband = 0, last_err = 22489 }, last_sample_time = { tv64 = 4063132438017305 }, prev_aperf = 287326796397463, prev_mperf = 251427432090198, sample = { core_pct_busy = 23081, aperf = 2937407, mperf = 3257884, freq = 2524484, time = { tv64 = 4063149215234118 } } } which results in the time between samples = last_sample_time - sample.time = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds. The duration between reads of the APERF and MPERF registers overflowed a s32 sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0. While the kernel shouldn't be delaying for a long time, it can and does happen and the intel_pstate driver should not panic in this situation. This patch changes the div_fp() function to use div64_s64() to allow for "long" division. This will avoid the overflow condition on long delays. [v2]: use div64_s64() in div_fp() Signed-off-by: Prarit Bhargava Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

cpufreq: dt: Tolerance applies on both sides of target voltage

2015-10-28T02:14:09Z

[ Upstream commit a2022001cebd0825b96aa0f3345ea3ad44ae79d4 ] Tolerance applies on both sides of the target voltage, i.e. both min and max sides. But while checking if a voltage is supported by the regulator or not, we haven't taken care of tolerance on the lower side. Fix that. Cc: Lucas Stach Fixes: 045ee45c4ff2 ("cpufreq: cpufreq-dt: disable unsupported OPPs") Signed-off-by: Viresh Kumar Reviewed-by: Lucas Stach Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processors

2015-08-27T17:25:58Z

[ Upstream commit 7e7e8fe69820c6fa31395dbbd8e348e3c69cd2a9 ] The pcc-cpufreq driver is not automatically loaded on systems where the platform's power management setting requires this driver. Instead, on those systems no CPU frequency driver is registered and active. Make the autoloading matching criteria for loading the pcc-cpufreq driver the same as done in acpi-cpufreq by commit c655affbd524d01 ("ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq"). x86 CPU frequency drivers are now typically autoloaded by specifying MODULE_DEVICE_TABLE entries and x86cpu model specific matching. But pcc-cpufreq was omitted when acpi-cpufreq and other drivers were changed to use this approach. Both acpi-cpufreq and pcc-cpufreq depend on a distinct and mutually exclusive set of ACPI methods which are not directly tied to specific processor model numbers. Both of these drivers have init routines which look for their required ACPI methods. As a result, only the appropriate driver registers as the cpu frequency driver and the other one ends up being unloaded. Tested on various systems where acpi-cpufreq, intel_pstate, and pcc-cpufreq are the expected cpu frequency drivers. Signed-off-by: Lenny Szubowicz Signed-off-by: Joseph Szczypek Reported-by: Trinh Dao Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

intel_pstate: set BYT MSR with wrmsrl_on_cpu()

2015-07-04T03:02:15Z

[ Upstream commit 0dd23f94251f49da99a6cbfb22418b2d757d77d6 ] Commit 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.) introduced byt_set_pstate() with the assumption that it would always be run by the CPU whose MSR is to be written by it. It turns out, however, that is not always the case in practice, so modify byt_set_pstate() to enforce the MSR write done by it to always happen on the right CPU. Fixes: 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.) Signed-off-by: Joe Konno Acked-by: Kristen Carlson Accardi Cc: 3.14+ # 3.14+ Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

cpufreq: Schedule work for the first-online CPU on resume

2015-04-24T21:13:44Z

[ Upstream commit c75de0ac0756d4b442f460e10461720c7c2412c2 ] All CPUs leaving the first-online CPU are hotplugged out on suspend and and cpufreq core stops managing them. On resume, we need to call cpufreq_update_policy() for this CPU's policy to make sure its frequency is in sync with cpufreq's cached value, as it might have got updated by hardware during suspend/resume. The policies are always added to the top of the policy-list. So, in normal circumstances, CPU 0's policy will be the last one in the list. And so the code checks for the last policy. But there are cases where it will fail. Consider quad-core system, with policy-per core. If CPU0 is hotplugged out and added back again, the last policy will be on CPU1 :( To fix this in a proper way, always look for the policy of the first online CPU. That way we will be sure that we are calling cpufreq_update_policy() for the only CPU that wasn't hotplugged out. Cc: 3.15+ # 3.15+ Fixes: 2f0aea936360 ("cpufreq: suspend governors on system suspend/hibernate") Reported-by: Saravana Kannan Signed-off-by: Viresh Kumar Acked-by: Saravana Kannan Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin

cpufreq: s3c: remove last use of resume_clocks callback

2015-03-06T22:52:48Z

commit 67fadaa2768716209ee19a8b8bf05bc3ac399445 upstream. Commit 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code") already removed the callback pointer, but there was one remaining user: drivers/cpufreq/s3c24xx-cpufreq.c: In function 's3c_cpufreq_resume_clocks': drivers/cpufreq/s3c24xx-cpufreq.c:149:14: error: 'struct s3c_cpufreq_info' has no member named 'resume_clocks' cpu_cur.info->resume_clocks(); ^ Signed-off-by: Arnd Bergmann Fixes: 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code") Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

cpufreq: s3c: remove incorrect __init annotations

2015-03-06T22:52:48Z

commit 61882b63171736571e1139ab5aa929e3bb336016 upstream. The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register are marked init but are called from a context that might be run after the __init sections are discarded, as the compiler points out: WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init() WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register() This removes the __init markings. Signed-off-by: Arnd Bergmann Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

cpufreq: speedstep-smi: enable interrupts when waiting

2015-03-06T22:52:48Z

commit d4d4eda23794c701442e55129dd4f8f2fefd5e4d upstream. On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the speedstep-smi driver sometimes loads and sometimes doesn't load with "change to state X failed" message. The hardware sometimes refuses to change frequency and in this case, we need to retry later. I found out that we need to enable interrupts while waiting. When we enable interrupts, the hardware blockage that prevents frequency transition resolves and the transition is possible. With disabled interrupts, the blockage doesn't resolve (no matter how long do we wait). The exact reasons for this hardware behavior are unknown. This patch enables interrupts in the function speedstep_set_state that can be called with disabled interrupts. However, this function is called with disabled interrupts only from speedstep_get_freqs, so it shouldn't cause any problem. Signed-off-by: Mikulas Patocka Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

2015-03-06T22:52:47Z

commit 6ffae8c06fab058d6c3f8ecb7f921327721034e7 upstream. In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared before calling kobject_put(&policy->kobj) and under cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a non-NULL policy from that after kobject_put(&policy->kobj) was executed. Consider this case: Thread A Thread B cpufreq_cpu_get() acquire cpufreq_driver_lock read-per-cpu cpufreq_cpu_data kobject_put(&policy->kobj); kobject_get(&policy->kobj); ... per_cpu(&cpufreq_cpu_data, cpu) = NULL And this will result in a warning like this one: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 kobject_get+0x41/0x50() Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon ... Call Trace: [] dump_stack+0x46/0x58 [] warn_slowpath_common+0x81/0xa0 [] warn_slowpath_null+0x1a/0x20 [] kobject_get+0x41/0x50 [] cpufreq_cpu_get+0x75/0xc0 [] cpufreq_update_policy+0x2e/0x1f0 [] ? up+0x32/0x50 [] ? acpi_ns_get_node+0xcb/0xf2 [] ? acpi_evaluate_object+0x22c/0x252 [] ? acpi_get_handle+0x95/0xc0 [] ? acpi_has_method+0x25/0x40 [] acpi_processor_ppc_has_changed+0x77/0x82 [] ? move_linked_works+0x66/0x90 [] acpi_processor_notify+0x58/0xe7 [] acpi_ev_notify_dispatch+0x44/0x5c [] acpi_os_execute_deferred+0x15/0x22 [] process_one_work+0x160/0x410 [] worker_thread+0x11b/0x520 [] ? rescuer_thread+0x380/0x380 [] kthread+0xe1/0x100 [] ? kthread_create_on_node+0x1b0/0x1b0 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x1b0/0x1b0 ---[ end trace 89e66eb9795efdf7 ]--- The actual code flow is as follows: Thread A: Workqueue: kacpi_notify acpi_processor_notify() acpi_processor_ppc_has_changed() cpufreq_update_policy() cpufreq_cpu_get() kobject_get() Thread B: xenbus_thread() xenbus_thread() msg->u.watch.handle->callback() handle_vcpu_hotplug_event() vcpu_hotplug() cpu_down() __cpu_notify(CPU_POST_DEAD..) cpufreq_cpu_callback() __cpufreq_remove_dev_finish() cpufreq_policy_put_kobj() kobject_put() cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under cpufreq_driver_lock, and once it gets a valid policy it expects it to not be freed until cpufreq_cpu_put() is called. But the race happens when another thread puts the kobject first and updates cpufreq_cpu_data before or later. And so the first thread gets a valid policy structure and before it does kobject_get() on it, the second one has already done kobject_put(). Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that too under locks. Reported-by: Ethan Zhao Reported-by: Santosh Shilimkar Signed-off-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman