user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2003-08-14	[PATCH] More timer race fixes	Ingo Molnar
	Patch from Julie DeWandel. This patch has solved the crashes observed during TPC-C runs on the 16-way box. (I'm confident it will fix the other reported cases as well.) The race is the setting of timer->base to NULL, by del_timer() or __run_timers(). If new_base == old_base in __mod_timer() then we do not re-check timer->base after getting the lock. (the only case where we do not have to re-check the base is in the !old_base case, but the else branch also includes the old_base==new_base case.) The __run_timers() case made the lock_timer() patch not work fully - we cannot use lock_timer() in __run_timers() due to lock ordering.
2003-08-14	[PATCH] timer race fixes	Andrew Morton
	From: Ingo Molnar <mingo@elte.hu> It unifies the functionality of add_timer() and mod_timer(), and makes any combination of the timer API calls completely SMP-safe. del_timer() is still not using the timer lock. this patch fixes the only timer bug in 2.6 i'm aware of: the del_timer_sync() + add_timer() combination in kernel/itimer.c is buggy. This was correct code in 2.4, because there it was safe to do an add_timer() from the timer handler itself, parallel to a del_timer_sync(). If we want to make this safe in 2.6 too (which i think we want to) then we have to make add_timer() almost equivalent to mod_timer(), locking-wise. And once we are at this point i think it's much cleaner to actually make add_timer() a variant of mod_timer(). (There's no locking cost for add_timer(), only the cost of an extra branch. And we've removed another commonly used function from the icache.)
2003-08-14	[PATCH] missing #if for 1000 HZ	Andrew Morton
	From: Albert Cahalan <albert@users.sourceforge.net> This should improve timekeeping a bit @ 1000 HZ.
2003-08-06	[PATCH] Add do_setitimer prototype to linux/time.h	Andrew Morton
	From: Peter Chubb <peterc@gelato.unsw.edu.au> Currently, do_setitimer() is used in several files, but doesn't appear in any header. Thus its declaration is repeated in some files, and its use causes a warning in others (because there is no declaration present). This patch: -- adds a couple of declarations to linux/times.h -- removes the (now duplicate) declarations from other files.
2003-07-05	[PATCH] another timer overflow thing	Ingo Molnar
	in add_timer_internal() we simply leave the timer pending forever if the expiry is in more than 0xffffffff jiffies. This means more than 48 days on eg. ia64 - which is not an unrealistic timeout. IIRC crond is happy to use extremely large timeouts. It's better to time out early (if you can call 48 days "early") than to not time out at all.
2003-06-26	[PATCH] Use Local Percpu Macros for Local Percpu Variables	Rusty Russell
	In general, it is more better to use get_cpu_var() and __get_cpu_var() to access per-cpu variables on this CPU than to use smp_processor_id() and per_cpu(). In the current default implemention they are equivalent, but on IA64 the former is already faster, and other archs will follow.
2003-06-20	[PATCH] revert adjtimex changes	Andrew Morton
	From: John Stultz, George Anzinger, Eric Piel There was confusion over the definition of TICK_USEC. TICK_USEC is supposed to be based on USER_HZ, however a recent change caused TICK_USEC to be based on HZ. This broke the adjtimex() interface on systems where USER_HZ != HZ. This patch reverts the change to TICK_USEC, removes an added mis-use of the value and fixes some incorrect comments that could lead to this sort of confusion. Also this patch resolves the related LTP adjtimex failures.
2003-06-14	[PATCH] More time clean up stuff	Andrew Morton
	From: george anzinger <george@mvista.com> This patch addresses issues of roundoff error in the time keeping and NTP code as follows: The conversion of "actual jiffies" to TICK_USEC and then to TICK_NSEC introduced large errors if jiffies was not a power of 10 (e.g. 1024 for the ia64). Most of this is avoided by converting directly to TICK_NSEC. The calculation of MAX_SEC_IN_JIFFIES (the largest timespec or timeval the kernel will attempt) had overflow problems in the 64-bit machines. We introduce a different equation for those machines. The NTP frequency update code was allowing a micro second of error to accumulate before applying the correction. We change FINEUSEC to FINENSEC to do the correction as soon as a full nanosecond has accumulated. The initial calculation of time_freq for NTP had severe roundoff errors for HZ not a power of 10 (i.e. 1024). A new equation fixes this. clock_nanosleep is changed to round up to the next jiffie to cover starting between jiffies.
2003-06-14	[PATCH] Some clean up of the time code.	Andrew Morton
	From: george anzinger <george@mvista.com> This patch does the following: Pushs down the change from timeval to timespec in the settime routines. Fixes two places where time was set without updating the monotonic clock offset. (Changes sys_stime() to call do_settimeofday() and changes clock_warp to do the update directly.) These were bugs! Changes the uptime code to use the posix_clock_monotonic notion of uptime instead of the jiffies. This time will track NTP changes and so should be better than your standard wristwatch (if your using ntp). Changes posix_clock_monotonic to start at 0 on boot (was set to start at initial jiffies). Fixes a bug (never experienced) in timer_create() in posix-timers.c where we "could" have released timer_id 0 if "id resources" were low. Adds a test in do_settimeofday() to error out (EINVAL) attempts to use unnormalized times. This is passed back up to both settimeofday and posix_setclock(). Warning: Requires changes in .../arch/???/kernel/time.c to change do_settimeofday() to return an error if time is not normalized and to use a timespec instead of timeval for its input.
2003-06-06	[PATCH] Move cpu notifiers et al to cpu.h	Rusty Russell
	Trivial patch: when these were introduced cpu.h didn't exist.
2003-06-05	[PATCH] misc fixes	Andrew Morton
	- Add comment about slab ctor behaviour (Ingo Oeser) - mm/slab.c:fprob() shows up in profiles a lot. Rename it to something more meaningful. - fatfs printk warning fix (Randy Dunlap) - give the the time interpolator list and lock file-static scope (hch)
2003-06-04	[PATCH] clean up timer interpolation code	Andrew Morton
	From: Christoph Hellwig <hch@lst.de> - don't add one level of indentation when taking a lock - remove useless ti_global struct
2003-06-02	[PATCH] improved core support for time-interpolation	Andrew Morton
	From: David Mosberger <davidm@napali.hpl.hp.com> Basically, what the patch does is provide two hooks such that platforms (and subplatforms) can provide time-interpolation in a way that guarantees that two causally related gettimeofday() calls will never see time going backwards (unless there is a settimeofday() call, of course). There is some evidence that the current scheme does work: we use it on ia64 both for cycle-counter-based interpolation and the SGI folks use it with a chipset-based high-performance counter. It seems like enough platforms do this sort of thing to provide _some_ support in the core, especially because it's rather tricky to guarantee that time never goes backwards (short of a settimeofday, of course). This patch is based on something Jes Sorensen wrote for the SGI Itanium 2 platform (which has a chipset-internal high-res clock). I adapted it so it can be used for cycle-counter interpolation also. The net effect is that "last_time_offset" can be removed completely from the kernel. The basic idea behind the patch is simply: every time you advance xtime by N nanoseconds, you call update_wall_time_hook(NSEC). Every time the time gets set (i.e., discontinuity is OK), reset_wall_time_hook() is called.
2003-05-12	[PATCH] Use '#ifdef' to test for CONFIG_xxx variables	Steven Cole
	Don't depend on undefined preprocessor symbols evaluating to zero.
2003-04-20	[PATCH] Fix POSIX timers to give CLOCK_MONOTONIC full	Andrew Morton
	The POSIX CLOCK_MONOTONIC currently has only 1/HZ resolution. Further, it is tied to jiffies (i.e. is a restatment of jiffies) rather than "xtime" or the gettimeofday() clock. This patch changes CLOCK_MONOTONIC to be a restatment of gettimeofday() plus an offset to remove any clock setting activity from CLOCK_MONOTONIC. An offset is kept that represents the difference between CLOCK_MONOTONIC and gettimeofday(). This offset is updated when ever the gettimeofday() clock is set to back the clock setting change out of CLOCK_MONOTONIC (which by the standard, can not be set). With this change CLOCK_REALTIME (a direct restatement of gettimeofday()), CLOCK_MONOTONIC and gettimeofday() will all tick at the same time and with the same rate. And all will be affected by NTP adjustments (save those which actually set the time).
2003-04-11	[PATCH] too much timer simplification...	George Anzinger
	Noted by David Mosberger: "If someone happens to arm a periodic timer at exactly 256 jiffies (as ohci happens to do on platforms with HZ=1024), then you end up getting an endless loop of timer activations, causing a machine hang. The problem is that __run_timers updates base->timer_jiffies _before_ running the callback routines. If a callback re-arms the timer at exactly 256 jiffies, add_timers() will reinsert the timer into the list that we're currently processing, which of course will cause the timer to expire immediately again, etc., etc., ad naseum... " The answer here is to move the whole expired list to a local header and to not look back.
2003-04-09	Add a user pointer annotation to sysinfo()	Linus Torvalds

2003-03-23	[PATCH] don't include swap.h in mm.h	Christoph Hellwig
	swap.h is basically the header for MM internals instead of the public API (mm_internal.h would have been a better name..). Stop including it in mm.h - this only needs moving one function that should be in swap.h anyway to the right place and fixing up a bunch of places using it.
2003-03-22	[PATCH] simplify the timer lockup avoidance code	Andrew Morton
	From: george anzinger <george@mvista.com> The recently-added code which avoids a lockup when a timer handler re-adds the timer right now can be simplified. If we change __run_timers() to increment base->timer_jiffies _before_ running the timers, then any re-additions will not be inserted in the list which __run_timers is presently walking.
2003-03-22	[PATCH] timer simplification	Andrew Morton
	From: george anzinger <george@mvista.com> Remove the `index' field from the timer structures. It contains the same info as the timer_jiffies field. So just use the base->timer_jiffies field directly.
2003-03-20	[PATCH] fix nanosleep() granularity bumps	Andrew Morton
	From: Tim Schmielau <tim@physik3.uni-rostock.de> Fixes the problem wherein nanosleep() is sleeping for the wrong duration. When starting out with timer_jiffies=0, the timer cascade is (unneccessarily) triggered on the first timer interrupt, incrementing all the higher indices. When starting with any other initial jiffies value, we miss that and end up with all higher indices being off by one.
2003-03-17	[PATCH] timer re-addition lockup fix	Andrew Morton
	This is a forward-port of Andrea's fix in 2.4. If a timer handler re-adds a timer to go off right now, __run_timers() will never terminate. (I wrote a test. It happens.) Fix that up by teaching internal_add_timer() to detect when it is being called from within the context of __run_timers() and to park newly-added timers onto a temp list instead. These timers are then added for real by __run_timers(), after it has finished processing all pending timers.
2003-03-17	[PATCH] timer code cleanup	Andrew Morton
	- Use list_head functions rather than open-coding them - Use time comparison macros rather than open-coding them - Hide some ifdefs - uninline internal_add_timer(). Saves half a kilobyte of text.
2003-02-24	[PATCH] make jiffies wrap 5 min after boot	Andrew Morton
	From Tim Schmielau <tim@physik3.uni-rostock.de> Force jiffies to start out at five-minutes-before-wrap. To find jiffy-wrapping bugs.
2003-02-21	[PATCH] Allow xtime_lock declaration in arch specific code for x86-64	Andi Kleen
	x86-64 vsyscalls require mapping the sequence number used by gettimeofday in a magic way, so that userland can access it via vsyscalls for user space time-of-day access. Instead of putting the magic into generic code I just allowed to move it into architecture specific files.
2003-02-17	[PATCH] POSIX clocks & timers	George Anzinger
	This is version 23 or so of the POSIX timer code. Internal changelog: - Changed the signals code to match the new order of things. Also the new xtime_lock code needed to be picked up. It made some things a lot simpler. - Fixed a spin lock hand off problem in locking timers (thanks to Randy). - Fixed nanosleep to test for out of bound nanoseconds (thanks to Julie). - Fixed a couple of id deallocation bugs that left old ids laying around (hey I get this one). - This version has a new timer id manager. Andrew Morton suggested elimination of recursion (done) and I added code to allow it to release unused nodes. The prior version only released the leaf nodes. (The id manager uses radix tree type nodes.) Also added is a reuse count so ids will not repeat for at least 256 alloc/ free cycles. - The changes for the new sys_call restart now allow one restart function to handle both nanosleep and clock_nanosleep. Saves a bit of code, nice. - All the requested changes and Lindent too :). - I also broke clock_nanosleep() apart much the same way nanosleep() was with the 2.5.50-bk5 changes. TIMER STORMS The POSIX clocks and timers code prevents "timer storms" by not putting repeating timers back in the timer list until the signal is delivered for the prior expiry. Timer events missed by this delay are accounted for in the timer overrun count. The net result is MUCH lower system overhead while presenting the same info to the user as would be the case if an interrupt and timer processing were required for each increment in the overrun count.
2003-02-04	[PATCH] seqlock for xtime	Stephen Hemminger
	Add "seqlock" infrastructure for doing low-overhead optimistic reader locks (writer increments a sequence number, reader verifies that no writers came in during the critical region, and lots of careful memory barriers to take care of business). Make xtime/get_jiffies_64() use this new locking.
2003-02-03	[PATCH] use 64 bit jiffies: fix utime wrap	Tim Schmielau
	Use 64 bit jiffies for reporting uptime.
2002-12-13	Fix nanosleep() behaviour with NULL "remaining" argument.	Linus Torvalds

2002-12-05	Implement system call restarting for the "nanosleep()" system call	Linus Torvalds
	using the new system call restart infrastructure. This breaks the compat layer - it really needs to do its own version of restarting, since the restarting depends on the types.
2002-12-03	[PATCH] compatibility syscall layer	Stephen Rothwell
	This is the generic part of the start of the compatibility syscall layer. I think I have made it generic enough that each architecture can define what compatibility means. To use this, an architecture must create asm/compat.h and provide typedefs for (currently) 'compat_time_t', 'struct compat_timeval' and 'struct compat_timespec'.
2002-12-01	[PATCH] getppid-2.5.50-A3	Ingo Molnar
	This changes sys_getppid() to be more POSIX-threading conformant. sys_getppid() needs to return the PID of the "process' parent" (ie. the tgid of the parent thread), not the thread parent's PID. The patch has no effect on non-CLONE_THREAD users, for them current->group_leader == current. The effect on CLONE_THREAD threads is that getppid() does not return any PID within the thread group anymore. Plus if a threaded application starts up a (non-thread) child then the child sees the process PID of the parent process, not the thread PID of the parent thread. in theory we could introduce the getttid() variant to get to the TID of the parent thread, but i doubt it would be of any use. (and we can add it if the need arises.) The lockless algorithm is still safe because the ->group_leader pointer never changes asynchronously. (the ->real_parent pointer might still change asynchronously so the SMP checks are still needed.) I've also updated the comments (they referenced the nonexistent p_ooptr field.), plus i've changed the mb() to rmb() - we need to order the reads, we dont do any global writes that need some predictable ordering.
2002-11-25	[PATCH] shrink task_struct by removing per_cpu utime and stime	Andrew Morton
	Patch from Bill Irwin. It has the potential to break userspace monitoring tools a little bit, and I'm a rater uncertain about how useful the per-process per-cpu accounting is. Bill sent this out as an RFC on July 29: "These statistics severely bloat the task_struct and nothing in userspace can rely on them as they're conditional on CONFIG_SMP. If anyone is using them (or just wants them around), please speak up." And nobody spoke up. If we apply this, the contents of /proc/783/cpu will go from cpu 1 1 cpu0 0 0 cpu1 0 0 cpu2 1 1 cpu3 0 0 to cpu 1 1 And we shall save 256 bytes from the ia32 task_struct. On my SMP build with NR_CPUS=32: Without this patch, sizeof(task_struct) is 1824, slab uses a 1-order allocation and we are getting 2 task_structs per page. With this patch, sizeof(task_struct) is 1568, slab uses a 2-order allocation and we are getting 2.5 task_structs per page. So it seems worthwhile. (Maybe this highlights a shortcoming in slab. For the 1824-byte case it could have used a 0-order allocation)
2002-11-16	[PATCH] Run timers as softirqs, not tasklets	Matthew Wilcox
	The timer code is attempting to replicate the softirq characteristics at the tasklet level, which is a little pointless. This patch converts timers to be a first-class softirq citizen.
2002-11-04	[PATCH] fix mod_timer() race	Andrew Morton
	If two CPUs run mod_timer against the same not-pending timer then they have no locking relationship. They can both see the timer as not-pending and they both add the timer to their cpu-local list. The CPU which gets there second corrupts the first CPU's lists. This was causing Dave Hansen's 8-way to oops after a couple of minutes of specweb testing. I believe that to fix this we need locking which is associated with the timer itself. The easy fix is hashed spinlocking based on the timer's address. The hard fix is a lock inside the timer itself. It is hard because init_timer() becomes compulsory, to initialise that spinlock. An unknown number of code paths in the kernel just wipe the timer to all-zeroes and start using it. I chose the hard way - it is cleaner and more idiomatic. The patch also adds a "magic number" to the timer so we can detect when a timer was not correctly initialised. A warning and stack backtrace is generated and the timer is fixed up. After 16 such warnings the warning mechanism shuts itself up until a reboot. It took six patches to my kernel to stop the warnings from coming out. The uninitialised timers are extremely easy to find and fix. But it will take some time to weed them all out. Maybe we should go for the hashed locking... Note that the new timer->lock means that we can clean up some awkward "oh we raced, let's try again" code in timer.c. But to do that we'd also need to take timer->lock in the commonly-called del_timer(), so I left it as-is. The lock is not needed in add_timer() because concurrent add_timer()/add_timer() and concurrent add_timer()/mod_timer() are illegal.
2002-10-31	[PATCH] make kernel_stat use per-cpu infrastructure	Andrew Morton
	Patch from Ravikiran G Thirumalai <kiran@in.ibm.com> 1. Break out disk stats from kernel_stat and move disk stat to blkdev.h 2. Group cpu stat in kernel_stat and make them "per_cpu" instead of the NR_CPUS array 3. Remove EXPORT_SYMBOL(kstat) from ksyms.c (as I noticed that no module is using kstat)
2002-10-29	[PATCH] percpu: convert timers	Andrew Morton
	Patch from Dipankar Sarma <dipankar@in.ibm.com> This patch changes the per-CPU data in timer management (tvec_bases) to use per_cpu data area and makes it safe for cpu_possible allocation by using CPU notifiers. End result - saving space. Depends on cpu_possible patch.
2002-10-29	[PATCH] slab: add_timer_on: add a timer on a particular CPU	Andrew Morton
	add_timer_on is like add_timer, except it takes a target CPU on which to add the timer. The slab code needs per-cpu timers for shrinking the per-cpu caches.
2002-10-15	[PATCH] oprofile - timer hook	John Levon
	This implements a simple hook into the profiling timer for x86 so that non-perfctr machines can still use oprofile. This has proven useful for laptops and the like. It also reduces header dependencies a bit by centralising readprofile code
2002-10-09	[PATCH] timer cleanups	Ingo Molnar
	This is my latest timer patchset, it makes del_timer_sync() a bit more robust wrt. code that re-adds timers from the timer handler. Other changes in the patch: - clean up cascading a bit. - do not save flags in __run_timer_list - we enter from an irqs-enabled tasklet.
2002-10-08	[PATCH] getpid() comment typo	Robert Love
	Comment above getpid() is wrong. This patch fixes it, and expands the comment to explain why on earth we have getpid() returning ->tgid and not ->pid.
2002-10-04	[PATCH] 64-bit timer fix	Anton Blanchard
	I think I have found it and it only hits on a 64 bit machine. If the timeout is big enough we still need to initialise timer->entry. Otherwise bad things happen we we hit del_timer.
2002-10-02	[PATCH] timer-2.5.40-F7	Ingo Molnar
	This does a number of timer subsystem enhancements: - simplified timer initialization, now it's the cheapest possible thing: static inline void init_timer(struct timer_list * timer) { timer->base = NULL; } since the timer functions already did a !timer->base check this did not have any effect on their fastpath. - the rule from now on is that timer->base is set upon activation of the timer, and cleared upon deactivation. This also made it possible to: - reorganize all the timer handling code to not assume anything about timer->entry.next and timer->entry.prev - this also removed lots of unnecessery cleaning of these fields. Removed lots of unnecessary list operations from the fastpath. - simplified del_timer_sync(): it now uses del_timer() plus some simple synchronization code. Note that this also fixes a bug: if mod_timer (or add_timer) moves a currently executing timer to another CPU's timer vector, then del_timer_sync() does not synchronize with the handler properly. - bugfix: moved run_local_timers() from scheduler_tick() into update_process_times() .. scheduler_tick() might be called from the fork code which will not quite have the intended effect ... - removed the APIC-timer-IRQ shifting done on SMP, Dipankar Sarma's testing shows no negative effects. - cleaned up include/linux/timer.h: - removed the timer_t typedef, and fixes up kernel/workqueue.c to use the 'struct timer_list' name instead. - removed unnecessery includes - renamed the 'list' field to 'entry' (it's an entry not a list head) - exchanged the 'function' and 'data' fields. This, besides being more logical, also unearthed the last few remaining places that initialized timers by assuming some given field ordering, the patch also fixes these places. (fs/xfs/pagebuf/page_buf.c, net/core/profile.c and net/ipv4/inetpeer.c) - removed the defunct sync_timers(), timer_enter() and timer_exit() prototypes. - added docbook-style comments. - other kernel/timer.c changes: - base->running_timer does not have to be volatile ... - added consistent comments to all the important functions. - made the sync-waiting in del_timer_sync preempt- and lowpower- friendly. i've compiled, booted & tested the patched kernel on x86 UP and SMP. I have tried moderately high networking load as well, to make sure the timer changes are correct - they appear to be.
2002-09-28	[PATCH] smptimers, old BH removal, tq-cleanup	Ingo Molnar
	This is the smptimers patch plus the removal of old BHs and a rewrite of task-queue handling. Basically with the removal of TIMER_BH i think the time is right to get rid of old BHs forever, and to do a massive cleanup of all related fields. The following five basic 'execution context' abstractions are supported by the kernel: - hardirq - softirq - tasklet - keventd-driven task-queues - process contexts I've done the following cleanups/simplifications to task-queues: - removed the ability to define your own task-queue, what can be done is to schedule_task() a given task to keventd, and to flush all pending tasks. This is actually a quite easy transition, since 90% of all task-queue users in the kernel used BH_IMMEDIATE - which is very similar in functionality to keventd. I believe task-queues should not be removed from the kernel altogether. It's true that they were written as a candidate replacement for BHs originally, but they do make sense in a different way: it's perhaps the easiest interface to do deferred processing from IRQ context, in performance-uncritical code areas. They are easier to use than tasklets. code that cares about performance should convert to tasklets - as the timer code and the serial subsystem has done already. For extreme performance softirqs should be used - the net subsystem does this. and we can do this for 2.6 - there are only a couple of areas left after fixing all the BH_IMMEDIATE places. i have moved all the taskqueue handling code into kernel/context.c, and only kept the basic 'queue a task' definitions in include/linux/tqueue.h. I've converted three of the most commonly used BH_IMMEDIATE users: tty_io.c, floppy.c and random.c. [random.c might need more thought though.] i've also cleaned up kernel/timer.c over that of the stock smptimers patch: privatized the timer-vec definitions (nothing needs it, init_timer() used it mistakenly) and cleaned up the code. Plus i've moved some code around that does not belong into timer.c, and within timer.c i've organized data and functions along functionality and further separated the base timer code from the NTP bits. net_bh_lock: i have removed it, since it would synchronize to nothing. The old protocol handlers should still run on UP, and on SMP the kernel prints a warning upon use. Alexey, is this approach fine with you? scalable timers: i've further improved the patch ported to 2.5 by wli and Dipankar. There is only one pending issue i can see, the question of whether to migrate timers in mod_timer() or not. I'm quite convinced that they should be migrated, but i might be wrong. It's a 10 lines change to switch between migrating and non-migrating timers, we can do performance tests later on. The current, more complex migration code is pretty fast and has been stable under extremely high networking loads in the past 2 years, so we can immediately switch to the simpler variant if someone proves it improves performance. (I'd say if non-migrating timers improve Apache performance on one of the bigger NUMA boxes then the point is proven, no further though will be needed.)
2002-09-25	Remove busy-wait for short RT nanosleeps. It's a random special case	Linus Torvalds
	and does the wrong thing for higher HZ values anyway.
2002-09-09	[PATCH] USER_HZ & NTP problems	Rolf Fokkens
	I've been playing with different HZ values in the 2.4 kernel for a while now, and apparantly Linus also has decided to introduce a USER_HZ constant (I used CLOCKS_PER_SEC) while raising the HZ value on x86 to 1000. On x86 timekeeping has shown to be relative fragile when raising HZ (OK, I tried HZ=2048 which is quite high) because of the way the interrupt timer is configured to fire HZ times each second. This is done by configuring a divisor in the timer chip (LATCH) which divides a certain clock (1193180) and makes the chip fire interrupts at the resulting frequency. Now comes the catch: NTP requires a clock accuracy of 500 ppm. For some HZ values the clock is not accurate enough to meet this requirement, hence NTP won't work well. An example HZ value is 1020 which exceeds the 500 ppm requirement. In this case the best approximation is 1019.8 Hz. the xtime.tv_usec value is raised with a value of 980 each tick which means that after one second the tv_usec value has increased with 999404 (should be 1000000) which is an accuracy of 596 ppm. Some more examples: HZ Accuracy (ppm) ---- -------------- 100 17 1000 151 1024 632 2000 687 2008 343 2011 18 2048 1249 What I've been doing is replace tv_usec by tv_nsec, meaning xtime is now a timespec instead of a timeval. This allows the accuracy to be improved by a factor of 1000 for any (well ... any?) HZ value. Of course all kinds of calculations had te be improved as well. The ACTHZ constantant is introduced to approximate the actual HZ value, it's used to do some approximations of other related values.
2002-08-04	Add KERN_xxx prefixes to printk's in kernel/ subdir.	Cory Watson

2002-07-24	[PATCH] Ensure xtime_lock and timerlist_lock are on difft cachelines	Ravikiran G. Thirumalai
	I've noticed that xtime_lock and timerlist_lock ends up on the same cacheline all the time (atleaset on x86). Not a good thing for loads with high xxx_timer and do_gettimeofday counts I guess (networking etc). Here's a trivial fix.
2002-07-23	[PATCH] scheduler fixes	Ingo Molnar
	- introduce new type of context-switch locking, this is a must-have for ia64 and sparc64. - load_balance() bug noticed by Scott Rhine and myself: scan the whole list to find imbalance number of tasks, not just the tail of the list. - sched_yield() fix: use current->array not rq->active.
2002-07-01	Make in-kernel HZ be 1000 on x86, retaining user-level 100 HZ clock_t.	Linus Torvalds
	Stop using "struct tms" internally - always use timer ticks (or one of the sane timeval/timespec types) instead. Explicitly convert to clock_t when copying to user space for the old broken interfaces that still use "clock_t". Clean up and unify jiffies<->timeval conversion.