| Age | Commit message (Collapse) | Author |
|
settimeofday will set the time a little bit too early on systems using
time interpolation since it subtracts the current interpolator offset
from the time. This used to be necessary with the code in 2.6.9 and earlier
but the new code resets the time interpolator after setting the time.
Thus the time is set too early and gettimeofday will return a time slightly
before the time specified with settimeofday if invoked immeditely after
settimeofday.
This removes the obsolete subtraction of the time interpolator offset
and makes settimeofday set the time accurately.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The following exports are necessary to allow loadable modules to define new
clocks. Without these the mmtimer driver cannot be build correctly as a
module (there is another mmtimer specific fix necessary to get it to build
properly but that will be a separate patch):
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch changes the update of the cmos clock to be timer driven rather
than poll driven by the timer interrupt function. If the clock is not
being synced to an outside source the timer is removed and thus system
overhead is nill in that case. The update frequency is still ~11 minutes
and missing the update window still causes a retry in 60 seconds.
We want the calls to sync_cmos_clock() to be made in a consistent environment.
This was not true when calling it directly from the NTP call code. The
change means that sync_cmos_clock() is ALWAYS called from run_timers(), i.e.
as a timer call back function.
Also, call the timer code only through the timer interface (set a short timer
to do it from the ntp call).
Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch corrects a problem that was originally added with the nanosecond
timestamps in stat patch. The problem is that some file systems don't have
enough space in their on disk inode to save nanosecond timestamps, so they
truncate the c/a/mtime to seconds when flushing an dirty node. In core the
inode would have full jiffies granuality.
This can be observed by programs as a timestamp that jumps backwards under
specific loads when an inode is flushed and then reloaded from disk.
The problem was already known when the original patch went in, but it
wasn't deemed important enough at that time. So far there has been only
one report of it causing problems. Now Tridge is worried that it will
break running Excel over samba4 because Excel seems to do very anal
timestamp checking and samba4 will supply 100ns timestamps over the
network.
This patch solves it by putting the time resolution into the superblock of
a fs and always rounding the in core timestamps to that granuality.
This also supercedes some previous ext2/3 hacks to flush the inode less
often when only the subsecond timestamp changes.
I tried to keep the overhead low, in particular it tries to keep divisions
out of fast paths as far as possible.
The patch is quite big but 99% of it is just relatively straight forward
search'n'replace in a lot of fs. Unconverted filesystems will default to a
1ns granuality, but may still show the problem if they continue to use
CURRENT_TIME. I converted all in tree fs.
One possible future extension of this would be to have two time
granualities per superblock - one that specifies the visible resolution,
and the other to specify how often timestamps should be flushed to disk,
which could be tuned with a mount option per fs (e.g. often m/atimes don't
need to be flushed every second). Would be easy to do as an addon if
someone is interested.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I realized that the best way to get the sys_time/sys_stime problem fixed is
to make sys_time 64 bit safe by using "time_t *" instead of "int *" and to
introduce two proper compat functions compat_sys_time and compat_sys_stime.
The prototype change of sys_time is transparent for 32 bit architectures
because both "int" and "time_t" are 32 bit. For 64 bit the type change
would be wrong but luckily no 64 bit architecture uses sys_time/sys_stime
in 64 bit mode. The patch makes the following change:
ia64 : Remove sys32_time, use compat_sys_time and
add (!!) compat_sys_stime to compat syscall table.
mips : Use compat_sys_time/compat_sys_stime in 32 bit syscall table.
Add #ifdef magic to compile sys_time/sys_stime and
compat_sys_time/compat_sys_stime only if needed.
parisc : Remove sys32_time, use compat_sys_time and compat_sys_stime.
ppc64 : remove sys32_time, ppc64_sys32_stime and ppc64_sys_stime.
Use common compat_sys_time, compat_sys_stime and sys_stime.
s390 : Use compat_sys_stime. Add #ifdef magic to compile
sys_time/sys_stime and compat_sys_time/compat_sys_stime only
if needed.
sparc64 : Use compat_sys_time/compat_Sys_stime in 32 bit syscall table.
um : Remove um_time and um_stime. Use common functions sys_time and
sys_stime. This adds a CAP_SYS_TIME check to UMs stime call.
x86_64 : Remove sys32_time. Use compat_sys_time and compat_sys_stime
in 32 bit syscall table.
The original stime bug is fixed for mips, parisc, s390, sparc64 and
x86_64. Can the arch-maintainers please take a look at this?
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert compat_time_t to time_t in 32 bit emulation for sys_stime and
consolidate all the different implementation of sys_time, sys_stime and
their 32-bit emulation parts.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This recently added function is only used by the posix timers code, no need
to be exported.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I have received positive feedback from various individuals who have applied my
BSD Secure Levels LSM patch, and so at this point I am submitting it to you
with a request to merge it in. Nothing has changed in this patch since when I
last posted it to the LKML, so I am not re-sending it there.
This first patch adds hooks to catch attempts to set the system clock back.
Signed-off-by: Michael A. Halcrow <mahalcro@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I found that the prototypes for sys_waitid and sys_fcntl in
<linux/syscalls.h> don't match the implementation. In order to keep all
prototypes in sync in the future, now include the header from each file
implementing any syscall.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
for IA64
This has been in the ia64 (and hence -mm) trees for a couple of months.
Changelog:
* Affects only architectures which define CONFIG_TIME_INTERPOLATION
(currently only IA64)
* Genericize time interpolation, make time interpolators easily usable
and provide instructions on how to use the interpolator for other
architectures.
* Provide nanosecond resolution for clock_gettime and an accuracy
up to the time interpolator time base.
* clock_getres() reports resolution of underlying time basis which
is typically <50ns and may be 1ns on some systems.
* Make time interpolator self-tuning to limit time jumps
and to make the interpolators work correctly on systems with
broken time base specifications.
* SMP scalability: Make clock_gettime and gettimeofday scale O(1)
by removing the cmpxchg for most clocks (tested for up to 512 CPUs)
* IA64: provide asm fastcall that doubles the performance
of gettimeofday and clock_gettime on SGI and other IA64 systems
(asm fastcalls scale O(1) together with the scalability fixes).
* IA64: provide nojitter kernel option so that IA64 systems with
correctly synchronized ITC counters may also enjoy the
scalability enhancements.
Performance measurements for single calls (ITC cycles):
A. 4 way Intel IA64 SMP system (kmart)
ITC offsets:
kmart:/usr/src/noship-tests # dmesg|grep synchr
CPU 1: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 417 cycles)
CPU 2: synchronized ITC with CPU 0 (last diff 2 cycles, maxerr 417 cycles)
CPU 3: synchronized ITC with CPU 0 (last diff 1 cycles, maxerr 417 cycles)
A.1. Current kernel code
kmart:/usr/src/noship-tests # ./dmt
gettimeofday cycles: 3737 220 215 215 215 215 215 215 215 215
clock_gettime(REAL) cycles: 4058 575 564 576 565 566 558 558 558 558
clock_gettime(MONO) cycles: 1583 621 609 609 609 609 609 609 609 609
clock_gettime(PROCESS) cycles: 71428 298 259 259 259 259 259 259 259 259
clock_gettime(THREAD) cycles: 3982 336 290 298 298 298 298 286 286 286
A.2 New code using cmpxchg
kmart:/usr/src/noship-tests # ./dmt
gettimeofday cycles: 3145 213 216 213 213 213 213 213 213 213
clock_gettime(REAL) cycles: 3185 230 210 210 210 210 210 210 210 210
clock_gettime(MONO) cycles: 284 217 217 216 216 216 216 216 216 216
clock_gettime(PROCESS) cycles: 68857 289 270 259 259 259 259 259 259 259
clock_gettime(THREAD) cycles: 3862 339 298 298 298 298 290 286 286 286
A.3 New code with cmpxchg switched off (nojitter kernel option)
kmart:/usr/src/noship-tests # ./dmt
gettimeofday cycles: 3195 219 219 212 212 212 212 212 212 212
clock_gettime(REAL) cycles: 3003 228 205 205 205 205 205 205 205 205
clock_gettime(MONO) cycles: 279 209 209 209 208 208 208 208 208 208
clock_gettime(PROCESS) cycles: 65849 292 259 259 268 270 270 259 259 259
B. SGI SN2 system running 512 IA64 CPUs.
B.1. Current kernel code
[root@ascender noship-tests]# ./dmt
gettimeofday cycles: 17221 1028 1007 1004 1004 1004 1010 25928 1002 1003
clock_gettime(REAL) cycles: 10388 1099 1055 1044 1064 1063 1051 1056 1061 1056
clock_gettime(MONO) cycles: 2363 96 96 96 96 96 96 96 96 96
clock_gettime(PROCESS) cycles: 46537 804 660 666 666 666 666 666 666 666
clock_gettime(THREAD) cycles: 10945 727 710 684 685 686 685 686 685 686
B.2 New code
ascender:~/noship-tests # ./dmt
gettimeofday cycles: 3874 610 588 588 588 588 588 588 588 588
clock_gettime(REAL) cycles: 3893 612 588 582 588 588 588 588 588 588
clock_gettime(MONO) cycles: 686 595 595 588 588 588 588 588 588 588
clock_gettime(PROCESS) cycles: 290759 322 269 269 259 265 265 265 259 259
clock_gettime(THREAD) cycles: 5153 358 306 298 296 304 290 298 298 298
Scalability of time functions (in time it takes to do a million calls):
=======================================================================
A. 4 way Intel IA SMP system (kmart)
A.1 Current code
kmart:/usr/src/noship-tests # ./todscale -n1000000
CPUS WALL WALL/CPUS
1 0.192 0.192
2 1.125 0.563
4 9.229 2.307
A.2 New code using cmpxchg
kmart:/usr/src/noship-tests # ./todscale
CPUS WALL WALL/CPUS
1 0.188 0.188
2 0.457 0.229
4 0.413 0.103
(the measurement with 4 cpus may fluctuate up to 15.x somehow)
A.3 New code without cmpxchg (nojitter kernel option)
kmart:/usr/src/noship-tests # ./todscale -n10000000
CPUS WALL WALL/CPUS
1 0.180 0.180
2 0.180 0.090
4 0.252 0.063
B. SGI SN2 system running 512 IA64 CPUs.
The system has a global monotonic clock and therefore has
no need for compensation. Current code uses a cmpxchg. New
code has no cmpxchg.
B.1 current code
ascender:~/noship-tests # ./todscale
CPUS WALL WALL/CPUS
1 0.850 0.850
2 1.767 0.884
4 6.124 1.531
8 20.777 2.597
16 57.693 3.606
32 164.688 5.146
64 456.647 7.135
128 1093.371 8.542
256 2778.257 10.853
(System crash at 512 CPUs)
B.2 New code
ascender:~/noship-tests # ./todscale -n1000000
CPUS WALL WALL/CPUS
1 0.426 0.426
2 0.429 0.215
4 0.436 0.109
8 0.452 0.057
16 0.454 0.028
32 0.457 0.014
64 0.459 0.007
128 0.466 0.004
256 0.474 0.002
512 0.518 0.001
Clock Accuracy
==============
A. 4 CPU SMP system
A.1 Old code
kmart:/usr/src/noship-tests # ./cdisp
Gettimeofday() = 1092124757.270305000
CLOCK_REALTIME= 1092124757.270382000 resolution= 0.000976563
CLOCK_MONOTONIC= 89.696726590 resolution= 0.000976563
CLOCK_PROCESS_CPUTIME_ID= 0.001242507 resolution= 0.000000001
CLOCK_THREAD_CPUTIME_ID= 0.001255310 resolution= 0.000000001
A.2 New code
kmart:/usr/src/noship-tests # ./cdisp
Gettimeofday() = 1092124478.194530000
CLOCK_REALTIME= 1092124478.194603399 resolution= 0.000000001
CLOCK_MONOTONIC= 88.198315204 resolution= 0.000000001
CLOCK_PROCESS_CPUTIME_ID= 0.001241235 resolution= 0.000000001
CLOCK_THREAD_CPUTIME_ID= 0.001254747 resolution= 0.000000001
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This issue was discussed on lkml and linux-ia64. The patch introduces
"getnstimeofday" and removes all the code scaling gettimeofday to
nanoseoncs. It makes it possible for the posix-timer functions to return
higher accuracy.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
From: David Mosberger <davidm@napali.hpl.hp.com>
Below is a patch that tries to sanitize the dropping of unneeded system-call
stubs in generic code. In some instances, it would be possible to move the
optional system-call stubs into a library routine which would avoid the need
for #ifdefs, but in many cases, doing so would require making several
functions global (and possibly exporting additional data-structures in
header-files). Furthermore, it would inhibit (automatic) inlining in the
cases in the cases where the stubs are needed. For these reasons, the patch
keeps the #ifdef-approach.
This has been tested on ia64 and there were no objections from the
arch-maintainers (and one positive response). The patch should be safe but
arch-maintainers may want to take a second look to see if some __ARCH_WANT_foo
macros should be removed for their architecture (I'm quite sure that's the
case, but I wanted to play it safe and only preserved the status-quo in that
regard).
|
|
From: "La Monte H.P. Yarroll" <piggy@timesys.com>
This is a Scott Wood patch against 2.6.3.
Use gettimeofday() rather than xtime.tv_sec in sys_time(), since
sys_stime() uses settimeofday() and thus subtracts the subtick correction
from the new xtime.
stime() used settimeofday(), but time() did not use gettimeofday(). Since
settimeofday() subtracts out the current intra-tick correction, and nsec
was 0 (since stime() only allows seconds), this resulted in xtime being
slightly earlier than the time that was set.
If time() had used gettimeofday(), the correction would have been applied,
and everything would be fine. However, instead time just reads the current
xtime.tv_sec, so if time() is called immediately after stime(), you'll
usually get a value one second earlier.
|
|
From: Stephen Hemminger <shemminger@osdl.org>
The following will prevent adjtime from causing time regression. It delays
starting the adjtime mechanism for one tick, and keeps gettimeofday inside
the window.
Only fixes i386, but changes to other arch would be similar.
Running a simple clock test program and playing with adjtime demonstrates
that this fixes the problem (and 2.6.0-test6 is broken). But given the
fragile nature of the timer code, it should go through some more testing
before inclusion.
|
|
|
|
From: davem
- sys_stime() takes an int*, but it should be a time_t*: they have different
sizes on (for example) sparc64.
- sys_settimeofday() is confusing timevals with timespecs. They have
different sizes on sparc64 and the time keeps on getting set to epoch.
|
|
The us->ns conversion must be in the "tv" if branch of settimeofday,
not in the "tz" if.
Also make it compile for x86-64 again. The 32bit emulation was forgotten
in the conversion.
|
|
From: george anzinger <george@mvista.com>
This patch addresses issues of roundoff error in the time keeping and NTP
code as follows:
The conversion of "actual jiffies" to TICK_USEC and then to TICK_NSEC
introduced large errors if jiffies was not a power of 10 (e.g. 1024 for
the ia64). Most of this is avoided by converting directly to TICK_NSEC.
The calculation of MAX_SEC_IN_JIFFIES (the largest timespec or timeval the
kernel will attempt) had overflow problems in the 64-bit machines. We
introduce a different equation for those machines.
The NTP frequency update code was allowing a micro second of error to
accumulate before applying the correction. We change FINEUSEC to FINENSEC
to do the correction as soon as a full nanosecond has accumulated.
The initial calculation of time_freq for NTP had severe roundoff errors for
HZ not a power of 10 (i.e. 1024). A new equation fixes this.
clock_nanosleep is changed to round up to the next jiffie to cover starting
between jiffies.
|
|
From: george anzinger <george@mvista.com>
This patch does the following:
Pushs down the change from timeval to timespec in the settime routines.
Fixes two places where time was set without updating the monotonic clock
offset. (Changes sys_stime() to call do_settimeofday() and changes
clock_warp to do the update directly.) These were bugs!
Changes the uptime code to use the posix_clock_monotonic notion of uptime
instead of the jiffies. This time will track NTP changes and so should be
better than your standard wristwatch (if your using ntp).
Changes posix_clock_monotonic to start at 0 on boot (was set to start at
initial jiffies).
Fixes a bug (never experienced) in timer_create() in posix-timers.c where
we "could" have released timer_id 0 if "id resources" were low.
Adds a test in do_settimeofday() to error out (EINVAL) attempts to use
unnormalized times. This is passed back up to both settimeofday and
posix_setclock().
Warning: Requires changes in .../arch/???/kernel/time.c to change
do_settimeofday() to return an error if time is not normalized and to use a
timespec instead of timeval for its input.
|
|
From: David Mosberger <davidm@napali.hpl.hp.com>
Basically, what the patch does is provide two hooks such that platforms
(and subplatforms) can provide time-interpolation in a way that guarantees
that two causally related gettimeofday() calls will never see time going
backwards (unless there is a settimeofday() call, of course).
There is some evidence that the current scheme does work: we use it on ia64
both for cycle-counter-based interpolation and the SGI folks use it with a
chipset-based high-performance counter.
It seems like enough platforms do this sort of thing to provide _some_
support in the core, especially because it's rather tricky to guarantee
that time never goes backwards (short of a settimeofday, of course).
This patch is based on something Jes Sorensen wrote for the SGI Itanium 2
platform (which has a chipset-internal high-res clock). I adapted it so it
can be used for cycle-counter interpolation also. The net effect is that
"last_time_offset" can be removed completely from the kernel.
The basic idea behind the patch is simply: every time you advance xtime by
N nanoseconds, you call update_wall_time_hook(NSEC). Every time the time
gets set (i.e., discontinuity is OK), reset_wall_time_hook() is called.
|
|
|
|
uninline get_jiffies_64() for 32-bit architectures
|
|
Add "seqlock" infrastructure for doing low-overhead optimistic reader
locks (writer increments a sequence number, reader verifies that no
writers came in during the critical region, and lots of careful memory
barriers to take care of business).
Make xtime/get_jiffies_64() use this new locking.
|
|
stat64 has been changed to return jiffies granuality as nsec in previously
unused fields. This allows make to make better decisions on when
to recompile a file. Follows losely the Solaris API.
CURRENT_TIME has been redefined to return struct timespec. The users
who don't use it in a inode/attr context have been changed to use a new
get_seconds() function. CURRENT_TIME is implemented by an out-of-line
function.
There is a small performance penalty in this patch. The previous
filemap code had an optimization to flush atime only once a second.
This is currently gone, which will increase flushes a bit. I believe
the correct solution if it should be a problem is to have per super
block fields that give an arbitary atime flush granuality - so that you
can set it to be only flushed once a hour if you prefer that. I will
work on that later in separate patches if the need should arise.
struct inode and the attr struct has been changed to store struct
timespec instead of time_t for [cma]time. Not all file systems support
this granuality, but some like XFS,NFSv3,CIFS,JFS do. The others will
currently truncate the nsec part on flushing to disk. There was some
discussion on this rounding on l-k previously. I went for simple
truncation because there is not much evidence IMHO that the more
complicated roundings have any advantages. In practice application will
be rather unlikely to notice the rounding anyways - they can only see a
difference when an inode is flush from memory and reloaded in less than
a second, which is rather unlikely.
|
|
I've been playing with different HZ values in the 2.4 kernel for a while
now, and apparantly Linus also has decided to introduce a USER_HZ
constant (I used CLOCKS_PER_SEC) while raising the HZ value on x86 to
1000.
On x86 timekeeping has shown to be relative fragile when raising HZ (OK,
I tried HZ=2048 which is quite high) because of the way the interrupt
timer is configured to fire HZ times each second. This is done by
configuring a divisor in the timer chip (LATCH) which divides a certain
clock (1193180) and makes the chip fire interrupts at the resulting
frequency.
Now comes the catch: NTP requires a clock accuracy of 500 ppm. For some
HZ values the clock is not accurate enough to meet this requirement,
hence NTP won't work well.
An example HZ value is 1020 which exceeds the 500 ppm requirement. In
this case the best approximation is 1019.8 Hz. the xtime.tv_usec value
is raised with a value of 980 each tick which means that after one
second the tv_usec value has increased with 999404 (should be 1000000)
which is an accuracy of 596 ppm.
Some more examples:
HZ Accuracy (ppm)
---- --------------
100 17
1000 151
1024 632
2000 687
2008 343
2011 18
2048 1249
What I've been doing is replace tv_usec by tv_nsec, meaning xtime is now
a timespec instead of a timeval. This allows the accuracy to be
improved by a factor of 1000 for any (well ... any?) HZ value.
Of course all kinds of calculations had te be improved as well. The
ACTHZ constantant is introduced to approximate the actual HZ value, it's
used to do some approximations of other related values.
|
|
On ia64 MP machines, we use the cycle counter register of each CPU to
obtain fine-grained time-stamps. At boot-time, we synchronize the
counters as close as possible (similar to x86, though with a different
algorithm). But even with this synchronization, there is still a
small (really: tiny) chance that a process bouncing from one CPU to
another could observe time going backwards. To guard against this, I
maintain a global variable called "last_time_offset" which keeps track
of the largest time-interpolation value returned so far. Most of this
is in platform-specific code (arch/ia64/kernel/time.c), but there are
a handful of places in platform-independent code where this variable
needs to be cleared to zero. This is what the patch below does. I
didn't put it inside CONFIG_IA64 because I think this can be useful
for other platforms, too. I suppose I could put it inside CONFIG_SMP
though this would make the code uglier. If you think it's OK, please
apply, otherwise, I'd appreciate your feedback.
|
|
|
|
|
|
Big bits first, I'll redo the smaller bits tomorrow after some sleep.
Same as last time, rediffed against pre5
|
|
- Patrick Mochel: initcall levels
- Patrick Mochel: devicefs updates, add PCI devices into the hierarchy
- Denis Oliver Kropp: neomagic fb driver
- David Miller: sparc64 and network updates
- Kai Mäkisara: scsi tape update
- Al Viro: more inode trimming, VFS cleanup
- Greg KH: USB update - proper urb allocations
- Eric Raymond: kdev_t updates for fb devices
|
|
|