user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2007-12-18	block: let elv_register() return void	Adrian Bunk
	elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-24	[BLOCK] Get rid of request_queue_t typedef	Jens Axboe
	Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-20	[PATCH] cfq-iosched: don't allow sync merges across queues	Jens Axboe
	Currently we allow any merge, even if the io originates from different processes. This can cause really bad starvation and unfairness, if those ios happen to be synchronous (reads or direct writes). So add a allow_merge hook to the io scheduler ops, so an io scheduler can help decide whether a bio/process combination may be merged with an existing request. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-01	[BLOCK] Cleanup unused variable passing	Jens Axboe
	- ->init_queue() does not need the elevator passed in - ->put_request() is a hot path and need not have the queue passed in - cfq_update_io_seektime() does not need cfqd passed in Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-10-12	[PATCH] elevator: elevator_type member not used	Jens Axboe
	elevator_type field in elevator_type structure is useless: it isn't used anywhere in kernel sources. Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-09-30	[PATCH] BLOCK: Make it possible to disable the block layer [try #6]	David Howells
	Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: () Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. () Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: () Block I/O tracing. () Disk partition code. () All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. () The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. () Various block-based device drivers, such as IDE and the old CDROM drivers. () MTD blockdev handling and FTL. () JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. () Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. () Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. () Makes a number of files in fs/ contingent on CONFIG_BLOCK. () Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. () set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. () fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: () Default blockdev file operations (to give error ENODEV on opening). () Makes some /proc changes: () /proc/devices does not list any blockdevs. () /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. () Makes some compat ioctl handling contingent on CONFIG_BLOCK. () If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. () In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. () The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). () The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30	[PATCH] elevator: define ioc counting mechanism	Jens Axboe
	None of the in-kernel primitives for handling "atomic" counting seem to be a good fit. We need something that is essentially free for incrementing/decrementing, while the read side may be more expensive as we only ever need to do that when a device is removed from the kernel. Use a per-cpu variable for maintaining a per-cpu ioc count and define a reading mechanism that just sums up the values. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30	[PATCH] Drop useless bio passing in may_queue/set_request API	Jens Axboe
	It's not needed for anything, so kill the bio passing. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30	[PATCH] elevator: introduce a way to reuse rq for internal FIFO handling	Jens Axboe
	The io schedulers can use this instead of having to allocate space for it themselves. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30	[PATCH] elevator: abstract out the rbtree sort handling	Jens Axboe
	The rbtree sort/lookup/reposition logic is mostly duplicated in cfq/deadline/as, so move it to the elevator core. The io schedulers still provide the actual rb root, as we don't want to impose any sort of specific handling on the schedulers. Introduce the helpers and rb_node in struct request to help migrate the IO schedulers. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30	[PATCH] elevator: move the backmerging logic into the elevator core	Jens Axboe
	Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-08	[PATCH] elevator switching race	Jens Axboe
	There's a race between shutting down one io scheduler and firing up the next, in which a new io could enter and cause the io scheduler to be invoked with bad or NULL data. To fix this, we need to maintain the queue lock for a bit longer. Unfortunately we cannot do that, since the elevator init requires to be run without the lock held. This isn't easily fixable, without also changing the mempool API. So split the initialization into two parts, and alloc-init operation and an attach operation. Then we can preallocate the io scheduler and related structures, and run the attach inside the lock after we detach the old one. This patch has survived 30 minutes of 1 second io scheduler switching with a very busy io load. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-18	[PATCH] fix rmmod problems with elevator attributes, clean them up	Al Viro

2006-03-18	[PATCH] elevator_t lifetime rules and sysfs fixes	Al Viro

2006-03-18	[PATCH] stop elv_unregister() from rogering other iosched's data, fix locking	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-08	[PATCH] block: implement elv_insert and use it (fix ordcolor flipping bug)	Tejun Heo
	q->ordcolor must only be flipped on initial queueing of a hardbarrier request. Constructing ordered sequence and requeueing used to pass through __elv_add_request() which flips q->ordcolor when it sees a barrier request. This patch separates out elv_insert() from __elv_add_request() and uses elv_insert() when constructing ordered sequence and requeueing. elv_insert() inserts the given request at the specified position and does nothing else. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10	s/assoicated/associated/	Adrian Bunk
	Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-09	make elv_try_merge() static, kill the dead declaration of	Coywolf Qi Hunt
	elv_try_last_merge(). Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06	[BLOCK] reimplement handling of barrier request	Tejun Heo
	Reimplement handling of barrier requests. * Flexible handling to deal with various capabilities of target devices. * Retry support for falling back. * Tagged queues which don't support ordered tag can do ordered. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28	Merge branch 'generic-dispatch' of ↵	Linus Torvalds
	git://brick.kernel.dk/data/git/linux-2.6-block
2005-10-28	[PATCH] gfp_t: block layer core	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28	[PATCH] generic dispatch fixes	Jens Axboe
	- Split elv_dispatch_insert() into two functions - Rename rq_last_sector() to rq_end_sector() Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28	[PATCH] 01/05 Implement generic dispatch queue	Tejun Heo
	Implements generic dispatch queue which can replace all dispatch queues implemented by each iosched. This reduces code duplication, eases enforcing semantics over dispatch queue, and simplifies specific ioscheds. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>
2005-06-27	[PATCH] Update cfq io scheduler to time sliced design	Jens Axboe
	This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-08	[PATCH] barrier rework updates	Jens Axboe
	As promised to Andrew, here are the latest bits that fixup the block io barrier handling. - Add io scheduler ->deactivate hook to tell the io scheduler is a request is suspended from the block layer. cfq and as needs this hook. - Locking updates - Make sure a driver doesn't reuse the flush rq before a previous one has completed - Typo in the scsi_io_completion() function, the bit shift was wrong - sd needs proper timeout on the flush - remove silly debug leftover in ide-disk wrt "hdc" Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] cfq-v2 I/O scheduler update	Jens Axboe
	Here is the next incarnation of the CFQ io scheduler, so far known as CFQ v2 locally. It attempts to address some of the limitations of the original CFQ io scheduler (hence forth known as CFQ v1). Some of the problems with CFQ v1 are: - It does accounting for the lifetime of the cfq_queue, which is setup and torn down for the time when a process has io in flight. For a fork heavy work load (such as a kernel compile, for instance), new processes can effectively starve io of running processes. This is in part due to the fact that CFQ v1 gives preference to a new processes to get better latency numbers. Removing that heuristic is not an option exactly because of that. - It makes no attempts to address inter-cfq_queue fairness. - It makes no attempt to limit upper latency bound of a single request. - It only provides per-tgid grouping. You need to change the source to group on a different criteria. - It uses a mempool for the cfq_queues. Theoretically this could deadlock if io bound processes never exit. - The may_queue() logic can be unfair since it fluctuates quickly, thus leaving processes sleeping while new processes are allowed to allocate a request. CFQ v2 attempts to fix these issues. It uses the process io_context logic to maintain a cfq_queue lifetime of the duration of the process (and its io). This means we can now be a lot more clever in deciding which process is allowed to queue or dispatch io to the device. The cfq_io_context is per-process per-queue, this is an extension to what AS currently does in that we truly do have a unique per-process identifier for io grouping. Busy queues are sorted by service time used, sub sorted by in_flight requests. Queues that have no io in flight are also preferred at dispatch time. Accounting is done on completion time of a request, or with a fixed cost for tagged command queueing. Requests are fifo'ed like with deadline, to make sure that a single request doesn't stay in the io scheduler for ages. Process grouping is selectable at runtime. I provide 4 grouping criterias: process group, thread group id, user id, and group id. As usual, settings are sysfs tweakable in /sys/block/<dev>/queue/iosched axboe@apu:[.]s/block/hda/queue/iosched $ ls back_seek_max fifo_batch_expire find_best_crq queued back_seek_penalty fifo_expire_async key_type show_status clear_elapsed fifo_expire_sync quantum tagged In order, each of these settings control: back_seek_max back_seek_penalty: Useful logic stolen from AS that allow small backwards seeks in the io stream if we deem them useful. CFQ uses a strict ascending elevator otherwise. _max controls the maximum allowed backwards seek, defaulting to 16MiB. _penalty denotes how expensive we account a backwards seek compared to a forward seek. Default is 2, meaning it's twice as expensive. clear_elapsed: Really a debug switch, will go away in the future. It clears the maximum values for completion and dispatch time, shown in show_status. fifo_batch_expire fifo_batch_async fifo_batch_sync: The settings for the expiry fifo. batch_expire is how often we allow the fifo expire to control which request to select. Default is 125ms. _async is the deadline for async requests (typically writes), _sync is the deadline for sync requests (reads and sync writes). Defaults are, respectively, 5 seconds and 0.5 seconds. key_type: The grouping key. Can be set to pgid, tgid, uid, or gid. The current value is shown bracketed: axboe@apu:[.]s/block/hda/queue/iosched $ cat key_type [pgid] tgid uid gid Default is tgid. To set, simply echo any of the 4 words into the file. quantum: The amount of requests we select for dispatch when the driver asks for work to do and the current pending list is empty. Default is 4. queued: The minimum amount of requests a group is allowed to queue. Default is 8. show_status: Debug output showing the current state of the queues. tagged: Set this to 1 if the device is using tagged command queueing. This cannot be reliably detected by CFQ yet, since most drivers don't use the block layer (well it could, by looking at number of requests being between dispatch and completion. but not completely reliably). Default is 0. The patch is a little big, but works reliably here on my laptop. There are a number of other changes and fixes in there (like converting to hlist for hashes). The code is commented a lot better, CFQ v1 has basically no comments (reflecting that it was writting in one go, no touched or tuned much since then). This is of course only done to increase the AAF, akpm acceptance factor. Since I'm on the road, I cannot provide any really good numbers of CFQ v1 compared to v2, maybe someone will help me out there. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] switchable and modular io schedulers	Jens Axboe
	This patch modularizes the io schedulers completely, allowing them to be modular. Additionally it enables online switching of io schedulers. See also http://lwn.net/Articles/102593/ . There's a scheduler file in the sysfs directory for the block device queue: axboe@router:/sys/block/hda/queue> ls iosched max_sectors_kb read_ahead_kb max_hw_sectors_kb nr_requests scheduler If you list the contents of the file, it will show available schedulers and the active one: axboe@router:/sys/block/hda/queue> cat scheduler [cfq] Lets load a few more. router:/sys/block/hda/queue # modprobe deadline-iosched router:/sys/block/hda/queue # modprobe as-iosched router:/sys/block/hda/queue # cat scheduler [cfq] deadline anticipatory Changing is done with router:/sys/block/hda/queue # echo deadline > scheduler router:/sys/block/hda/queue # cat scheduler cfq [deadline] anticipatory deadline is now the new active io scheduler for hda. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-04-12	[PATCH] CFQ io scheduler	Andrew Morton
	From: Jens Axboe <axboe@suse.de> CFQ I/O scheduler
2004-02-03	[PATCH] gcc-3.5: elevator.h fixes	Andrew Morton
	include/linux/elevator.h:106: sorry, unimplemented: inlining failed in call to 'elv_try_last_merge': function body not available
2003-09-21	[PATCH] misc fixes	Andrew Morton
	- Remove dead declaration from elevator.h (Nick Piggin) - Fix the scheduler selection boot-time message. "Using anticipatory scheduling io scheduler" is not grammatical. - Remove last use of __SMP__ (Randy Dunlap)
2003-09-04	[PATCH] fix IO hangs	Jens Axboe
	The "insert_here" list pointer logic was broken, and unnecessary. Kill it and its associated logic off completely - just tell the IO scheduler what kind of insert it is. This also makes the *_insert_request strategies much easier to follow, imo.
2003-07-26	[PATCH] Make I/O schedulers optional	Bernardo Innocenti
	Add kconfig options to allow excluding either or both the I/O schedulers. This can be useful for embedded systems (saves about ~13KB). All schedulers are enabled by default for non-embedded.
2003-07-17	Merge jet.(none):/home1/jejb/BK/scsi-misc-2.5	James Bottomley
	into jet.(none):/home1/jejb/BK/scsi-for-linus-2.5
2003-07-17	[PATCH] Consolidate SCSI requeueing and add blk elevator hook	Jens Axboe
	This patch removes the scsi mid layer dependency on __elv_add_request and introduces a new blk_requeue_request() function so the block layer specificially knows a requeue is in progress. It also adds an elevator hook for elevators like AS which need to hook into the requeue for correct adjustment of internal counters.
2003-07-04	[PATCH] per queue nr_requests	Andrew Morton
	From: Nick Piggin <piggin@cyberone.com.au> This gets rid of the global queue_nr_requests and usage of BLKDEV_MAX_RQ (the latter is now only used to set the queues' defaults). The queue depth becomes per-queue, controlled by a sysfs entry.
2003-07-04	[PATCH] anticipatory I/O scheduler	Andrew Morton
	From: Nick Piggin <piggin@cyberone.com.au> This is the core anticipatory IO scheduler. There are nearly 100 changesets in this and five months work. I really cannot describe it fully here. Major points: - It works by recognising that reads are dependent: we don't know where the next read will occur, but it's probably close-by the previous one. So once a read has completed we leave the disk idle, anticipating that a request for a nearby read will come in. - There is read batching and write batching logic. - when we're servicing a batch of writes we will refuse to seek away for a read for some tens of milliseconds. Then the write stream is preempted. - when we're servicing a batch of reads (via anticipation) we'll do that for some tens of milliseconds, then preempt. - There are request deadlines, for latency and fairness. The oldest outstanding request is examined at regular intervals. If this request is older than a specific deadline, it will be the next one dispatched. This gives a good fairness heuristic while being simple because processes tend to have localised IO. Just about all of the rest of the complexity involves an array of fixups which prevent most of teh obvious failure modes with anticipation: trying to not leave the disk head pointlessly idle. Some of these algorithms are: - Process tracking. If the process whose read we are anticipating submits a write, abandon anticipation. - Process exit tracking. If the process whose read we are anticipating exits, abandon anticipation. - Process IO history. We accumulate statistical info on the process's recent IO patterns to aid in making decisions about how long to anticipate new reads. Currently thinktime and seek distance are tracked. Thinktime is the time between when a process's last request has completed and when it submits another one. Seek distance is simply the number of sectors between each read request. If either statistic becomes too high, the it isn't anticipated that the process will submit another read. The above all means that we need a per-process "io context". This is a fully refcounted structure. In this patch it is AS-only. later we generalise it a little so other IO schedulers could use the same framework. - Requests are grouped as synchronous and asynchronous whereas deadline scheduler groups requests as reads and writes. This can provide better sync write performance, and may give better responsiveness with journalling filesystems (although we haven't done that yet). We currently detect synchronous writes by nastily setting PF_SYNCWRITE in current->flags. The plan is to remove this later, and to propagate the sync hint from writeback_contol.sync_mode into bio->bi_flags thence into request->flags. Once that is done, direct-io needs to set the BIO sync hint as well. - There is also quite a bit of complexity gone into bashing TCQ into submission. Timing for a read batch is not started until the first read request actually completes. A read batch also does not start until all outstanding writes have completed. AS is the default IO scheduler. deadline may be chosen by booting with "elevator=deadline". There are a few reasons for retaining deadline: - AS is often slower than deadline in random IO loads with large TCQ windows. The usual real world task here is OLTP database loads. - deadline is presumably more stable. - deadline is much simpler. The tunable per-queue entries under /sys/block//iosched/ are all in milliseconds: read_expire Controls how long until a request becomes "expired". It also controls the interval between which expired requests are served, so set to 50, a request might take anywhere < 100ms to be serviced _if_ it is the next on the expired list. Obviously it can't make the disk go faster. Result is basically the timeslice a reader gets in the presence of other IO. 100((seek time / read_expire) + 1) is very roughly the % streaming read efficiency your disk should get in the presence of multiple readers. read_batch_expire Controls how much time a batch of reads is given before pending writes are served. Higher value is more efficient. Shouldn't really be below read_expire. * write_ versions of the above * antic_expire Controls the maximum amount of time we can anticipate a good read before giving up. Many other factors may cause anticipation to be stopped early, or some processes will not be "anticipated" at all. Should be a bit higher for big seek time devices though not a linear correspondance - most processes have only a few ms thinktime.
2003-07-04	[PATCH] elevator completion API	Andrew Morton
	From: Nick Piggin <piggin@cyberone.com.au> Introduces an elevator_completed_req() callback with which the generic queueing layer may tell an IO scheduler that a particualr request has finished.
2003-07-04	[PATCH] elv_may_queue() API function	Andrew Morton
	Introduces the elv_may_queue() predicate with which the IO scheduler may tell the generic request layer that we may add another request to this queue. It is used by the CFQ elevator.
2003-05-23	[PATCH] elevator core update	Jens Axboe
	The noop io scheduler has a data corrupting bug, because q->last_merge doesn't get cleared properly. So do that in io scheduler core, and remove the same code from deadline. Also kill bio_rq_in_between(), it's not used by anyone anymore. rbtrees are the hot thing these days. And finally, remove a direct test for REQ_CMD in rq flags, use blk_fs_request() instead.
2003-05-10	[PATCH] dynamic request allocation	Jens Axboe
	This patch adds dynamic allocation of request structures. Right now we are reserving 256 requests per initialized queue, which adds up to quite a lot of memory for even a modest number of queues. For the quoted 4000 disk systems, it's a disaster. Instead, we mempool 4 requests per queue and put an upper limit on the number of requests that we will put in-flight as well. I've kept the 128 read/write max in-flight limit for now. It is trivial to experiement with larger queue sizes now, but I want to change one thing at the time (the truncate scenario doesn't look all that good with a huge number of requests, for instance). Patch has been in -mm for a while, I'm running it here against stock 2.5 as well. Additionally, it actually kills quite a bit of code as well
2003-05-03	[PATCH] make <linux/blk.h> obsolete	Christoph Hellwig
	This file was _the_ header for block-device related stuff in earlier Linux versions, but nowdays there's just a few prototypes left that really belong into blkdev.h or genhd.h (and in one case elevator.h). This patch moves them over and removes everything but including blkdev.h from blk.h Note that blkdev.h gets all the headers that were included in blk.h inmplicitly too. Now we can start removing all references to it an maybe kill it off before 2.6. sniff
2003-01-12	[PATCH] rbtree core for io scheduler	Jens Axboe
	This patch has a bunch of io scheduler goodies that are, by now, well tested in -mm and by self and Nick Piggin. In order of interest: - Use rbtree data structure for sorting of requests. Even with the default queue lengths that are fairly short, this cuts a lot of run time for io scheduler intensive work loads. If we go to longer queue lengths, it very quickly becomes a necessity. - Add sysfs interface for the tunables. At the same time, finally kill the BLKELVGET/BLKELVSET completely. I made these return -ENOTTY in 2.5.1, but there are left-overs around the kernel. This old interface was never any good, it was centered around just one io scheduler. The io scheduler core itself has received count less hours of tuning by myself and Nick, should be in pretty good shape. Please apply. Andrew, I made some sysfs changes to the version from 2.5.56-mm1. It didn't even compile without warnings (or work, for that matter), as the sysfs store/show procedures needed updating. Hmm?
2002-10-27	[PATCH] elv_add_request cleanups	Jens Axboe
	Request insertion in the current tree is a mess. We have all sorts of variants of elv_add_request, and it's not at all clear who does what and with what locks (or not). This patch cleans it up to be: o __elv_add_request(queue, request, at_end, plug) Core function, requires queue lock to be held o elv_add_request(queue, request, at_end, plug) Like __elv_add_request(), but grabs queue lock o __elv_add_request_pos(queue, request, position) Insert request at a given location, lock must be held
2002-10-03	[PATCH] pass elevator type by reference, not value	Jens Axboe
	Ingo spotted this one too, it's a leftover from when the elevator type wasn't a variable. Also don't pass in &q->elevator, it can always be deduced from queue itself of course.
2002-09-26	[PATCH] io scheduler update	Jens Axboe
	This fixes a problem with the deadline io scheduler, if the correct insertion point is at the front of the list. This is something that we never have gotten right in 2.4 either. The problem is that the elevator merge function has to return a pointer to a struct request, and for front insert we really have to return the head of the list which cannot be expressed as a request of course. The real issue is that the elevator_merge function actually performs two functions - it scans for a merge, and if it can't find any, it selects and insertion point. It's done this way for efficiency reasons, even if the design isn't all that clean. So we change the io scheduler merge functions to get passed a pointer to a list_head pointer instead. This works for both inserts and merges. In addition, deadline checks if it really should insert at the very front. Also don't pass in request to elv_try_last_merge(), the very name of the function suggests that it's q->last_merge that we are interested in.
2002-09-25	[PATCH] deadline ioscheduler cleanups	Jens Axboe
	Some various small cleanups, optimizations, and fixes. o Make fifo_batch=32 as default, from testing this appears a good default value. We still get good throughput, and latency is good. o Reintroduce the merge_cleanup logic. We need it for deadline for rehashing requests when they have been merged. o Cleanup last_merge logic. Move it to the new elv_merged_request(), this is where it really belongs. Doing it inside the io scheduler core can causes false positives, when the queue merge functions reject an otherwise good merge o Have deadline_move_requests() account from last entry on the dispatch queue, if it is non-empty. It doesn't really matter what the last extracted sector was, if we are not right behind it. o Clean/optimize deadline_move_requests() o Account size of a request just a little bit. Streaming transfer isn't for free, it's just a lot cheaper than a seek. o Make deadline_check_fifo() more readable.
2002-09-24	[PATCH] remove elevator_linus	Jens Axboe
	Patch killing off elevator_linus for good. Sniffle.
2002-09-24	[PATCH] deadline scheduler	Jens Axboe
	This introduces the deadline-ioscheduler, making it the default. 2nd patch coming that deletes elevator_linus in a minute. This one has read_expire at 500ms, and writes_starved at 2.
2002-09-15	[PATCH] fix elevator_linus accounting	Jens Axboe
	elevator_linus is seriously broken wrt accounting. Marcelo recently took the patch to fix it in 2.4.20-pre, here's the 2.5 equiv. Right now, we account merges as costly and seeks as not. Only thing that prevents seek starvation is the aging scan. That is broken, very much so. This patch fixes that to account merges and inserts differently. A seek is ELV_LINUS_SEEK_COST more costly than a merge, currently that define is at '16'. Doing the math on a disk, this sort of makes sense. Defaults are read latency of 1024, which means 1024 merges or 64 seeks. Writes are double that.
2002-07-31	[PATCH] misc elevator/block updates	Jens Axboe
	I've got a new i/o scheduler in testing, some changes where needed in the block layer to accomodate it. Basically because right now assumptions are made about q->queue_head being the sort list. The changes in detail: o elevator_merge_requests_fn takes queue argument as well o __make_request() inits insert_here to NULL instead of q->queue_head.prev, which means that the i/o schedulers must explicitly check for this condition now. o incorporate elv_queue_empty(), it was just a place holder before o add elv_get_sort_head(). it returns the sort head of the elevator for a given request. attempt_{back,front}_merge uses it to determine whether a request is valid or not. Maybe attempt_{back,front}_merge should just be killed, I doubt they have much relevance with the wake up batching. o call the merge_cleanup functions of the elevator _after_ the merge has been done, not before. This way the elevator functions get the new state of the request, which is the most interesting. o Kill extra nr_sectors check in ll_merge_requests_fn() o bi->bi_bdev is always set in __make_request(), so kill check.