user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2005-11-07	[PATCH] kfree cleanup: fs	Jesper Juhl
	This is the fs/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in fs/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Change ll_rw_block() calls in JBD	Jan Kara
	We must be sure that the current data in buffer are sent to disk. Hence we have to call ll_rw_block() with SWRITE. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07	[PATCH] Fix JBD race in t_forget list handling	Jan Kara
	Fix race between journal_commit_transaction() and other places as journal_unmap_buffer() that are adding buffers to transaction's t_forget list. We have to protect against such places by holding j_list_lock even when traversing the t_forget list. The fact that other places can only add buffers to the list makes the locking easier. OTOH the lock ranking complicates the stuff... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28	[PATCH] ext3/jbd race: releasing in-use journal_heads	Stephen C. Tweedie
	Fix destruction of in-use journal_head journal_put_journal_head() can destroy a journal_head at any time as long as the jh's b_jcount is zero and b_transaction is NULL. It has no locking protection against the rest of the journaling code, as the lock it uses to protect b_jcount and bh->b_private is not used elsewhere in jbd. However, there are small windows where b_transaction is getting set temporarily to NULL during normal operations; typically this is happening in __journal_unfile_buffer(jh); __journal_file_buffer(jh, ...); call pairs, as __journal_unfile_buffer() will set b_transaction to NULL and __journal_file_buffer() re-sets it afterwards. A truncate running in parallel can lead to journal_unmap_buffer() destroying the jh if it occurs between these two calls. Fix this by adding a variant of __journal_unfile_buffer() which is only used for these temporary jh unlinks, and which leaves the b_transaction field intact so that we never leave a window open where b_transaction is NULL. Additionally, trap this error if it does occur, by checking against jh->b_jlist being non-null when we destroy a jh. Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07	[PATCH] Factor out phase 6 of journal_commit_transaction	Matthew Wilcox
	journal_commit_transaction() is 720 lines long. This patch pulls about 55 of them out into their own function, removes a goto and cleans up the control flow a little. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07	[PATCH] JBD: reduce stack and number of journal descriptors	Alex Tomas
	Dynamically allocate the holding array for kjournald write patching rather than allocating it on the stack. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07	[PATCH] jbd: journal overflow fix #2	Alex Tomas
	fix against credits leak in journal_release_buffer() The idea is to charge a buffer in journal_dirty_metadata(), not in journal_get_*_access()). Each buffer has flag call journal_dirty_metadata() sets on the buffer. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-11	[PATCH] BUG on error handlings in Ext3 under I/O failure condition	Hisashi Hifumi
	I found bugs on error handlings in the functions arround the ext3 file system, which cause inadequate completions of synchronous write I/O operations when disk I/O failures occur. Both 2.4 and 2.6 have this problem. I carried out following experiment: 1. Mount a ext3 file system on a SCSI disk with ordered mode. 2. Open a file on the file system with O_SYNC\|O_RDWR\|O_TRUNC\|O_CREAT flag. 3. Write 512 bytes data to the file by calling write() every 5 seconds, and examine return values from the syscall. from write(). 4. Disconnect the SCSI cable, and examine messages from the kernel. After the SCSI cable is disconnected, write() must fail. But the result was different: write() succeeded for a while even though messages of the kernel notified SCSI I/O error. By applying following modifications, the above problem was solved. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-07	[PATCH] sched: ext3: fix scheduling latencies in ext3	Ingo Molnar
	The attached patch fixes long scheduling latencies in the ext3 code, and it also cleans up the existing lock-break functionality to use the new primitives. This patch has been in the -VP patchset for quite some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-07	[PATCH] remove journal callback code from jbd	Arjan van de Ven
	With the demise of intermezzo, the journal callback stuff in jbd is entirely unused (neither ext3 nor ocfs2 use it), and thus will only bitrot and bloat the kernel with code and datastructure growth. If intermezzo ever gets resurrected this will be the least of the problems they have to face (both with generic kernel as jbd). Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] jbd wakeup fix	Andrew Morton
	Processes can sleep in do_get_write_access(), waiting for buffers to be removed from the BJ_Shadow state. We did this by doing a wake_up_buffer() in the commit path and sleeping on the buffer in do_get_write_access(). With the filtered bit-level wakeup code this doesn't work properly any more - the wake_up_buffer() accidentally wakes up tasks which are sleeping in lock_buffer() as well. Those tasks now implicitly assume that the buffer came unlocked. Net effect: Bogus I/O errors when reading journal blocks, because the buffer isn't up to date yet. Hence the recently spate of journal_bmap() failure reports. The patch creates a new jbd-private BH flag purely for this wakeup function. So a wake_up_bit(..., BH_Unshadow) doesn't wake up someone who is waiting for a wake_up_bit(BH_Lock). JBD was the only user of wake_up_buffer(), so remove it altogether. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-21	[PATCH] jbd endianness annotations	Alexander Viro
	Signed-off-by: Al Viro <viro@parcelfarce.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] ext3 barrier support	Jens Axboe
	Mount with "mount -o barrier=1" to enable barriers. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-30	[PATCH] sparse: NULL vs 0 - filesystems	Mika Kukkonen

2004-06-23	[PATCH] jbd needs to wait for locked buffers	Andrew Morton
	From: Chris Mason <mason@suse.com> jbd needs to wait for any io to complete on the buffer before changing the end_io function. Using set_buffer_locked means that it can change the end_io function while the page is in the middle of writeback, and the writeback bit on the page will never get cleared. Since we set the buffer dirty earlier on, if the page was previously dirty, pdflush or memory pressure might trigger a writepage call, which will race with jbd's set_buffer_locked. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-08	[PATCH] jbd: descriptor buffer state fix	Andrew Morton
	Fix a problem discovered by Jeff Mahoney <jeffm@suse.com>, based on an initial patch from Chris Mason <mason@suse.com>. journal_get_descriptor_buffer() is used to obtain a regular old buffer_head against the blockdev mapping. The caller will populate that bh by hand and will then submit it for writing. But there are problems: a) The function sets bh->b_state nonatomically. But this buffer is accessible to other CPUs via pagecache lookup. b) The function sets the buffer dirty and then the caller populates it and then it is submitted for I/O. Wrong order: there's a window in which the VM could write the buffer before it is fully populated. c) The function fails to set the buffer uptodate after zeroing it. And one caller forgot to mark it uptodate as well. So if the VM happens to decide to write the containing page back __block_write_full_page() encounters a dirty, not uptodate buffer, which is an illegal state. This was generating buffer_error() warnings before we removed buffer_error(). Leaving the buffer not uptodate also means that a concurrent reader of /dev/hda1 could cause physical I/O against the buffer, scribbling on what we just put in it. So journal_get_descriptor_buffer() is changed to mark the buffer uptodate, under the buffer lock. I considered changing journal_get_descriptor_buffer() to return a locked buffer but there doesn't seem to be a need for this, and both callers end up using ll_rw_block() anyway, which requires that the buffer be unlocked again. Note that the journal_get_descriptor_buffer() callers dirty these buffers with set_buffer_dirty(). That's a bit naughty, because it could create dirty buffers against a clean page - an illegal state. They really should use mark_buffer_dirty() to dirty the page and inode as well. But all callers will immediately write and clean the buffer anyway, so we can safely leave this optimising cheat in place. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-04-12	[PATCH] jbd: b_transaction zeroing cleanup	Andrew Morton
	Almost everywhere where JBD removes a buffer from the transaction lists the caller then nulls out jh->b_transaction. Sometimes, the caller does that without holding the locks which are defined to protect b_transaction. This makes me queazy. So change things so that __journal_unfile_buffer() nulls out b_transaction inside both j_list_lock and jbd_lock_bh_state(). It cleans things up a bit, too.
2004-04-11	[PATCH] jbd: fix I/O error handling	Andrew Morton
	Fix a few buglets spotted by Jeff Mahoney <jeffm@suse.com>. We're currently only checking for I/O errors against journal buffers if they were locked when they were first inspected. We need to check buffer_uptodate() even if the buffers were already unlocked.
2004-04-11	[PATCH] JBD: ordered-data commit cleanup	Andrew Morton
	For data=ordered, kjournald at commit time has to write out and wait upon a long list of buffers. It does this in a rather awkward way with a single list. it causes complexity and long lock hold times, and makes the addition of rescheduling points quite hard So what we do instead (based on Chris Mason's suggestion) is to add a new buffer list (t_locked_list) to the journal. It contains buffers which have been placed under I/O. So as we walk the t_sync_datalist list we move buffers over to t_locked_list as they are written out. When t_sync_datalist is empty we may then walk t_locked_list waiting for the I/O to complete. As a side-effect this means that we can remove the nasty synchronous wait in journal_dirty_data which is there to avoid the kjournald livelock which would otherwise occur when someone is continuously dirtying a buffer.
2004-04-11	[PATCH] jbd: fix ordered-data writeout logic	Andrew Morton
	There's some nasty code in commit which deals with a lock ranking problem. Currently if it fails to get the lock when and local variable `bufs' is zero we forget to write out some ordered-data buffers. So a subsequent crash+recovery could yield stale data in existing files. Fix it by correctly restarting the t_sync_datalist search.
2003-12-29	[PATCH] JBD: b_committed_data locking fix	Andrew Morton
	The locking rules say that b_committed_data is covered by jbd_lock_bh_state(), so implement that during the start of commit, while throwing away unused shadow buffers. I don't expect that there is really a race here, but them's the rules.
2003-12-29	[PATCH] ext3 scheduling latency fix	Andrew Morton
	Sometimes kjournald has to refile a huge number of buffers, because someone else wrote them out beforehand - they are all clean. This happens under a lock and scheduling latencies of 88 milliseconds on a 2.7GHx CPU were observed. The patch forward-ports a little bit of the 2.4 low-latency patch to fix this problem. Worst-case on ext3 is now sub-half-millisecond, except for when the RCU dentry reaping softirq cuts in :(
2003-10-21	[PATCH] Fix JBD memory leak	Andrew Morton
	Plug the two-megabyte-per-day memory leak.
2003-07-31	[PATCH] ext3: fix commit assertion failure	Andrew Morton
	We're getting asserion failures in commit in data=journal mode. journal_unmap_buffer() has unexpectedly donated this buffer to the committing transaction, and the commit-time assertion doesn't expect that to happen. It doesn't happen in 2.4 because both paths are under lock_journal(). Simply remove the assertion: the commit code will uncheckpoint the buffer and then recheckpoint it if needed.
2003-07-10	[PATCH] JBD: transaction buffer accounting fix	Andrew Morton
	From: Alex Tomas <bzzz@tmi.comex.ru> start_this_handle() takes into account t_outstanding_credits when calculating log free space, but journal_next_log_block() accounts for blocks being logged also. Hence, blocks are accounting twice. This effectively reduces the amount of log space available to transactions and forces more commits. Fix it by decrementing t_outstanding_credits each time we allocate a new journal block.
2003-07-10	[PATCH] misc fixes	Andrew Morton
	- remove accidental debug code from ext3 commit. - /proc/profile documentation fix (Randy Dunlap) - use sb_breadahead() in ext2_preread_inode() - unused var in mpage_writepages()
2003-07-02	[PATCH] ext3: fix journal_release_buffer() race	Andrew Morton
	CPU0 CPU1 journal_get_write_access(bh) (Add buffer to t_reserved_list) journal_get_write_access(bh) (It's already on t_reserved_list: nothing to do) (We decide we don't want to journal the buffer after all) journal_release_buffer() (It gets pulled off the transaction) journal_dirty_metadata() (The buffer isn't on the reserved list! The kernel explodes) Simple fix: just leave the buffer on t_reserved_list in journal_release_buffer(). If nobody ends up claiming the buffer then it will get thrown away at start of transaction commit.
2003-06-25	[PATCH] ext3: fix memory leak	Andrew Morton
	We need to unconditionally brelse() the buffer in there, because journal_remove_journal_head() leaves a ref behind. release_buffer_page() does that. Call it all the time because we can usually strip the buffers and free the page even if it was not marked buffer_freed(). Mainly affects data=journal mode
2003-06-20	[PATCH] ext3/JBD: remove trailing whitespace	Andrew Morton
	ext3 and JBD still have enormous numbers of lines which end in tabs. Fix them all up.
2003-06-17	[PATCH] ext3: explicitly free truncated pages	Andrew Morton
	With data=ordered it is often the case that a quick write-and-truncate will leave large numbers of pages on the page LRU with no ->mapping, and attached buffers. Because ext3 was not ready to let the pages go at the time of truncation. These pages are trivially reclaimable, but their seeming absence makes the VM overcommit accounting confused (they don't count as "free", nor as pagecache). And they make the /proc/meminfo stats look odd. So what we do here is to try to strip the buffers from these pages as the buffers exit the journal commit.
2003-06-17	[PATCH] JBD: fix race between journal_commit_transaction and	Andrew Morton
	start_this_handle() can decide to add this handle to a transaction, but kjournald then moves the handle into commit phase. Extend the coverage of j_state_lock so that start_this_transaction()'s examination of journal->j_state is atomic wrt journal_commit_transaction().
2003-06-17	[PATCH] JBD: additional transaction shutdown locking	Andrew Morton
	Plug a conceivable race with the freeing up of trasnactions, and add some more debug checks.
2003-06-17	[PATCH] JBD: remove lock_journal()	Andrew Morton
	This filesystem-wide sleeping lock is no longer needed. Remove it.
2003-06-17	[PATCH] JBD: remove lock_kernel()	Andrew Morton
	lock_kernel() is no longer needed in JBD. Remove all the lock_kernel() calls from fs/jbd/. Here is where I get to say "ex-parrot".
2003-06-17	[PATCH] JBD: remove remaining sleep_on()s	Andrew Morton
	Remove the remaining sleep_on() calls from JBD.
2003-06-17	[PATCH] JBD: implement dual revoke tables.	Andrew Morton
	From: Alex Tomas <bzzz@tmi.comex.ru> We're about to remove lock_journal(), and it is lock_journal which separates the running and committing transaction's revokes on the single revoke table. So implement two revoke tables and rotate them at commit time.
2003-06-17	[PATCH] JBD: implement j_committing_transaction locking	Andrew Morton
	Go through all sites which use j_committing_transaction and ensure that the deisgned locking is correctly implemented there.
2003-06-17	[PATCH] JBD: implement j_running_transaction locking	Andrew Morton
	Implement the designed locking around journal->j_running_transaction. A lot more of the new locking scheme falls into place.
2003-06-17	[PATCH] JBD: implement t_jcb locking	Andrew Morton
	Provide the designed locking around the transaction's t_jcb callback list. It turns out that this is wholly redundant at present.
2003-06-17	[PATCH] JBD: t_updates locking	Andrew Morton
	Provide the designating locking for transaction_t.t_updates.
2003-06-17	[PATCH] JBD: remove journal_datalist_lock	Andrew Morton
	This was a system-wide spinlock. Simple transformation: make it a filesystem-wide spinlock, in the JBD journal. That's a bit lame, and later it might be nice to make it per-transaction_t. But there are interesting ranking and ordering problems with that, especially around __journal_refile_buffer().
2003-06-17	[PATCH] JBD: implement b_transaction locking rules	Andrew Morton
	Go through all use of b_transaction and implement the rules. Fairly straightforward.
2003-06-17	[PATCH] JBD: Finish protection of journal_head.b_frozen_data	Andrew Morton
	We now start to move across the JBD data structure's fields, from "innermost" and outwards. Start with journal_head.b_frozen_data, because the locking for this field was partially implemented in jbd-010-b_committed_data-race-fix.patch. It is protected by jbd_lock_bh_state(). We keep the lock_journal() and spin_lock(&journal_datalist_lock) calls in place. Later, spin_lock(&journal_datalist_lock) is replaced by spin_lock(&journal->j_list_lock). Of course, this completion of the locking around b_frozen_data also puts a lot of the locking for other fields in place.
2003-06-17	[PATCH] JBD: rename journal_unlock_journal_head to	Andrew Morton
	journal_unlock_journal_head() is misnamed: what it does is to drop a ref on the journal_head and free it if that ref fell to zero. It doesn't actually unlock anything. Rename it to journal_put_journal_head().
2003-06-17	[PATCH] JBD: fine-grain journal_add_journal_head locking	Andrew Morton
	buffer_heads and journal_heads are joined at the hip. We need a lock to protect the joint and its refcounts. JBD is currently using a global spinlock for that. Change it to use one bit in bh->b_state.
2003-06-17	[PATCH] JBD: remove jh_splice_lock	Andrew Morton
	This was a strange spinlock which was designed to prevent another CPU from ripping a buffer's journal_head away while this CPU was inspecting its state. Really, we don't need it - we can inspect that state directly from bh->b_state. So kill it off, along with a few things which used it which are themselves not actually used any more.
2003-06-17	[PATCH] JBD: fix race over access to b_committed_data	Andrew Morton
	From: Alex Tomas <bzzz@tmi.comex.ru> We have a race wherein the block allocator can decide that journal_head.b_committed_data is present and then will use it. But kjournald can concurrently free it and set the pointer to NULL. It goes oops. We introduce per-buffer_head "spinlocking" based on a bit in b_state. To do this we abstract out pte_chain_lock() and reuse the implementation. The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning in there) and we may move this to a hashed spinlock later.
2003-04-02	[PATCH] ext3 journal commit I/O error fix	Andrew Morton
	From: Hua Zhong <hzhong@cisco.com> The current ext3 totally ignores I/O errors that happened during a journal_force_commit time, causing user space to falsely believe it has succeeded, which actually did not. This patch checks IO error during journal_commit_transaction. and aborts the journal when there is I/O error. Originally I thought about reporting the error without doing aborting the journal, but it probably needs a new flag. Aborting the journal seems to be the easy way to signal "hey sth is wrong..".
2003-02-10	[PATCH] Fix synchronous writers to wait properly for the result	Andrew Morton
	Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> points out a bug in ll_rw_block() usage. Typical usage is: mark_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); wait_on_buffer(bh); the problem is that if the buffer was locked on entry to this code sequence (due to in-progress I/O), ll_rw_block() will not wait, and start new I/O. So this code will wait on the _old_ I/O, and will then continue execution, leaving the buffer dirty. It turns out that all callers were only writing one buffer, and they were all waiting on that writeout. So I added a new sync_dirty_buffer() function: void sync_dirty_buffer(struct buffer_head *bh) { lock_buffer(bh); if (test_clear_buffer_dirty(bh)) { get_bh(bh); bh->b_end_io = end_buffer_io_sync; submit_bh(WRITE, bh); } else { unlock_buffer(bh); } } which allowed a fair amount of code to be removed, while adding the desired data-integrity guarantees. UFS has its own wrappers around ll_rw_block() which got in the way, so this operation was open-coded in that case.
2003-01-14	[PATCH] fix ext3 memory leak	Andrew Morton
	This is the leak which Con found. Long story... - If a dirty page is fed into ext3_writepage() during truncate, block_write_full_page() will reutrn -EIO (it's outside i_size) and will leave the buffers dirty. In the expectation that discard_buffer() will clean them. - ext3_writepage() then adds the still-dirty buffers to the journal's "async data list". These are buffers which are known to have had IO started. All we need to do is to wait on them in commit. - meanwhile, truncate will chop the pages off the address_space. But truncate cannot invalidate the buffers (in journal_unmap_buffer()) because the buffers are attached to the committing transaction. (hm. This behaviour in journal_unmap_buffer() is bogus. We just never need to write these buffers.) - ext3 commit will "wait on writeout" of these writepage buffers (even though it was never started) and will then release them from the journalling system. So we end up with pages which are attached to no mapping, which are clean and which have dirty buffers. These are unreclaimable. Aside: ext3-ordered has two buffer lists: the "sync data list" and the "async data list". The sync list consists of probably-dirty buffers which were dirtied in commit_write(). Transaction commit must write all thee out and wait on them. The async list supposedly consists of clean buffers which were attached to the journal in ->writepage. These have had IO started (by writepage) so commit merely needs to wait on them. This is all designed for the 2.4 VM really. In 2.5, tons of writeback goes via writepage (instead of the buffer lru) and these buffers end up madly hpooing between the async and sync lists. Plus it's arguably incorrect to just wait on the writes in commit - if the buffers were set dirty again (say, by zap_pte_range()) then perhaps we should write them again before committing. So what the patch does is to remove the async list. All ordered-data buffers are now attached to the single "sync data list". So when we come to commit, those buffers which are dirty will have IO started and all buffers are waited upon. This means that the dirty buffers against a clean page which came about from block_write_full_page()'s -EIO will be written to disk in commit - this cleans them, and the page is now reclaimable. No leak. It seems bogus to write these buffers in commit, and indeed it is. But ext3 will not allow those blocks to be reused until the commit has ended so there is no corruption risk. And the amount of data involved is low - it only comes about as a race between truncate and writepage().