login
Header Space

 
 

Linux: Why Reiser4 Is Not in the Kernel

July 17, 2006 - 7:10am
Submitted by Jeremy on July 17, 2006 - 7:10am.
Linux news

The question of if and when Reiser4 will be merged into the mainline Linux kernel has been an on-going debate for a couple of years [story]. The filesystem was described as being "fairly stable for average users" by Hans Reiser [interview] over two years ago, in March of 2004 [story]. It has been merged into Andrew Morton [interview]'s -mm kernel [story], though issues such as Reiser4 plugins [story] and coding style [story] caused lengthy discussions last year. Two recent threads on the lkml raised the question again, asking at a non-technical level why Reiser 4 has not been included in the Linux kernel. Some have offered theories that Reiser4 is being blocked for political reasons, others because of concerns that once Reiser4 is included Namesys might forget it and move onto another filesystem. Responses to these theories point out that in reality there are technical issues that must be resolved before the filesystem will be merged, and that much progress has been made toward this end. Additional discussion can be found on a relevant recently created kernel newbies wiki page.

Hans Reiser posted a "short term task list for Reiser4" to address the remaining technical issues. The todo list included getting batch_write merged into the -mm kernel [story], getting read optimization code merged into the -mm kernel, documenting everything in the Namesys wiki, exploring and addressing reports of system pauses when using Reiser4, a complete review of the crypt-compress code, a large effort in optimizing fsync, a review of installation instructions, and a review of the kernel documentation. Hans explains, "unfortunately, our code stability is going to decrease for a bit due to all these changes to the read and write code --- no way to cure that but passage of time. On the other hand, our CPU usage went way down. Reiser4's only performance weakness now is fsync. Once the crypt-compress code is ready, we will release Reiser4.1-beta (with plugins, releasing a beta means telling users that if they mount -o reiser4.1-beta then cryptcompress will be their default plugin, and if they don't, then they are using Reiser4.0 still). Doubling our performance and halving our disk usage is going to be fun."


From: "ivo welch" [email blocked]
To:  linux-kernel
Subject: reiserFS?
Date:	Sun, 16 Jul 2006 08:45:58 -0400

dear linux geeks:  may I ask why the "new" (many years old) reiser
filesystem is not making it into the kernel?  it does seem to have
some advantages, if nothing else at least over the old reiser v3.

sincerely,  /ivo welch

(please cc me personally. I am not a linux geek accdg to the lkml faq,
but just an end user.)


From: Matthias Andree <matthias.andree@gmx.de> Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 15:50:38 +0200 On Sun, 16 Jul 2006, ivo welch wrote: > dear linux geeks: may I ask why the "new" (many years old) reiser > filesystem is not making it into the kernel? it does seem to have > some advantages, if nothing else at least over the old reiser v3. Why would anyone want ReiserFS in the kernel that is discontinued by its developers when it's just started to become stabile and useful, with bugs (hashing) remaining, as happened with 3.6? Who is going to make guarantees this won't happen again with reiser4? Besides that, there had been technical discrepancies that prevented reiser4 inclusion into the baseline kernel when it was suggested, search the archives for details. There's ext3, you can set the dir_index option (either for mke2fs, or afterwards with tune2fs, then unmount and run e2fsck -fD) and you're set. -- Matthias Andree
From: Lexington Luthor <Lexington.Luthor@gmail.com> Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 16:29:24 +0100 Matthias Andree wrote: > Why would anyone want ReiserFS in the kernel that is discontinued by its > developers when it's just started to become stabile and useful, with > bugs (hashing) remaining, as happened with 3.6? Who is going to make > guarantees this won't happen again with reiser4? I looked at the reiser4 patch, and it does very little outside of the fs/reiser4 directory. If it is no longer supported by namesys, why can't it just be removed from the kernel like all the other bits that are obsoleted? I am just saddened that kernel decisions are motivated by politics and a personal dislike of Hans Reiser rather than technical merit. :( > There's ext3, you can set the dir_index option (either for mke2fs, or > afterwards with tune2fs, then unmount and run e2fsck -fD) and you're set. I am not arguing for the inclusion of reiser4 in the kernel, but you should know it has its uses. There are very many things that reiser4 can do that will make ext3 blow up. It simply the best filesystem for many kinds of usage patterns. Regards, LL
From: grundig [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 18:55:30 +0200 (added by [email blocked]) El Sun, 16 Jul 2006 16:29:24 +0100, Lexington Luthor <Lexington.Luthor@gmail.com> escribió: > I am just saddened that kernel decisions are motivated by politics and a > personal dislike of Hans Reiser rather than technical merit. :( You (and every reiser4 fan that FUDs against linux developers just because he can't understand why reiser4 still is not in) are seriously missing the point. Nobody is oppossing that reiser4 gets into the main linux tree. Let me remember you that Linux has already merged quite a lot of filesystems, including a lot of them that don't even matter in the real world (BeFS, minix, ADFS, AFFS, EFS, VXFS, qnx4fs, sysvfs, ncpfs, codafs. Let me also remember you that we have had other relevant filesystems for year (reiser3, XFS, JFS), and everybody is ok with them, nobody is planning to kill anyone. With this short panoramic view of Linux I find shocking that people dares even to _suggest_ that linux don't want to include a filesystem for "political reasons". Just to point you how wrong you are, compare OCFS2 and GFS. Both of them are clustering filesystems, being OCFS from Oracle and GFS from Redhat. _Both_ filesystems have been asked to be included in the main tree. As for today, only OCFS2 is in (since 2.6.16, after several reviews and after fixing their code like everyone else). GFS still isn't in and seems to need work, but you won't see to the GFS technical leader tell everybody (like Hans has done here) that Linux politics don't allow GFS to go in. Instead of wining, they just go back to work on their issues.
From: Arjan van de Ven [email blocked] Subject: Re: reiserFS? Date: Mon, 17 Jul 2006 11:23:02 +0200 > With this short panoramic view of Linux I find shocking that people > dares even to _suggest_ that linux don't want to include a filesystem > for "political reasons". that's just their way of doing politics; they think that by using that sort of psychological warfare the code gets merged without review rather than with ... and sometimes it even works unfortunately.
From: Matthias Andree <matthias.andree@gmx.de> Subject: Re: reiserFS? Date: Mon, 17 Jul 2006 16:56:04 +0200 On Sun, 16 Jul 2006, Lexington Luthor wrote: > Matthias Andree wrote: > >Why would anyone want ReiserFS in the kernel that is discontinued by its > >developers when it's just started to become stabile and useful, with > >bugs (hashing) remaining, as happened with 3.6? Who is going to make > >guarantees this won't happen again with reiser4? > > I looked at the reiser4 patch, and it does very little outside of the > fs/reiser4 directory. If it is no longer supported by namesys, why can't > it just be removed from the kernel like all the other bits that are > obsoleted? People (including you) would scream blue murder if their file system were going away. The same would happen if it just didn't work for them. Somebody, however skilled they may be, just trying out a patch and finding it works for them is certainly not sufficient reason to judge if a product is of adequate quality. The code was reviewed, found to contain major misdesigns, and the maintainers refused to fix those, and that's it. > I am just saddened that kernel decisions are motivated by politics and a > personal dislike of Hans Reiser rather than technical merit. :( If you had understood my postings, it had been clear to you that there have been technical reasons that blocked the inclusion, and there have additionally been precedences of such misconduct, or maintainers declaring the system stable when in fact it was years (literally) from that. I respect namesys for the efforts they made in getting 3.6 and the toolchain workable, but some issues remain that some people never run into, are showstoppers for others, and at that point where minor polishing was due, namesys moved on to reiser4, dropping 3.6 support - and that was a decision that made me phase out reiserfs 3.6, and I'm certainly not looking into reiser4 until 2 years after a first major distro (that's currently Debian, Ubuntu, Fedora, Opensuse, Mandrake) ships it as the default root and user FS. > >There's ext3, you can set the dir_index option (either for mke2fs, or > >afterwards with tune2fs, then unmount and run e2fsck -fD) and you're set. > > I am not arguing for the inclusion of reiser4 in the kernel, but you > should know it has its uses. There are very many things that reiser4 can > do that will make ext3 blow up. It simply the best filesystem for many > kinds of usage patterns. Apparently, kernel coding standard applicability doesn't fall into the usage patterns you're referring to. SCNR. I haven't heard the other side, but if you're going to contribute to some project you MUST please its maintainers - life's bad... -- Matthias Andree
From: Xavier Roche <roche+kml2@exalead.com> Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 18:16:31 +0200 > It simply the best filesystem for many kinds of usage patterns. The most frightening too. Reiserfs might be suitable for very specific appliactions, but to use it in production machine, you need to have some guts. My last reiserfs partition was blown up two days ago, because of a bad sector, plus a fatal oops, looping endlessly. This was the second time, and the last one, as none of my ext3 filesystems *ever* had similar problems, despite numerous other bad sector issues. Not mentionning the funny "recovery" tool, which generally finishes to trash your data. Jul 14 23:35:29 linux kernel: hdh: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jul 14 23:35:29 linux kernel: hdh: dma_intr: error=0x40 { UncorrectableError }, LBAsect=12458384, sector=12458383 Jul 14 23:35:29 linux kernel: ide: failed opcode was: unknown Jul 14 23:35:29 linux kernel: end_request: I/O error, dev hdh, sector 12458383 Jul 14 23:35:29 linux kernel: ------------[ cut here ]------------ Jul 14 23:35:29 linux kernel: kernel BUG at fs/reiserfs/file.c:620! .. Jul 14 23:35:29 linux kernel: <0>Fatal exception: panic in 5 seconds The funny part is that 14 july is the french's fireworks day, generally launched around midnight.
From: Christian Trefzer [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 18:28:31 +0200 On Sun, Jul 16, 2006 at 06:16:31PM +0200, Xavier Roche wrote: > > It simply the best filesystem for many kinds of usage patterns. > > The most frightening too. Reiserfs might be suitable for very specific > appliactions, but to use it in production machine, you need to have > some guts. > > My last reiserfs partition was blown up two days ago, because of a bad > sector, plus a fatal oops, looping endlessly. This was the second > time, and the last one, as none of my ext3 filesystems *ever* had > similar problems, despite numerous other bad sector issues. Not > mentionning the funny "recovery" tool, which generally finishes to > trash your data. I don't quite understand. You are supposed to dd_rescue the whole block device to a working drive and use fsck on the copy. Whatever is lost in the process must of course be restored from a recent backup. But, as a friend of mine put it recently, people don't need backup, they only need restore ; ) fsck on a faulty drive might cause even more damage - how do you know that other areas of the device are OK? I also oppose the ReiserFS-v3.x design philosophy regarding faulty storage layer, but in any case where your _live_ data is valuable and uptime counts, you _really_should_ use a RAID of some sort. Kind regards, uziel PS: Your mail was line-wrapped really bad, you might want to look into that.
From: Theodore Tso [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 12:56:48 -0400 On Sun, Jul 16, 2006 at 06:28:31PM +0200, Christian Trefzer wrote: > I don't quite understand. You are supposed to dd_rescue the whole block > device to a working drive and use fsck on the copy. Whatever is lost in > the process must of course be restored from a recent backup. But, as a > friend of mine put it recently, people don't need backup, they only need > restore ; ) If the disk is known to be bad, yes, and the number of bad blocks is growing. On the other hand, disks can and will have a few bad blocks, or bad writes that don't mean the disk is going bad, and a modern filesystem should be robust enough that a single failed sector doesn't cause the filesystem to go completely kaput. In fact, one of the scary trends with hard drives is that size is continuing to grow expoentially, access times linearly (more or less), and error rates (errors per kilobytes per unit time) are remaining more or less constant. The fact that reiserfs uses a single B-tree to store all of its data means that very entertaining things can happen if you lose a sector containing a high-level node in the tree. It's even more entertaining if you have image files (like initrd files) in reiserfs format stored in reiserfs, and you run the recovery program on the filesystem..... Yes, I know that reiserfs4 is alleged to fix this problem, but as far as I know it is still using a single unitary tree, with all of the pitfalls that this entails. Now, that being said, that by itself is not a reason not to decide not to include reseirfs4 into the mainline sources. (I might privately get amused when system administrators use reiserfs and then report massive data loss, but that's my own failure of chairty; I'm working on it.) For the technical reasons why resierfs4 hasn't been integrated, please see the mailing list archives. - Ted
From: Lexington Luthor <Lexington.Luthor@gmail.com> Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 18:26:03 +0100 Theodore Tso wrote: > If the disk is known to be bad, yes, and the number of bad blocks is > growing. On the other hand, disks can and will have a few bad blocks, > or bad writes that don't mean the disk is going bad, and a modern > filesystem should be robust enough that a single failed sector doesn't > cause the filesystem to go completely kaput. I never trust a single hard drive with data that cannot be instantly re-generated anyway (eg squid caches). The fact that some people have hardware errors should not require every single fs to accommodate random bad-sectors. Feel free to use ext3 or other fs which handles this situation (and other situations) better, but reiserfs works fine on good hardware. It has been my root filesystem on many systems with no problems whatsoever, even with cheap SATA drives. > In fact, one of the scary trends with hard drives is that size is > continuing to grow expoentially, access times linearly (more or less), > and error rates (errors per kilobytes per unit time) are remaining > more or less constant. > > The fact that reiserfs uses a single B-tree to store all of its data > means that very entertaining things can happen if you lose a sector > containing a high-level node in the tree. It's even more entertaining > if you have image files (like initrd files) in reiserfs format stored > in reiserfs, and you run the recovery program on the filesystem..... > > Yes, I know that reiserfs4 is alleged to fix this problem, but as far > as I know it is still using a single unitary tree, with all of the > pitfalls that this entails. > > Now, that being said, that by itself is not a reason not to decide not > to include reseirfs4 into the mainline sources. (I might privately > get amused when system administrators use reiserfs and then report > massive data loss, but that's my own failure of chairty; I'm working > on it.) For the technical reasons why resierfs4 hasn't been > integrated, please see the mailing list archives. > I read the archives, and most of the problems pointed out during the review were fixed relatively quickly, followed by a flame war due to some suggesting that reiser4 should not be able to affect VFS semantics, and other such matters (which IMO should be outside of the scope of a code review). There has been no follow-up review as far as I can tell. The discussion quickly degenerated into a personality argument against Mr Reiser, with several developers taking a strong position against reiser4 (the exact reasons for which are not specified). I don't quite know where reiser4 stands at the moment, given that it is in -mm and has been for a very long time. I also looked at the patch again, and it is indeed quite well isolated from the rest of the kernel so I don't understand why it is not being merged as an EXPERIMENTAL option. Regardless, it is available in -mm for anyone to use, and last I checked, works incredibly well leaving other filesystems miles behind in terms of speed. I haven't tested it enough to comment on the reliability, but if it is as reliable as reiserfs, it is sufficient for me and many others who use RAID and a UPS. Regards, LL
From: Theodore Tso [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 13:48:04 -0400 On Sun, Jul 16, 2006 at 06:26:03PM +0100, Lexington Luthor wrote: > I read the archives, and most of the problems pointed out during the > review were fixed relatively quickly, followed by a flame war due to > some suggesting that reiser4 should not be able to affect VFS semantics, > and other such matters (which IMO should be outside of the scope of a > code review). There has been no follow-up review as far as I can tell. As far as I know not all of the problems were fixed. And it has been observed that given the abuse and accusations that were directed at the people who did decide to review it, that it would not at all surprising if some (all?) of reviewers may have decided they had better things to do. Getting things merged into mainline is not a right, and the reviewers are volunteers..... Speaking for myself, since I don't enjoy being accused of partisanship and being ascribed of having a desire to backstab reiserfs, I have a personal policy to avoid reiserfs review, and recuse myself from any votes within program committee discussions regarding Hans Reiser. Being accused of taking unfair advantage of my volunteer activities is something I allow myself to get into once. - Ted
From: Diego Calleja [email blocked] Subject: "Why Reuser 4 still is not in" doc (was: 'reiserFS?') Date: Sun, 16 Jul 2006 22:01:15 +0200 El Sun, 16 Jul 2006 13:48:04 -0400, Theodore Tso [email blocked] escribió: > As far as I know not all of the problems were fixed. And it has been > observed that given the abuse and accusations that were directed at > the people who did decide to review it, that it would not at all > surprising if some (all?) of reviewers may have decided they had > better things to do. Getting things merged into mainline is not a > right, and the reviewers are volunteers..... Maybe it's too late and reiser 4 will get in in the next release, but I've written this doc into the kernelnewbies' wiki: http://wiki.kernelnewbies.org/WhyReiser4IsNotIn . If you disagree with something in that doc, edit it or just answer to this mail what you want to see in it and I'll add it myself.
From: "Dr. David Alan Gilbert" [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 18:46:37 +0100 It is a sad reflection that we have these regular 'fs wars'; and that most of them are driven by peoples bad experiences with particular filesystems. That leads me to ask what level of testing is performed on each filesystem - are there filesystem torture tests that are getting run by someone (who?) on various filesystems (preferably on large TB sized ones, preferably with simulated failures and resets)? The discussions on Ext4 a few weeks ago made me think that the thing I'd value more than anything else would be a damn good test regime. It would be much nicer if the fs wars came down to peoples particularly good experiences with filesystems rather than people selecting file systems based on which one has lost them data most rarely. Dave P.S. For the record I use Reiser for large (>500GB) fs since I couldn't get Ext3 stable on one a year or so ago and xfs failed the 'recover from hitting reset' test. A couple of years ago I wouldn't touch Reiser because of NFS issues, but it seems to have grown out of that. -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/
From: Theodore Tso [email blocked] Subject: Re: reiserFS? Date: Sun, 16 Jul 2006 14:14:31 -0400 On Sun, Jul 16, 2006 at 06:46:37PM +0100, Dr. David Alan Gilbert wrote: > That leads me to ask what level of testing is performed on each > filesystem - are there filesystem torture tests that are getting > run by someone (who?) on various filesystems (preferably on large > TB sized ones, preferably with simulated failures and resets)? > The discussions on Ext4 a few weeks ago made me think that the > thing I'd value more than anything else would be a damn good > test regime. As far as I know ext2/3 is the only filesystem with a fsck tool that has a regression test suite. It's always amazed me that filesystem consistency checkers and repair tools get so little attenion by most filesystem developers. As far as random torture testing, Pavel has written a random test tool that punches random errors into random blocks of a filesystem, and that was used to uncover a couple of cases that e2fsprogs didn't handle cleanly. Those were reported to me, and I fixed them, and the edge cases were encorprated into the regression test suite. IIRC it came up in discussion a few weeks ago one LKML or linux-fsdevel (I can't remember which), and I believe either someone from XFS or Reiser team was going to take Pavel's torture tester and adapt it do some robustifying of their filesystem's repair capabilities. Finally, the good folks at the Stanford Metacompilation group did some very interesting work to find bugs in three common Linux filesystems: http://keeda.stanford.edu/~junfeng/papers/osdi04/osdi04.html Regards, - Ted
From: Caleb Gray [email blocked] Subject: Reiser4 Inclusion Date: Sun, 16 Jul 2006 20:02:15 -0700 Dear Linux Kernel Developers, I would like to express my experiences with the reiser4 filesystem and my reasons for its readiness to be officially included in the Linux kernel. I have been putting together servers since 2001, all of which are still operational and serving web sites reliably. The earliest servers I built used ext3 for their primary filesystems. Overtime I realized that I needed a faster filesystem for my servers' so I tried reiserfs. Those servers were, in fact, more responsive but carried several headaches into my life due to severe unreliability and so I was forced to convert all of the reiserfs servers to ext3. It wasn't until two years ago that I read about reiser4 and felt as though I should give the new reiser filesystem a chance. After two years of reiser4 and five years of ext3, I can attest to three things that reiser4 does just as well or better than ext3: speed, responsiveness, and reliability. This is not to say that reiser4 is _better_ than ext3, this is to simply say that it is as production ready as ext3 is. The reliability of reiser4 does _NOT_ compete with ext3 but it doesn't falter from it either. For every time that I have to run fsck.ext3, I have to run fsck.reiser4. Every time one of my servers crash, whether it's ext3 or reiser4, I spend the exact same amount of time recovering lost/broken files. And to note: the atomic file saving system that reiser4 implements has never caused me any issues during file recovery. Reiser4's responsiveness is undoubtedly at least twice as fast as ext3. I have deployed two nearly identical servers in Florida (I live in Washington state) but one difference: one uses ext3 and the other reiser4. The ping time of the reiser4 server is (on average) 20ms faster than the ext3 server. It has maintained this speed for the past two years against the ext3 server even with aging hardware and bulking file and directory structures. (Both of the filesystems have slowed down at a similar pace for the duration of their lifetime [about 15ms].) And finally reiser4's speed. I am constantly transferring files between other servers, and hard drives. The servers are also (obviously) serving data to the viewers of web sites, dealing with huge email queues (a few gigabytes every few hours), and handling heavy cron jobs to tarball user dirs from one drive to another. The reiser4 and ext3 servers deal with relatively the same amount of data to compress (~190GiB each), and the reiser4 is and always has been the first to finish. Not only finish first though, it generally finishes about 45 minutes before the ext3 server. (You can ignore the idea that it's probably the CPUs that can't handle the compression not the filesystems, because while the backup is running on both dual core processors the load never surpasses 45%; the bottleneck comes down to the throughput efficiency of data between drives.) The purpose of this email is not to bash ext3. As I have said I have a five year old ext3 server that runs great, and I intend to keep it that way. The reason that I have sent this is to present real life situations where reiser4 is reliable, fast, usable, and production ready. It is both realistic and reasonable to say that reiser4 is prepared to be officially supported in the Linux kernel. Please consider the fact that I have patched my servers' kernels time and time again, with all kinds of patches, and I have never once had an issue with the reiser4 patched kernels. Thank you for taking time away from development to read this email (I'm a programmer too), I know how it is. Sincerely, Caleb Gray
From: Arjan van de Ven [email blocked] Subject: Re: Reiser4 Inclusion Date: Mon, 17 Jul 2006 11:25:54 +0200 On Sun, 2006-07-16 at 20:02 -0700, Caleb Gray wrote: > Dear Linux Kernel Developers, > > I would like to express my experiences with the reiser4 filesystem and > my reasons for its readiness to be officially included in the Linux kernel. Hi, may I ask why you are sending this? Have you done code audits to the code? Have you done anything that was on the "these need fixing before it can go in" list? If not, aren't you just doing campaigning on non-technical grounds? And isn't that a bad idea? Arjan van de Ven -- who is starting to smell a directed PR campaign leading to allergic reactions
From: Grzegorz Kulewski [email blocked] Subject: Re: Reiser4 Inclusion Date: Mon, 17 Jul 2006 13:48:02 +0200 (CEST) Hi Arjan, On Mon, 17 Jul 2006, Arjan van de Ven wrote: > On Sun, 2006-07-16 at 20:02 -0700, Caleb Gray wrote: >> Dear Linux Kernel Developers, >> >> I would like to express my experiences with the reiser4 filesystem and >> my reasons for its readiness to be officially included in the Linux kernel. > > Hi, > > may I ask why you are sending this? Have you done code audits to the > code? Have you done anything that was on the "these need fixing before > it can go in" list? Well, as I understand he is end user (an advanced one). He does his job as a end user: he does testing and reports back the results. This is not that common, as many users do not report problems / requests they have. He even did more: he tested (very hard and extensively) experimental, not even in the tree part of the kernel. And he reported problems / ideas he found in a very kind and gentle way. This is not so common and makes him a valuable person in the users comunity in my opinion. As I understand you, as a developer, should say "thank you" to him and make everything you can to solve the problems he has and help implement the parts of the software he needs. No? That way you build comunity of users that not only are using the software but also are giving back in form of bug reports, feature requests, continuous testing on variety of setups (that no developer ever can have all), reviews, ideas, telling others about what a great software with friendly comunity they found and so on. For me (I am active end user of most open source projects and developer on others) the comunity and good contacs between developers and end users is the most important part of the software. It gives me security. Even if the software is not yet stable it can be fixed by cooperation between users and developers. While people are really way harder to fix than software. And he as a end user does not have to (and probably does not even have enough knowledge about the kernel internals) make code audits and review of new filesystem. So why are you demanding that he does one? > If not, aren't you just doing campaigning on > non-technical grounds? And isn't that a bad idea? Well, his kind message was not very technical. But wasn't completly non technical or flamewar either. He tested software, compared and reported what he saw. He also expressed wish (that many users have) that Reiser4, as a usefull and even useable in some production evironments, should be integrated into the kernel. Because there are users for it. > Arjan van de Ven -- who is starting to smell a directed PR campaign > leading to allergic reactions Come on. Another conspiracy theory? Why some people just can't understand that Reiser4 is not that bad (from end user's point of view)? Some people tested it and found it good and want to have it integrated ASAP. Some even can't live without it after they used it for a while and saw how good it is in something... I can assure you that it really is not some directed centrally controlled campaign. This is just what many users want. I too tested Reiser4 some time ago. It didn't have any big problems for me. But I am not using (or testing) it now. Why? Mainly because of security: if Reiser4 is not merged (even as a experminental, subject to change, unstable, whatever) it will work with new kernels as long as Namesys will release patches. And if something happens to Namesys I will have to port it to new kernels (and that is usually trivial for kernel developers introducing incompatible internal kernel API changes but not for me) myself or will have to use old kernels. And _that_ is a problem for me. (Not to mention that I am regulary applying 4-7 patches, some big ones, for every kernel I am building and resolving merge problems in not your code is not easy thing to do and takes time. While I can live without staircase scheduler or vesafb-tng if my manual merge attempt fails I can not do so without my main filesystem. And -mm is a little too unstable for me recently.) It is unfortunate that Hans Reiser pushed Reiser4 the way he did and that he got the reaction from some kernel developers he did got. But he and his developers did (and are still doing) very hard job to fix problems and make Reiser4 better and more suitable into the kernel. And having Reiser4 out of the kernel is hurting mainly end users. Really. Arjan, is this really technically impossible to have Reiser4 merged into the kernel after fixing some worst problems that touch mm and VFS (in say 2 months), flagged unofficial-try-merge-for-testing, super-experimental and subject-to-change? I would make live of many end users easier and does not sound that bad for me especially in the 2.6 forever era... If someone thinks that Reiser4 is too unstable or evil he can set it to N and be happy. And if Reiser4 will be abandoned by Namesys and not fixed further it could be maintained by kernel developers at a minimal level (porting to new kernel internal APIs as they change) for say 6-12 months while flagged for removal and then removed because of unofficial-try-merge-for-testing flag. This at least does give some time to migrate from it for end users (and maybe even time to fix it for some other developers?). Thanks and sorry for such long post, wrong as usual, Grzegorz Kulewski
From: Diego Calleja [email blocked] Subject: Re: Reiser4 Inclusion Date: Mon, 17 Jul 2006 16:06:18 +0200 El Mon, 17 Jul 2006 13:48:02 +0200 (CEST), Grzegorz Kulewski [email blocked] escribió: > If someone thinks that Reiser4 is too unstable or evil he can set it to N > and be happy. And if Reiser4 will be abandoned by Namesys and not fixed http://wiki.kernelnewbies.org/WhyReiser4IsNotIn
From: Hans Reiser [email blocked] Subject: short term task list for Reiser4 Date: Tue, 11 Jul 2006 15:04:20 -0700 Please feel free to comment on this list and the order of its tasks: 0) fix all bugs as they arise 1) get batch_write into the -mm kernel --- small task 2) get read optimization code into the -mm kernel (coded and probably debugged but not fully tested and not sent in yet) --- small task 3) get EVERYTHING into wiki (migration has started already, thanks flx). 4) review complaints of pauses while using reiser4 --- size of task unknown, and it is also unknown how much we may have fixed it while writing recent patches. 5) review crypt-compress code --- full code review --- substantive task 6) optimize fsync --- substantive task which requires using fixed area for write twice logging, and using write twice logging for fsync'd data. It might require creating mount options to choose whether to optimize for serialized sequential fsyncs vs. lazy fsyncs. 7) review all of our installation instructions --- I am already doing that, but volunteers who will help out our wiki would be sorely appreciated. Installing reiser4 as the root for each distro needs step-by-step instructions. 8) review our kernel documentation --- I should do that but when will I have time? Unfortunately, our code stability is going to decrease for a bit due to all these changes to the read and write code --- no way to cure that but passage of time. On the other hand, our CPU usage went way down. Reiser4's only performance weakness now is fsync. Once the crypt-compress code is ready, we will release Reiser4.1-beta (with plugins, releasing a beta means telling users that if they mount -o reiser4.1-beta then cryptcompress will be their default plugin, and if they don't, then they are using Reiser4.0 still). Doubling our performance and halving our disk usage is going to be fun.



Related Links:

I personally find Arjan van d

July 17, 2006 - 11:32am
Anonymous (not verified)

I personally find Arjan van de Ven's first reaction to a user supporting reiserfs 4 rather absurd. He's got to change his attitude.

If you had been on lkml since

July 17, 2006 - 11:45am
Anonymous (not verified)

If you had been on lkml since a few years you'd understand it. Reiser 4 people announced that reiser 4 was "ready"...3 years ago or so? In those three years hans reiser has been asking to merge reiser 4 and reacting with attacks to anyone who would point out that it's "still not ready".

Today, still after three years of several people working at full time they're cleaning up things - just imagine what the state was three years ago. To be fair, linux people has been listening requests for inclusion for years, all of them ignoring the fact that reiser 4 may not be ready. After so much time, one starts wondering if people is just blind or there's some conspiration behind it. Is not that reiser people are bad programmers, but reiser 4 is a very complex filesystem and obviously is taking years to finish it. Do you think that ZFS was included in opensolaris as soon as it started working?

By that logic the Linux kerne

July 17, 2006 - 12:27pm
Anonymous (not verified)

By that logic the Linux kernel itself is "still not ready" and shouldn't ever be used (or merged into a distribution).

Some filesystems (like ext3) are merge and forget. Reiser4 isn't, mostly because Reiser4 provides _more_ than the traditional Unix filesystem semantics, and software and usage patterns have had 30 years to employ the traditional semantics, so there's not much churn.

As long as Reiser4 can provide the baseline funtionality, then that should be good enough.

Unless, of course, this whole argument is in actuality a naming dispute, e.g. ext3dev versus ext4. Maybe if Hans used a different naming convention people would calm down.

Reiserfs has some bad bagage

July 18, 2006 - 1:22pm
-___Anonymous___- (not verified)

Reiserfs (and Hans) is carrying around some bad baggage in the form of abandoning a filesystem shortly after it is merged.
Namesys needs to show that they will stand behind their code for more than just the time it takes to get merged.
Hans/Namesys also need to open up a lot more and accept other peoples patches more easily. Have you ever tried getting Hans to apply a fix for reiserfs? even obviously correct fixes are a pain to get merged - either you just get ignored or it takes ages.

I'd say the above are just as big obstackles to merging reiser4 than the current code quality.

Just for an example of Hans'

July 17, 2006 - 12:28pm
Anonymous (not verified)

Just for an example of Hans' trying to sell reiser4 with marketing speak like he has done for years, read the announcement:

"crypt-compress code is ready, we will release Reiser4.1-beta (with plugins, releasing a beta means telling users that if they mount -o reiser4.1-beta then cryptcompress will be their default plugin, and if they don't, then they are using Reiser4.0 still). Doubling our performance and halving our disk usage is going to be fun."

...as everybody knows, "halving the disk usage" is a completely stupid idea for most of computers out there. Your compression rate depends on the kind of data you save (now, you could write reiser4 plugins for every compression format in the world, getting lame and divx inside of the kernel of course)

I don't know. It might not be

July 18, 2006 - 12:36am
Anonymous (not verified)

I don't know. It might not be very usefull for average users, but it is very usefull for people like me developing on large projects from a laptop. I have around 20Gbyte of sourcecode alone lying around on the disk. This could be compressed by x5, and I have another 20Gbyte of binaries with debugging-info which could be halved using compression.

Since I am already using reiser4 because it is the only filesystem under which SVN doesn't use silly amounts of disk-space, this is a _very_ welcome addition, and I guess many other open source developers will feel the same way.

No doubt the compression is u

July 18, 2006 - 8:35pm

No doubt the compression is useful and as long as the CPU overhead isn't bad then it probably should be enabled by default. But to announce it as "Doubling our performance and halving our disk usage is going to be fun" is pure marketing speak and I think even a bit of holier-than-thou ego stroking on Hans' part.

Perhaps..

July 20, 2006 - 4:26am
Anonymous (not verified)

It very well could be market-speak only, or Reiser could be on to something. If you know right now that he is lying or misleading and can prove it, do it. Otherwise, hold off the flames until after his updates are finished and thenverify his statements. If his claims fall flat, THEN call him a loser market-droid. But not now.

Perhaps... You should learn about compression

July 20, 2006 - 8:59am
Anonymous (not verified)

It's not like ReiserFS is using some completely new form of "Data halving", it's still just compression, just merged into the FS.

Normal compression will give good compression to highly predictable input (like text and bmp files), and bad to no compression to a lot of binaries binaries like executables, mp3 files, jpeg files and encrypted files.

If your Reiser4 partition is only made up from text, then maybe, yes, you could use half the space.

If you only had mp3s on it, it is quite possible that you'd end up using MORE space (compression overhead, like headers).

So yes, it is just silly marketing speak...

Not quite...

July 20, 2006 - 5:29pm
Anonymous (not verified)

You gotta give Hans a little more credit than that. Last I heard the Reiser4 compression had a check in place that if the first few blocks of a file didn't compress sufficiently, it would not attempt to compress the rest of the file.

This is just one of the benefits of having a general purpose file system that isn't geared towards one thing. For instance a "compressed file system" that just blindly compresses everything is obviously very specialized and would probably be slow for a lot of common scenarios. A file system (like Reiser4) that allows you to enable/disable compression on a per file basis is much more useful to a greater number of people. I might not want to compress certain files are require the maximum performance as possible, but other files that just sit around and are rarely used can definitely be compressed.

Obviously the type of files you store depends on the type of compression you get, but on AVERAGE I believe users can easily save half their disk space.

My backup server is currently RAR'ing (maximum compression) the home directories of about 200 users that make up 110GB. After RAR is complete it will be less then 30GB (including 5% for recovery records). Using a compression algorithm that is more CPU friendly (which Reiser4 does) will obviously not compress quite that well, but 2:1 on average is definitely not some pipe dream.

Heck, tape drive manufactures have been labeling their tape capacity at twice what they really are for years, based on the assumption that their compression will average 2:1. The 12TB array I used to work with averaged at least 2:1 with just basic compression algorithms.

a compress loop ?

July 19, 2006 - 1:04am
Anonymous (not verified)

Why not using a loop with a compress feature ?

Performance?

July 20, 2006 - 2:58pm
Anonymous (not verified)

Because of the lower performance and greater overhead?

I believe XFS also does end p

July 19, 2006 - 4:00am
Anonymous (not verified)

I believe XFS also does end packing, which is probably the reason for the difference in disk use. Also, like reiserfs, XFS writes the inodes i n as they are needed to minimise wasted space.

i disagree

July 18, 2006 - 6:00am
Anonymous (not verified)

Obviously ymmv, but reiser3 is very much geared towards workloads where you have lots and lots of small files. Tail packing and directory indexes helps here. I have used that file system for dozens of mail servers and squid servers over the years with very good results.

This is also the kind of workload that may benefit from compression. Those queue directories contains a lot of plaintext and base64 encoded data, and with the ratio of cpu speed vs. disk speed contantly increasing, I dare say that it is an educated guess that compression can help performance.

I personally find Arjans remarks unfortunate. After all, the kernel is not the developers personal playground, some people actually use the stuff and I am very interested in hearing other peoples experiences. (Although this particular comment was a bit weird; low ping?)

I have been reading lkml sinc

July 17, 2006 - 12:58pm
Anonymous (not verified)

I have been reading lkml since 2001, and I've seen the topic of raiserfs coming up every few months. So I know the situation pretty well (and that of Suspend2, too).

There ARE technical reasons behind not merging it, I'd agree, but I also feel there are political sides too. Who is going to guarantee the reviewers are unbiased, especially if they come from other competing filesystem teams? (here I'm omitting names, but whoever familiar with the flamewars should know)

I think the biggest problem for reiserfs4 is that it has too many new ideas, and thus making people worry about breaking the existing model too much. These are the things Hans should be solving. But on the other hand, I see people trying to deny it by pushing Hans to "bring those changes to VFS instead of Reiserfs 4 since these should be core functionality instead of fs-specific". I feel this is a trick trying to make the mission almost impossible. If the ideas are too new and could break the existing model, it should be tried in one fs and then only after being proven should it be ported to the core.

Anyway, fortunately I don't have much dependency on it, and I personally care more about suspend2 than reiserfs 4.

Don't piss off the volunteers

July 17, 2006 - 6:26pm

As Ted T'so pointed out, the reviewers are all volunteers. They aren't required to be interested in the software. So, yeah, some level of politics will erupt if attitudes don't mesh in the long run.

I think the fact that Reiser4's been in -mm and continues to be in -mm means it'll eventually end up in one of two states:

-- Determined to be broken beyond repair or not worth the effort to improve
-- Or, end up in the kernel.

It may take a long time. Hans and co. are making steady progress to get Reiser4 palatable enough for inclusion. In the meantime, its continuous exposure in -mm means it gets much more vetting than if it just stayed in Namesys' patch sets. I'm not too worried about it.

You know what else?

July 17, 2006 - 5:17pm
Anonymous (not verified)

He was hyping the looming threat of WinFS and all the database like stuff it was going to do and trying to force the issue that way.

Looks like WinFS is still born.

I respect the hell out of what Hans is doing. filesystem work isn't terribly sexy, everyone has opinions on it too, but it takes more than fear to change some things and it takes more than just some benchmarks. Reiserfs3 isn't stable, I've witnessed a number of crashes with it that weren't recoverable. The politics here are that he's not trying to get some bit part filesystem in, he's selling it like it's the next great thing and doesn't have a good track record. If he keeps at it, he might come up with something revolutionary. He hasn't yet and he needs to learn how to play the game of life a little better such that if he ever does it will be recognized.

Amen!

July 17, 2006 - 5:28pm
SamBishop (not verified)

Hans strikes me as a bit of an idiot. If this recent push Arjan is responding to is some kind of PR move, it has certainly back-fired. Unfortunately for Hans, he opened up his mouth and has shown that he hasn't changed a bit. Check out his exchange with Jeff Mahoney:

http://marc.theaimsgroup.com/?l=git-commits-head&m=115294319726671&w=2

It's a long thread. But don't worry if don't understand at first (like I didn't) what Jeff is talking about. He explains himself so many times that anyone ought to be able to catch on. Which is exactly the problem: Hans never does.

So, maybe you and other reise

July 17, 2006 - 10:35pm
Anonymous (not verified)

So, maybe you and other reiser4 bashers jusst don't like Hans personally? We call it "politics".

Even so, someone has to work

July 18, 2006 - 8:45pm

Even so, someone has to work with the people who wrote the code. Who's going to maintain Reiser4 after it's in the kernel? Yes the code is available so anyone can fix bugs and stuff but it's Hans' baby and he should look after it and that means that he has to work with the rest of the kernel developers. If he can't get along with anyone outside of Namesys how well is reiser4 going to fare? Probably about as well, or worse since it's more complicated, as reiser3 did.

Andre Hedrick is no longer working on the kernel for those very same reasons. Eventually you have to decide if it's even worth it to talk to these people, hopefully Hans will realize how much of an ass he is and fix his attitude so that people will be willing to work with him to get his stuff included. But I wouldn't get my hopes up. I'd bet that reiser4 will eventually get included in main and then a year or so later will be unsupported by Namesys and left on someone else's doorstep to maintain just like reiser3.

...

July 20, 2006 - 6:44am
Anonymous (not verified)

I don't understand why people think Hans has abandoned Reiser3? He hasn't at all! He is active in fixing any BUGS that come up in it, but he doesn't want to introduce any instability in to it by adding FEATURES that would be better off in Reiser4.

The guys from SuSE have been the ones that added ACLs and Xattrs to Reiser3 even after Hans respectfully requested them NOT too. (He would rather they spent the time adding them to Reiser4, and not introduce bugs to v3) So it is on THEIR lap to fix bugs that they introduced, which Hans helps with as well.

You don't see the Kernel maintainers back porting major v2.6 kernel features to v2.4. Have they abandoned v2.4? NO. They simply don't want to destabilize it by adding features they don't absolutely need to.

Reiser4 has been designed to be around for a LONG time. Hans has no interest in abandoning it once it gets in the kernel, thats just when things start getting fun.

Yes, thank you for saying say

October 28, 2006 - 8:05pm
Anonymous (not verified)

Yes, thank you for saying saying this... I think from what I've read of what he's said that is the case. As always though in large projects with great amounts of public prestige such as the linux kernel there is a great deal of internal politics unfortuneately. And there tend to be a lot of people that don't check all of the facts surrounding something... such as SuSE writing patches for ReiserFS...

I also haven't had a ReiserFS machine crash unrecoverably ever, never ever... and I've been using it for about 4 years. I've had drives go bad, but never ReiserFS.

Small file performance

July 18, 2006 - 1:12am
Anonymous (not verified)

While I can agree partly to your sentiment, i've not seen reiserfs3 crash for 4 years, yet I have also seen both ext2 and ext3 crash unrecoverably.

One feature I think Hans Reiser really has given, is small file performance. Compare reiserfs3 to ext2, and ext2 wins in many ways, but look at small file performance, and reiserfs3 just beats the hell out of it.

ReiserV3 is as stable as it comes...

July 18, 2006 - 7:21am
Anonymous (not verified)

I'm so sick of seeing people post messages like:

Filesystem X sucks! I lost data on it, so it how can anyone consider it to be stable?

EVERY filesystem loses data. Disk drives are UNRELIABLE and they can (and DO) randomly corrupt data without the file system ever knowing it. Run "smartctl -a /dev/hda" and you'll be amazed at the amount of errors your disk reports that don't corrupt data. I see at least 2-4 ECC errors a day on my desktop machine. Yesterday there were 15 entries in the syslog like:

smartd[2536]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 51 to 52

This is all normal in the world of disk drives, modern drives transparently move data around behind the scenes when sectors go bad. By the time you actually start noticing bad sectors its usually too late. Doesn't matter what file system you use.

Just because you've lost data on one file system doesn't mean its unstable. If you run enough machines with enough different file systems you WILL lose data on them ALL.

EVERY filesystem loses data.

July 18, 2006 - 11:50am
Anonymous (not verified)

EVERY filesystem loses data.

Well, maybe when your system crashes, sure.. I've never ever experienced dataloss with ext2/3, though. Never. I've been running linux for over 10 years. I guess I've just been very lucky!

I tried reiser 3 for a week, it fucked up after a reboot.
I tried reiser 4, which fucked up as well. It should be noted that there's no fsck for it that actually works, either. Nice touch.

Disk drives are UNRELIABLE and they can (and DO) randomly corrupt data without the file system ever knowing it.
smartd[2536]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 51 to 52

Thank god those disks have ECC! (Error-correcting Code)
Sorry, harddisk don't corrupt data without letting you know. In theory this can happen when both the checksum and the data are corrupted but the bad checksum happens to match anyway, but the odds of this happening are extremely small. Your harddisk is more likely to break down completely before giving you corrupt data.

This is all normal in the world of disk drives, modern drives transparently move data around behind the scenes when sectors go bad. By the time you actually start noticing bad sectors its usually too late.

Or until you run smartctl -a and take a look at the "Reallocated Sectors" and "Pending Sectors" value. ECC != Reallocated sectors.

http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc

Note: I have nothing against reiser fs being merged into the kernel in it's current state. I'm not going to use it, though.

plain wrong?

July 18, 2006 - 12:42pm
Anonymous (not verified)

Sorry, but something is wrong here:

"Well, maybe when your system crashes, sure.. I've never ever experienced dataloss with ext2/3, though. Never. I've been running linux for over 10 years. I guess I've just been very lucky!"

yes, you are!

"I tried reiser 3 for a week, it fucked up after a reboot.
I tried reiser 4, which fucked up as well. It should be noted that there's no fsck for it that actually works, either. Nice touch."

"Note: I have nothing against reiser fs being merged into the kernel in it's current state. I'm not going to use it, though."

Heey, hold on a second? Do you really think, that we're so dumb? please spread your FUD somewhere else. :P

What FUD? You obviously don't

July 18, 2006 - 2:16pm
Anonymous (not verified)

What FUD? You obviously don't know what "FUD" means.

I shouldn't, but I must ...

July 18, 2006 - 6:33pm
Anonymous (not verified)

So, at work we're using Reiser on our near-line backup array (bunch of Promise IDE and SATA to SCSI disk trays plugged into a dedicated rsync server). Due to a bug in Reiser, we have to use LVM to break our 40TB "disk" (the trays are glued together using software RAID-6) into 20 2TB chunks otherwise the filesystem doesn't survive a reboot. And we've paid Namesys to verify that "feature". They proposed that we ditch RHEL3's stock kernel and compile a plain-vanilla release, so we did. Same issue. They came back and told us we needed to apply a series of small patches - both to the kernel and to the filesystem code. No improvement. Back and forth, upgrade LVM tools, compile a special version of the MD admin bits, nothing helped. Once we found the 2TB chunk workaround, we decided to live with it until we get the space to migrate to JFS or XFS (or mebbe ZFS if we can convince management that x86 Solaris is viable). It was nice to be able to talk to the experts (and not too pricey either), but it wasn't so nice to still have the problem at the end of the day.

A big difference is that ext2

July 18, 2006 - 8:51pm

A big difference is that ext2/3, JFS, XFS, etc maintain backup superblocks so if a sector inside the main one goes bad you can use fsck with a backup to get your data back. reiserfs doesn't do this, if you loose a block in their tree you lose everything related to that block and anything that depends on it.

And if you happen to have a filesystem image containing a reiserfs filesystem, say for VMWare, Xen, old filesytem dd backup, etc and you run reiserfsck on the filesystem containing that image it'll happily attempt to knite the image into the containing filesystem resulting in unpredictable corruption.

I have used reiser3 from the

July 18, 2006 - 7:40am
Anonymous (not verified)

I have used reiser3 from the first time it was introduced in suse(5 years?) and the only time it crash was a hardware problem.... Whith kind of problem have you had?

I had a nice instance where a

July 18, 2006 - 8:56pm

I had a nice instance where any time a certain file was accessed on a reiser3 filesystem the kernel would freak out and scribble on some random memory causing the screen to blank and the box to hang. reiserfsck said the filesystem was fine, mount said the filesytem was fine but as soon as syslog started (the offending file was a log file) the box would die. Because of the scribbling and screen blanking I had to hook up a serial cable to even see that it was reiserfs causing the oops and hang. This was within the last 2 years, I don't remember exactly when it happened but a quick rm badfile, backup/restore onto XFS and I haven't had a single problem since.

I power cycle tested XFS, JFS, Ext3 and Reiser3

July 19, 2006 - 3:58am
Anonymous (not verified)

For a settop box. Power cycled each for weeks.

Ext3 was once rendered unusable, never debugged it. It has some other things we didn't like for our application and it was never really a contender. It performed remarkably well though, better than I was expecting from the noise I've heard in different forums.

JFS and XFS were perfect, over 20,000 reboots. I was somewhat shocked and pleasantly surprised by this. The JFS project never really got a ton of support and has always been a fairly small player, it is a very good fs though, well balanced, good performance, great reliability.

ReiserFS had 2 strikes. Files being written to during the power cycle always had random crap in the last block or two when they came up (the semantics for what to do to an open file aren't clear, so this isn't exactly broken) while the others never did that. Hans and his team claimed it was a DMA thing, the DMA keeps writing as the power drops, it simply doesn't work that way though. It was a bug they couldn't or wouldn't explain so they propagated a myth which doesn't create a lot of confidence. It also had a number of failures, I don't remember the exact number but it was in the 100-200 range out of about 20,000 boots.

I'm not going to say this was scientific. It was a fairly simple set of hacks that we produced to get some ideas on what was better. Filesystem reliability is a very important issue (we're not talking about drive revectoring bad blocks like someone mentioned before, that is seemless to the filesystem usually, the fs doens't lose data.) it's deeply personal when you experience a failure and lose data.

It's not clear to me what people are considering failures. we were checksuming files, we were intentionally writing files MPEG like data and observing what happened to them. You know, if you pull the plug while you're installing some RPMs, you might crater the box, regardless of the filesystem. The failures I reported above were when the machine wouldn't boot up to runnable, regardless of if it was fixable. Second, if you include the bullshit data in the files we were writing, then reiserfs fails way more than 50% of the time. That is an incredibly dangerous behavior when you start to factor in to it other files that you may or may not be writing during the failure. You botch /boot/grub/menu.lst and you're screwed, period, it's fixable but you're screwed in the mean time. Power failures are pretty rare, UPS isn't that expensive... Still with 0.5 to 1.0% failure, most people won't see a reiserfs failure on the desktop and they likely won't recognize it if they did. I wouldn't be surprised of it garbled part of some xpm or bitmap and then GNOME or KDE refused to start up or some behavior like that, is that a filesystem failure though? The system might not boot all the way up to "usable" Again, writing files like that is pretty rare in the grand scheme of things but somehow it manages to happen to people.

You can draw your own conclusions, I can tell you that I certainly don't personally use reiserfs for any data that I feel is important. In fact my systems are all like this: /boot is ext3 the rest is LVM with either JFS or XFS and I haven't lost data due to software failure since 1996.

Different types of journaling

July 19, 2006 - 6:57am
Anonymous (not verified)

Since you didn't mention anything about the journal mode you mounted each file system in, I assume you left them as default. By default the file systems do NOT guarantee that data is intact on power failures. Just that the file system metadata is intact and recover/mount can occur on reboot.

If you want to ensure that data is not corrupt you need to mount ReiserFS in journal mode, or EXT3 in journal mode. (I don't think JFS, or XFS have this option)

http://en.wikipedia.org/wiki/Journaling_file_system

From what you describe you got lucky that other file systems didn't corrupt data, and that ReiserFS was just doing what you TOLD it to do. The other issue is if you were using an IDE drive with write caching enabled, the drive will sometimes lie to the OS and claim data is written when it is just in the write cache and not actually on the disk. A power failure at this point is then a lot more likely to corrupt data.

If you don't want data to be corrupt on power failure, disable write caching, enable data=journal when you mount the file systems and try your test again. Most people don't do this though because it will cause performance to decrease, if this is you, then your data just isn't that important to you in the first place. You may be able to get away with just data=journal and keeping your write cache enabled, it all depends though.

Most of the problems I have read from people using one file system or the other just don't understand what is REALLY going on. "I tell my file system to NOT journal data (default settings), then when I pull the power cord data is corrupt! File system X sucks!"

"I just drove my Hummer off a cliff and now it doesn't run! Hummers suck!"

DUH, it just did what you told it to do!

Bullshit. You can enable d

July 19, 2006 - 6:49pm
Anonymous (not verified)

Bullshit.

You can enable data journaling at a performance penalty.

IDE write cache has nothing to do with Reiserfs writing bullshit to the file and Ext3, JFS, and XFS not doing that during the power failure.

Journaling of metadata is all you can really hope for in a DVR settop box, you lose some data during the failure, that's how it is. Putting bullshit data in to the file, for whatever reason, isn't acceptable.

Please explain why those other filesystems never did that and had equal or better performance?

Read the link...

July 19, 2006 - 8:51pm
Anonymous (not verified)

Read the link, ( http://en.wikipedia.org/wiki/Journaling_file_system ) it explains whats going:

"For example, appending to a file on a Unix file system typically involves three steps:

1. Increasing the size of the file in its inode.
2. Allocating space for the extension in the free space map.
3. Actually writing the appended data to the newly-allocated space.

In a metadata-only journal, it would not be clear after a crash whether step 3 was done or not, because it would not be logged. If step 3 was not done but steps 1 and 2 are replayed anyway after a crash, the file will gain a tail of garbage."

It goes on to say:

"The write cache in most operating systems will traditionally order its writes with an elevator sort (or some similar scheme) to maximize throughput. To avoid an out-of-order write hazard, writes for file data must be ordered in the sort so that they are committed to storage before their associated metadata. This can be tricky to implement because it requires coordination within the operating system kernel between the file system driver and write cache."

So the fact of the matter is that in meta-data journaling mode some things are out of the hands of the file system. In ANY case (and any file system), without data journaling enabled (or something similar), your not guaranteed, your just LUCKY if data isn't corrupt.

It seems to me that what you originally described is ReiserFS working as designed in meta-data journaling mode, while you EXPECTED it to work as if it was in data journaling mode. (which it obviously wasn't)

If you want a higher level of reliabilty, pay the performance price and enable data journaling, thats what it was designed for! If not, quit bitching that one FS sucks or another doesn't if you don't understand how to use them properly.

It is all very nice. But ther

July 22, 2006 - 9:03pm
Anonymous (not verified)

It is all very nice. But there is no LUCKY involved in 20.000 reboots. Also, he mentions that he talked to the reiserfs people, you would think they would mention something trivial as enabling data journalling? In any event, you are right about highlighting the differences in journalling, but you have no argument at all for refuting the grandparent's assertion that reiserfs preforms significantly worse in this area than the other filesystems.

This is not much of a problem, but reiserfs is somehow less reliable than the other options. And he is not the only one to observe this in a test setup. Most other people here just have anecdotal evidence of their own experience. I'd say this story is much more reliable.

Also, your last lines are very rude and in no way are appropiate here. The parent was not bitching, just stating a fact. In the end, I rather believe him than you.

Wikipedia is the filesystem bible now?

July 24, 2006 - 4:58pm
Anonymous (not verified)

UNIX doesn't define semantics for data journaling and what it means. Really only meta-data. So think about how those operations can fail. You can trunc the file longer and then crash, that updates the directory structure (meta data) that operation either completes or doesn't. With non-logging filesystems you could fail in the middle of that operations and then you wait on a lengthy fsck.

YOu can allocate space or not, again this either fails or doesn't. Doing so correctly means zeroing out pages before you call them allocated, this is important for a lot of security reasons. It sounds like resiserfs might allocate pages and call them allocated before they are zeroed and then during the power failure you end up with some data in the file from somewhere else.

Writing the data is the least important. If you have a power failure, data is always corrupted at some level. If you're writing a file, you lose the data that needs to be written the instant after the failure, regardless. What you want though is for the filesystem to maintain its integrity for the the directory sturcture and anything allocated to the file to be "correct." Having pages "allocated" to the file that weren't actually or properly allocated is a bug.

I can't think of any valid reason or good reason to have random data in a file that was writen to during a failure, especially if other filesystems don't suffer from that problem.

"Writing the data is the leas

July 25, 2006 - 1:16am
Anonymous (not verified)

"Writing the data is the least important. If you have a power failure, data is always corrupted at some level. If you're writing a file, you lose the data that needs to be written the instant after the failure, regardless. What you want though is for the filesystem to maintain its integrity for the the directory sturcture and anything allocated to the file to be "correct." Having pages "allocated" to the file that weren't actually or properly allocated is a bug."

Writing the data is sure to succeed on certain other architectures such as later SGI MIPS computers. It is also sure to succeed if you have a UPS. If writing the data is not important to you, then put on write cache, or just fsck up your filesystem by doing sth stupid. The aim is to minimize or guarantee.

Journaling does not even aim to solve this though. Journaling aims for consistency of the filesystem.

"I can't think of any valid reason or good reason to have random data in a file that was writen to during a failure, especially if other filesystems don't suffer from that problem."

It is explained on the Wikipedia entry... there is also the issue of enabling write cache which adds up to that...

out of order data on writes

August 28, 2006 - 6:15am
Anonymous (not verified)

This thread seems to hit on a problem we are having with the following code snippet - this is just a simple example to illustrate the problem. The code just appends data to a file, then writes the last record to offset 0 of the file.

char *aaaa = "aaaaaaaaaaaa";
remove( argv[1] );
while ( 1 ) {
fd = open( argv[1], O_RDWR|O_CREAT, S_IREAD|S_IWRITE );
if ( fd >= 0 ) {
memset( tmp, 0, sizeof(tmp) );
sprintf( tmp, "%lld %s\n", nr, aaaa );
len = strlen(tmp) + 1;
lseek( fd, 0, SEEK_END );
write( fd, tmp, len );
lseek( fd, 0, SEEK_SET );
write( fd, tmp, len );
if ( elev ) { /* if set, always end up at beginning */
lseek( fd, 0, SEEK_END );
}
nr++;
}
}

We start the program, wait a second or two, then power off and on.
An example of the results are the following - as can be seen, a lot of data is lost.
The OS is Linux MontaVista 2.4.20_mvl31-pc_target - mount shows type ext3 (rw,noatime)filesystem.
The /var/log/kern.log shows this:
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 76k freed
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.

>>
128180 aaaaaaaaaaaa

>>
99526 aaaaaaaaaaaa

The "if (elev ) {}" was an attempt to always position heads at end of file - same results however.

Is there something wrong with the code or is there maybe a change to the OS that would prevent this?

Thanks,

Jim

Sounds like you need the "sync" mount option or fsync(2)

August 28, 2006 - 11:34am

By default, unless asked otherwise, Linux will delay and reorder writes to increase performance. If this is bad for your application, there's at least four different ways to tell Linux that your writes must complete sooner (in rough order of performance, from worst to best:

  1. Use the "sync" mount option, which forces the kernel to write to disk immediately, rather than delay writes to a more convenient time.
  2. Use the O_SYNC flag to open(2), to tell the kernel to write changes to this file immediately, and not reorder them.
  3. Use the sync() function to write everything to disk at suitable checkpoints.
  4. Use the fsync() function to write this file's data to disk at a suitable checkpoint.

Only you know what tradeoff between performance and data integrity on unexpected shutdown suits your application.

Arjan does have a point, thou

July 19, 2006 - 2:29am
Anonymous (not verified)

Arjan does have a point, though. There are technical objections to merging reiser4 at this point, in its current state, and until those are addressed, it cannot (should not) go in.

Things like "it's faster than ext3" etc. are totally irrelevant really; it could be dog slow, use antiquated technology, and have arbitrary restrictions, and it still could be merged if it's actually being used and if it plays along nicely with the rest of kernel, both in terms of functionality (e.g., no side effects etc.) and in terms of conventions (e.g., coding style etc.).

I think what Arjan is annoyed by is the fact that people don't seem to get this. Whether reiser4 is faster or not may matter to those who actually decide to *use* it later on, but it's meaningless for the question of whether it should be merged.

Given that, suggesting that the user in question actually contribute something meaningful isn't absurd - it's an appropriate reacting to an email that's not only pointless, but in fact a waste of time. Arjan probably has better things to do than to read even more advocacy that completely misses the point and just shows that the one who wrote it didn't even understand just why reiser4 is not merged yet.

All the linux filesystems have the wrong focus IMO

July 18, 2006 - 8:47am
Anonymous (not verified)

I want the focus on bulletproof reliability and acceptable performance for that bulletproofness.

The focus in zfs on checksumming all data blocks live. Checksumming and multiple copies of file metadata.

Self healing with mirror and raidz of data.

Self healing of metadata on all disks.

Atomic operations...no journalling to replay (though some for perf reasons).

There are large classes of errors that are not caught without these features...look at some of the zfs marketing stuff before you reply about raid and smart level stuff.

I don't see any interest in these features from reiserfs and ext but instead on performance optimizations etc.

As more sysadmins look at zfs more seriously I think they will increasingly be blown away with how inadequate the other filesystem designs are. The number of core problems addressed by zfs is revolutionary. More to do for sure but its dealing with real problems faced by sysadmins and not solely chasing performance.

The COW/snapshot features are just another really cool bonus but once again solve issues for working sysadmins.

I'd love to have a zfs-like (zfs fuse in few months probably) filesystem for some disks and a super performing reiser/ext/xfs for some disk that can be easily reconstructed when typical flaky behaviour surfaces. Though zfs seems like it has pretty competitive performance anyway...

The kernel filesystem summit (lwn.net has coverage) detailed some interesting ideas for future filesystems but I was dissapointed to read a limp interest in ubiquitous checksumming and reliability.

I hope zfs-fuse is an acceptable solution because if not this is one sysadmin that will be deploying opensolaris/nexenta more and more over the coming years.

1) Criticism shall be aimed t

July 18, 2006 - 10:14am
Anonymous (not verified)

1) Criticism shall be aimed towards FILESYSTEMS, not about people behind something.

2) IF Reiser4 is stable enough to merge, then it SHOULD be merged.

I have hope that both options are viable, and that they will be reached in due time. Maybe even this year.

But give them enough time. The past is past, we want the present. And future.

Regarding 1): certainly, when

July 19, 2006 - 2:34am
Anonymous (not verified)

Regarding 1): certainly, when someone reviews your code, in their spare time, free of charge, you should treat them with some respect, too, though, instead of slinging mud at them and accusing them of having an agenda whenever they criticise something.

That's a problem with Hans Reiser: he feels cornered whenever his baby is criticised, gets defensive and lashes out. Is it a surprise that at some point, noone's really willing to put up with that anymore? I don't think so.

So if reiser4 is limping along because there are not enough people willing to review it, Hans Reiser only has himself to blame. I'm sure people would be willing to let the past be the past if he showed that he realised that he acted like a schmuck and that he was willing to change, but he hasn't even done that.

So, yes, it should be the code you object to, not the people; but on the other hand, while you don't have to be a very likeable fellow to get your code merged, you should at least try not to be a total *rse, and if you are, then you have no right to complain that people see you as just that.

Reiser4 is the closest we have...

July 18, 2006 - 2:04pm
Anonymous (not verified)

You gotta give Hans credit. He is the only one I have heard of that is truely pushing the limits of file system design. What other file system has a framework for plugins?

You want file checksumming like ZFS? Write a plugin. You want compression/encryption? Use the plugin that already exists. No need to switch file systems in the process. In my opinion this is one reason why Linux supports so many different file systems, because there aren't any that are FLEXIBLE. A great example is the Journalling Flash FS (JFFS), basically its optimized for wear leveling on flash devices, a simple allocation plugin for Reiser4 would probably accomplish the same thing, and still give you all the other advantages of Reiser4. (speed, atomicity, repacker, etc...) And they said we can't have our cake and eat it too! :)

This doesn't even touch on the pseudo file functionality that Hans is working on for Reiser4 either.

In my opinion Reiser4 has the potential to offer more to the Linux community than any other kernel feature I've seen since v2.6 was released.

But only time will tell.

Why?

July 18, 2006 - 2:36pm
Carlos Rodrigues (not verified)

"What other file system has a framework for plugins?"

Plugins are a nice touch, if reiser4 were a graphical application I'd be expecting themability to come along with it...

Filesystems store data, reliability should their primary concern, with performance as a distant second. Features are just for show, who actually cares about all those new features provided by reiser4? Almost nobody except the fanboys who won't actually use them, they just like them for the bragging value.

Plugins are no substitute for good design, and if the kernel maintainers say it isn't ready, then it isn't. With all the choice available now, why all the rush to have reiser4 included? If it were that important, most people wouldn't be using ext3...

Why not?

July 18, 2006 - 3:26pm
Anonymous (not verified)

Who said anything about reliablity not being a primary concern? Of course it is, no sane person designs a file system without reliability as a primary conern. Reiser4 is no exception, it is an atomic file system after all.

I disagree with you completely, you could have some mystical file system that guaranteed you would never lose data, but if it was 10x slower than the next slowest file system, hardly anyone would use it. Both reliability and speed need to be there for a file system to gain any sort of mass appeal.

Features Reiser4 offers are features ANYONE can, and should use in a lot of cases. Compression? Come on, what desktop user wouldn't want to compress /var/log? (no, log rotation is not the same) What desktop user wouldn't want to compress their email folder, (I could save gigabytes!) or encrypt their email folder? Online compression is a killer feature alone.

Kernel maintainers aren't questioning Reiser4 design as a whole at all, they are questioning Reiser4 having code that should be in the VFS. The problem with that is two fold. One, no other file system currently _needs_ these new VFS calls, and two, agreeing on what the calls should be, what they should do and actually getting around to coding them so EVERY file system can, if they want make use of them.

Fortunately my understanding is that most of that work is almost done. So if Hans and friends are so poor at design and coding, why are the kernel maintainers insisting that THEY code new VFS calls that every file system can use? (search LKML for batch_write/batch_read to see for yourself.)

What the kernel maintainers are basically saying is: Hey, you got a good idea here, we want every file system to be able to do this too! But we won't let you in the kernel until you do most of the leg work and code these features for everyone else too.

Which obviously makes a lot of sense from the kernel maintainers point of view, and much less sense for Hans' time and payroll costs.

Why most people use EXT3 is because it and every other file system does just one thing. Stores data. No multi-purpose ones offer compression, none offer encryption, and none have plugins.

Its not a matter of IF you see Reiser4 in the kernel, but when.

Features Reiser4 offers are f

July 19, 2006 - 11:59pm
Anonymous (not verified)

Features Reiser4 offers are features ANYONE can, and should use in a lot of cases. Compression? Come on, what desktop user wouldn't want to compress /var/log? (no, log rotation is not the same) What desktop user wouldn't want to compress their email folder, (I could save gigabytes!) or encrypt their email folder? Online compression is a killer feature alone.

My /var/log is compressed using log rotation. Old logs i hardly ever touch compressed. New logs aren't compressed and are being written to. The result is good performance/space ratio. In any case, zless/less allow me to read my logs compressed or not. Your feature wouldn't save me gigabytes at all since i already use log rotation (you don't? or you don't have a desktop system? wrong compare then...). Besides, double compression would be quite stupid.

My e-mail folder is not encrypted since... i don't have to. If i'd like to encrypt my personal data i'd encrypt my whole /home using dm-crypt. What is wrong with dm-crypt?

All the above with Ext3FS! And userland applications. Never any corruption (different with ReiserFS3 and XFS). Which is the way i like it. I use Vi, you use Emacs. If you want to use your ReiserFS4 or sth then go patch a vanilla kernel for now.

Hmm... Some time ago some peo

July 19, 2006 - 9:21am
Kamil (not verified)

Hmm... Some time ago some people said that no one needs computers.

In 1993 people said that no one needs ext2 because its too unstable, too "futuristic" and no one needs it as Linux users have xiafs. Today ext3=ext2+journal, anyone remembers xiafs?

"
Filesystems store data, reliability should their primary concern, with performance as a distant second. Features are just for show, who actually cares about all those new features provided by reiser4?
"

Funny most people I know use ext3 and reiserfs in default (fast but insecure for data) meta-journal mode, even though both fs can use fully secure (but very slow) block journal. Reiser4 is first fs on this planet which uses in default fast block journal. So Reiser4 is more reliable than ext3/reiserfs. Reliability is most important for you? Move quickly to Reiser4.

No

July 19, 2006 - 12:29pm

>Reliability is most important for you? Move quickly to Reiser4.

Does Reiser4 has a reliable fsck yet?

I have been burned by ReiserFS 3 poor fsck which didn't manage to recover a partition, that plus once it wrote binary at the end of the /etc/passwd file..

Once burned, twice shy, I'm not going to use something made by namesys for a *long* time..

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.