LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 07-11-2005, 06:51 PM   #1
dcabbar
LQ Newbie
 
Registered: Jul 2005
Posts: 4

Rep: Reputation: 0
10 million files on Linux?


I am planning to undertake a project where I need to store around 10 million(+) images on (a) dedicated server(s)... At this point, I am trying to decide whether I should store them on the database or on the filesystem.

And, my question is do you think linux can handle this many files? I will probably store them on a RAID 5 system, and of course I will partition them over 4 level of directories and one directory will contain at most (100 files) or (100 directories), but still, I am curios to know if linux can handle this?

Does anyone have experience with this many files on linux filesystems? Also, can you recommend which FS would be the best (in terms of reliability and/or speed) Any suggestions/feedback?

Thanks...
 
Old 07-11-2005, 07:05 PM   #2
marghorp
Senior Member
 
Registered: Jan 2004
Location: Slovenia
Distribution: Slackware 10.1, SLAX to the MAX :)
Posts: 1,040

Rep: Reputation: 45
For the speed and security I would suggest ReiserFS. Don't know if it can handle them though. Read the docs.
 
Old 07-11-2005, 11:22 PM   #3
Noth
Member
 
Registered: Jun 2005
Distribution: Debian
Posts: 356

Rep: Reputation: 30
For data integrity I would recommend against ReiserFS, everytime I give it a try something bad happens. And I believe there's a problem with the hashing algorithms in reiser3 that can cause hash collisions and lose files.

If you want to store them on a fs, I would suggest XFS. It's got a much better track record than reiser3 and it's userland tools are much more complete.

I've currently got a XFS filesystem in a software mirror with 380,800 files on it. It's not anywhere near 10 million, but the box is an older Sun Ultra2 with 1.5G of memory and it has no performance problems at all so I would figure if you're using a more powerful machine you won't have any problems either.

But if you're going to store them in a database, you'll probably need something like Oracle for it to work well anyway.
 
Old 07-11-2005, 11:27 PM   #4
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Rep: Reputation: 46
Quote:
Originally posted by Noth
For data integrity I would recommend against ReiserFS, everytime I give it a try something bad happens.
EVERY time? Have you considered the possibility that you may have hardware issues, or perhaps it's user error?
 
Old 07-11-2005, 11:30 PM   #5
Noth
Member
 
Registered: Jun 2005
Distribution: Debian
Posts: 356

Rep: Reputation: 30
Quote:
Originally posted by KimVette
EVERY time? Have you considered the possibility that you may have hardware issues, or perhaps it's user error?
I have, except for the fact that the hardware has almost always been different. And the last time there was absolutely no way I could have caused the problem. I don't remember the incidents prior to that, but the fact that XFS has been flawless since the SGI 1.0 release and reiserfs bombs out regularly is reason enough to not use it, IMO.
 
Old 07-11-2005, 11:55 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,971
Blog Entries: 11

Rep: Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876
Hmmm ... I've been using Reiser exclusively since 1999 (when
SuSE still patched it into the 2.2 kernels themselves), on varied
hardware, never had any troubles - the broken 2.4.xx kernels
(I forgot which ones they were) I didn't install - I suppose there's
a benefit to me being reasonably conservative with upgrades.

[edit]
To the OP:

Reiser (3.5 and 3.6) can roughly hold 4.3 billion files,
with a limit of ~ 518 millions per directory for 3.5 and
4.3 for 3.6.
[/edit]

Cheers,
Tink

Last edited by Tinkster; 07-12-2005 at 12:03 AM.
 
Old 07-12-2005, 12:36 AM   #7
Noth
Member
 
Registered: Jun 2005
Distribution: Debian
Posts: 356

Rep: Reputation: 30
Quote:
Originally posted by Tinkster
Hmmm ... I've been using Reiser exclusively since 1999 (when
SuSE still patched it into the 2.2 kernels themselves), on varied
hardware, never had any troubles - the broken 2.4.xx kernels
(I forgot which ones they were) I didn't install - I suppose there's
a benefit to me being reasonably conservative with upgrades.
Well SuSe essentially maintains reiser3 now since Hans and his team don't, so I wouldn't be surprised if the version in their kernels works better.

Quote:
[edit]
To the OP:

Reiser (3.5 and 3.6) can roughly hold 4.3 billion files,
with a limit of ~ 518 millions per directory for 3.5 and
4.3 for 3.6.
[/edit]

Cheers,
Tink
I don't know the exact numbers for XFS, but IMO the XFS userland tools put it way ahead of reiserfs and even ahead of ext3. xfs_admin, xfs_db, xfsdump, xfs_freeze, etc And if he's going to use 4 levels of directories with no more than 100 entries per directory just about any filesystem should work fine.
 
Old 07-12-2005, 01:43 AM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,971
Blog Entries: 11

Rep: Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876Reputation: 876
Quote:
Originally posted by Noth
Well SuSe essentially maintains reiser3 now since Hans and his team don't, so I wouldn't be surprised if the version in their kernels works better.
I wouldn't know - I've been using Slack with a stock-kernel from
kernel.org for the last three years. ;}

And on namesys website there's no word of SuSE
running the joint. Where did you get the idea from
that "Hans and his team" gave the product up?


Cheers,
Tink
 
Old 07-12-2005, 02:46 AM   #9
slackie1000
Senior Member
 
Registered: Dec 2003
Location: Brasil
Distribution: Arch
Posts: 1,037

Rep: Reputation: 45
hi there,
the information about reiser3 is incomplete.
you can take a better look here.
they still fix bugs.
regards
slackie1000
 
Old 07-12-2005, 09:41 AM   #10
KimVette
Senior Member
 
Registered: Dec 2004
Location: Lee, NH
Distribution: OpenSUSE, CentOS, RHEL
Posts: 1,794

Rep: Reputation: 46
Quote:
Originally posted by Noth
I have, except for the fact that the hardware has almost always been different. And the last time there was absolutely no way I could have caused the problem. I don't remember the incidents prior to that, but the fact that XFS has been flawless since the SGI 1.0 release and reiserfs bombs out regularly is reason enough to not use it, IMO.
(thanks for not reading a flame into my post, I did not mean ill intent. I realized after I'd left the office that my brief comment could be misinterpreted as a flame)

That's odd. I've had nothing but great success with Reiser. In fact my previous motherboard (Abit VP6) caught fire (that was my last Abit board ever - I used to be a big Abit fan until last year after experiencing their now-piss-poor support and HORRIBLE RMA quality control) and my hard drives got scrambled, including customer quotes I was working on, so I downloaded a Suse live CD and used reiserfsck and was able to recover 100% of my data after reiserfsck replayed the journal and rebuilt the tree. I've experienced data loss on ext2 due to far less serious circumstances.

I haven't used XFS on anything other than IRIX (on SGI boxen of course) but based on what I've read about XFS on Linux, it seems (based on some /. postings) that XFS is an incomplete port and not fully implemented by SGI (particularly the journaling features) so I'm hesitant to trust XFS on Linux. Of course that's based on slashdotters who claim to be ex-SGI engineers who may have a chip on their shoulders due to SGI's apparant corporate suicide in recent years.

However if you say it's solid on Intel I'll definitely have to give it a try on our next Linux box since XFS is known for fragmenting even less than ReiserFS. One question though: is XFS on Linux a zero-slack filesystem? I have a LOT of small source code files that would not take up an entire block so if I were to use a conventional filesystem on a CVS or even development box I'd be wasting a lot of space in slack - that was the main reason I initially tried ReiserFS, with the second reason being that it was designed with journaling in mind from the beginning, and not an afterthought hack like ext3, and of course the (for all practical purposes) unlimited file size to allow for large video NLE projects is a nice plus as well.
 
Old 07-12-2005, 09:58 AM   #11
Noth
Member
 
Registered: Jun 2005
Distribution: Debian
Posts: 356

Rep: Reputation: 30
Quote:
I wouldn't know - I've been using Slack with a stock-kernel from kernel.org for the last three years. ;}
My condolences =)

Quote:
And on namesys website there's no word of SuSE
running the joint. Where did you get the idea from
that "Hans and his team" gave the product up?
They gave up when they started working on reiser4, if you look at the commit logs (the URL was posted to lkml a few days ago) for the reiserfs stuff in the 2.6.x kernel almost all of the patches have been from SuSe with a few generic maintenance patches from other kernel devs. Hell, it was a SuSe developer who added journaling to reiser3, when Namesys submitted it to the kernel it didn't do journaling at all.

Quote:
That's odd. I've had nothing but great success with Reiser. In fact my previous motherboard (Abit VP6) caught fire (that was my last Abit board ever - I used to be a big Abit fan until last year after experiencing their now-piss-poor support and HORRIBLE RMA quality control) and my hard drives got scrambled, including customer quotes I was working on, so I downloaded a Suse live CD and used reiserfsck and was able to recover 100% of my data after reiserfsck replayed the journal and rebuilt the tree. I've experienced data loss on ext2 due to far less serious circumstances.
Well I've had a journal replay via mount and reiserfsck both tell me a filesystem was fine even though reading a certain file on that filesystem would cause the screen to go black and the box to panic and that was only a few months ago.

Quote:
I haven't used XFS on anything other than IRIX (on SGI boxen of course) but based on what I've read about XFS on Linux, it seems (based on some /. postings) that XFS is an incomplete port and not fully implemented by SGI (particularly the journaling features) so I'm hesitant to trust XFS on Linux. Of course that's based on slashdotters who claim to be ex-SGI engineers who may have a chip on their shoulders due to SGI's apparant corporate suicide in recent years.
There's a few things missing, but mostly things caused by limitations in Linux, like the filesystem blocksize has to be page size or smaller. I don't know about the journaling features of XFS on Irix, so I can't comment on any of those features not being in the Linux port.

Quote:
However if you say it's solid on Intel I'll definitely have to give it a try on our next Linux box since XFS is known for fragmenting even less than ReiserFS.
I have XFS running on Intel, Alpha and Sparc64 right now without any issues. And XFS has a defragmentation tool, xfs_fsr, if you really need that sort of thing.

Quote:
One question though: is XFS on Linux a zero-slack filesystem? I have a LOT of small source code files that would not take up an entire block so if I were to use a conventional filesystem on a CVS or even development box I'd be wasting a lot of space in slack - that was the main reason I initially tried ReiserFS, with the second reason being that it was designed with journaling in mind from the beginning, and not an afterthought hack like ext3, and of course the (for all practical purposes) unlimited file size to allow for large video NLE projects is a nice plus as well.
That's a good question and I don't know the answer. But you could always run xfs_estimate on your CVS root and see how it compares to the current amount of space being used.

And your second reason seems to be invalid, Hans just told me recently that reiser3 didn't ship with journaling support at all and it was added later by a SuSe developer. I had always thought it was there from the beginning as well, but apparently that's not true.
 
Old 07-12-2005, 11:38 AM   #12
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,327

Rep: Reputation: 1099Reputation: 1099Reputation: 1099Reputation: 1099Reputation: 1099Reputation: 1099Reputation: 1099Reputation: 1099
I would recommend that you index the images on the database, but store them in the filesystem.

As a general rule, databases aren't too good at managing "binary large objects." The acronym "BLOB" is apropos in more ways than one...

Plus, if you store the images as files, and you're getting to them through a browser, the web-server can deliver the files (and the browser can request them) without pushing that traffic through the SQL server.

Give some thought to your file-naming convention. You really don't want thousands of files in one directory. For example, Microsoft stores its technical-documents lists in a directory-structure suggested by the filename. A document named "TQ12345678" might be stored as "TQ/1/234/56/TQ12345678.doc". Thus there are no more than 999 files in any one directory, and you can "zoom" directly to the directory that you need. It's a pretty sensible arrangement.

The "symbolic links" capability of Linux allows you to distribute that directory structure among many different drives.

The choice of filesystems is really up to you, but it must support journaling. (For example "ext3" instead of "ext2.") If the system goes down and there's a need to check and/or recover the drive, the journal will be your savior.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LQ ISO Linux Download Site Reaches One Million Downloads jeremy Linux - News 20 05-07-2007 11:33 AM
It's probably been asked a million times... ckiraly Linux - General 3 10-15-2004 10:47 PM
A Million Questions jeffChuck Slackware - Installation 10 07-19-2004 07:23 PM
I know this gets asked a million and one times but... Jestrik Linux - General 7 01-12-2004 06:47 AM
Here to ask the $5 million Question!! crumb Linux - Distributions 9 07-17-2003 05:05 PM


All times are GMT -5. The time now is 12:03 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration