Solaris / OpenSolarisThis forum is for the discussion of Solaris and OpenSolaris.
General Sun, SunOS and Sparc related questions also go here.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
regardong zpool and zfs, this is my understanding:
zpool = a group of (usually) drives together in a pool, eg 4x200GB drives = 1x800GB pool
zfs = the filesystem associated with a pool. It is here that you create a file system based on the redundency you need. ie, you could create 2x400GB fs in some RAID-Z config which would return 2x300GB usable redundant file system base, yes?
A pool is indeed a group of disk space (whole disks, partitions, plain files) but is also where you define whether mirroring or raidz be implemented or not and separate intent log and spare devices.
A pool contains an unlimited number of zfs filesystems including an implicit one, and possibly snapshots, clones and volumes. All of these beasts share the same zpool space. This is one of the very cool things about zfs. You no more need to split your disks into smaller zones (primary partitions, slices, extendend partitions) to lay out filesystems. They just all share the same pool space so there is no risk a filesystem reach a disk full situation while plenty of space is available elsewhere.
A pool is indeed a group of disk space (whole disks, partitions, plain files) but is also where you define whether mirroring or raidz be implemented or not and separate intent log and spare devices.
A pool contains an unlimited number of zfs filesystems including an implicit one, and possibly snapshots, clones and volumes...
Actually your first 2 sentences have explained things very well there, thanks!
Three questions though (you knew there'd be questions):
1. What is the intent log though?
2. On the implicit side of things, what is meant by an 'implicit one'? You can have many but exactly ONE implicit one? What would be the difference between one implicit one and a non-implicit one?
3. Disregarding snapshots and clones, what are volumes? Are they the file system locations used by the end user? I was thinking that the zfs was the end user file system. think this is part of my confusion. And what would be the difference between creating zfs file systems and volumes?
Am I on the right track here? delete or correct whatever is wrong please
Code:
-+ (disk1 + disk2 + disk3 + disk4)
|
+-+ myPool (zpool) (should this be mountable and can it be used by the end user?)
|
+-+ fs1 (zfs) (mountable and end user can use this space)
| |
| +-+ myVol1 (volume) (mountable and usable space)
| |
| +-+ myVol2 (volume)
|
+-+ fs2 (zfs) (mountable and end user can use this space)
|
+-+ myVol3 (volume) (is this allowed? advantages/disadvantages?)
Actually your first 2 sentences have explained things very well there, thanks!
Three questions though (you knew there'd be questions):
1. What is the intent log though?
When synchronous writes are required by an application, performance can be greatly improved by using journaling (aka logging) techniques. Separating the intent log provides even better performance. You shouldn't care that much about it though for a simple file server.
Quote:
2. On the implicit side of things, what is meant by an 'implicit one'?
I mean this is a zfs filesystem you do not need to create with the "zfs create" command. As soon as you create a pool, there is a zfs filesystem available with the same name as the pool.
Quote:
You can have many but exactly ONE implicit one?
Yes.
Quote:
What would be the difference between one implicit one and a non-implicit one?
None outside the fact you didn't create the former explicitely.
Quote:
3. Disregarding snapshots and clones, what are volumes?
Fixed sized raw data storage areas.You probably won't need them either in your use case. They are used by specific applications that prefer their own data management instead of a filesystem or if you want to use another kind of filesystem (ufs, fat32, whatever) while still taking advantage of the zpool redundancy and flexibility.
Quote:
Are they the file system locations used by the end user?
Not at all.
Quote:
I was thinking that the zfs was the end user file system.
This is the case.
Quote:
Am I on the right track here? delete or correct whatever is wrong please
Code:
-+ (disk1 + disk2 + disk3 + disk4)
|
+-+ myPool (zpool) (is also a usable and mounted zfs)
|
+-+ fs1 (zfs) (mountable and end user can use this space)
|
+-+ myVol1 (volume) (unmountable, raw/block space)
| |
| + fs2 (ext2fs for example) (mountable, usable)
|
+-+ fs3 (zfs) (mountable and end user can use this space)
|
+ fs4 (zfs) //
Wow, I actually wanted to get back and edit my map above before you saw it, I have realised some of the errors of my ways. But still didn't have my head around volumes. But I do now. That is great thanks for that.
I ran a test, very similar to this and I have to admit, I am very impressed. It was only a file based test, not using real devices, but I was able to corrupt the data on one drive (using dd) and then replaced one virtual drive with another, and it rebuilt (or resilvered, as they call it).
I used:
Code:
zpool create test raidz drive1 drive2 drive3
it works out that the overhead of the redundency is a little larger than I expected, certainly larger than raid 5. Anyway, 3x110MB drives only made ~178MB available. With raid5 that would be closer to 200MB. Not a huge problem. and the status of the pool is thus:
Code:
# zpool status test
pool: test
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/jack/disk1 ONLINE 0 0 0
/jack/disk2 ONLINE 0 0 0
/jack/disk3 ONLINE 0 0 0
Now if I add another drive to the pool, I can only add it as either a new pool or fs, I can't explicitly add it to the raidz1 fs that isin place. ie, after adding it the pool...
Code:
# zpool add -f test /jack/disk4 // -f since this is a file and not a real device
# zpool status test
pool: test
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/jack/disk1 ONLINE 0 0 0
/jack/disk2 ONLINE 0 0 0
/jack/disk3 ONLINE 0 0 0
/jack/disk4 ONLINE 0 0 0
What is the redundancy on this new drive in relation to the rest of the pool? There WAS redundancy, I had confirmed that by
Code:
# dd if=/dev/random of=/jack/disk4 bs=512 count=5
5+0 records in
5+0 records out
2560 bytes (2.6kB) copied, 0.00636275 s, 402 kB/s
# zpool scrub test
# zpool status
pool: test
state: FAULTED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed with 201 errors on Sun Mar 16 16:36:26:2008
config:
NAME STATE READ WRITE CKSUM
test FAULTED 0 0 0 insufficient replicas
raidz1 ONLINE 0 0 0
/jack/disk1 ONLINE 0 0 0
/jack/disk2 ONLINE 0 0 0
/jack/disk3 ONLINE 0 0 0
/jack/disk4 UNAVAIL 0 0 0 corrupted data
replacing disk4 with a new disk5 (file) and the pool rebuilt just fine. Although now the zpool status returns:
Code:
# zpool status
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed with 144 errors on Sun Mar 16 17:24:13:2008
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 576
raidz1 ONLINE 0 0 0
/jack/disk1 ONLINE 0 0 0
/jack/disk2 ONLINE 0 0 0
/jack/disk3 ONLINE 0 0 0
/jack/disk5 ONLINE 0 0 576
I ran an md5 test over the data I had saved there. Now initially this checksumed ok (which is why I thought it was all good), but after moving data some more, I found that it hadn't. Eventually the checksum failed. If I run these same tests with just drives being a part of the raidz, I have more luck (ie, the md5 has returned correct everytime - so far).
In short, have I missed something that will allow me to extend the raidz specifically? I can't seem to locate it. Extending the pool is not sufficient if I have to extend it by complete raidz everytime. This could be a show stopper. Please let me know that I am expanding it incorrectly.
Ok, install of OpenSolaris has gone ahead without any dramas. I selected dhcp so my network card was configured to work out of the box, very good. But now I do want to assign a static IP to it. I just pulled it down and put it back up again with the required IP and that has worked, but how do I get this to persist?
ie, I want to find where it is initially assigned an address by DHCP and replace it with the new static one.
The only flaw I have picked up so far with OpenSolaris is that it hasn't picked up my WD 1TB drive. I have tried devfsadm -v but that has made no change.
Code:
dave@solaris:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c7d0 <DEFAULT cyl 14590 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
1. c8d0 <drive type unknown>
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0
Specify disk (enter its number):
I'm not sure of c8d0 is my CDROM/DVD Burner (wich is IDE) or of it is the WD drive.
A section of my dmesg, not sure if this is of any help
Code:
Mar 17 09:46:01 solaris genunix: [ID 773945 kern.info] UltraDMA mode 5 selected
Mar 17 09:46:01 solaris genunix: [ID 773945 kern.info] UltraDMA mode 2 selected
Mar 17 09:46:01 solaris last message repeated 2 times
Mar 17 09:46:01 solaris genunix: [ID 640982 kern.info] IDE device at targ 0, lun 0 lastlun 0x0
Mar 17 09:46:01 solaris genunix: [ID 846691 kern.info] model WDC WD10EACS-00ZJB0
Mar 17 09:46:01 solaris genunix: [ID 479077 kern.info] ATA/ATAPI-8 supported, majver 0x1fe minver 0x0
Mar 17 09:46:01 solaris genunix: [ID 228648 kern.info] ata_set_feature: (0x66,0x0) failed
Mar 17 09:46:01 solaris rge: [ID 801725 kern.info] NOTICE: rge0: Using FIXED interrupt type
Mar 17 09:46:01 solaris unix: [ID 954099 kern.info] NOTICE: IRQ17 is being shared by drivers with different interrupt levels.
Mar 17 09:46:01 solaris This may result in reduced system performance.
Mar 17 09:46:01 solaris mac: [ID 469746 kern.info] NOTICE: rge0 registered
Mar 17 09:46:01 solaris scsi: [ID 193665 kern.info] sd0 at ata0: target 1 lun 0
Mar 17 09:46:01 solaris genunix: [ID 936769 kern.info] sd0 is /pci@0,0/pci-ide@1f,1/ide@0/sd@1,0
Ok, install of OpenSolaris has gone ahead without any dramas. I selected dhcp so my network card was configured to work out of the box, very good. But now I do want to assign a static IP to it. I just pulled it down and put it back up again with the required IP and that has worked, but how do I get this to persist?
edit: I found that I can add it to the zpool without a problem, it works. I just couldn't see/use it with fdisk or format. So the hard drive side is ok now.
But jlliagre, could you have another quick look at post #19 above, specifically:
Quote:
In short, have I missed something that will allow me to extend the raidz specifically? I can't seem to locate it. Extending the pool is not sufficient if I have to extend it by complete raidz everytime. This could be a show stopper. Please let me know that I am expanding it incorrectly.
Last edited by madivad; 03-16-2008 at 08:07 PM.
Reason: hdd problem fixed
I can not locate anything.rge0 on the system:[
...
*note I already have the IP address I want because I manually assigned it
No problem is dhcp.rge0 doesn't exists. That depends on your Solaris release and how it was set.
I missed to specify the directory where these files are located though. You need to create hostname.rge0 in /etc for the IP to persist between reboots.
Quote:
I found that I can add it to the zpool without a problem, it works. I just couldn't see/use it with fdisk or format. So the hard drive side is ok now.
Yes. I think disks larger than 1 TB cannot be handled by traditional fdisk partitions and vtocs. EFI label is required and this is what ZFS uses.
Quote:
But jlliagre, could you have another quick look at post #19 above, specifically:
You added a single disk so there can't be redundancy. You need to add at least three of them to expand the top level pool while keeping raidz functionality.
You added a single disk so there can't be redundancy. You need to add at least three of them to expand the top level pool while keeping raidz functionality.
I just want to confirm this one because this was one of the biggest issues I face with any raided system, ie expandability. I was under the impression you could expand current raidz. Not just add to the pool. ie, I want my raid to be as efficient as possible which means adding more and more storage as it becomes cheaper and spread the costs over time. The problem with any raid5 (and apparently raidz) is that you have to create the whole raid in one hit you can't simply just add/extend it by adding another drive. So, I guess I am wrong, but just confirming:
You can expand a pool (obviously), but you CAN'T expand an actual raidz zfs at the top level?
(I'm hoping I am wrong, so if you can, can someone tell me how? <G>)
You certainly can expand a pool using raidz and the filesystems using that pool will have their available size increased.
Code:
#!/bin/ksh
# cleanup
zpool destroy raid 2>/dev/null
rm -f d1 d2 d3 d4 d5 d6 d7 d8
print "Creating 8x64 MB backend files"
mkfile 64m d1 d2 d3 d4 d5 d6 d7 d8
print "Creating a raidz pool using 4 devices"
zpool create raid raidz $PWD/d1 $PWD/d2 $PWD/d3 $PWD/d4
zpool list | egrep "(SIZE|raid)"
zfs list | egrep "(raid|AVAIL)"
print "Creating a filesystem on this pool"
zfs create raid/fs1
zfs list | egrep "(raid|AVAIL)"
print "Expanding the pool with raidz using 4 remaining devices"
zpool add raid raidz $PWD/d5 $PWD/d6 $PWD/d7 $PWD/d8
zpool list | egrep "(SIZE|raid)"
zfs list | egrep "(raid|AVAIL)"
Code:
Creating 8x64 MB backend files
Creating a raidz pool using 4 devices
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
raid 238M 143K 238M 0% ONLINE -
NAME USED AVAIL REFER MOUNTPOINT
raid 100K 146M 1,50K /raid
Creating a filesystem on this pool
NAME USED AVAIL REFER MOUNTPOINT
raid 162K 146M 26,9K /raid
raid/fs1 26,9K 146M 26,9K /raid/fs1
Expanding the pool with raidz using 4 remaining devices
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
raid 476M 236K 476M 0% ONLINE -
NAME USED AVAIL REFER MOUNTPOINT
raid 170K 324M 28,4K /raid
raid/fs1 26,9K 324M 26,9K /raid/fs1
So you have to add at least 3 drives at a time and preferably in the quantity of drives as found in the initial raid. If it's not the same number, you can use the -f option in the add.
When you look at the pool status though, it has created 2xraid1. So has it actually added to the raid? or just the pool? I notice that overhead is reduced with adding the new drives, but it is not reduced by as much as it would be if you add ALL the drives in one raidz:
Code:
dave@solaris:/raidtest# mkfile 100m d1 d2 d3 d4 d5 d6 d7 d8
dave@solaris:/raidtest# zpool create test raidz $PWD/d1 $PWD/d2 $PWD/d3 $PWD/d4
dave@solaris:/raidtest# zfs list test
NAME USED AVAIL REFER MOUNTPOINT
test 128K 254M 26.9K /test
dave@solaris:/raidtest# zpool add -f test raidz $PWD/d5 $PWD/d6 $PWD/d7 $PWD/d8
dave@solaris:/raidtest# zfs list test
NAME USED AVAIL REFER MOUNTPOINT
test 132K 539M 26.9K /test
dave@solaris:/raidtest# zpool destroy test
dave@solaris:/raidtest# zpool create test raidz $PWD/d1 $PWD/d2 $PWD/d3 $PWD/d4 $PWD/d5 $PWD/d6 $PWD/d7 $PWD/d8
dave@solaris:/raidtest# zfs list test
NAME USED AVAIL REFER MOUNTPOINT
test 111K 632M 1.74K /test
dave@solaris:/raidtest#
539M > 2x254M, so the overhead is reduced, but if all the disks are created in one raidz, the avail space is 632M...
The above is not really question, just cause for discussion.
But this is a question I don't quite get. What is the purpose of creating a zfs? When you create a zpool, it comes pre-mounted and usable. What is the actual point of creating a zfs under it? I know you can create multiple zfss under it, but what is the point? the mountpoint is (usually) only an extension of the original mount point anyway. ie, pool = /test and zfs = /test/fs1
I don't understand the benefits. (yet)
edit: Expanding the question: Basically, I can't see what the difference between creating a folder within a zpool and creating a zfs. With symbolic links, a folder in a pool can appear anywhere.
Last edited by madivad; 03-18-2008 at 11:03 AM.
Reason: redefining the question
So has it actually added to the raid? or just the pool?
It is added to the pool. You cannot add devices to a raidz once it is created. This is by design.
Quote:
I notice that overhead is reduced with adding the new drives, but it is not reduced by as much as it would be if you add ALL the drives in one raidz
Indeed. More drives give a better utilization.
Quote:
But this is a question I don't quite get. What is the purpose of creating a zfs? When you create a zpool, it comes pre-mounted and usable.
As I wrote, when you create a pool, a zfs is also implicitly created. It isn't the pool which is mountable, it is the associated zfs.
Quote:
What is the actual point of creating a zfs under it?
To isolate it from the pool.
Quote:
I know you can create multiple zfss under it, but what is the point?
Having multiple file systems is extremely useful and flexible.
You can have custom mount point for them.
You can turn compression on, add inner redundancy, create snapshots, clones, send and receive file systems, set quotas.
You can clone zones installed on their own zfs very quickly and efficiently.
You can share zfs (NFS).
Quote:
the mountpoint is (usually) only an extension of the original mount point anyway. ie, pool = /test and zfs = /test/fs1
Yes but it is an independent filesystem.
A symbolic link isn't equivalent to a mount point. You can't install packaged software on a symlinked directory for instance.
Having multiple file systems is extremely useful and flexible.
You can have custom mount point for them.
You can turn compression on, add inner redundancy, create snapshots, clones, send and receive file systems, set quotas.
You can clone zones installed on their own zfs very quickly and efficiently.
As usual, you have explained it quite well, I hadn't considered the options one has with actual file systems themselves.
Quote:
You can share zfs (NFS).
What is with that? How does sharing with zfs work? It's not a replacement for samba at all, is it? What/who is a zfs share... sharing the data with?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.