Since FreeNAS® 9.1.0 is based on FreeBSD 9.1, it supports the same hardware found in the amd64 and i386 sections of the FreeBSD 9.1 Hardware Compatibility List.
Actual hardware requirements will vary depending upon what you are using your FreeNAS® system for. This section provides some guidelines to get you started. You can also skim through the FreeNAS® Hardware Forum for performance tips from other FreeNAS® users or to post questions regarding the hardware best suited to meet your requirements.
While FreeNAS® is available for both 32-bit and 64-bit architectures, 64-bit hardware is recommended for speed and performance. A 32-bit system can only address up to 4 GB of RAM, making it poorly suited to the RAM requirements of ZFS. If you only have access to a 32-bit system, consider using UFS instead of ZFS.
The best way to get the most out of your FreeNAS® system is to install as much RAM as possible. If your RAM is limited, consider using UFS until you can afford better hardware. FreeNAS® with ZFS typically requires a minimum of 8 GB of RAM in order to provide good performance and stability. The more RAM, the better the performance, and the FreeNAS® Forums provide anecdotal evidence from users on how much performance is gained by adding more RAM. For systems with large disk capacity (greater than 8 TB), a general rule of thumb is 1 GB of RAM for every 1 TB of storage. This post describes how RAM is used by ZFS.
It is possible to use ZFS on systems with less than 8 GB of RAM. However, FreeNAS® as distributed is configured to be suitable for systems meeting the sizing recommendations above. If you wish to use ZFS on a smaller memory system, some tuning will be necessary, and performance will be (likely substantially) reduced. ZFS will automatically disable pre-fetching (caching) on systems where it is not able to use at least 4 GB of memory just for ZFS cache and data structures. This post describes many of the relevant tunables.
If you plan to use ZFS deduplication, a general rule of thumb is 5 GB RAM per TB of storage to be deduplicated.
If you use Active Directory with FreeNAS®, add an additional 2 GB of RAM for winbind's internal cache.
If you are installing FreeNAS® on a headless system, disable the shared memory settings for the video card in the BIOS.
Compact or USB Flash
The FreeNAS® operating system is a running image. This means that it should not be installed onto a hard drive, but rather to a USB or compact flash device that is at least 4 GB in size. If you don't have compact flash, you can instead use a USB thumb drive that is dedicated to the running image and which stays inserted in the USB slot. While technically you can install FreeNAS® onto a hard drive, this is discouraged as you will lose the storage capacity of the drive. In other words, the operating system will take over the drive and will not allow you to store data on it, regardless of the size of the drive.
The FreeNAS® installation will partition the operating system drive into two partitions. One partition holds the current operating system and the other partition is used when you upgrade. This allows you to safely upgrade to a new image or to revert to an older image should you encounter problems.
Storage Disks and Controllers
The Disk section of the FreeBSD Hardware List lists the supported disk controllers. In addition, support for 3ware 6gbps RAID controllers has been added along with the CLI utility tw_cli for managing 3ware RAID controllers.
FreeNAS® supports hot pluggable drives. Make sure that AHCI is enabled in the BIOS. Note that hot plugging is not the same as hot swapping.
If you need reliable disk alerting, immediate reporting of a failed drive, and or swapping, use a fully manageable hardware RAID controller such as a LSI MegaRAID controller or a 3Ware twa-compatible controller. Until FreeBSD commits zfsd, its implementation of ZFS will not notice that a drive is gone until you reboot or put the volume on high load.
If you have some money to spend and wish to optimize your disk subsystem, consider your read/write needs, your budget, and your RAID requirements.
For example, moving the ZIL (ZFS Intent Log) to a dedicated SSD only helps performance if you have synchronous writes, like a database server. SSD cache devices only help if your working set is larger than system RAM, but small enough that a significant percentage of it will fit on the SSD.
If you have steady, non-contiguous writes, use disks with low seek times. Examples are 10K or 15K SAS drives which cost about $1/GB. An example configuration would be six 600 GB 15K SAS drives in a RAID 10 which would yield 1.8 TB of usable space or eight 600 GB 15K SAS drives in a RAID 10 which would yield 2.4 TB of usable space.
7200 RPM SATA disks are designed for single-user sequential I/O and are not a good choice for multi-user writes.
If you have the budget and high performance is a key requirement, consider a Fusion-I/O card which is optimized for massive random access. These cards are expensive and are suited for high end systems that demand performance. A Fusion-I/O can be formatted with a filesystem and used as direct storage; when used this way, it does not have the write issues typically associated with a flash device. A Fusion-I/O can also be used as a cache device when your ZFS dataset size is bigger than your RAM. Due to the increased throughput, systems running these cards typically use multiple 10 GigE network interfaces.
If you will be using ZFS, Disk Space Requirements for ZFS Storage Pools recommends a minimum of 16 GB of disk space. Due to the way that ZFS creates swap, you can not format less than 3 GB of space with ZFS . However, on a drive that is below the minimum recommended size you lose a fair amount of storage space to swap: for example, on a 4 GB drive, 2 GB will be reserved for swap.
If you are new to ZFS and are purchasing hardware, read through ZFS Storage Pools Recommendations first.
ZFS uses dynamic block sizing, meaning that it is capable of striping different sized disks. However, if you care about performance, use disks of the same size. Further, when creating a RAIDZ, only the size of the smallest disk will be used on each disk.
The FreeBSD Ethernet section of the Hardware Notes indicates which interfaces are supported by each driver. While many interfaces are supported, FreeNAS® users have seen the best performance from Intel and Chelsio interfaces, so consider these brands if you are purchasing a new interface. Realteks will perform poorly under CPU load as interfaces with these chipsets do not provide their own processors.
At a minimum you will want to use a GigE interface. While GigE interfaces and switches are affordable for home use, it should be noted that modern disks can easily saturate 110 MB/s. If you require a higher network throughput, you can "bond" multiple GigE cards together using the LACP type of Link Aggregation. However, any switches will need to support LACP which means you will need a more expensive managed switch rather than a home user grade switch.
If network performance is a requirement and you have some money to spend, use 10 GigE interfaces and a managed switch. If you are purchasing a managed switch, consider one that supports LACP and jumbo frames as both can be used to increase network throughput.
NOTE: at this time the following are not supported: InfiniBand, FibreChannel over Ethernet, or wireless interfaces.
If network speed is a requirement, consider both your hardware and the type of shares that you create. On the same hardware, CIFS will be slower than FTP or NFS as Samba is single-threaded. If you will be using CIFS, use a fast CPU.
Data redundancy and speed are important considerations for any network attached storage system. Most NAS systems use multiple disks to store data, meaning you should decide which type of RAID to use before installing FreeNAS®. This section provides an overview of RAID types to assist you in deciding which type best suits your requirements.
RAID 0 (stripe): provides optimal performance and allows you to add disks as needed. Provides zero redundancy, meaning if one disk fails, all of the data on all of the disks is lost. The more disks in the RAID 0, the more likely the chance of a failure.
RAID 1 (mirror): provides redundancy as data is copied (mirrored) to two or more drives. Provides good read performance but may have slower write performance, depending upon how the mirrors are setup and the number of ZILs and L2ARCs.
RAID 5: requires a minimum of three disks and can tolerate the loss of one disk without losing data. Disk reads are fast but write speed can be reduced by as much as 50%. If a disk fails, it is marked as degraded but the system will continue to operate until the drive is replaced and the RAID is rebuilt. However, should another disk fail before the RAID is rebuilt, all data will be lost. If your FreeNAS® system will be used for steady writes, RAID 5 is a poor choice due to the slow write speed.
RAID 6: requires a minimum of four disks and can tolerate the loss of two disks without losing data. Benefits from having many disks as performance, fault tolerance, and cost efficiency are all improved relatively with more disks. The larger the failed drive, the longer it takes to rebuild the array. Reads are very fast but writes are slower than a RAID 5.
RAID 10: requires a minimum of four disks and number of disks is always even as this type of RAID mirrors striped sets. This type of RAID can survive the failure of any one drive. If you lose a second drive from the same mirrored set, you will lose the array. However, if you lose a second drive from a different mirrored set, the array will continue to operate in a degraded state. RAID 10 significantly outperforms RAIDZ2, especially on writes.
RAID 60: requires a minimum of eight disks. Combines RAID 0 striping with the distributed double parity of RAID 6 by striping two 4-disk RAID 6 arrays. RAID 60 rebuild times are half that of RAID 6.
RAIDZ1: ZFS software solution that is equivalent to RAID5. Its advantage over RAID 5 is that it avoids the write-hole and does not require any special hardware, meaning it can be used on commodity disks. If your FreeNAS® system will be used for steady writes, RAIDZ is a poor choice due to the slow write speed.
RAIDZ2: double-parity ZFS software solution that is similar to RAID-6. It also avoids the write-hole and does not require any special hardware, meaning it can be used on commodity disks. RAIDZ2 allows you to lose one drive without any degradation as it basically becomes a RAIDZ1 until you replace the failed drive and restripe. At this time, RAIDZ2 on FreeBSD is slower than RAIDZ1.
RAIDZ3: triple-parity ZFS software solution. RAIDZ3 offers three parity drives and can operate in degraded mode if up to three drives fail with no restrictions on which drives can fail.
NOTE: instead of mixing ZFS RAID with hardware RAID, it is recommended that you place your hardware RAID controller in JBOD mode and let ZFS handle the RAID. According to Wikipedia: "ZFS can not fully protect the user's data when using a hardware RAID controller, as it is not able to perform the automatic self-healing unless it controls the redundancy of the disks and data. ZFS prefers direct, exclusive access to the disks, with nothing in between that interferes. If the user insists on using hardware-level RAID, the controller should be configured as JBOD mode (i.e. turn off RAID-functionality) for ZFS to be able to guarantee data integrity. Note that hardware RAID configured as JBOD may still detach disks that do not respond in time; and as such may require TLER/CCTL/ERC-enabled disks to prevent drive dropouts. These limitations do not apply when using a non-RAID controller, which is the preferred method of supplying disks to ZFS."
When determining the type of RAIDZ to use, consider whether your goal is to maximum disk space or maximum performance:
- RAIDZ1 maximizes disk space and generally performs well when data is written and read in large chunks (128K or more).
- RAIDZ2 offers better data availability and significantly better mean time to data loss (MTTDL) than RAIDZ1.
- A mirror consumes more disk space but generally performs better with small random reads.
For better performance, a mirror is strongly favored over any RAIDZ, particularly for large, uncacheable, random read loads.
When determining how many disks to use in a RAIDZ, the following configurations provide optimal performance. Array sizes beyond 12 disks are not recommended.
- Start a RAIDZ1 at at 3, 5, or 9, disks.
- Start a RAIDZ2 at 4, 6, or 10 disks.
- Start a RAIDZ3 at 5, 7, or 11 disks.
The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups.
The following resources can also help you determine the RAID configuration best suited to your storage needs:
NOTE: NO RAID SOLUTION PROVIDES A REPLACEMENT FOR A RELIABLE BACKUP STRATEGY. BAD STUFF CAN STILL HAPPEN AND YOU WILL BE GLAD THAT YOU BACKED UP YOUR DATA WHEN IT DOES. See the sections on Periodic Snapshot Tasks and Replication Tasks if you would like to use ZFS snapshots and rsync as part of your backup strategy.
While ZFS isn't hardware, an overview is included in this section as the decision to use ZFS may impact on your hardware choices and whether or not to use hardware RAID.
If you are new to ZFS, the Wikipedia entry on ZFS provides an excellent starting point to learn about its features. These resources are also useful to bookmark and refer to as needed:
ZFS version numbers change as features are introduced and are incremental, meaning that a version includes all of the features introduced by previous versions. Table 1.4a summarizes various ZFS versions, the features which were added by that ZFS version, and in which version of FreeNAS® that ZFS version was introduced.
Table 1.4a: Summary of ZFS Versions
|ZFS Version||Features Added||FreeNAS® Version|
|11||improved scrub performance||.7.x|
|14||passthrough-x aclinherit property||8.0.x|
|15||user and group space accounting||8.0.x|
|16||STMF property support||8.3.0|
|18||snapshot user holds||8.3.0|
|19||log device removal||8.3.0|
|20||compression using zle (zero-length encoding)||8.3.0|
|23||deferred update (slim ZIL)||8.3.0|
|25||improved scrub stats||8.3.0|
|26||improved snapshot deletion performance||8.3.0|
|27||improved snapshot creation performance||8.3.0|
|28||multiple vdev replacements||8.3.0|
The following is a glossary of terms used by ZFS:
Pool: a collection of devices that provides physical storage and data replication managed by ZFS. This pooled storage model eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of file systems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all file systems at all times. The Storage Pools Recommendations of the ZFS Best Practices Guide provides detailed recommendations for creating the storage pool.
Dataset: once a pool is created, it can be divided into datasets. A dataset is similar to a folder in that it supports permissions. A dataset is also similar to a filesystem in that you can set properties such as quotas and compression.
Zvol: ZFS storage pools can be divided into zvols for applications that need access to a raw device, such as swap devices or iSCSI device extents. In other words, a zvol is a virtual block device in a ZFS storage pool.
Snapshot: a read-only point-in-time copy of a file system. Snapshots can be created quickly and, if little data changes, new snapshots take up very little space. For example, a snapshot where no files have changed takes 0MB of storage, but if you change a 10 GB file it will keep a copy of both the old and the new 10 GB version. Snapshots provide a clever way of keeping a history of files, should you need to recover an older copy or even a deleted file. For this reason, many administrators take snapshots often (e.g. every 15 minutes), store them for a period of time (e.g. for a month), and store them on another system. Such a strategy allows the administrator to roll the system back to a specific time or, if there is a catastrophic loss, an off-site snapshot can restore the system up to the last snapshot interval (e.g. within 15 minutes of the data loss). Snapshots can be cloned or rolled back, but the files on the snapshot cannot be accessed independently.
Clone: a writable copy of a snapshot which can only be created on the same ZFS volume. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients. Clones do not inherit the properties of the parent dataset, but rather inherit the properties based on where the clone is created in the ZFS pool. Because a clone initially shares all its disk space with the original snapshot, its used property is initially zero. As changes are made to the clone, it uses more space.
Deduplication: the process of eliminating duplicate copies of data in order to save space. Once deduplicaton occurs, it can improve ZFS performance as less data is written and stored. However, the process of deduplicating the data is RAM intensive and a general rule of thumb is 5 GB RAM per TB of storage to be deduplicated. In most cases, enabling compression will provide comparable performance. Beginning with FreeNAS® 8.3.0, deduplication can be enabled at the dataset level and there is no way to undedup data once it is deduplicated: switching deduplication off has NO AFFECT on existing data. The more data you write to a deduplicated dataset, the more RAM it requires, and there is no upper bound on this. When the system starts storing the DDTs (dedup tables) on disk because they no longer fit into RAM, performance craters. Furthermore, importing an unclean pool can require between 3-5 GB of RAM per TB of deduped data, and if the system doesn't have the needed RAM it will panic, with the only solution being to add more RAM or to recreate the pool. Think carefully before enabling dedup!
ZIL: (ZFS Intent Log) is effectively a filesystem journal that manages writes. You can increase performance by dedicating a device (typically an SSD or a dedicated disk) to hold the ZIL by creating a log device in Volume Manager. If you are using VMWare, the speed of the ZIL device is essentially the write performance bottleneck when using NFS. In this scenario, iSCSI will perform better than NFS. If you decide to create a dedicated cache device to speed up NFS writes, it can be half the size of system RAM as anything larger than that is unused capacity. Mirroring the ZIL device won't increase the speed, but it will help performance and reliability if one of the devices fails. If you lose a non-mirrored ZIL device on a ZFSv15 pool, the pool is unrecoverable and the pool must be recreated and the data restored from a backup. If you lose a non-mirrored ZIL device on a ZFSv28 pool, only the data in the ZIL which has not been written to the pool will be lost. You can replace the lost ZIL device in the View Volumes → Volume Status screen. Note that a dedicated ZIL device can not be shared between ZFS pools.
L2ARC: on-disk cache used to manage reads. Losing an L2ARC device will not affect the integrity of the storage pool, but may have an impact on read performance, depending upon the workload and the ratio of dataset size to cache size. You can learn more about how L2ARC works here. Note that a dedicated L2ARC device can not be shared between ZFS pools.
Scrub: similar to ECC memory scrubbing, all data is read to detect latent errors while they are still correctable. A scrub traverses the entire storage pool to read every data block, validates it against its 256-bit checksum, and repairs it if necessary.
ZFS and IOPs
THIS SECTION NEEDS TO BE IMPROVED AND PERHAPS INCORPORATED INTO ANOTHER SECTION
Notes on IOPs (Input/Output Operations per second) and ZPOOLS (example zpool name: abyss)
Choose any 2: speed | reliability | cost
If you pick speed and reliability, it will not be cheap
If you pick reliable and cost effective, it will not be fast
If you pick speed and cost effective, it will not be reliable
What does this all mean for my POOL?
|RAID Level||Read IO penalty||Write IO penalty|
|RAID 1, 10||1||2|
IOPs needed = ( Total IOPs x % read ) + (Total IOPs x % write x RAID penalty)
So, if we needed 300 IOPs with a 50% read and 50% write workload from a RAID-Z (write penalty = 4) pool
IOPs needed = (300 x 0.5) + (300 x 0.5 x 4) = 150 + 600 = 750 IOPs
For, a similar 300 IOPs with a 50% read and 50% write workload for a RAID 1, 10 (write penalty = 2) pool
IOPs needed = (300 x 0.5) + (300 x 0.5 x 2) = 150 + 300 = 450 IOPs
This says that a 750 IOPs pool would be needed to support a 300 IOPs RAID-Z pool with a 50/50 read/write workload.
Note, only a 450 IOPs pool is needed to support the same 300 IOPs on a RAID 1, 10 pool with the same 50/50 read/write workload
Question: How do you calculate bandwidth when you only have IOPS?
Answer: The mathematical formula to calculate bandwidth is a function of IOPS and I/O size.
The formula is simply IOP x I/O size. Example: 10,000 IOPS x 4k block size (4096 bytes) = 40.9 MB/sec.
See zpool IOPs read/write CLI metrics
zpool iostat -v 1
Optimal RAID-Zx pool member per vdev rule 2n + p
Where n is 1, 2, 3, 4, . . .
And p is the parity: p=1 for raid-z1, p=2 for raid-z2 and p=3 for raid-z3
RAID-Z = (21 + 1) ... (2n + 1) = 3, 5, 9, 17, ...
RAID-Z2 = (21 + 2) ... (2n + 2) = 4, 6, 10, 18, ...
RAID-Z3 = (21 + 3) ... (2n + 3) = 5, 7, 11, 19, ...
RAID-Z Similar to RAID5, but uses variable width stripe for parity which allows better performance than RAID5.
RAID-Z2 Similar to RAID6, and allows 2 drive failures before being vulnerable to data loss.
RAID-Z3 Allows 3 drive failures before being vulnerable to data loss.
Having a spare ready minimizes the time your pool is unprotected. You can begin replacement as soon as a failure occurs.