Resumo do conteúdo contido na página número 1
StorNext
File System Tuning Guide File System Tuning Guide File System Tuning Guide
®
StorNext 3.0
6-01376-05
Resumo do conteúdo contido na página número 2
Document Title, 6-01376-05, Ver. A, Rel. 3.0, March 2007, Made in USA. Quantum Corporation provides this publication “as is” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability or fitness for a particular purpose. Quantum Corporation may revise this publication from time to time without notice. COPYRIGHT STATEMENT Copyright 2007 by Quantum Corporation. All rights reserved. StorNext copyright (c) 1991-2007 Advanced Digi
Resumo do conteúdo contido na página número 3
Contents StorNext File System Tuning 1 The Underlying Storage System ...................................................................... 1 RAID Cache Configuration....................................................................... 2 RAID Write-Back Caching ........................................................................ 2 RAID Read-Ahead Caching ...................................................................... 3 RAID Level, Segment Size, and Stripe Size ......................
Resumo do conteúdo contido na página número 4
0StorNext File System Tuning The StorNext File System (SNFS) provides extremely high performance for widely varying scenarios. Many factors determine the level of performance you will realize. In particular, the performance characteristics of the underlying storage system are the most critical factors. However, other components such as the Metadata Network and MDC systems also have a significant effect on performance. Furthermore, file size mix and application I/O characteristics may also
Resumo do conteúdo contido na página número 5
StorNext File System Tuning The Underlying Storage System RAID Cache The single most important RAID tuning component is the cache Configuration 0 configuration. This is particularly true for small I/O operations. Contemporary RAID systems such as the EMC CX series and the various Engenio systems provide excellent small I/O performance with properly tuned caching. So, for the best general purpose performance characteristics, it is crucial to utilize the RAID system caching as fully as po
Resumo do conteúdo contido na página número 6
StorNext File System Tuning The Underlying Storage System metadata operations throughput. This is easily observed in the hourly File System Manager (FSM) statistics reports in the cvlog file. For example, here is a message line from the cvlog file: PIO HiPriWr SUMMARY SnmsMetaDisk0 sysavg/350 sysmin/333 sysmax/367 This statistics message reports average, minimum, and maximum write latency (in microseconds) for the reporting period. If the observed average latency exceeds 500 microseconds,
Resumo do conteúdo contido na página número 7
StorNext File System Tuning The Underlying Storage System it severely degrades typical scenarios. Therefore, it is unsuitable for most environments. RAID Level, Segment Configuration settings such as RAID level, segment size, and stripe size Size, and Stripe Size 0 are very important and cannot be changed after put into production, so it is critical to determine appropriate settings during initial configuration. The best RAID level to use for high I/O throughput is usually RAID5. The stri
Resumo do conteúdo contido na página número 8
StorNext File System Tuning File Size Mix and Application I/O Characteristics File Size Mix and Application I/O Characteristics It is always valuable to understand the file size mix of the target dataset as well as the application I/O characteristics. This includes the number of concurrent streams, proportion of read versus write streams, I/O size, sequential versus random, Network File System (NFS) or Common Internet File System (CIFS) access, and so on. For example, if the dataset is do
Resumo do conteúdo contido na página número 9
StorNext File System Tuning File Size Mix and Application I/O Characteristics Command Options on page 16). So, it is typically most important to optimize the RAID cache configuration settings described earlier in this document. It is usually best to configure the RAID stripe size no greater than 256K for optimal small file buffer cache performance. For more buffer cache configuration settings, see Mount Command Options on page 16. It is best to isolate NFS and/or CIFS traffic off of the me
Resumo do conteúdo contido na página número 10
StorNext File System Tuning The Metadata Network The Metadata Network As with any client/server protocol, SNFS performance is subject to the limitations of the underlying network. Therefore, it is recommended that you use a dedicated Metadata Network to avoid contention with other network traffic. Either 100BaseT or 1000BaseT is required, but for a dedicated Metadata Network there is usually no benefit from using 1000BaseT over 100BaseT. Neither TCP offload nor are jumbo frames required.
Resumo do conteúdo contido na página número 11
StorNext File System Tuning The Metadata Controller System Some metadata operations such as file creation can be CPU intensive, and benefit from increased CPU power. The MDC platform is important in these scenarios because lower clock- speed CPUs such as Sparc and Mips degrade performance. Other operations can benefit greatly from increased memory, such as directory traversal. SNFS provides three config file settings that can be used to realize performance gains from increased memory: Bu
Resumo do conteúdo contido na página número 12
StorNext File System Tuning The Metadata Controller System Example: [stripeGroup RegularFiles] Status UP Exclusive No ##Non-Exclusive stripeGroup for all Files## Read Enabled Write Enabled StripeBreadth 256K MultiPathMethod Rotate Node CvfsDisk6 0 Node CvfsDisk7 1 Affinities 0 Affinities are another stripe group feature that can be very beneficial. Affinities can direct file allocation to appropriate stripe groups according to performance requirements. For exampl
Resumo do conteúdo contido na página número 13
StorNext File System Tuning The Metadata Controller System StripeBreadth 0 This setting must match the RAID stripe size or be a multiple of the RAID stripe size. Matching the RAID stripe size is usually the most optimal setting. However, depending on the RAID performance characteristics and application I/O size, it might be beneficial to use a multiple of the RAID stripe size. For example, if the RAID stripe size is 256K, the stripe group contains 4 LUNs, and the application to be optimiz
Resumo do conteúdo contido na página número 14
StorNext File System Tuning The Metadata Controller System InodeCacheSize 0 This setting consumes about 800-1000 bytes of memory times the number specified. Increasing this value can reduce latency of any metadata operation by performing a hot cache access to inode information instead of an I/O to get inode info from disk, about 100 to 1000 times faster. It is especially important to increase this setting if metadata I/O latency is high, (for example, more than 2ms average latency). You s
Resumo do conteúdo contido na página número 15
StorNext File System Tuning The Metadata Controller System severely consumes metadata space in cases where the file-to-directory ratio is less than 100 to 1. However, startup and failover time can be minimized by increasing FsBlockSize. This is very important for multi- terabyte file systems, and especially when the metadata servers have slow CPU clock speed (such as Sparc and Mips). A good rule of thumb is to use 16K unless other requirements such as directory ratio dictate otherwise. No
Resumo do conteúdo contido na página número 16
StorNext File System Tuning The Metadata Controller System It also possible to trigger an instant FSM statistics report by setting the Once Only debug flag using cvadmin. For example: cvadmin -F snfs1 -e ‘debug 0x01000000’ ; tail -100 /usr/cvfs/data/snfs1/log/cvlog The following items are a few things to watch out for: • A non-zero value for FSM wait SUMMARY journal waits indicates insufficient IOPS performance of the disks assigned to the metadata stripe group. This usually requires reduci
Resumo do conteúdo contido na página número 17
StorNext File System Tuning The Metadata Controller System The cvcp utility is a higher performance alternative to commands such as cp and tar. The cvcp utility achieves high performance by using threads, large I/O buffers, preallocation, stripe alignment, DMA I/O transfer, and Bulk Create. Also, the cvcp utility uses the SNFS External API for preallocation and stripe alignment. In the directory-to-directory copy mode (for example, cvcp source_dir destination_dir,) cvcp conditionally use
Resumo do conteúdo contido na página número 18
StorNext File System Tuning The Metadata Controller System • Zr: Hole in file was zeroed Both traces also report file offset, I/O size, latency (mics), and inode number. Sample use cases: • Verify that I/O properties are as expected. You can use the VFS trace to ensure that the displayed properties are consistent with expectations, such as being well formed; buffered versus DMA; shared/non-shared; or I/O size. If a small I/O is being performed DMA, performance will be poor. If DMA I/O is n
Resumo do conteúdo contido na página número 19
StorNext File System Tuning The Metadata Controller System The latency-test command has the following syntax: latency-test index-number [seconds] latency-test all [seconds] If an index-number is specified, the test is run between the currently- selected FSM and the specified client. (Client index numbers are displayed by the cvadmin who command). If all is specified, the test is run against each client in turn. The test is run for 2 seconds, unless a value for seconds is specified. Here is a
Resumo do conteúdo contido na página número 20
StorNext File System Tuning The Metadata Controller System The buffer cache I/O size is adjusted using the cachebufsize setting. The default setting is usually optimal; however, sometimes performance can be improved by increasing this setting to match the RAID5 stripe size. Unfortunately, this is often not possible on Linux due to kernel memory fragmentation. In this case performance may degrade severely because the full amount of buffer cache cannot be allocated. Using a large cachebufs