Wednesday 8 February 2012

pNFS(parallel Network File System)


pNFS Overview
"Parallel storage based on pNFS is the next evolution beyond clustered NFS storage and the best way for the industry to solve storage and I/O performance bottlenecks. Panasas was the first to identify the need for a production grade, standard parallel file system and has unprecedented experience in deploying commercial parallel storage solutions."

Introduction
High-performance data centers have been aggressively moving toward parallel technologies like clustered computing and multi-core processors. While this increased use of parallelism overcomes the vast majority of computational bottlenecks, it shifts the performance bottlenecks to the storage I/O system. To ensure that compute clusters deliver the maximum performance, storage systems must be optimized for parallelism. The industry standard Network Attached Storage (NAS) architecture has serious performance bottlenecks and management challenges when implemented in conjunction with large scale, high performance compute clusters.
Panasas® ActiveStor™ parallel storage takes a very different approach by allowing compute clients to read and write directly to the storage, entirely eliminating filer head bottlenecks and allowing single file system capacity and performance to scale linearly to extreme levels using a proprietary protocol called DirectFlow®. Panasas has actively shared its core knowledge with a consortium of storage industry technology leaders to create an industry standard protocol which will eventually replace the need for DirectFlow. This protocol, called parallel NFS (pNFS) is now an optional extension of the NFS v4.1 standard.

NFS Challenges
In order to understand how pNFS works it is first necessary to understand what takes place in a typical NFS architecture when a client attempts to access a file. Traditional NFS architecture consists of a filer head placed in front of disk drives and exporting a file system via NFS. When large numbers of clients want to access the data, or if the data set grows too large, the NFS server quickly becomes the bottleneck and significantly impacts system performance because the NFS server sits in the data path between the client computer and the physical storage devices.

pNFS Performance
pNFS removes the performance bottleneck in traditional NAS systems by allowing the compute clients to read and write data directly and in parallel, to and from the physical storage devices. The NFS server is used only to control metadata and coordinate access, allowing incredibly fast access to very large data sets from many clients.
When a client wants to access a file it first queries the metadata server which provides it with a map of where to find the data and with credentials regarding its rights to read, modify, and write the data. Once the client has those two components, it communicates directly to the storage devices when accessing the data. With traditional NFS every bit of data flows through the NFS server – with pNFS the NFS server is removed from the primary data path allowing free and fast access to data. All the advantages of NFS are maintained but bottlenecks are removed and data can be accessed in parallel allowing for very fast throughput rates; system capacity can be easily scaled without impacting overall performance.
pNFS eliminates the performance bottleneck of traditional NAS

The future for pNFS
It is anticipated that pNFS will begin to be widely deployed in standard Linux distributions by 2012. The HPC market will be the first market to adopt the pNFS standard as it provides substantial performance benefits, especially for the technical compute market. However, simply supporting pNFS will not guarantee the high performance that Panasas currently delivers on its ActiveStor storage systems with DirectFlow. When it comes to matching the pNFS protocol with the back-end storage architecture and delivering the most performance, the Object layout in pNFS has many advantages and Panasas will be ideally situated to continue to deliver the highest parallel performance in the industry.

No comments:

Post a Comment