Monday, January 3, 2011

Analysis of new technology for object storage Linux file system

With the high performance computing from traditional host to the networked cluster evolution, the traditional host-based storage schema has been gradually to the development of networked storage, computing and storage separation of trend is becoming increasingly clear.

For San and NAS, International has launched a novel for Linux cluster file system-object storage file system, this article focuses on the storage object file system architecture, technical characteristics, and Lustre file system object stores a preliminary test, the results indicate that the object store file system scalability, performance, ease of use has increased significantly, as networked storage technology continues to mature, and object storage file system will become the important direction of development. I. Introduction to high performance computing has evolved from a traditional way gradually to a host cluster evolution, such as the TOP500, 1998, only 2 system is a cluster approach, and by 2003 had 208 for cluster systems. With high-performance computing architecture development and changes in the traditional host-based storage architecture has become the new bottlenecks can not meet the needs of the cluster system. Cluster storage system must effectively address two main issues: (1) to provide shared access to data, to facilitate the preparation of the cluster application and storage of load balance; (2) provides high-performance storage, I/O and data throughput rate can meet hundreds of Linux Cluster Server aggregate access needs. Currently, networked storage has become a resolution of the cluster system high-performance storage of effective ways techniques. Internationally, there are two kinds of networked storage schema, which is the set of commands to differentiate. The first type is SAN (StorageAreaNetwork) structure, it uses the SCSI block i/o command set, by disk or FC (FiberChannel) level of data access provides high-performance random i/o and data throughput rate, it has a high-bandwidth, low latency, high-performance computing in place, such as SGI's CXFS file system are San-based storage for high-performance file, but since San system prices higher, and scalability, can not meet thousands of CPUs in the system scale. The second category is NAS (NetworkAttachedStorage) structure that use NFS or CIFS command set access data to a file for transfer protocol, TCP/IP networked storage scalability, good, cheap, easy to manage users, such as the current application in the cluster computing more NFS file systems, but because NAS protocol overhead, high bandwidth, low latency, not conducive to a high-performance cluster application. For a Linux cluster on a storage system for high performance and data-sharing needs, have started to study abroad-new storage architecture and a new type of file system, in the hope of effective combination of San and NAS systems, support direct access to the disk to increase performance, through the sharing of files and metadata to simplify management, current object storage file system has become a Linux cluster system high performance file system, such as Lustre ClusterFileSystems company, Panasas's ActiveScale file system etc. Lustre file system using object-based storage technology that originates from the Carnegie Mellon University's Coda project research work, published in December 2003, it is anticipated that the Lustre1.0 Edition in 2005 will publish 2.0. Lustre in United States Department of energy (u.s.departmentofenergy: DOE), national laboratories, LawrenceLivermore LosAlamos national laboratory, Sandia National Laboratories, the PacificNorthwest national laboratory's high performance computing system has been a preliminary application, IBM's BlueGene system is being developed will adopt the Lustre file system to achieve their high-performance storage. ActiveScale file system technology from Carnegie Mellon University's earliest Dr.GarthGibson, is supported by DARPA NASD (NetworkAttachedSecureDisks) project, now is the industry's more influential object storage file system, winner of the innovative technology award in ComputerWorld2004. Second, the object store file system 2.1 object storage file system schema object storage file system is the core data path (data read or write) and control pathway separation (metadata), and object-based storage devices (Object-basedStorageDevice, OSD) build a storage system, each object storage device has a smart, to automatically manage the distribution of data, object storage file system usually has the following parts. 1, object object is a system in the basic unit of data storage, an object is actually a file of data and a set of properties, those properties you can define the file-based RAID parameters, data distribution and service quality, etc, and traditional storage systems using a file or block as the basic storage unit, in the block storage system also needs to be always tracking system in each block of property, object through communication with storage system maintains its own properties. In the storage device, all object has an object identifier, by object ID OSD command to access the object. Usually there are many types of objects, storage device root object ID storage devices and the equipment of all kinds of property, the set of objects is the storage device shared resource management policy is a collection of objects. 2, object storage device object storage device has a smart, it has its own CPU, memory, network and disk system, at present, usually uses the blade structure implementation object storage device. OSD provides three key capabilities: (1) data storage. OSD management object data, and place them in the standardDisk systems, the OSD does not provide block interface access method, the Client requests the object ID data, when used offsets data read and write. (2) smart distribution. OSD with its own CPU and memory optimization data distribution, and support data prefetch. Due to the OSD can intelligently supports object pre-fetching, so you can optimize the performance of disk. (3) each object metadata management. OSD management objects are stored in the metadata, the metadata and traditional inode metadata similar, often including object blocks and the length of the object. Whereas in traditional NAS systems, these metadata from file server maintenance, object storage architecture system main metadata management work accomplished by the OSD, lowering the cost of the Client. 3, metadata server (MetadataServer, MDS) MDS control Client interactions with OSD object, which provides the following features: (1) object storage access. MDS structure, management describes each file distribution view, allowing Client direct access to the object. MDS for Client provides access to the file containing the object's capabilities, OSD in each request received that will verify the abilities before you can access. (2) file and directory access management. MDS on the storage system, build a file structure, including limit control, directory and file creation and deletion, access control, etc. (3) ClientCache consistency. In order to improve Client performance, in the object store file system design time typically support Client side Cache. Since the introduction of Client side Cache, Cache consistency issue, MDS support based on the Client's file Cache, Cache file changes, you will be notified that the Client refreshes the Cache, thus preventing inconsistent Cache. 4. object storage file system Client in order to effectively support Client supports access to objects on the OSD, you need to compute node implementation object storage file system of the Client, typically provide POSIX file system interface that allows applications like implementation of the standard file system operations. 2.2 object storage file system key technology 1, distribution metadata traditional storage structure metadata server typically provides two primary functions. (1) for the purpose of calculating the node provides a logical view of the data is stored (VirtualFileSystem, VFS layer), the list of file names and directory structure. (2) Organization of physical storage media for data distribution (inode). Object storage architecture will store the data in the logical view from the physical view, and load distribution, avoid metadata server caused by bottlenecks (such as NAS system). Metadata part is usually the VFS metadata server load of 10%, the remaining 90 per cent of the work (inode) is in a block of data storage media for physical distribution. In the object store structure, inode work distributed to every intelligent OSD OSD, each is responsible for managing data distribution and retrieval so that 90% of the metadata management for distribution to the intelligent storage devices, thus enhancing the system metadata management performance. In addition, the distribution's metadata management, adding more OSD to your system, you can also increase the metadata properties and system storage capacity. 2, concurrent data access object storage architecture defines a new, more intelligent disk interface OSD. OSD is a network-connected devices, which itself contains storage media, such as a disk or tape, and have enough intelligence to manage local storage of data. Compute nodes communicate directly with the OSD, access stored data, because the OSD with smart, so there's no need to file server intervention. If the file system of data spread across multiple OSD, aggregate i/o rate and data throughput rate will be linear growth, for the vast majority of Linux cluster applications, sustained I/O aggregate bandwidth and throughput on a number of compute nodes is very important. Object storage architecture provides performance is difficult to achieve in other storage structure, such as ActiveScale object storage file system bandwidth you can reach 10GB/s. 2.3Lustre object storage file system Lustre file system object stored by a client (the client), storage server (OST, ObjectStorageTarget) and metadata server (MDS) of three main parts. Lustre of the client and Lustre file system, and file data I/O OST interactions and MDS for namespace operation of interaction. In order to improve the performance of the Lustre file system, usually the Client, the OST and MDS is the separation, of course, these subsystems can also run in the same system. Its three main sections as shown in Figure 1. Lustre is a transparent global file system, clients can transparently access the cluster file system data, without having to know the location of the actual storage. Client over the network, read the data on the server, the storage server is responsible for the actual file system read and write operations, as well as storage devices, metadata server is responsible for the file system directory structure, file permissions and file extension properties, as well as maintenance of the entire file system of data consistency and respond to client requests. As Lustre to files by metadata server objects, metadata server guide the actual file i/o requests to the storage server, storage-server management in the disk group object based on the physical storage. As a result of using metadata and store the data phase separation techniques, can fully separate computing and storage resources, allowing the client computer can focus on the user and application request; storage servers and metadata server focused on reading, transport, and write data. Storage server-side data backup and storage configuration and storage server extension, and so does not affect client, storage servicesAnd metadata server will not become a performance bottleneck. Lustre of the global namespace for a file system for all client provides a valid globally unique directory tree and data bars, and then put data assigned to the individual storage servers, providing than traditional Sans "block sharing more flexible way of sharing access. Global directory tree eliminates client configuration information, and the configuration information update is still remain valid. 3. test and conclusions 1, Lustreiozone testing for object storage file system, we made the Lustre file system-a preliminary test, specific configuration as follows: 3 dual-Xeon system: CPU: 1.7GHz, memory: 1GB, Chin

No comments:

Post a Comment