With the rapid development of the monitoring field, the birth of new technologies is also coming one after another. Cloud storage is the most happy high-tech product. It has the following major technologies.
The cloud storage system has the following characteristics: data security, super scalability, charging according to use, can cross different applications, automatically switch failures, and is easy to manage. Cloud storage is mainly used in four areas: backup, archiving, distribution, and sharing collaboration.
Cloud storage is a new concept that is extended and developed on the concept of cloud computing. It refers to the use of cluster applications, grid technology, or distributed file systems to integrate a large number of different types of storage devices in the network through application software. A system that works together to provide external data storage and business access functions. When the core of the computing and processing of a cloud computing system is the storage and management of a large amount of data, a large number of storage devices need to be configured in the cloud computing system, so the cloud computing system becomes a cloud storage system, so cloud storage is a data storage And management as the core cloud computing system.
Compared with cloud computing systems, cloud storage can be considered as a cloud computing system configured with large-capacity storage space. The cloud storage system has the following characteristics: data security, super scalability, charging according to use, can cross different applications, automatically switch failures, and is easy to manage. Cloud storage is mainly used in four areas: backup, archiving, distribution, and sharing collaboration.
The cloud storage system is a multi-device, multi-application, and multi-service collaboration, and its realization is based on the development of multiple technologies. According to the characteristics of cloud storage and its application fields, the main cloud storage technologies involve storage virtualization, distributed file systems, cluster storage, centralized storage management, heterogeneous platform collaboration, automatic hierarchical storage, and of course deduplication. , Data compression and other technologies.
TI on）最通俗的理解就是对存储硬件资源进行抽象化表现。 The most popular understanding of storage virtualization (StorageVirtualiza TI on) is to abstract the performance of storage hardware resources. By integrating one (or more) target services or functions with other additional functions, a comprehensive and useful service is provided in a unified manner. Typical virtualization includes the following situations: shielding the complexity of the system, adding or integrating new functions, simulating, integrating or decomposing existing service functions, etc. Virtualization works on one or more entities, and these entities are used to provide storage resources or services.
Storage virtualization is a technology that is used throughout the IT environment to simplify the underlying infrastructure that might otherwise be relatively complex. The idea of storage virtualization is to separate the logical image of the resource from the physical storage, thereby providing a simplified and seamless view of the resource virtual for the system and administrator.
For users, virtualized storage resources are like a huge "storage pool". Users do not see specific disks and tapes, and they do not have to worry about which path their data goes to which specific storage device. .
Hadoop Distributed File System (HDFS) is a distributed file system designed to be used on ordinary hardware devices. It has many similarities with the existing distributed file systems, but it is very different from these file systems. HDFS is highly fault-tolerant and is designed to be deployed on inexpensive hardware. HDFS provides high throughput for application data and is suitable for applications with large data sets. HDFS opens up some of the necessary POSIX interfaces, allowing streaming access to file system data.
HDFS is a master / slave structure. A cluster has a name node, which is the main control server, which is responsible for managing the file system's name space and coordinating client access to files. There is also a bunch of data nodes. Generally, one is deployed on one physical node and is responsible for storage management on the physical node where they are located. HDFS opens the file system namespace to allow user data to be stored in files. Internally, a file is divided into one or more data blocks, which are stored in a set of data nodes. Name nodes perform namespace operations on the file system, such as opening, closing, renaming files or directories, and determining the mapping of data blocks from data nodes. The data node is responsible for providing the client's read and write requests. The data node also performs the creation, deletion, and copying of data blocks according to the instructions of the name node.
Cluster storage is to aggregate the storage space in multiple storage devices into a storage pool that can provide a unified access interface and management interface to the application server. Applications can transparently access and use disks on all storage devices through this access interface. Make use of the performance of storage devices and disk utilization. Data will be stored and read from multiple storage devices in accordance with certain rules to achieve higher concurrent access performance.
The advantages of cluster storage are mainly reflected in improving the overall performance of parallel or partitioned I / O, especially workflow, read-intensive, and access to large files. The use of lower-cost servers reduces the overall cost. There are two ways to implement cluster storage: one is hardware infrastructure plus software; the other is dedicated cluster storage, which is built on the NAS infrastructure, but implements cluster storage through the operating system.
The cloud storage management platform is required to support cross-data center deployment and management, and support functions such as user access scheduling, data migration, and data off-site storage and backup across data centers.
Supports centralized management. The cloud storage management platform is deployed in the central computing room of the cloud computing. The storage nodes can be deployed in the local computer rooms. The management platform can uniformly manage and dispatch the storage devices in the branch computer rooms.
Through the cloud storage management platform, users can easily understand the service status of each node of the cloud storage system, including the capacity and performance of each node (read IOPS, write IOPS, read traffic, write traffic) and other information, allowing users to understand the resources in the domain in real time Information, operating status, so that these resources can be manipulated; at the same time understand the abnormality of resources in a timely manner, and if necessary, can take appropriate measures to ensure its normal operation.
At present, various storage solutions and technologies are very complicated and diverse. It is known from the situation that various types of storage devices may exist at the same time in an enterprise. Moreover, the storage environment between different storage equipment vendors has always had compatibility problems. Therefore, after years of storage consolidation, it is still difficult to meet the needs of enterprises. This is also the biggest obstacle to the development of storage virtualization and cloud storage.
Although storage cloud is easy to enter in some aspects (such as online storage and backup), on the other hand, it is not easy to achieve comprehensive storage integration through private cloud storage. In this regard, to successfully complete the goal of storage virtualization, it is necessary to improve the existing IT storage environment of the enterprise. The improvement focuses on a shared storage architecture, an intimate use environment, a simple and single operating interface, and an efficient storage solution. . Among them, whether it is a single operation interface or a unified standard API, it is one of the keys to solving the coordination problem between different storage devices.
CDMI is a new standard interface for cloud storage, which was developed by the International Storage Network Industry Association SNIA. For cloud computing, CDMI provides a general cloud computing management infrastructure. At the same time, the focus of information management has gradually shifted from storage management to data management. The CDMI standard can assist users to mark special interpretation data (Metadata) on the data, and the interpretation data will tell the endpoint storage provider what kind of data service provides the data (such as backup, archiving, encryption, etc.). Through the implementation of the CDMI standard interface, users can move data arbitrarily between different cloud providers, and no longer need to endure the pain of recoding in different interfaces.
Improving storage management efficiency has become the first problem for many enterprises. Automatic tiered storage has become the most effective basic technology. It refers to the function of migrating data blocks between different disk types and RAID levels. This can meet the gap between performance and space usage. The right balance, quickly put data in the right place and avoid so-called hot spots.
As this technology has received widespread attention, moving data between different levels of storage media, such as FC disks and SATA disks, requires a fully automated migration process.
In the hierarchical data storage structure, storage devices generally include tape libraries, disks, or disk arrays. Disks can be classified into FC disks, SCSI disks, and SATA disks based on their performance. Flash storage media (non-volatile random Access memory (NVRAM)) can also be used as a higher level in the hierarchical data storage structure because of higher performance. Generally, high-cost and fast devices such as disks or disk arrays are used to store important information that is frequently accessed, while lower-cost storage resources such as tape libraries are used to store information that is accessed less frequently.