Published by: INFINIT Consulting
Written by: Daniel Schneiderman, Director of IT Services
Date Published: April 18, 2008
The State of Data Storage
Data storage has long been considered one of the most valuable pieces of the IT infrastructure puzzle. The logical real estate of the digital world, storage space, is being consumed in increasing amounts, with premiums placed on media speed and reliability, as well as data protection and availability. Various methods exist to satiate the storage appetite of today’s business environments. Depending on the needs of the environment, companies can choose from direct-attached storage, storage area networks (SAN) and network attached storage (NAS).
Over the years, direct-attached storage (storage directly connected to the systems accessing it, such as internal hard disks) has cemented its place as the simplest, most common form of data storage solution. Even today, workstation and server purchases typically come packaged with internal hard disks, while most also include options for upgrading or adding to these disks. However, the need for more reliable, available storage became increasingly apparent. Businesses needed a solution that would allow easy access to data from multiple locations. They also recognized the need to have this data, often critical to business activities, protected from loss and corruption.
Enter the technologies of NAS and SAN. NAS and SAN devices allowed companies to host data in centralized locations while providing data availability that was configurable and manageable. While seemingly similar on the surface, NAS and SAN devices differ in a number of ways. Essentially, a NAS device acts as a central repository of data in a networked environment. Other devices in the network such as servers and workstations are able to access this data at the file-level using protocols such as NFS and SMB. While a NAS device could technically consist of one disk, they typically use large groups of disks configured for redundancy using one form of RAID or another. This allows for better performance and data redundancy.
A SAN is a combination of technology designed to attach storage devices, such as disk arrays and tape libraries, to servers in a way that makes these devices appear as local. From the data storage/disk perspective, SANs provide block-level access to data via some form of network protocol, most commonly SCSI. SAN disk arrays are typically configured for redundancy using one form of RAID or another, with multiple arrays of varying RAID configurations common in many environments. A SAN allows not only for the benefits of centralized data storage, it also provides for increased data performance due to the high-speed connections of devices making up the SAN. Other benefits of a SAN include the increased data protection as well as virtualization capabilities.
Not all SANs are created equal, however, and the fight for SAN supremacy is one that continues even as this article is written. Due to the continuous advancement of today’s business needs, companies are demanding more. More performance, more redundancy, more availability, more compliance. While most SAN providers are trying to address these demands, one has based their entire platform on them.
A Compelling Solution
Compellent is a Minnesota-based storage vendor that seems to have gotten the proverbial “it” right. That “it” being a SAN solution that not only delivers the features that today’s businesses need from their storage solution, but also provides a multitude of features they didn’t even know they wanted. When it comes to efficient use of storage space, data clustering, data recovery, and data performance, no other SAN comes close. Proprietary features such as Thin Provisioning, Data Progression, and Data Instant Replay set Compellent’s Storage Center apart from the rest of the crowd. In addition, the ease of administration and expandability are next to none.
In terms of scalability, no other SAN solution offers a solution quite like Compellent’s. Storage Center is a truly modular solution, allowing an administrator to implement a system with as little as one terabyte of storage and expand to hundreds of terabytes without the need of performing a ‘forklift’ upgrade. Even more importantly, adding storage and components requires no disruption in service or downtime whatsoever.
Optimal Efficiency – Is There Such a Thing?
The answer to the above question – it depends. If we are talking about data storage, then the answer is yes. Just ask yourself this question: How much total disk space is actively being used in my environment? Ok, so the answer may not be immediately evident. However, if your answer is not enough, you are just like most other businesses. The fact is, with traditional storage solutions we are forced to plan for the future by acting today. What this methodology causes us to do is purchase excessive amounts of disk storage now to allow for growth in the future. Well, what’s so bad about that? Nothing if you are able to maintain that disk space over time. The truth is storage space is guaranteed to become consumed. Whether you purchased 300GB or 30TB, at some point in time, that storage space will be spoken for. While this doesn’t mean the space will be truly filled, it does mean that it will become unavailable for use in one way or another.
For example, let’s say you purchase a SAN solution with a single array maxed out with 2TB of useable disk space. You configure the disks in such a way that you have 4 assignable logical disks, each of 500GB. You assign each of these logical disks to different servers. Traditionally, this disk space is for use for the assigned server only. As a result if you came to need an additional 200GB of space for a new server, you would need to either break the existing configuration and re-assign the space, or purchase additional a new array of disks to add to your current solution. The first option can be an administrative nightmare while the second option can be very costly. The worst part of this is that if you look at the servers that had the space assigned to them in the first place, not one is using over 150GB. The antithesis of efficiency, no?
Now let’s assume you are able to create a new logical disk of 200GB and assign it to the new server. Problem solved, right? Wait a minute, that’s 2.2TB used on a SAN with only 2TB of space. How is that possible? Please allow me to introduce you to Dynamic Capacity, also known as Thin Provisioning. This key feature of Compellent Storage Centerallows you to assign as much storage to your servers as you deem necessary, above and beyond the current useable capacity. It does this by only consuming physical space when data is written rather than by allocating data up front. As a result, planning becomes much easier. When the written data begins to approach the physical limit, simply add more drives, and all is right with the SAN. Gone are the headaches associated with inappropriately allocated storage space and server “low disk space” messages.
Location, Location, Location (Part 1)
Earlier, we referred to storage space as the digital equivalent of real estate. Many real estate professionals will say that the three most important elements in real estate valuation are location, location, location. Taking this same stance with regard to disk space would be a bit over-the-top, but let’s take a moment to discuss the effects of disk management.
You have a choice when it comes to the types of disks you put into your SAN. Whether you choose high-end Fibre Channel disks or cost-effective SATA disks, or a combination of the two, you are making a decision based on performance and budgetary considerations. In many cases, data stored on the SAN such as databases and frequently accessed files requires high performance drives to meet the demands of high I/O activity. However, it has been said that the majority of data, once created, is rarely accessed after 30 to 90 days. Storing this data on costly high-performance fibre-channel disks is the digital equivalent of using beach-front property as a junkyard. Could you do it? Sure, but when you consider that you could pay one tenth of the rent to put a junkyard five times as large further inland, it doesn’t make much sense to keep it on the beach where you can build something that provides more value. Leave important and high I/O data on the high-end disks and put the less important, stale data on larger, cheaper disks.
The concepts of tiered storage is not new, however it has traditionally been a difficult and expensive to implement and maintain due to the need to purchase multiple point productions and perform data classification and movement tasks manually. If only there were a way to automate the process, including tiered storage in your environment would be a no-brainer. Fortunately, Compellent felt the same way. Using a technology they call ‘Data Progression’ Compellent Storage Center is able to quickly and easily migrate stale data from expensive high-performance disks to cheaper, slower, high-capacity disks. What’s more, Storage Center will do this automatically based on the age of the data in question eliminating the need for costly, time-consuming ILM software implementations. This is made possible by Compellent’s Dynamic Block Architecture, which records and keeps track of data blocks and specific information about those data blocks including associated volume, RAID level, time written, etc. Storage Center uses this metadata to identify and migrate data accordingly, without the need for user or administrator intervention.
Compellent’s Data Progression and Tiered Storage offerings enable you to safely and easily ensure that the prime real estate of your SAN is used as efficiently as possible providing the most value-driven storage solution available.
The Digital Insurance Policy
Just as you take out insurance policies on items of importance in life (i.e. your home, automobile, etc), it is important to ensure that your business assets are protected from loss. In most organizations, this insurance policy comes in the form of a disaster recovery plan. One of the key components of an effective disaster recovery plan is a data backup and recovery solution. Whether the solution involves backing data up to tape, disk, or both, an effective backup and recovery solution is a must in every business that deals with digital assets.
Recently, disk-based backups have become increasingly popular due to their simplicity and redundancy implications. Many companies have adopted a strategy of creating disk-based backups which in turn get dumped onto a tape or other digital or magnetic media. The benefits of having a disk-based backup layer include:
- Ease of use – Often the interface for backup and recovery of disk-based backup solution is simple and easy to navigate.
- Speed – Once the initial data snapshot (an identical copy of the data) is created, nightly changes are backed up quickly.
- Redundancy – The disk-based backup acts as an intermediate backup level between the live data and the tape-based backup. Should the need to initiate a data restore operation arise, it’s as simple as restoring from disk.
However, traditional disk-based backup is not without its faults. Oftentimes, the extra space required to store the data snapshots can be costly. The space taken up by the snapshot data could be otherwise used by production data and applications. Also, the initial snapshot of production data can take extremely long depending on the amount of data being copied. The bandwidth used by the copy process could also affect production performance, forcing this process to take place in off-hours. Finally, disk-based backup software typically has limitations around the amount of snapshots that can be created and stored.
Compellent recognized both the benefits and drawbacks of traditional disk-based backup and worked to make a better solution. Data Instant Replay is the name given to the technology developed by Compellent which keeps all of the benefits of traditional disk-based backup, and then some, while doing away with the drawbacks.
In essence, Data Instant Replay makes a copy of production data without making a copy of anything. As a result, creating a snapshot of a 5TB volume is nearly instantaneous. How is that possible, you may ask? Data Instant Replay simply freezes the blocks of data that are selected for the snapshot. Any changes to this data are made on top of the frozen blocks. The frozen blocks will be there until the snapshot is expired. As a result, a complete snapshot of all production data is created in seconds. Should you need to restore some amount of data, simply mount the replay volume to any server and viola, your data is available in a matter of seconds.
Additionally, Data Instant Replay allows unlimited snapshots. The absence of this limitation combined with the fact that an Instant Replay snapshot takes up little –to-no additional storage space means you can create a Data Instant Replay schedule tailored to your business’s needs. Using automated Data Instant Replay, you can schedule snapshots at the frequency that is right for you, whether it be once a day, once per hour, or even every 5 minutes.
Location, Location, Location (Part 2)
Earlier we talked about the importance of data location with regard to expensive disks. Now, we’ll actually talk about data location with regard to geography. As discussed previously, Data Instant Replay provides a simple, effective, and robust method of creating backups of the data stored on the Compellent SAN. However, there may be a need for data to be stored at multiple geographic sites. Whether for offsite data protection or for multi-site data availability, Compellent has addressed the need for data replication with a solution called Remote Instant Replay.
Remote Instant Replay takes Data Instant Replay a step further by replicating snapshots to one or more business locations over long distances. The data synchronization between sites can be synchronous or asynchronous and can be performed over multiple link types such as Fibre Channel or Ethernet. Since Remote Instant Replay replicates data intelligently at the block level and optimizes bandwidth usage, the cost of remote data replication is drastically decreased while performance is drastically increased. Remote Instant Replay’s easy-to-use interface combined with online verification provides one of the most effective data replication solutions available. Additionally, the ability to perform data replication across multiple sites while each site remains active and available allows for one of the most robust data replication solutions available.
Welcome to the Real World
The real world, the world we live and operate in today, is one that, ironically, is becoming more virtual than real. The need to reduce operating costs while promoting efficiency combined with the desire to help preserve our environment has taken Information Technology into a whole new direction: Virtualization. Virtualization has become increasingly important for business and IT executives throughout the world, and for good reason.
By virtualizing various components of an infrastructure, businesses can realize increased cost savings and productivity. For example, let’s assume your business has 3 active directory domains with 6 domain controller servers (two for the each domain). These 6 servers could easily be combined into two servers, with one copy of each domain controller running on a virtual machine on each server. Not only does this save on electrical costs, but management of these systems is now more centralized.
Compellent Storage Center virtualizes at the disk level. This spreads read/write access across all disks, dramatically increasing performance. Additionally, since all disks are pooled and virtualized, tasks such as capacity planning and provisioning become extremely simple and performed in a matter of minutes.
In addition, Compellent features Boot-from-SAN functionality. Using features such as Data Instant Replay and Server Instant Replay. Compellent’s built-in Boot-from SAN features help lower costs, reduce administrative overhead and management time, and help improve server availability and performance. By placing all server boot images on the SAN, they become centralized and easy to manage. Furthermore, utilizing Instant Replay, recovery of a corrupted server becomes easier than ever. The time required to deploy, provision, and recover a server can be reduced from 8 hours to less than 15 minutes on average.
Additionally, utilizing the SAN for all storage needs, including boot partitions for servers, drastically reduces infrastructure costs. In fact, Compellent claims that for every 25 servers booting from SAN, organizations will save an average of $70,000 in first year server costs. By eliminating the need to purchases individual direct-attached storage disks and cutting out the energy consumption of those disks, boot-from-SAN exhibits a very positive return on investment.
Welcome to the Real World
Compellent’s SAN solution is, hands down, the most complete SAN solution available. By providing features such as Thin Provisioning, Data Instant Replay, Remote Instant Replay, Data Progression and Boot-from-SAN, in a completely scalable, modular architecture, they have taken all of the possible reasons for not owning a SAN out of the equation.