vipulvajpayee blog: 2012

Thursday, 13 December 2012

Points to remember when expanding the RAID group in Hitachi HUS (Hitachi Unified Storage)

Points to remember when expanding the RAID group in Hitachi HUS (Hitachi Unified Storage)

Raid Group can be expanded on Hitachi HUS storages, but we need to take lot of care during the raid group expansion, if you will not take care of some of points during expansion then you will end up with data loss.

1. Raid Group can be expanded by adding the disk.

2. Minimum disks that can be added are 8 or less if we reach the maximum RG width.

3. R0 cannot be expanded.

4. Any number of RG expansion request can be given but at any point of time only each controller will do one raid group expansion.

5. Expanding the raid group will not expand the luns inside the raid group luns are known as LU in Hitachi.

6. When the Raid group is expanded then extra space get created in which we can create extra LUs in that raid group.

7. Raid group expansion takes times and slightly performance gets decrease so it is recommended by Hitachi to do this operation when the I/Ops are less on Hitachi storage.

8. Expanding the raid group will not change the raid level means the R5 (raid 5) will be R5 (raid5) when it get expanded.

9. Only those RG can be expanded whose PG depth is 1.

10. Raid group can be expanded but cannot be shrunk.

11. There are two states when the raid group is expanded, one is the waiting state and one is the expanding state.

12. Waiting state raid group state can be cancelled, in this state means that the raid group expansion is yet not started and then you can cancel it if you want.

13. Expanding state raid group cannot be cancelled, means the expansion got started and if we forcefully try to cancel it then we can end up by data loss.

Rules for expanding a RAID group

You cannot expand the raid group in following conditions

1. If the LU whose status of the forced parity correction is:

Correcting, waiting, waiting drive reconstruction, unexecuted, unexecuted 1 or unexecuted 2.

Means that when parity construction is going on please don’t perform the raid group expansion, let the activity get completed and then we can do the expansion.

2. If an LU is being formatted and its part of the raid group which you need to expand then don’t expanded the raid group until and unless the formatting gets completed.

3. If you are expanding a raid group after setting or changing cache partition manager configuration, the storage system must be rebooted, expand the raid group after rebooting the storage system in which the power saving function is set. Change the status of the power saving features to “Normal (spin-on)” and then expand the raid group.

4. If you are expanding a raid group when the dynamic sparing/correction copy/copy back is operating, expand the raid group after the drive has been restored.

5. If you are expanding a raid group while installing the firmware, expand the raid group after completing the firmware installation.

Best practices for raid group expansion

1. You can assign priority as Host I/O or RAID group expansion.

2. Perform backup of all data before executing expansion.

3. Execute the RAID group expansion at a time when host i/o is at a minimum.

4. Add drive with the same capacity and rotational speed as RAID group of expansion target to maximize performance.

5. Add drives in multiples of 2 when expanding a RAID-1 or RAID-1+0 groups.

Now in 2 point it is mentioned to take backup of all the data before doing the expansion is because some time during expansion , if there is power failure, or something happens to the system and if any disaster happens, then the LUN associated with the expansion can become unformatted. And there can be chance of data loss. So to be at safer side please takes a backup of the data before performing the expansion.

Friday, 30 November 2012

Avamar

Avamar

As I was going through the EMC avamar product I thought of writing something about it in my blog. As we know the backup is the evergreen technology which keeps on improving day by day and the IT companies are really investing lots of money on backup and I would like to say that Symantec product like backup exec and netbackup were mostly heard names in backup world, but now a day’s the Emc is eating their market by Avamar and Networker and Data Domain products EMC is having 68% of backup market and these product are really giving the tough fight to Symantec, and other backup vendors.

Let me introduce about the Avamar.

Avamar is software + hardware compliance which takes the backup on disk, avamar does not comes on model vise, rather it comes in capacity vise like 3.9 TB or etc... , Avamar do the source base deduplication so it becomes easy to send data on LAN or WAN with nice speed, There is one more good thing about this product is that it takes daily full back up , so the restoration of the data becomes much faster than the traditional tape. As we know that because of virtualization, lots of physical server moved into virtual world and so the complicity of taking backup of those virtual machine also got increased and by Avamar we can easily and with nice speed take backup of those virtual machine and even of the NAS devices sometimes the NDMP backup becomes quite time consuming, so by Avamar the NDMP backup really becomes very fast and smooth and the backup window also get reduced. One more thing I forget to mention the laptop and desktop backup, mostly in IT companies and other companies we don’t take backup of our laptop and desktop and sometimes if our laptop lost or crashed , it becomes difficult to retrieve the data, by avamar has an capability of taking the desktop and laptop backup. Data Encryption on flight and on rest is added advantage on security perspective ,and centralized management makes protecting hundreds of remote offices easy, by avamar data transport it can transport the deduplicated data on the tape for the long term retention, Finally avamar GRID architecture provides online scalability and patented redundant array of independent node (RAIN) technology provide high availability.

Now when I was telling about this avamar features to one of my friend , he laughed and told me what new in this technology like other product also offer the deduplication technology then what’s a new in this deduplication , right what ‘s new?

As I would say that other companies have their own deduplication algorithm by using those algorithm these deduplication software do the deduplication on the data, As I don’t want to discusses on algorithm but as I worked on NetApp deduplication technology and their deduplication technology breaks the data or scanned the data on 4kb of fixed block for finding the duplicate data, and so it really takes lots of CPU utilization and the NetApp deduplication is always referred or recommended to run in non-production hours, like night after 12, on Saturday or Sundays. So by these things we can understand that its dedup technology eats lots of CPU utilization. Now I am not telling the NetApp deduplication is not quite good, but here I want to say that the dedup technology is much depended on type of algorithm used by some companies like they scanned their data on fixed block or in variable segments or block.

Variable vs. fixed –length data segments

As segment size is key factor for eliminating the redundant data at a segment or sub file level, fixed –block and fixed-length segment are commonly used by lots of deduplication technology, for ex if there is small change to the dataset (for ex if I added a small word in beginning of the file) can change all fixed-length segment in a dataset, despite the fact that very little of the data set has actually changed. Avamar uses an intelligent variable –length method for determining the segment size that looks at the data itself to determine the logical boundary points, eliminating the inefficiency.

Logical segment determination

Avamar’s algorithm analysis the binary structure of dataset (the 0’s and 1’s that make the data set) in order to determine the segment boundaries that are context dependent, so that avamar’s client agent will be able to identify the exact same segments for the dataset, no matter where the dataset is stored in the enterprise. Avamar variable length segment average 24kb in size and then compressed to an average of just 12kb.By analyzing the binary structure, Avamar method works for all file types and sizes, including the database, for instance if paragraph is added in beginning or middle of a text file, avamar algorithm will identify and backup only the new added segment rather than backing the whole file again.

For each 24kb segment, avamar generates a 20byte ID, using SHA-1 algorithm this unique id is like a fingerprint for that segment, Avamar software then uses the unique ID to determine whether a data segment has been stored before and then the only unique data is stored again , eliminating the duplicate data.

Benefits of Avamar

1. Because of client side deduplication there is tremendous reduction in daily network bandwidth and backup, as in avamar whitepapers they says that there is up to 500X of reduction in bandwidth and backup data.

2. Because only changed data are sent, so you can see 10X faster daily full backup.

3. Restoration is fast because of daily full backup.

4. If you are sending the backup data through LAN/WAN to remote site then there also you can see the tremendous amount of bandwidth reduction.

5. Now all the benefit of disk can be added to your backup, like end-to-end protection, restoration fast, backup fast, reduction of backup window, etc…

6. As we know the deduplication technology is resource oriented that means the client will be highly utilized while the backup is running for the avamar, then I will say that the avamar client deduplication features run in low priority and does not take much of the CPU utilization.

Grid Server Architecture

Avamar Grid Server Architecture is also one of the features which make avamar more reliable, scalable, performance _oriented Flexible and more available solution.

Avamar global deduplication feature also work in the unique _ID process which I have already discussed above , by this features the duplicated data is not copied again, but to maintain this there should be some good indexing technology, as we know that centralized index is not much reliable, as if the index file is corrupted we will be not able to retrieve the data from the tape or anywhere , because we will be not aware where the data is stored, so the avamar uses the distributed indexing feature .

Distributed Indexing

As the data volume increases, a centralized index becomes increasingly complex, and difficult to manage, often introducing the bottleneck to backup operations. In addition the corruption of centralized indexing can result in inability for organization to recover the backup data. Avamar uses the distributed indexing method to overcome this challenge, Avamar uses the segment ID in a manner similar to a phone number for landlines in phone number, the area code provides the first general area where call needs to be routed and number itself tell the exact location where the call is targeted. Avamar uses a portion of each unique ID which can determine, which segment of data will be stored in which storage node, and then another portion on unique ID is to identify where the data will be stored in the specific storage node, By this process the identification of the data is quite easy and the hence the restoration becomes quite faster and easy, The automatic load balancing features distribute the load across the each storage node, and distribute the data between each storage node.

RAIN Architecture (Redundant array of independent node)

Avamar also support the RAIN architecture across its node in order to provide the fault tolerance and failover across its node so if any of the nodes get failed there will be no effect on the data, online reconstruction of the node data will be started. Even the raid 5 or raid 1 will support up to disk failure. Even the daily checking of the data also happens in Avamar, so that the data which is backed up can be restored properly, because avamar is the disk based backup solution so the block checksum is performed in the disk for checking the error block.

Flexible deployment options

Agent-only options: For smaller or remote offices you can install the agents on the server or desktop laptop and can dedup the data on the source side and send it to the centrally located avamar server through WAN.

EMC-certified server: Avamar software installed on an EMC-certified server running the red hat enterprise Linux from vendors including the DELL, HP and IBM.

EMC Avamar Virtual Edition for VMware: This is the Industry first deduplication virtual appliance, means you can install it as a virtual machine in an ESX server hosts, leveraging the existing server CPU and disk storage.

EMC Avamar Data Store: All in one packaged solution means Hardware + Software, it comes in two models one is scalable model and one is single node. Scalable model can be placed in the centralized datacenter where it can store the data coming from all the remote sites and grow up to petabytes of storage.

A single node Avamar solution is ideal for the deployment for the remote offices that requires faster local recovery performance , it provide up to 1TB,2TB or 3.3 TB of deduplication backup capacity, which under typical backup solution of tape and disk can take up to tens of terabytes space.

Conclusion

Avamar is a best backup solution for Enterprise customer, who is really facing lot of problem on backing up their data, but it becomes costly solution for some SMB customer, and hence avamar solution as best in industry for desktop/laptop back and VMware back, still face lot of challenge when it comes of costing, and because of the costing, the data domain is taking lot of backup market in SMB sector,

Avamar is best known solution for its desktop/laptop backup solution and VMware backup solution, and because it do only full backup, it makes restoration much faster.

One of the good thing of this solution is that it is end to end solution means software and hardware and so if we implement this solution in our environment, we get end to end support from the EMC, means if backup fails or some issue happens, I no need to call my backup software vendor and then hardware vendor separately, which usually happens because backup software from other vendor and hardware from other, will make you to log a two different support call if something happens with backup.

Tuesday, 30 October 2012

Hitachi Unified Storage Specification and Features

Hitachi Unified Storage Specification and Features

Few days back I attended the Hitachi HUS modular training, and while the training was going on, as I have already worked on NetApp storage, so I was keep on comparing Hitachi each feature with the NetApp unified storage features, but apart from the architecture (hardware & CPU and cache) I have not found anything new in Hitachi unified storage. While one thing I can say that installing the Hitachi storage and upgrading the Hitachi firmware’s are quite easier than the NetApp storage and even managing is also quite easier than the NetApp GUI.

Hitachi unified storage is not unified in Hardware , but it is unified from software front, means they have Hitachi Command suite for management of Hitachi SAN/NAS and even the other VSP, ASM, others product.

When I asked that why they have not merged all the hardware in single hardware like NetApp, their straight answer was they don’t want to compromise on performance, more hardware means good performance, because each hardware have its own memory, ram, CPU to perform its own task, while merging in single hardware the load on central CPU will increase and then there will be more hardware failure, decrease in performance.

Well I have not used any Hitachi product so I cannot say that whether they are good or not in performance but the examples of some of their customer feedback they presented in training, that quite say that Hitachi is really good performance storage box. If you see their controller architecture their you can see that there is 6GB/s connectivity between the its RAID Processor (DCTL) chip or we can say that the between the two controller there is 6GB/s connectivity which help their controller to do the load balancing, that’s good, because this was new to me, because I have not seen this in NetApp, yes controller failover and failback happens in NetApp , but NetApp never say that their failover /failback link do any type of load balancing activity.

Yes I know that in NetApp the disk are owned by the respective controller and their load will be handled by that owned controller only, and each controller have some load handling capacity, but lot of customer don’t understand that and they keep on adding the disk shelf to one of the controller in the hope that in future they will add to the other controller and by this one of the controller which have more number of disk owned by the others have to handle lot more iops than other and hence the utilization of that controller increases and the performance get decreases.

As I have also worked in some of the EMC VNX series storage box, which are unified storage box of EMC, well there they strictly recommend to keep adding the expansion shelf alternatively, for ex if one expansion shelf is attached to the one of the controller bus then other expansion shelf should be attached to the other controller bus, for load balancing.

So that Cleary state that neither the NetApp nor the EMC have such type of 6GB/s internal connectivity between the controller which can do the automatic load balancing like Hitachi, but still I cannot write much because I do not have any experience in Hitachi automatic load balancing features so could not say that whether it really works fine .

But by doing some of enquiry with some of the colleagues there who have good experience in Hitachi storage stated that they never seen much of hardware failure in Hitachi except the disk, I mean like controller failure, PCI card failure, or fan failure or power supply failure types. But I can say the hardware failure in NetApp is quite higher than the other vendor storage product. That’s my personal experience and I don’t know why, even you can experience that suddenly some mail will get dropped in your inbox from NetApp stating that please do this or that firmware upgrade urgent to save you controller getting failed (W.T.F) and then you have to plan for that activity, so bad, my personal experience, NetApp have lots of bugs and the good part is they keep working on them.

While in Hitachi training they were more focusing more on the cache part that means in Hitachi storage you can do lot of cache modification, like you can set the cache size as per your application and the data is first written to the cache and then to the disk, so that if there is any disk failure the cache data will get copied to the flash memory and the content can remain there for infinite time (means until you get recovered from your power failure).

Even there is one more features in Hitachi unified storage that you can fix size of the block of greater size, like every storage divides the data into their some default block size like 4kb or etc. and then store it in disk, but in Hitachi you can increase the block striping size of greater size also, but these all activity should only be done when you get the recommendation from the Hitachi technical team.

Now striping the block of bigger size is really a fantastic feature and it really improves the performance, so by all this good features you can see that Hitachi is more concentrated on performance and that really good and best things of Hitachi storage apart from this basic installation of Hitachi is quite easier compared to NetApp and EMC and then the part replacement is also quite easier than the NetApp and EMC , well for the part replacement you don’t to apply your extra brain just go and remove the faulty part that’s all .

But still if you see that today’s storage world is talking about deduplication and automatic tearing, which is still not there in Hitachi storage, and these features will be soon launched in coming version.

Below are the file module specifications:

Below is HUS block storage specification:

Now in HUS unified storage you have to buy the block level storage and file level then only it is unified, so when you say that u wan Hitachi unified you will get block and file both.

Well I have not deeply explained the features of Hitachi storage, because still I need to work in this product and by working on any product you can easily understand about it, so in future I will be writing some more blogs on Hitachi product.

vipulvajpayee blog