The Top 5 Things You Can Do To Mess Up Your CommVault Deployment
In the latest edition of this series, we tackle deploying CommVault Sympana. We have delivered a number of CommVault deployments, health checks, and upgrades over the years, and we thought we would share share some of our experiences with the broader community. As with any complex enterprise storage or backup solution, there are some pitfalls and hidden dangers that stand in the way of a successful project. Knowing what those are and how to address them, can save you lots of time and money.
Needless to say, we are huge fans of CommVault. They write some incredible software and their focus on quality is second to none in the industry.
With that being said, let’s get to the list of the Top 5 Things You Can Do To Mess Up Your CommVault Deployment.
1. Don’t Do Any Upfront Capacity or Workload Analysis
Before you embark on your new project, what should you know?
Know your data footprint
Know your applications
Know their criticality to your organization
In other words, take the time to truly understand what your landscape looks like supported by some empirical data. Most people have an “idea” of what their data set looks like, but often times, after we capture some data, we see a large discrepancy between that “idea” and “reality”.
What do I mean by this? Several of our clients in the healthcare or energy space have made big acquisitions over the last few years and have accumulated several holding companies each with their own backup and recovery software. As they inherit more and more platforms and their footprint grows and grows, their visibility into these environments becomes more and more clouded.
Here are some questions I would ask before the project gets started:
- Do I have a list of my backup Clients at each site?
- How many of these Clients are Physical versus Virtual Machines?
- What are the Operating Systems in use?
- What is the rate of change for each Client?
- What Application is in use (relational database, application server, etc?)
- If deploying CommVault replication, what is the size of the circuit connecting your sites?
- What is my footprint size relative to the size of my pipe?
Do this analysis up-front and prior to deploying because you will use these inputs to answer some tough questions. Let me give you an example:
- If my tier 1 applications need to complete their backups inside a 24-hour window (all backup jobs and auxiliary copy jobs must complete within 24 hours), what is the maximum amount of data that I can back up to stay within this window? To answer, I would need to know my application’s daily change rate, my circuit bandwidth, and a calculation as to how long the backup and auxiliary copy jobs will take.
There are several good tools that can be used to quickly canvas your environment, collect some meta-data, and create the required reports you are seeking.
2. Don’t Involve Your Network Team
This really applies when you an are using CommVault’s auxiliary copy jobs that use DASH (Deduplication Accelerate Streaming Hash) Copy over WAN links.
DASH Copy is intended to use network bandwidth very efficiently, but if you don’t understand your network and firewall devices or you don’t have visibility into your circuit utilization, you will likely have problems when you start up the remote copies.
To work well, CommVault’s DASH Copy feature relies on network performance and firewall throughput. But, what if your network is not properly sized to begin with? What if your firewalls can’t handle this additional network throughput?
We have seen situations where entry-level firewalls at branch office sites can’t process streams and start to drop packets. Instead of getting a decent throughput, users get a trickle. What happens when CommVault sees a dropped packet during a DASH copy? It will place the job in a pending state and retry the job, or worse, go into an error state.
Make sure your network team has a seat at the table, and that they have the network tools and analytics at their fingertips to give you and the backup team, the circuit utilization and bandwidth reports.
3. Pick Disk Optimized Instead of Network Optimized (or Vice Versa)
Improving DASH copy performance is achieved in 1 of 2 ways – network optimized or disk optimized. Disk optimized is the default which can mess things up if you don’t understand why you would want one option over the other. CommVault training and their books online (BOL) spend quite a bit of time explaining the differences, so I’m not going to do that here.
The big difference between the two is that Network Optimized re-hydrates the data and creates a new hash to compare against the source side cache, while Disk optimized actually uses the chunk’s metadata. I recommend using networked optimized if I am DASH copying small jobs over a VERY small pipe and using network throttling.
The real difference in the two is the process used to generate the hash to compare with the source side cache.
Think of it like your IRS tax bill. How do you want to pay the tax – with your network bandwidth or disk resources?
4. Forget How to Seed
Seeding allows you to transfer the initial Client backup between two sites using an available removable disk drive such as a USB drive. Seeding is commonly used when remote office sites are separated from the main data center across a WAN and data needs to be either backed up remotely or replicated periodically to a central data center site. Sort of like a hub and spoke model.
Once the initial baseline backup is established, all subsequent remote Auxiliary Copy operations consume less network bandwidth, because only the changes are transferred.
Like I said above, when warranted, it’s important to change the aux copy properties to “network optimized” so as to cut down on the communication by only sending unique signatures over the network.
But more important, when you Seed, you MUST follow the proper Seeding process especially when you seed a site over a slow connection or you won’t cut down on the communication.
First, you put a USB drive on the Media Agent at the source site and run what’s called a seed copy (basically, a tertiary copy.).
Then, you send that device to the central data center site and associate the secondary copy with the seed copy and ingest that seed. This process will create a folder called CV_CLDBD_AUX_<number of job id> (directory name will be appended with a unique, random number associated with job).
Next, copy the AUX folder back to the source. If you don’t do this step and copy that folder’s data back to the source, all you have essentially done is ingest the back up data into the CommServe to make it available for reporting and recovery. You haven’t done anything for DASH copy performance.
What generates a lot of communications over WAN link is when the cv_cldb_aux_XXX folder is not copied back. Then, the source side cache doesn’t know the other side has the chunk so it just keeps sending data not knowing the other side has it.
We recently found that after redoing the seeding and the disk-network optimization setting, DASH copy job increases were from 5X – 40X better – a huge increase.
5. Skip The Training
For a recent deployment, we found that one person had gone to the very first CommVault training course. He then began deploying the software in a very sophisticated manner utilizing many of the features we spoke of above and even more like VirtualizeME.
There is a mismatch there between the skill and expertise of the team deploying the software, and the expectations of the organization’s leadership team. Ultimately, this leads to a challenging deployment and frustration.
No Shame Zone
If you find yourself in the midst of a challenging CommVault deployment, or even if you have made one of the mistakes I have outlined above, there is no shame in reaching out for help.
Headwaters is a “no shame” zone, I assure you.