Making the Case for Hybrid Cloud – What is your Cloud Strategy? Part 3

In the previous two parts of the Cloud Strategy blog we discussed the organizational changes required to adopt a cloud operating model and different phases of migration. In this part, let’s first look at a real customer scenario where Hybrid Cloud is a great fit and then explore some generic use cases.

Customer Case Study: E-commerce application that needed to rapidly innovate to provide business intelligence

Biz Requirement: A major, legacy e-commerce retailer needs to stay competitive (against disruptive newcomers like StitchFix etc.), retain customers and increase revenue from online and in-store shoppers.

Technology Requirement: To stay relevant, the e-commerce portal needs to be able to collect analytics from visitors to their site, use algorithms and data science to personalize clothing items based on size, budget, local trends and style.

Constraints: Moving the existing 3 tier app to a native public cloud provider is deemed too risky and expensive and is not an option. This project needs to be completed in a very short timeline. Application cannot be refactored and the guidance is to make as few changes as possible to the stable working environment.

Assumptions: Sufficient in-house public cloud and DB/Business intelligence technology experience is available. Any changes will not impact the core framework and will be performed with rigorous testing and in a manner that introduces least amount of risk. Operational teams are skilled and agile to adapt to new solutions and technology stacks and can provide guardrails around security in public cloud.

Risks: Introducing changes to any existing application, inherently introduces risks. Adopting new solutions, can expose security and operational gaps.

Current Architecture and Proposed Enhancements:

The current application was a 3-tier application deployed on-premises. It consisted of VM’s and an SQL database for storing transactions. Developers wanted to use a secondary database for Online analytics processing (OLAP). They needed a NoSQL database to support click-stream captures and gather intelligence about site visitors, their browsing history and preferences. The developers were unsure of which NoSQL DB would be a right choice for them. The analyzed what the intended use would be and how big it would grow. But this data was not definitive until they were able to deploy and analyze the data that they would actually be able to collect.

One group wanted to go with either CouchDB or MongoDB and use a document based, JSON compatible database. Another group was sure that they would be collecting lot of data and wanted to start with a large columnar DB- like Cassandra DB or a managed NoSQL db like DyanmoDB.

From an IT perspective, supporting each of these requests takes months of preparation – right from sourcing vendors, identifying compute resources, security framework, operational models, training and education etc. These workloads are better suited to be supported on public cloud.

The cost to allow developers to experiment on public cloud is minimal compared to supporting both DB options in house. Developers are allowed the freedom to explore the public cloud’s virtually unlimited cloud eco-system choices, do some rapid prototyping and fail fast. This allows them to make an informed choice.

We designed this high-level hybrid cloud architecture to allow developers the fastest path to innovate without rewriting their entire application in public cloud or having to lift and shift.

The block architecture is kept intentionally simple but demonstrates an easy approach to add “hybridity” to any application that is running in a traditional data center.

Developers added some web-hooks to capture click-stream data in parallel. This data was exported to a noSQL DB like DyanmoDb on AWS. This export was made over a secure VPN connection over Direct Connect. API Gateways are also a good way to import or export data into public cloud. Once this data was in a noSQL DB on a public cloud, it was very easy to allow developers their choice of tools to analyze it. A simple Lambda workflow was proposed to break this new data into ingest-able chunks and a Kinesis work stream was suggested to convert this data into business intelligence.

The last part of this was to provide this data back to the business to make personalized recommendations, promo codes and other incentives to the shopper to complete a transaction.

Still not convinced on what business intelligence/data analytics can provide? See how Netflix suggest different titlecards for the same shows based on user preferences.

VMConAWS and Other hosted VMware Engines

This “on-premises” 3-tier application could just as easily be migrated to VMware Cloud on AWS or other Hyper-scale cloud solutions like Google VMware Engine or Azure VMware Solutions to reduce latency and to provide better access to the cloud-ecosystem. In the case of VMConAWS, there are no egress charges between a VMware SDDC (which would typically host the 3-tier app in AWS) and services used from a native AWS VPC, within the same region. This blog provides a good description of the egress charges.

The migration is greatly simplified because of the ability to maintain the same operational experience and tools that the operations teams are already used to, to support a VMware stack in a public cloud.

Note- in this case above, latency is not an issue as the transactional DB was still on-premises and the business did not need sub second granularity for the business analytics input.

Hybrid Cloud especially makes sense for these workloads:

  1. Legacy apps that are still business critical but are stable and have predictable resource requirements. In some cases, the supporting products are already End of Life, with extended support options.
  2. Applications where it doesn’t make sense to lift and shift to public cloud, there may be ongoing multi-year efforts to re-architect them.
  3. Dev or Production workloads with supporting services that IT is comfortable supporting on-premises -i.e. no developer/experimental projects that require cutting edge technology stack. The eco-system of supporting services has a mature, enterprise grade support available or SaaS options which can offload in house IT expertise required. For instance, it is straightforward for IT to support SQL Server 2019 as opposed to an on-premise NoSQL solution. In public cloud there are plethora of options.
  4. Compliance requirements that limit applications and their data to be stored in public cloud. While public cloud is inherently secure and provides the same level of security as on-premise data centers, there may be regulations that limit public cloud usage for certain applications.

Public Cloud vs Private Cloud vs Hybrid Cloud

With public cloud becoming more mainstream in the early part of the last decade, many established Enterprises jumped to a “cloud first” mentality for all greenfield development, primarily to save data center OPEX and CAPEX. Many others mandated that all data centers be evacuated and workloads be moved to the cloud. If you are a business that has a relatively large footprint (2000+ VM’s in my opinion); you could make the case that it may be cost advantageous to run your own data center either in a co-lo or at a hosted data center.

Today, I have far fewer conversations with C-levels around cost-only advantages of public cloud. Many businesses that mandated data center evacuation are repatriating workloads onto on-premises data centers because of ballooning public-cloud costs. Whatever be the case, it’s fair to say that public cloud was not the cost panacea that businesses sought.

Related IDC Report

Read about the savings realized by “Fat Tire” brewing company.

In all fairness to public cloud providers, they have continuously innovated to support flexible hybrid architectures. For instance, the ability to offload storage onto S3 with on-premises storage gateways. Or the Netflix use case where they are largely deployed on AWS, but they use their own CDN to distribute content across the globe.

“What is the right approach?”

There are no right or wrong answers, it really depends on your own Cloud Strategy. One could argue that with an ideal and evolving design (fully server-less, service oriented architecture e.g. A CLOUD GURU), you could run your public cloud operations on a shoestring budget. But, many CxO’S are coming to the conclusion that a hybrid approach is the best option.

Assuming that we can transform the operating model of private data centers to look very similar to that of public cloud- we can run private data centers with the same flexibility, agility and resiliency as public cloud.

Summary

Hybrid cloud provides businesses the most flexible options to support business innovation, while also keeping operational expenses and capital expenses in control. In addition, hybrid cloud with the VMware solution stack running in public cloud, provides on-demand capacity or DR capability and adds a level of resiliency without the budgetary considerations normally required to support such a solution.

Thank you Prabhu Barathi for providing me excellent feedback. (Twitter: @prabhu_b)

Tagged with: , ,
Posted in Uncategorized

What is your Cloud Strategy? (Part 2) – Phases of Migration

In the first part of the Cloud Strategy blog we discussed the organizational changes required to adopt a cloud operating model effectively. In the second part, we dive into different phases of migration.

Before we dive into different phases of migration, an important aspect to consider for these workloads that are a target for migration is – are there ways to adopt a more efficient cloud like operating model on-premises?

Legacy workloads have a more or less have a steady state, are not undergoing rapid incremental or transformational change; are mostly in maintenance mode and probably don’t need much support from the cloud eco-system. In this model developers and operations teams will use a self-service portal to deploy new workloads and automation for patches, updates and change control. If this is a possible outcome, hybrid cloud becomes the most efficient and economical approach for the legacy workloads. (This is not to say that new workloads will not fit in this model, they are simply not part of the scope of migration.)

By maintaining an efficient on-premises data center and supplementing with public cloud, customers can recognize the economic benefits of running their own data center and also benefit from the public cloud eco-system. Of the many examples of companies that pulled back from the cloud to save money – Dropbox is a popular and well documented one. More on this in part 3.

Phases of Migration

Let dive back into different phases. This is not an original idea and is documented across various articles and cloud providers, but it has my own spin of what has worked in my experience.

Phase 1: Which Application First?

Too often, customers end up choosing an application that is too vast or too easy to migrate. The challenge with migrating a complex multi-dimensional application is that it is likely bound to fail in meeting deadlines or success criteria. Picking an easy application doesn’t deliver on the learning and decision making experience that is a byproduct of this exercise.

Ideally, one would want to choose something that maximizes learning and minimizes risk. Any applications that directly impact a line of business revenue should be immediately excluded. Homegrown back office applications- say a non critical ticketing system may good example of a low risk target. Moving an application to a SAAS version on any cloud is not considered a migration. It is a Replacement.

Intrinsic and Extrinsic Factors?

Consider other intrinsic factors – are there any applications that for any reason are already very unstable? This is a great candidate! Do consider latency requirements and and compliance requirements. Data storage for this application should ideally grow at a low rate or a moderate rate. This will help fine tune your life cycle and archival policies in public cloud without consuming your IT budget.

Are there Extrinsic factors- such as a data center evacuation that is impacting an app? In such a scenario, you are most likely going to take a downtime anyway, why not coordinate a migration to cloud at the same time?

Phase 2: Planning it pays to be wrong!

Confused? Start with the premise that everything will go wrong, execute every step in an incremental way and prove your premise wrong or repeat until you are wrong i.e. the perfect migration.

This is probably the most important and arduous phase. The amount of effort you put into the planning stages will directly impact the outcome. A key element here is good old project planning. Do not commit to overwork your team, do not commit to underwork your team. Commit to a consistent workload every week and achieve that exact amount every week. (This is a great philosophy I first read in Great By Choice – by Jim Collins.) This ensures teams have predictable workloads and can stay motivated. Too often, leaders commit their team to a migration plan that is heavily focused on the very end.

For more specific migration related planning here are some things to consider. Map dependencies for the applications that are to be migrated. Which other applications, or resources does these applications talk to. What is the average flow rate, what is the hourly and 24 hour bandwidth consumption between applications that communicate? These metrics will help identify which applications need to be grouped together to move into the public cloud. Group the VM’s and create priorities for different move groups. This step likely requires tools that may specifically track these metrics.

Creating a migration schedule with the input from planning sessions is very important. The migration schedule decides the order in which application groups will move. Upon successful completion of a group move, a sanity check is a must before moving on to the next priority group.

Lastly, VM resizing must be considered. On-premises workloads are almost always over-provisioned. Use the right tools to identify how you can right size over-provisioned resources. This step can also be done after the migration, with cloud native tools.

Phase 3: Migration ‘Day’

The third phase is when the actual migration is completed. Assuming your direct connect or VPN tunnels are already set up, firewall ports are configured etc; you start migrating workloads. Various cloud native tools are available to get a data file like a VMDK over to the cloud. Storage vendors also provide ways to sync or backup your VMDK over to public cloud, but this requires you to consume their service in the cloud and hence should be the least preferred option. For instance, in the case of AWS, the storage vendor will likely use S3 to store the data rather than EBS like mountable volume.

Once the group migration is completed, verify if the migration was successful. If the migration broke at some point, retreat to the the last known good spot and trace your steps. In most cases, the data file is likely to have changed.

Exit Strategy

Exit Stage Left: To exit or disappear in a quiet, non-dramatic fashion, making way for more interesting events.

This is an often overlooked, transparent phase that stretches the first 3 phases. An idea of an exit strategy doesn’t mean you are abandoning the migration, it simply means to retreat to the previous logical step and re-evaluate options. This is simply an option when multiple failures have been experienced during the migration and it is not worthwhile to debug these failures. Something important was likely overlooked during the planning phase or the app selection phase. Remember it is acceptable to choose this option and suffer the intermediate setback. The migration is a marathon after-all, not a sprint. Stay calm and use the well documented exit runbook that was previously defined. Repeat Phases 1 and 2 with the new information. This may also be a good segue to bring in a partner who has done these migrations previously.

Phase 4: Are we there yet?

Workloads are in the cloud, the various line of businesses are not complaining, nothing major is broken, it feels like the migration is complete. Celebrate the team that reached this important milestone, but now is not the time to rest.

After a brief breather, start deploying and refining monitoring and management tools that were previously defined in the Cloud Framework. Verify that they are set to accurately trigger and classify alerts. Even if you did right-size your environment before migration, review workload optimization. You likely have access to many more resource optimization tools in public cloud. Start identifying candidates for automation. It is important to not automate everything initially. Automation is great to reduce errors introduced by manual provisioning. Although automation is only as good as the engineer who is automating.

Automation, especially with incident response, can be tricky. It is hard to automate unless you have considered all the causes and the possible outcomes and the necessary action. There have been outages caused by false alarms, because someone relied on automation entirely.

As you start ramping up on adoption of cloud native services, try to use managed services. This offloads patching and maintenance and planning for DR events – which can be offloaded to the cloud provider. For example Aurora in AWS in place of a self managed SQL database is a good choice.

Lastly, continuously refine SIEM (Security Incidents and Event Management) workflows. This should be a pinned agenda item in all cloud council meetings. All new information should be reviewed to see if it impacts existing SIEM workflows.

Post Migration

Congratulations! You have reached an important milestone. Celebratory lunches/team events are in order. It is important to quantify if the success criteria were met and to document the lessons learned from the process and factor this into the next migration. Share this knowledge with the entire organization and educate and uplift the team.

Too often, leaders set unrealistic goals for milestones. There is very little rationale behind deadlines and these are often met with messy, outage prone, time overrun and costly migrations. Over time, these different phases will start to amalgamate into a familiar workflow with minimal process variation.

Here are some external references that are good reads.

https://cloud.google.com/solutions/migration-to-gcp-getting-started – Detailed overview, the concepts and approach are largely applicable to any cloud provider.

https://cloud.google.com/files/Lift-and-Shift-onto-Google-Cloud.pdf Great read on slightly different phases of migration, but there is a significant number of common tasks that need to be considered for any cloud migration.

In the next part of this series, we will delve into how to create an effective Hybrid Cloud.

PS: Hat-tip Prabhu Barathi @prabhu_b for reviewing my work and providing me valuable feedback.

Tagged with: , ,
Posted in Uncategorized

What is your Cloud Strategy?

Hello Again,

It is a new year and a topic worthy of writing started to form in my head through the holidays. In the past year and a half, I have discovered that it is extremely challenging to write original content. For those of you who do this on a regular basis, you have my utmost admiration.

In this entry, I will share some thoughts on why you need a Cloud Strategy and how to go about adopting a hybrid cloud approach. In a future post, I will share some approaches to a relatively easy application migration to the cloud.

As part of my new role as a Cloud Strategist,  I spent a large part of 2019 advising decision makers at various enterprise business on adopting the Cloud model. I have visited many customers over the course of the year and Cloud Adoption or Cloud Migration were some of the common themes that the key decision makers were pondering over. Over the course of many white board sessions, many hours of contentious discussions and back and forth view points and based on a lot of reading, I came up with this white board for Cloud Strategy. It isn’t meant to go into significant detail, it is meant to help you formulate a high level plan.

When I first started, I wasn’t really sure of the function of a Cloud Strategist. When I read the Gartner Report: The Cloud Strategy Cookbook, 2019 it served as validation that such a function is needed. The report further validated what we were advising customers.

Any migration conversation starts with the 6R’s of migration which was also originally published by Gartner as the 5R’s, in 2011. Every IT decision maker has a Cloud mandate. In many cases, they even have a mandate to exit private or on-premises data centers. That much is certain. What isn’t clear is the path to do the same, especially to the people lower down in the organization who will eventually own this task.

image0

Start with the WHY?

In other words, what is the expected Business Outcome that is expected from this mandate. In majority of the cases, it falls under one of two categories.

  1. Cost savings ($)
    • Cloud Agility, Flexibility, Scale to drive down cost of supporting the business
    • Data Center consolidation/Evacuation
    • Migration of workloads
  2. Drive business innovation
    • Cloud EcoSystem to support faster development cycles
    • IT as a business enabler

It is important to communicate this widely across the organization that will support this endeavor.

Migrate to VM form factor? Container?

The first task at hand it to identify if all new development will go to cloud and if it will use Cloud Native technologies such as containers, micro-services, managed services, serverless functions, alerts, monitoring and management functions that are commonly offered by every big cloud vendor. There are very few reasons to not go this route for new application development. Here is a sneak peak into the evolving eco-system.

That leaves us with how we can take the current monolithic applications to the cloud.

There are even some options to containerize existing VM applications and drop them in a public cloud provider in a ‘fat container’ format. GCP Anthos is one such solution. While one can argue that there are inherent benefits to this approach i.e. IT team no longer manages OS, patching, availability of these apps; this approach is more risky and less beneficial than adopting true cloud native approach which may require a complete redesign of the application.

Which TEAM will support this?

Digital Transformation first starts with People, then Processes and lastly the Tools. It isn’t that the tools are not important, but the mindset of the People needs to be adapted to thinking very differently.

The Fellowship of the Cloud Council

Most customers who successfully adopted a transformative approach started with a Cloud Council. This is a hybrid team that consists  of team leads with varying expertise in networking, security, storage, virtualization etc. This team typically reports to the project lead and could potentially report into the VP or CIO to provide updates. The task for this team is to create a standard framework or template, which will meet the requirements set forth by this team. It is important that the members in the Cloud Council team function as one Hybrid Team and communicate relevant information to their own teams and continuously drive the projects assigned to them. It is also required to constantly communicate changes. Chaos reigns supreme during any kind of transformation. By bringing along the team, everyone in the IT organization is motivated and understand how their role is a critical part of the larger effort.

In House Cloud Expertise

If specific public cloud expertise is lacking in house, it is best to bring in partners who specialize in this area. This provides a valuable function – to learn from the missteps of others that have been supported by the partner.

Needless to say, the choice of partner is very important to the success of the project. Do not naturally assume that any existing data center focused channel partners are default options. Cloud Adoption requires a different mindset, dare to reconsider which partner will suit your needs best. It is time to put existing partnerships to the test. Many organizations fail in picking the appropriate partner, dooming their project from the very start.

If a cloud provider has been decided, a paid sales engagement with the cloud provider is also a very good approach, atleast initially. If that choice is yet to be made, the Cloud Council needs to consider various aspects such as

  • Workload requirements and nature of workload
  • Developer skillset and requirements
  • Eco-system of cloud provider
  • Vertical that the business is in and any impact the cloud provider choice may have – e.g. are they considered competitive to your core business?
  • Existing relationships with vendors

Next up is to socialize the Cloud Framework across the teams. The Cloud Framework must be treated as a continuously evolving framework and defines specific guidelines and boundaries with respect to Security, RBAC, Identity Access and Management, Monitoring, Alerts, Visibility. In addition, well defined response procedures and workflows must be highlighted.As the adoption curve increases, this document will evolve.

Start Small: Fire bullets before Cannonballs

It is of utmost importance to start small. What that means is to allow one IT team to lead the first migration effort, rather than have the whole IT organization jump into a dev-ops mindset and making radical changes. This will lead to confusion and result in a lack of motivation among large teams. Legacy companies especially need to be aware that not all dev-ops practices will naturally fit as it would for a SaaS provider and it may need a lot of tweaking to be relevant.

Lastly, define the success criteria. This is an important metric to have before the start of the actual migration. Some benefits are harder to quantify in the short term. For instance it is harder to quantify savings from moving a couple of VM’s from an application. Criteria should be defined accordingly. 

In Part 2 of this blog, we will delve into the different phases of migration.

PS: Hat-tip Prabhu Barathi @prabhu_b for reviewing my work and providing me valuable feedback.

Tagged with: , ,
Posted in Uncategorized

Unboxing of VeloCloud Edge 510

 

Recently, I got an opportunity to get my hands on some edge hardware (Working for VirtZilla, it took 2+ years and an acquisition for me to say that!) and I decided to set it up at home.

I was one of the lucky few to get a VeloCloud Edge 510 device. This is mostly meant for branch offices, but this little nifty device can do many things. Before we get into it, lets take a quick look at VeloCloud.

VMware acquired VeloCloud in December 2017. VeloCloud is a cloud networking services company that simplifies branch WAN networking. This acquisition is part of VMware’s overall SDDC strategy and continued push into networking.

The main value proposition for VeloCloud is below:

  1. Improves business up time by making internet reliable and independent of expensive and dedicated MPLS circuits.
  2. Dynamic Multi-Path Optimization to leverage multiple internet connections- including 4G LTE to optimize utilization as well as route around failures.
  3. Assured application performance by prioritizing time sensitive applications such as voice and video.
  4.  Allows higher priority to SaaS applications or cloud based applications by use of VeloCloud Gateways which can reside in public cloud providers such as AWS.

One unique feature that VeloCloud can deliver is the ability to switch uplinks upon failure without dropping voice calls. This is made possible by patented IP such as the aforementioned DMPO.

DMPO also reorders UDP based flows such as voice and video. DMPO can also work around a lossy network and improve performance. This is done by duplicating packets when loss is detected, in order to keep TCP sliding windows size at maximum.

Here is an example of what that could mean for a file transfer session:Screen Shot 2018-04-11 at 11.49.39 PM

Now that we covered what the VeloCloud SD-WAN solution does, lets take a look at how easy it was to set it up.

I received this device in the mail and opened it up – I was pleasantly surprised to see the attention to detail to packaging. Here are some photos from what it looks like:

img_7306.jpg

img_1371-e1523516861131.jpg

I plugged in the device and followed the instructions to plug in the the uplink port from my Wireless router into GE3. I had to login to the temporary wifi SSID and change a few things to ensure there wasnt any IP overlap with my existing network. Soon after, I was able to get an IP address. As part of this process, I was already added to a VCO- VeloCloud Orchestrator. As an admin setting up a branch office, you would set up a new edge from the VCO, by using a pre-created profile.

Screen Shot 2018-03-27 at 4.41.38 PM

Once the new edge was provisioned,  I was able to generate an email with the activation key and a softlink to activate the physical edge. The activation itself is uneventful from that point on.

Screen Shot 2018-03-27 at 5.01.00 PM

Once the device is activated, you can control and manage the device from the VCO. From the VCO, you can change the interface configuration, set up primary and back up uplinks, or set up multiple active-active uplinks. For my setup, since I did not have multiple uplink connections, I used the USB port to set up a 4G LTE uplink as a second active uplink. I planned to do some failover tests at a later point.

Here are some screenshots of applications that I am running through my VeloCloud Edge and some quality scores as well. To make things more interesting, I started some Netflix and YouTube in the background. With in 3 minutes those applications popped up as well.

Screen Shot 2018-04-12 at 1.16.12 AM

Another really useful tool is where the edge is able to display the quality of the uplink. The VeloCloud Enhancements bar includes remediation by DMPO.

Screen Shot 2018-04-11 at 11.00.58 PM

In summary, the VeloCloud Edge was very easy to setup and administer. In the second part of this blog, we will cover architecture and a link failure demo.

Leave your comments below.

PS: Hat tip to George Shih (VeloCloud SE) or helping me with this blog.

 

Posted in NSX, SD-WAN, VeloCloud

What’s new in NSX 6.4!

NSX 6.4 release was announced in January. This is a dot release – one that brings many major features. The new features are broadly classified into Core Feature enhancements, Advanced Micro-segmentation and Ease of Use and Serviceability features,

Head over to the NSX 6.4 Official page here, in the meantime here are some thoughts.

Context Aware Micro-segmentation

The primary new feature that 6.4 delivers is the ability to provide L7 granularity in the Distributed Firewall. DFW adds layer-7 based application context for flow control and micro-segmentation. Initially, approximately 60 commonly used application signatures are supported in this release.

This official VMware blog does a great job of explaining the Context Awareness in a great detail. At a very high level ther users can now use the APP ID to provide a more granular policy. This allows security policy based on applications, even if they don’t use the standard ports.

There is also RDSH or Multi Session Identity Firewall which allows application access on a shared desktop based on user ID. In other words, two users accessing the same desktop can be provided access to different applications based on their user group affiliations. NSX previously supported Identity Firewall based on integration with Active Directory. This provides granularity at the Virtual Desktop level. See a detailed demo of RDSH here .

Ease of Use and Serviceability

Upgrade Co-ordinator: This feature is bound to significantly ease any NSX System upgrade pains. Upgrades can now be co-ordinated and managed from the NSX Upgrade Co-ordinator which offers a single pane to manage the upgrade of various components. This handy features also allows you to automate the upgrade process. The tool also performs a pre-check to ensure system upgrade will only if the system is healthy to begin with.

Several features are now available under HTML5 (as well as Flash). They have the same feature functionality.

Here is a detailed video of 6.4 Upgrade Co-ordinator.

Upgrade Co-ordinator offers two modes- Custom upgrade plan and One Click Upgrade. It also provides an inventory of the NSX components and lists the current version and target version. NSX manager upgrade is required as a precursor to this step.

Screen Shot 2018-02-21 at 9.50.19 PM

One Click Upgrade pre-defines the upgrade sequence and displays the settings which cannot be modified. The “Plan your upgrade” option allows you to choose which components you want to include in the upgrade and provides the some other knobs.

Screen Shot 2018-02-21 at 10.09.11 PM

Lastly, this also allows you to monitor the upgrade progress for each component.

Packet Capture

NSX 6.4 now allows the user to capture packets from the Web-UI itself.  While this was previously supported via CLI, users can now start a packet capture for debugging purposes from the Web client without necessarily being familiar with the CLI. There are 4 points along the data path where packets can be captured: Physical, VMKernel, vNIC and vDR port. A more expansive list is available via CLI. User must specify the direction of the traffic to be captured. Only one direction is supported but multiple sessions can be created for rx and tx and then combined using Wireshark.

Screen Shot 2018-02-27 at 11.44.15 AM

Support Bundle

Support bundle can now be collected from the UI and uploaded to a remote server directly. Support bundle can be collected for NSX Manager, Hosts, Controllers and Edges by selecting from a drop down and choosing which objects to include.

Screen Shot 2018-02-27 at 11.50.01 AMNSX Dashboard – improvements

Lastly, as you have probably noticed from 6.3, the dashboard continues to gets enhanced with handy widgets that are a great for a single point of visibility into NSX component health. System Scale is a new widget which provides alerts and warnings when you approach scale limits. Here is a snapshot of the new and improved dashboard.

Screen Shot 2018-02-27 at 11.24.38 AM

In addition, a new tab for System Scale provides a global view of Object Types and their counts per NSX manager. This includes Firewall Rules, Security Groups, number of hosts prepared etc. Screen Shot 2018-02-27 at 11.55.05 AMOther Blogs related to this topic:

VMware NSX-v 6.4 Released

https://www.vmguru.com/2018/01/nsx-v-6-4-is-here-and-massive/

 

Posted in Uncategorized

Making the case for VMC – VMware Cloud on AWS

If you are in the data center industry, you probably already heard about the partnership between VMware and AWS. The excitement around all of VMware’s cloud initiatives was palpable at VMworld 2017. As I rode the shuttle back to the hotel from VMworld, I overheard a common theme- customers talked about how VMware’s newfound vision would help them adapt to the changing data center landscape.

A couple of days later, as I started dropping in on discussions on the web, I realized that there was a lot of confusion on what VMC it is and what it isn’t. On a LinkedIn discussion, there were many admins who were comparing the price of running a VM in AWS to that of VMC. If you are asking this question- you probably misunderstood the value of VMC.

VMC is not meant to compete against AWS’s native offering. Rather, it is offered as a low resistance, immediate path to adopt the elasticity of AWS, without having to retool and relearn new skills. Here is a recap of the three main use cases of VMC:

  1. Maintain and Expand: Some customers looking to continue maintaining their DC and to expand services into the cloud. If some developers need access to native AWS services like Lamda or Kinesis, this is a really fast way to provide them with all of AWS’s offerings, without having to learn how to manage an AWS environment at scale. There are other use cases such as DRaaS which are appealing.
  2. Consolidate and Migrate: Some other customers are looking to consolidate their data center and begin a migration process to the cloud. Many smaller customers are freezing their DC spend and expanding into the cloud. This may make sense depending on the size of the footprint & the type of data they handle. Even small data centers need a team comprised of security, virtualization, networking  and storage skill-sets. Smaller businesses could see significant savings of OpEx and CapEx by adopting a public cloud strategy. Migration of existing data center workloads is not a trivial problem. This solution will take them one step closer to that goal.
  3. Capacity Flex: This is another great use case for a customer that needs seasonal capacity. Rather than invest in data center hardware that may otherwise remain idle for large periods of time, they can cloud burst into AWS, but manage their environment with the same  familiar tools and operational overhead- vSphere, vROPs, LogInsight etc.

The key takeaway here is the ability for a business to be able to benefit from an AWS footprint without having to re-architect their application, learn new skills and review deployment models.

DR as a Service alone is such an important feature of VMC; it is very unique and will appeal to IT operations staff. How many businesses today are confident of executing their DR run book? The answer based on the conversations I have had, lies in the low 20-30%. Imagine having an on-demand DR site, where you don’t have idle capacity that is adding to your OpEX (not to mention CapEx),  but it can be summoned in the case of a DR event. When I paint this picture to the VP of IT Operations, their eyes literally light up!

Having the ability to extend IP spaces from your data center to the cloud will take away the hassle of re-IP-ing workloads. Your DR strategy just became super simplified.

In my opinion- if you have a cloud native workload, whose life-cycle is completely in the public cloud – it is better off on AWS EC2. If there are existing applications that were designed for your VMware private cloud but needs the flexibility or services of a public cloud, this is a good candidate for VMC.

This is the value that VMC brings. So, next time if you are asking the question “How does it compare in cost to AWS?”, you are asking the wrong question.

For additional information on VMC please go to- https://cloud.vmware.com/vmc-aws

Posted in DC migration, NSX, VMC

NSX Proactive Health Check

The idea for this post came from an activity that we engaged in with one of our premier customers who was just entering their Q4 peak. We offered them a proactive health check before they head into peak season.

Think of this as the equivalent of – “check your oil, tire inflation and head lights before you head into the hills” for your NSX footprint.

Here is a quick checklist of Health Check items. This list is not meant to be comprehensive, but lists a few common sense techniques that can be addressed prior to a forecasted peak or in a scheduled interval.

  1. NSX manager: Check if NTP, DNS and Syslog is setup properly.Screen Shot 2017-10-13 at 1.17.23 AM
  2. NSX manager: Regularly download tech support bundle so that you have a known good state handy.
  3. NSX manager: Check CPU, memory and storage usage on NSX manager. Temporary peaks for CPU and Memory are acceptable.Screen Shot 2017-10-13 at 1.15.12 AM
  4. NSX manager: Restore NSX manager backup on a test NSX manager appliance to check integrity of the file. This need not be a weekly or even monthly activity, but should be done as needed by the business. (Eg, before an expected peak)
  5. Check system events under NSX manager on the vSphere plugin and address any critical alerts.Screen Shot 2017-10-13 at 1.30.22 AM
  6. Login to ESG’s and run the “show highavailability” command – look for sync status between active and standby instances and  rx/tx errors. A few errors are acceptable but continuously increasing error count can point to a larger issue. (HA will be turned off if ECMP is enabled.)Screen Shot 2017-10-18 at 1.00.07 PM
  7. LogInsight: login to the appliance and look for critical errors. Also check to see if all ESXi hosts are reporting syslogs to LogInsight. Admins may have forgotten to add logging capability to newly added hosts.
  8. VRNI: if you also have vRealize Network Insight, check the critical errors on vRNI. Periodically review system alerts and ensure they have appropriate email notifications set up. Remember to use the custom alert feature, this is easy to set up and can make a big difference in avoiding a crisis.
  9. VRNI System Alerts: vRNI provides 110 system alerts (and growing) that are already set up and only need to be toggled to enable with appropriate option for notifications. These are of varying severity but provide a proactive way to monitor critical infrastructure.  Screen Shot 2017-10-20 at 11.02.30 AM

If there is anything else that comes to mind- please leave a comment below and I will add it to this list.

Tagged with: , ,
Posted in Uncategorized