02 Sep CEO Jonathan Cartu Announced – How to Use Chaos Engineering to Break Things Productively
- The benefits of chaos engineering. This process makes for more comprehensive and effective vulnerability testing, which ultimately benefits the customer.
- Is there a downside? Not really, unless you prefer to discover system weaknesses on the fly in the real world, which is not recommended.
- Overview of chaos engineering. A quick breakdown of how chaos engineering is the best way to stress a system and the four steps to implement it.
- Defining a measurable steady state that represents normal circumstances to use as a baseline.
- Developing a hypothesis that this state will continue in both control and challenge groups.
- Introducing realistic network stressors into a challenge group, such as server crashes, hardware malfunctions, and severed connections.
- Attempting to invalidate the hypothesis by noting differences in behavior between control and challenge groups after chaos is introduced.
- Repeat the process through automation. Automated tools like Metasploit, Nmap, and VPNs allow you to change variables and expand testing exponentially.
- Chaos engineering best practices. Understand when to use manual testing, how to apply containerized chaos testing, as well as how to confine the blast radius.
- Testing tools and resources. As chaos testing gains popularity, expect an already impressive selection of tools and resources to increase.
More people connected to more servers, increased reliance on complex distributed networks, and a proliferation of apps in development mean more opportunities for data leaks and breaches.
Modern problems require modern solutions, as Amazon found out the hard way. Netflix escaped with minor inconvenience by being prepared.
What did they do differently?
Amazon Web Services (AWS), Amazon’s cloud-based platform, experienced an outage on September 20, 2015, that crashed their servers for several hours and affected many vendors. Netflix experienced the issue as a blip because they’ve been there and done that when they changed their service delivery model. This led their engineering team to craft a unique solution for software production testing.
The solution? Chaos as a preventative for calamity. It’s predicated on the idea of failure as the rule rather than the exception, and it led to the development of the first dedicated chaos engineering tools. Otherwise known as the Simian Army, they’re called Chaos Monkey, Chaos Kong, and the newest member of the family, Chaos Automation Platform (ChAP).
What Are the Benefits of Chaos Engineering in DevOps?
Focusing only on a network environment and the associated security considerations (because the world of chaos engineering is quite large), we have already seen it as a positive force in an already strong cybersecurity market for improving business risk mitigation, fostering customer confidence, and reducing the workload for IT teams. If you’re a business owner, you’ll be blessed with happier engineers, reduced risk of revenue loss, and lower maintenance costs.
Customers, whether B2B or B2C, will enjoy greater service availability that’s more reliable and less prone to disruptions. Tech teams will be able to reduce failure incidents and gain deeper insight into how their apps work. It will also lead to better design, faster mean time in response to SEVs, and fewer repeat incidences.
Is There a Downside?
Critics feel that chaos engineering is just another industry buzzword or cover up for apps that were poorly designed in the first place. Some chaos engineering proponents opine that this is the result of an ego-driven mentality. If you’re confident in your capabilities and work product, there should be nothing to fear in testing their limits.
Chaos engineering is meant to eliminate the eight logical fallacies that plague many developers and software engineers who are new to distributed networks while providing a system for more refined testing.
These incorrect assumptions are that:
- Networks are reliable
- Latency is zero
- Bandwidth is infinite
- Networks are secure
- Topology never changes
- Each system has only one admin, who also doesn’t change
- Transportation costs nothing
- Networks are homogenous
A quick look at internet usage statistics around the world demonstrates the need for a focus on innovative network testing at all phases of software development. Achieving that means taking a non-traditional approach to DevOps.
Overview of Chaos Engineering and Use Cases
Cloud-based, distributed networks enable a level of scalability that was previously unseen. Because these networks are more complex and have built-in uncertainty by the nature of how they function, it’s essential for software engineers to utilize an empirical approach to testing for vulnerabilities that’s systematic and innovative.
This can be achieved through controlled experimentation that creates chaos in an effort to determine how much stress any given system can withstand. The goal is to observe and identify systematic weaknesses. According to principlesofchaos.org, this experiment should follow a four-step process that involves:
Defining a measurable steady state that represents normal circumstances to use as a baseline.
Developing a hypothesis that this state will continue in both control and challenge groups.
Introducing realistic network stressors into a challenge group, such as server crashes, hardware malfunctions, and severed connections.
Attempting to invalidate the hypothesis by noting differences in behavior between control and challenge groups after chaos is introduced.
The wisdom behind this process proposes that the more difficult…