In this blog, we are primarily focusing on our stress testing framework, challenges, lessons learned, and future work. Stress testing: Sending traffic (with specific flags) to the production site to simulate unexpected load spikes or expected organic growth.Assuming the performance is in an acceptable range, the new version will be pushed to the rest of the cluster. The goal is to measure the performance characteristics and compare the results to the existing/older versions. Canarying: Sending small percentage of production traffic to some number of instances in a cluster which are running a different build (newer in most cases).In this case, the response(s) won’t be sent to the requester(s). Dark traffic testing: Sending production traffic to a new service to monitor its health and performance characteristics.Tap compare: Sending production requests to instances of a service in both production and staging environments and comparing the results for correctness and evaluating performance characteristics.Load testing: Performing load tests against few instances of a service in non-production environment to identify a new service’s performance baseline or compare a specific build’s performance to the existing baseline for that service.We evaluate performance in several ways for different purposes these might be broadly categorized: Services under load fail due to a variety of causes including GC pressure, thread safety violations and system bottlenecks (CPU, network).īelow are the typical steps we follow to evaluate a service’s performance. While load testing a service in a staging environment is a good release practice, it does not provide insight into how the overall system behaves when it’s overloaded. These tests help us anticipate how our services will handle traffic spikes and ensure we are ready for such events.Īdditionally, these tests help us to be more confident that the loosely coupled distributed services that power Twitter’s products are highly available and responsive at all times and under any circumstance.Īs part of our deploy process before releasing a new version of a service, we run a load test to check and validate the performance regressions of the service to estimate how many requests a single instance can handle. We test different stages of a service life cycle in different environments (e.g., a release candidate service in a staging environment). Our Site Reliability Engineering (SRE) team has created a framework to perform different types of load and stress tests. At Twitter, we strive to prepare for sustained traffic as well as spikes - some of which we can plan for, some of which comes at unexpected times or in unexpected ways. To help us prepare for these varied types of traffic, we continuously run tests against our infrastructure to ensure it remains a scalable and highly available system.
0 Comments
Leave a Reply. |