Your web /mobile app or other solution is not performing? Here is a checklist to make sure you have optimized everything possible and to help you troubleshoot for errors.
- Cloudwatch — Check CPU usage is it near the limit set? Have alarms been set up? Check network in and out — is there a spike in bytes received? (If your disk utilization is sub 40% you can cut your number of disks in half. If it's over 80% you might have performance issues)
- Check system logs for exhausted memory of disk full errors — you might see errors like “oom-killer” or “failure to fork”.
- Do you have the correct instance type for the workload? The number of times the same instance types keep being used but the workload has CHANGED…
- General purpose(exp: T2 or M5): good for web servers and code repositories. Simple web / mobile apps. Where workloads are fairly consistent.
- Burstable performance — check these out for eCommerce or requirements where you may experience spikes in performance.
- Compute-optimized: video streaming or big data analytics — check the instance type as each is optimized for specific use cases for example graphics processing.
- Memory-optimized: large datasets in memory using RDS, Cassandra — check if you need solid-state storage backups as data may be lost for some instance types that do not persist data.
- Storage optimized instances — this can save you $$ in the case that you have to store data for years for occasional compliance and auditing for example.
- Base your instance selection on the following: Operating System, Number of CPU cores, amount of system memory (RAM), Storage requirements, GPU cores, and Network bandwidth requirement. Does anything in the requirements list not match with the instance you have procured?
- Do you have an EBS volume mounted? Has it hit IOPS or throughput limits?
- Load balancers: Do you have any network connectivity issues? It could be worth checking your load balancers.
- Load balancers: Do you have high memory RAM usage on backend instances?
- Load balancers: Do you have high CPU usage on backend instances?
- Do you have the correct web server config?
- Check the DNS resolution isn't contributing to latency.
- Check target response times for the application load balancer — if the value is high there could be an issue with the backend instances.
- For backend instances with high CPU utilization you may need a bigger instance type — you may also have to reconsider your architecture over the longer time if this is set to significantly grow and you’re running a monolith solution.
- DB performance: Check you have the correct data types and there aren’t inefficiencies there which are creating issues.
- DB performance: A lack of indexes can also be a problem performance suffers due to excessive, improper, or missing indexes.
If you have been through and optimized everything it could simply be that your architecture is no longer fit for purpose and that you need to reconsider rearchitecting the application and scaling. You may have to rearchitect both your front-end application and back-end solution for this.
Right-sizing instances for cost considerations:
A quick guide to considering scale:
- Scaling horizontally (auto-scaling) — adding instances to allow you to scale your workloads — in the case of using a containerized microservices architecture this is ideal.
- Sharding (partitioning) your DB so frequent queries do not run across the entire DB to retrieve a result.
- Separating reads from writes — CQRS so using Cassandra DB for example for writes and Elastic search for reads. This is not always appropriate depending on your requirements.
- Load balancing — ensure this is set up to handle spikes in performance requirements.
- Read replicas on DB instances — RDS and Aurora DB’s on Aurora you can add up to 15 Read Reps.
- Consider whether your solution needs to be HA or fault-tolerant — few solutions need to be fault-tolerant unless you’re in a highly compliant environment like Banking or Healthcare.
- Caching your reads on the DB where possible.
- Optimize your querying — this is really a case of going through your business requirements and how your querying has been set up and scrutinizing whether what has been implemented has been done so efficiently. Nine times out of ten there is something to optimize.
- Cost and scaling is often a balance — ensuring your solutions are right-sized can take practice make sure your monitoring is set up to ensure you can troubleshoot and cost optimise as you gain experience with the solution.
- For batch workloads — large batch workloads may need to be broken down into smaller batches. However, if you require a lot of storage and parallel processing then the batch is simply the wrong solution and you will need to consider rearchitecting with solutions like Hadoop.