- Mais recentes
- Mais votos
- Mais comentários
One critical code component that many people overlook is retries in exception handling. There is a legacy approach that either something will succeed or it will fail; and that if it fails, it will continue to fail due to some system being hard-down, ie. the database. There are tons of reasons for transient errors, such as a DB lock, or a time-out due to resources that are in the process of auto-scaling.
It is critical to assume a non-zero error rate for legacy as well as modern, complex systems.
When transitioning from on-premises to the cloud, the underlying infrastructure gets abstracted and therefore even more complex. This complexity provides tremendous value including vastly more scalability and resiliency but the trade-offs include even more likelihood of non-zero error rates. Having simple yet thorough exception handling as well as observability is complex but essential.
Conteúdo relevante
- AWS OFICIALAtualizada há 2 anos
- Como posso anunciar rotas de VPC em uma conexão do Direct Connect para uma rede on-premises via BGP?AWS OFICIALAtualizada há 8 meses
- AWS OFICIALAtualizada há 2 anos
- AWS OFICIALAtualizada há 2 anos
Hi, you have to add tangible details to your question: metrics, error logs, etc. if you want to obtain meaningful support from re:Post community. "Very unstable" can mean millions of things: detailing in more details what is exactly failing will definitely help. Thanks