Elasticache Redis Network Bandwidth In / Out Allowance Exceeded

Question

My question is similar to another question which was posted over a year ago: 
https://repost.aws/questions/QUv105xDmfQMGUiBbfeYW-iQ/elasticache-shows-network-in-and-out-as-exceeded-but-how  
In the above question, the poster was seeking clarification on the cause of the Network Bandwidth In / Out Allowance Exceeded in relation to the instance throughput. It appears they were a bit confused as the metric graphs report in MB/s and not Mbit/s as assumed. It does appear this user was hitting / getting close to the baseline throughput.

We have a similar situation: 
![aws metrics graphs showing network bytes in and out as well as Network Bandwidth In / Out Allowance Exceeded ](/media/postImages/original/IMAmS7I8CURIO0z2bg3C7OmA)

However in our case, we are running t4g.medium instances which should have 0.256Gbps throughput, though the spikes on Network Bytes In / Out are nowhere close to 32MB/s. This is a 2 node cluster with 1 shard, though we have cluster mode off (so it acts as a replica)

I have a few questions: 
* Why is the network traffic being shaped when it appears we are not close to our allowable throughput? Is it possible that another metric is causing the network traffic to be queued / dropped, not Network Bytes In & Out? (e.g packets in / out or micro-bursting? ) 
* Is it possible to view the network I/O burst balance? 
* Can we differentiate between queued and dropped packets?

We are not seeing any performance degradation at this stage, and we have not tried scaling up to see if these metrics resolve.

Any assistance or thoughts are greatly appreciated.

Answer

Hi, yes, replicas can trigger trafic shaping when you've intense write activity. The bandwidth of those replications is part of the total count.

Have a look at https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/TroubleshootingConnections.html#Network-limits

```
Network traffic limits: Check the following CloudWatch metrics for Redis 
to identify possible network limits being hit on the ElastiCache node:

NetworkBandwidthInAllowanceExceeded / NetworkBandwidthOutAllowanceExceeded: 
Network packets shaped because the throughput exceeded the aggregated bandwidth limit.

It is important to note that every byte written to the primary node will be replicated 
to N replicas, N being the number of replicas. Clusters with small node types, multiple 
replicas, and intensive write requests may not be able to cope with the replication backlog. 
For such cases, it's a best practice to scale-up (change node type), scale-out (add shards
 in cluster-mode enabled clusters), reduce the number of replicas, or minimize the 
number of writes.

NetworkConntrackAllowanceExceeded: Packets shaped because the maximum number 
of connections tracked across all security groups assigned to the node has been exceeded. 
New connections will likely fail during this period.

NetworkPackets PerSecondAllowanceExceeded: Maximum number of packets
 per second exceeded. Workloads based on a high rate of very small requests may 
hit this limit before the maximum bandwidth.

The metrics above are the ideal way to confirm nodes hitting their network limits. 
However, limits are also identifiable by plateaus on network metrics.

If the plateaus are observed for extended periods, they will be likely followed by 
replication lag, increase on bytes Used for cache, drop on freeable memory, 
high swap and CPU usage. Amazon EC2 instances also have network limits that can 
tracked through ENA driver metrics. Linux instances with enhanced networking support 
and ENA drivers 2.2.10 or newer can review the limit counters with the command:

# ethtool -S eth0 | grep "allowance_exceeded"
```

Elasticache Redis Network Bandwidth In / Out Allowance Exceeded

Relevant content