EC2 T2, M7i, M6in, M6i, low performance in Mumbai compared to Frankfurt

0

Hi! We are experiencing serious issues with the server performance in India, Mumbai (ap-south-1). We set up a dev environment in Germany and managed to get 50 server FPS capped (good Server tick rate), which delivers a smooth multiplayer experience. We used EC2 t2.medium On-Demand instances to run the game server on it.

Now when we transferred the game server setup to India, using as well t2.medium On-Demand instances, we got a much worse performance than in Germany. We got on average 30 server FPS which is a huge difference that creates CPU lag when playing. We tried setting up On-Demand M7i, M6i and M6in (which is per AWS good for game server deployment) instances. But are getting the exact same result.

Why are there inconsistencies between the instance types? We need a solution in order to provide servers in India. Please help as soon as possible! Appreciate the help in advance!

  • @ rePost-User-4872669

    Have you identified the problem with the performance gaps, or are you still seeing the same behaviour?

  • Hi, FYI - I am part of the APJ team of games industry specialists and would like to support directly. Please respond here, or find me on LinkedIn and connect there, to share contact info & AWS account details.

asked 3 months ago845 views
2 Answers
0

As Oleksii Bebych correctly advised, variable performance isn't surprising for burstable instance families, like the legacy t2 family. However, m7i, m6i, and so on are fixed-capacity instance families and use some of the latest and most powerful technologies. Given that you're describing the experience as similar for all the instance families, despite their massive relative differences, combined with the fact that instance family performance shouldn't differ between regions, it sounds more likely that your issue is elsewhere.

To be clear, by better performance in eu-central-1 (Frankfurt, Germany), do you mean that the FPS rate for players in or geographically near Germany against servers in eu-central-1 was better than the FPS rate for players in some areas of India against servers in ap-south-1?

If so, have you got the ability to measure network characteristics, starting simply with the network round-trip (ping) times between German users/eu-central-1 versus Indian users/ap-south-1? India is around the same size as all of Europe, so the geographic distances between any locations in Germany (just one region in Europe) and consequent network latencies are simply shorter than over the much longer distances that might be typical for users between Mumbai and various locations across India.

If you can measure the "ping" (network round-trip) times, you could expect to get an accurate comparison by comparing users in India with similar RTTs to ap-south-1 as the users in Germany are seeing to eu-central-1.

EXPERT
Leo K
answered 3 months ago
  • Hi! Thank you for your responses!

    @Oleksii I know that t2.x instances are burstable which is why i tried the other machines. And I know that this behavior shouldn’t happen on any of them while in Germany it works with all of the same types. To clarify: i still have cpu credits available. More than enough. The performance is different in both regions.

  • @Leo it’s not network or code related. I can play with an Indian VPN with my German server with no problem. That’s because the German server runs at 50 FPS (those fps are not on the client but on the server, it’s how fast he computes the calculations for the multiplayer. These fps are not the client performance, but the server performance and only impact the multiplayer sync) This means that the German server runs smoothly because it gets to its 50 FPS compute goal but none of the Indian instances manage to get to even 40 fps.

    I measured ping, checked CPU credits etc. but to answer your question about ping: this problem has nothing to do with ping. To my Indian server i get a 120 ms ping. Using a vpn to India and then German i get a 200ms ping. But the 200ms ping runs more smoothly because the server behind it computes at 50 FPS and not like the120ms to India, that may be less ping but runs more choppy because the server doesn’t get to 50 FPS for a smooth gameplay in multiplayer.

    I measured cpu performance on the M7i, M6in etc… and all of them run at about max 8% cpu usage but the fps stay at around 30. It seems like it’s throttling …

    Any ideas? I need to run Indian servers. And i don’t know what to do.

  • Are you able to see in CloudWatch metrics or inside the operating system that the workload is CPU-bound? Specifically, do you see CPU utilisation either at 100% in total, or the equivalent of 100% for an integer multiple of CPU cores' worth of power (like 25% for a four-core instance)? Also, is the performance of the game 30 FPS in Mumbai for all the instance types you tried, ranging from t2 to the far more powerful m7i?

  • If there's no clear CPU bottleneck, I think it's best to measure the potential bottlenecks one by one. There's a bunch of CPU tests linked in this article: https://www.tomshardware.com/reviews/cpu-hierarchy,4312.html#section-best-cpu-benchmarks-you-can-run If you run them with an equivalent m6i or m7i in eu-central-1 and ap-south-1, do you actually see substantially different results for the same test? I've never seen that happen anywhere in AWS, although I haven't run instances in India.

  • Hi! Yes the performance is the same for all if those instance types which is what is really weird.

    No. The cpu is stuck at about 8% cpu utilization for the server process but it should go higher to achieve a higher fps. It seems like it’s just in a relaxed state and doesn’t even try to get to full power. I will try those tests and let you know. Thanks!

0

T2 is a burstable instance type

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances.html

If you run out of CPU credits, CPU is throttled and performance degrades https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-monitoring-cpu-credits.html

Burstable performance instances have these additional CloudWatch metrics, which are updated every five minutes:

CPUCreditUsage – The number of CPU credits spent during the measurement period.

CPUCreditBalance – The number of CPU credits that an instance has accrued. This balance is depleted when the CPU bursts and CPU credits are spent more quickly than they are earned.

CPUSurplusCreditBalance – The number of surplus CPU credits spent to sustain CPU utilization when the CPUCreditBalance value is zero.

CPUSurplusCreditsCharged – The number of surplus CPU credits exceeding the maximum number of CPU credits that can be earned in a 24-hour period, and thus attracting an additional charge.

Maybe your T2 server in Germany was not so loaded over the time and CPU Credits were fine, but in India the load is higher and CPU credits balance is zero

If you need stable and predictable performance, use other instance types, for example M

Why M type performance is different - need to check metrics and investigate

profile picture
EXPERT
answered 3 months ago
  • Hi! Thank you for your responses! @Oleksii I know that t2.x instances are burstable which is why i tried the other machines. And I know that this behavior shouldn’t happen on any of them while in Germany it works with all of the same types. To clarify: i still have cpu credits available. More than enough. The performance is different in both regions.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions