I have a client's website hosted on Amplify, which has been a great hosting platform for the past few years.
Around the start of February, we noticed a significant increase in requests to the website, subsequently exceeding our request limits in certain third-party tools like Sentry and Weglot. After further investigation, we've found that these requests all had the same User Agent header of "Bytespider", which we've learned is a data scraper for ByteDance. A lot of forums have confirmed that it doesn't respect a disallow in robots.txt.
We do have an IP range (and a User Agent, obviously) that we can block, but unfortunately, as the website is on Amplify and not our own AWS setup, we can't access the CloudFront distribution or a WAF to implement rules to block these requests.
Does anyone know a way we can solve this issue?
Is AWS WAF the only possible solution to this problem?
As far as I know, Amplify does not have a function to restrict IP etc., so I think it is necessary to use AWS WAF etc. to do so.