Trying to call API with a list of URLs but Lambda is timing out

0

I'm trying to call the Pagespeed Insights API and save the response back in Dynamo. The Lambda timeout is 15 min. I will eventually need to call the API with about 100 URLs with an average response time of 20-30 sec.

What is the best approach on doing this?

My current code looks like this:

const { v4 } = require('uuid');
const axios = require('axios');
const urls = require('urls.json');
const endpoint = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed";
const API_KEY = "";
const dynamodb = require('aws-sdk/clients/dynamodb');
const docClient = new dynamodb.DocumentClient();
const tableName = process.env.SAMPLE_TABLE;

var id;

const insertRecords = async (_id, _url, _lighthouseResults) => {

const metrics = _lighthouseResults.lighthouseResult.audits.metrics.details.items[0];

console.log(metrics);
    const params = {
        TableName: tableName,
        Item: { id: _id,
                created_at: new Date().toISOString(),
                URL: _url,
                metrics
            },
    };
    console.log("got to db entry")
    console.log(_id)
    console.log(_url)
    console.log(_lighthouseResults)

    return docClient.put(params).promise(); // convert to Promise
}

exports.putItemHandler = async (event) => { // Async function
  for (const url of urls) {
    id = v4();
    console.log(url + " - " + id);

    const lighthouseResults = await getLighthouse(url);     

    await insertRecords(id, url, lighthouseResults) // wait until it finish and go to next item
      .catch((error) => {
        console.log(error);
        // throw error; don't care about this error, just continue
      });
  }
  console.log("Done");
};

const getLighthouse = async (url) => {
    console.log("inside getLighthouse")

    try {
        const resp = await axios.get(endpoint, {
            params: {
                key: API_KEY,
                url: url,
                category: 'performance',
                strategy: 'mobile'
            }
        });

        return resp.data

    }
    catch (err) {
        console.error(err)
    }

}
tcope
질문됨 2년 전988회 조회
3개 답변
0

One option here is to decouple the database read from the API query. You can do this by using SQS: As you read each entry from the database, send it to SQS and then trigger Lambda functions from the SQS queue. That way, each Lambda function has a separate runtime and doesn't have a shared fate with any of the other queries. You probably want to limit the concurrency there so that you're not overloading the target API with calls. This method will also let you scale well beyond 100 URLs.

profile pictureAWS
전문가
답변함 2년 전
0

Please take a look at this workshop - https://async-messaging.workshop.aws/scatter-gather.html

It appears very similar to what you are trying to achieve - 1 front-end call branching off to multiple backend call and responses of the backend calls getting written to DynamoDB. Front-end waits for all responses or a finite amount of time, before responding back to the client.

profile pictureAWS
전문가
답변함 2년 전
0

I would go for an approach with a StepFunction.

Step1 - call lambda that returns urls to process 
Step2 - parallel iterator (you can configure parallellism)
  Step2.1 - call PageSpeed for 1 url + store in DDB (you can configure error handling, and gather a result)

The result of the StepFunction might be a analysis report of the whole run

profile picture
JaccoPK
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠