MWAA: Timeout on googleapiclient Connection

0

We are experiencing timeout issue when connecting to Google Sheets, specifically using googleapiclient. The code has been working, but after some new deployment, we start getting this error. Even we roll back the changes, this error still persists.

We setup airflow running on MWAA Airflow 2.6.3, and build dependencies with python WHL file. We tried installing requirements from Python Package Index but it got timeout error WARNING: requirements.txt installation timed out after 9 minutes. Some requirements may not have installed. and DAGs are broken.

Airflow is able to connect to other 3rd party services (Jira, other services, etc.), but DAGs connecting to Google Sheet API are having issues.

Please share any solution or possible place we can look to resolve the issue. Thanks.

Code Snippet

from googleapiclient.discovery import build

service = getattr(build(
    serviceName='sheets',
    version='v4',
    credentials=<credentials>), spreadsheets)()
service.get(spreadsheetId=<spreadsheet_id>).execute()

And we get following stack trace

Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/spreadsheet.py", line 199, in get_spreadsheet
    return service.get(spreadsheetId=self._id).execute()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
    resp, content = _retry_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
    raise exception
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
    self.credentials.before_request(self._request, method, uri, request_headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/auth/credentials.py", line 151, in before_request
    self.refresh(request)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 434, in refresh
    access_token, expiry, _ = _client.jwt_grant(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 312, in jwt_grant
    response_data = _token_endpoint_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 272, in _token_endpoint_request
    response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 219, in _token_endpoint_request_no_throw
    request_succeeded, response_data, retryable_error = _perform_request()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 195, in _perform_request
    response = request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
    response, data = self.http.request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
    (response, content) = self._request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1366, in _conn_request
    conn.connect()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1156, in connect
    sock.connect((self.host, self.port))
TimeoutError: timed out

Configurations:

1. MWAA: Airflow 2.6.3
2. Installed Packages (Using plugins.zip):
* Levenshtein-0.21.1
* PyGithub-1.59.0
* adtk-0.6.2
* apache-airflow-providers-atlassian-jira-2.1.1
* apache-airflow-providers-github-2.3.1
* apache-airflow-providers-mysql-5.1.1
* apache-airflow-providers-snowflake-4.2.0
* asttokens-2.2.1
* atlassian-python-api-3.39.0
* aws-requests-auth-0.4.3
* backcall-0.2.0
* cachetools-5.3.1
* comm-0.2.2
* cycler-0.12.1
* debugpy-1.8.1
* defusedxml-0.7.1
* executing-1.2.0
* fonttools-4.50.0
* google-api-core-2.11.0
* google-api-python-client-2.92.0
* google-auth-2.21.0
* google-auth-httplib2-0.1.0
* googleapis-common-protos-1.59.1
* gql-3.3.0
* graphql-core-3.2.3
* httplib2-0.22.0
* iniconfig-2.0.0
* ipykernel-6.25.1
* ipython-8.14.0
* jedi-0.18.2
* jira-3.5.2
* joblib-1.3.2
* jupyter-client-8.3.0
* jupyter-core-5.3.1
* kiwisolver-1.4.5
* matplotlib-3.5.2
* matplotlib-inline-0.1.6
* mpld3-0.5.9
* mysqlclient-2.2.0
* nest-asyncio-1.6.0
* numpy-1.24.4
* oauthlib-3.2.2
* oscrypto-1.3.0
* pandas-1.5.3
* parso-0.8.3
* patsy-0.5.6
* pickleshare-0.7.5
* pillow-10.2.0
* playwright-1.37.0
* protobuf-4.23.4
* pure-eval-0.2.2
* py-1.11.0
* pyOpenSSL-23.2.0
* pyasn1-0.4.8
* pyasn1-modules-0.2.8
* pycryptodomex-3.18.0
* pyee-9.0.4
* pynacl-1.5.0
* pypika-0.48.9
* pytest-7.4.0
* python-Levenshtein-0.21.1
* pyzmq-25.1.0
* requests-oauthlib-1.3.1
* retry-0.9.2
* rsa-4.9
* scikit-learn-1.3.0
* scipy-1.12.0
* snowflake-connector-python-3.0.4
* snowflake-sqlalchemy-1.4.7
* sortedcontainers-2.4.0
* sql-formatter-0.6.2
* stack-data-0.6.2
* statsmodels-0.14.1
* thefuzz-0.20.0
* threadpoolctl-3.4.0
* traitlets-5.9.0
* uritemplate-4.1.1
asked a month ago496 views
2 Answers
2

I understand the issue here to be when making a connection to Google Sheets, you are getting a timed out error. Is it possible you changed something on the Google Sheets side? The most common cause of these sorts of errors is permissions. As the client is being called, the permission issue is most likely not the role you are using but I would check that first.

AWS
evaleah
answered a month ago
0
Accepted Answer

After lots of try-and-error, eventually we found the issue with IPv6 on network interacting with the Google API packages (per this answer https://stackoverflow.com/a/75375184/15938510) We removed the IPv6 on the AWS network, and now the code is working normally.

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions