MWAA: Timeout on googleapiclient Connection

0

We are experiencing timeout issue when connecting to Google Sheets, specifically using googleapiclient. The code has been working, but after some new deployment, we start getting this error. Even we roll back the changes, this error still persists.

We setup airflow running on MWAA Airflow 2.6.3, and build dependencies with python WHL file. We tried installing requirements from Python Package Index but it got timeout error WARNING: requirements.txt installation timed out after 9 minutes. Some requirements may not have installed. and DAGs are broken.

Airflow is able to connect to other 3rd party services (Jira, other services, etc.), but DAGs connecting to Google Sheet API are having issues.

Please share any solution or possible place we can look to resolve the issue. Thanks.

Code Snippet

from googleapiclient.discovery import build

service = getattr(build(
    serviceName='sheets',
    version='v4',
    credentials=<credentials>), spreadsheets)()
service.get(spreadsheetId=<spreadsheet_id>).execute()

And we get following stack trace

Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/spreadsheet.py", line 199, in get_spreadsheet
    return service.get(spreadsheetId=self._id).execute()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
    resp, content = _retry_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
    raise exception
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
    self.credentials.before_request(self._request, method, uri, request_headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/auth/credentials.py", line 151, in before_request
    self.refresh(request)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 434, in refresh
    access_token, expiry, _ = _client.jwt_grant(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 312, in jwt_grant
    response_data = _token_endpoint_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 272, in _token_endpoint_request
    response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 219, in _token_endpoint_request_no_throw
    request_succeeded, response_data, retryable_error = _perform_request()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 195, in _perform_request
    response = request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
    response, data = self.http.request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
    (response, content) = self._request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1366, in _conn_request
    conn.connect()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1156, in connect
    sock.connect((self.host, self.port))
TimeoutError: timed out

Configurations:

1. MWAA: Airflow 2.6.3
2. Installed Packages (Using plugins.zip):
* Levenshtein-0.21.1
* PyGithub-1.59.0
* adtk-0.6.2
* apache-airflow-providers-atlassian-jira-2.1.1
* apache-airflow-providers-github-2.3.1
* apache-airflow-providers-mysql-5.1.1
* apache-airflow-providers-snowflake-4.2.0
* asttokens-2.2.1
* atlassian-python-api-3.39.0
* aws-requests-auth-0.4.3
* backcall-0.2.0
* cachetools-5.3.1
* comm-0.2.2
* cycler-0.12.1
* debugpy-1.8.1
* defusedxml-0.7.1
* executing-1.2.0
* fonttools-4.50.0
* google-api-core-2.11.0
* google-api-python-client-2.92.0
* google-auth-2.21.0
* google-auth-httplib2-0.1.0
* googleapis-common-protos-1.59.1
* gql-3.3.0
* graphql-core-3.2.3
* httplib2-0.22.0
* iniconfig-2.0.0
* ipykernel-6.25.1
* ipython-8.14.0
* jedi-0.18.2
* jira-3.5.2
* joblib-1.3.2
* jupyter-client-8.3.0
* jupyter-core-5.3.1
* kiwisolver-1.4.5
* matplotlib-3.5.2
* matplotlib-inline-0.1.6
* mpld3-0.5.9
* mysqlclient-2.2.0
* nest-asyncio-1.6.0
* numpy-1.24.4
* oauthlib-3.2.2
* oscrypto-1.3.0
* pandas-1.5.3
* parso-0.8.3
* patsy-0.5.6
* pickleshare-0.7.5
* pillow-10.2.0
* playwright-1.37.0
* protobuf-4.23.4
* pure-eval-0.2.2
* py-1.11.0
* pyOpenSSL-23.2.0
* pyasn1-0.4.8
* pyasn1-modules-0.2.8
* pycryptodomex-3.18.0
* pyee-9.0.4
* pynacl-1.5.0
* pypika-0.48.9
* pytest-7.4.0
* python-Levenshtein-0.21.1
* pyzmq-25.1.0
* requests-oauthlib-1.3.1
* retry-0.9.2
* rsa-4.9
* scikit-learn-1.3.0
* scipy-1.12.0
* snowflake-connector-python-3.0.4
* snowflake-sqlalchemy-1.4.7
* sortedcontainers-2.4.0
* sql-formatter-0.6.2
* stack-data-0.6.2
* statsmodels-0.14.1
* thefuzz-0.20.0
* threadpoolctl-3.4.0
* traitlets-5.9.0
* uritemplate-4.1.1
preguntada hace 2 meses698 visualizaciones
2 Respuestas
2

I understand the issue here to be when making a connection to Google Sheets, you are getting a timed out error. Is it possible you changed something on the Google Sheets side? The most common cause of these sorts of errors is permissions. As the client is being called, the permission issue is most likely not the role you are using but I would check that first.

AWS
evaleah
respondido hace 2 meses
0
Respuesta aceptada

After lots of try-and-error, eventually we found the issue with IPv6 on network interacting with the Google API packages (per this answer https://stackoverflow.com/a/75375184/15938510) We removed the IPv6 on the AWS network, and now the code is working normally.

respondido hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas