While running Glue I see these arguments passed to job:
{
'job_bookmark_option': 'job-bookmark-disable',
'job_bookmark_from': None,
'job_bookmark_to': None,
'JOB_ID': 'j_c8afc16edb1420c2fb878249843e27280db60efcd37b4f6c7c469c4a55a1b5bd',
'JOB_RUN_ID': 'jr_d74caf9a56f744d09ac4d7fd076caa3d8da3cbc5d58f925ea532dc3c7dfcdf32',
'SECURITY_CONFIGURATION': None,
'encryption_type': None,
'enable_data_lineage': None,
'RedshiftTempDir': 's3://aws-glue-assets-myaccount-us-east-1/temporary/',
'TempDir': 's3://aws-glue-assets-myaccount-us-east-1/temporary/',
'JOB_NAME': 'my-job',
}
I spotted parameter called "enable_data_lineage" . For the next run I set this parameter to True, like this: "--enable-data-lineage true" in Job parameters section.
After this my job startup time jumped from 7 seconds to 3 min and 10 seconds. I went to logs to check what's going on and I spotted error messages like this:
2022-12-30 12:27:54,873 WARN [Thread-12] lineage.LineagePersistence$ (LineagePersistence.scala:isCatalogLineageSettingEnabled(99)): Exception occurred while getting catalog lineage settings, lineage for this job run will be disabled
com.amazonaws.services.lakeformation.model.InternalServiceException: Received an unexpected Content-Type: text/html', expected one of [application/json]. HTTP status code:503 (Service: AWSLakeFormation; Status Code: 500; Error Code: InternalServiceException; Request ID: ffe50f30-ec28-4648-814a-e267be0453da; Proxy: null)
I tried to search for documentation, but no luck..
How to properly set-up this feature? Is there any examples?