Need AWS tech team's help here.
I've used my job's temporary path retrived by getResolvedOptions
func as staging_path
of relationlize
function.
Found the job fails sometimes - means NOT REGURARY - because the job can't retrive the staged table after Relationalize function executed.
For your better understading, added some explanation & codes in below.
Pls advise me if any and kindly confirm back that we can keep using arguments get by getResolvedOptions
func.
[Code 1]
getResolvedOptions(sys.argv, [..., "TempDir", ...])
...
# the name of the target field to be relationalized is "params"
flatten_dyc = dyc["post_log"].relationalize(
root_table_name = 'root',
staging_path = args["TempDir"],
transformation_ctx = 'flatten_dyc'
)
flatten_dyc["root"].printSchema()
flatten_dyc["root_params"].printSchema()
This morning, I've ran it and got result as below.
flatten_dyc["root_params"]
is empty despite it should have had id
field at least to join with flatten_dyc["root"]
table.
[Code 2]
So I tried the same script with hard codedstaging_path
(pls refer to the blw) and found the job read staged tables - flatten_dyc["root"]
- with all fields successfully.
...
flatten_dyc = dyc["post_log"].relationalize(
root_table_name = 'root',
staging_path = "s3://temp-glue-info/"
transformation_ctx = 'flatten_dyc'
)
flatten_dyc["root"].printSchema()
flatten_dyc["root_params"].printSchema()
My question is:
1/ Why the function couldn't read the staged table properly when the path was soft-coded?
2/ Moreover , when I run [Code1] again, flatten_dyc["root_params"]
was read successfully. Means the function is not realiable. Can you look into this?
@Gonzalo Herreros 'll try following your suggestion. Could you advise me more the reason why you recommended to use a different path for staging with base job tmp dir? & why there is any chance that the staged table is deleted by other jobs even there's no command to delete it? Tks!
Normally the temporary dir is fine but in your case, it's better to move it somewhere else to eliminate the possibility of being deleted by some other job