I'm trying to customize a script built via Visual ETL and came across a error
Steps I followed
- A DynamicFrame created from csv file present in s3 needs to be evaluated for its data-type
- I'm enforcing a schema to a custom schema using apply_mapping function and reset its schema
- Then I perform a conversion from NaN to None ( converting DynamicFrame to Dataframe and then back to DynamicFrame)
- I passed this transformed **Parsed_dynamic_Frame **(DynamicFrame) to EvaluateDataQuality().process_rows for validation with ruleset Rules = [ ColumnDataType "Col_name" = "integer" ]
data_schema_dquee_results_set = EvaluateDataQuality().process_rows(
frame=Parsed_dynamic_Frame,
ruleset=ruleset,
publishing_options={
"dataQualityEvaluationContext": "data_quality_schema_rule_set",
# "enableDataQualityCloudWatchMetrics": True,
"enableDataQualityResultsPublishing": True
},
additional_options={"performanceTuning.caching": "CACHE_NOTHING"}
)
I encounter the following error
: An error occurred while calling z:com.amazonaws.services.glue.dq.EvaluateDataQuality.processRows.
: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.processInner(EvaluateDataQuality.scala:283)
at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.process(EvaluateDataQuality.scala:157)
at com.amazonaws.services.glue.dq.EvaluateDataQualityHelper.process$(EvaluateDataQuality.scala:152)
at com.amazonaws.services.glue.dq.EvaluateDataQuality$.process(EvaluateDataQuality.scala:64)
at com.amazonaws.services.glue.dq.EvaluateDataQuality$.processRows(EvaluateDataQuality.scala:124)
at com.amazonaws.services.glue.dq.EvaluateDataQuality.processRows(EvaluateDataQuality.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Couldn't find much here.
Can someone help me here?
Fyi,
This is all done in AWS Glue Notebooks
%glue_version 4.0
%idle_timeout 2880
%worker_type G.1X
%number_of_workers 2
@Gonzalo Herreros, Setting enableDataQualityResultsPublishing to False prevented the error But, does that mean I'll not be able to access DataQualityRulesPass , DataQualityRulesFail , DataQualityRulesSkip , DataQualityEvaluationResult parameters?
It looks like I actually need the job results for further processing.