AWS Glue data quality dynamic rules not working

0

I tried AWS Glue data quality dynamic rules in my AWS Glue pipeline. I wrote below rule RowCount > avg(last(3))

Then I processed 3 csv files with 1000,10000 and 100 rows. Then in 4th run I again processed file with 100 rows, expecting this rule to fail. But it didnt fail. I got below pass results :

RowCount > avg(last(3)) Rule passed Dataset.*.RowCount: 100.00

  • Yep, it sounds it thinks the average is 100, not sure why. What did it say on the previous evaluations?
    How did you make it process different files on different runs?, do you have any count that the job actually read them all? (maybe it has thrown away invalid rows)

  • This worked on 5th time. I think the issue is, in my first 3 runs this rule was not there. I added this rule in 4th run only. So it didnt took earlier runs into consideration. But this thing is not mentioned anywhere in documentation. There is no mention how Glue stores 'state' of runs.

질문됨 3달 전173회 조회
1개 답변
1

That's a fair point about clarifying the documentation, the history is based on the historical metric, if the rule wasn't applied then the metric wasn't created, the rule cannot backprocess the data to calculate the previous history if the rule wasn't applied.

profile pictureAWS
전문가
답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인