no matches found in two tables even though they are present - AWS Entity Resolution

0

I need to link two tables based on columns: dummy_pubmed_author_name2.author_name and dummy_pubmed_hcp_crm_names2.hcp_names I've created two schema mappings for each table where I put those respective columns as input columns for matching choosing the "name" metric. When I run the matching workflow ML technique) - there are no matches to be found (it shows that all I've got are unique values and each has a unique Match Id). My data in some rows in those columns have either similar or even the exact same matches for the name column, so why are the matches not identified? This is not the expected behavior.

Below is the file content that I'm getting as an output to file matching - as you can see in row 6 and 12 we have an exact same match for the hcp_name and author_name which I used for matching, but they have different ids like they weren't matched:

RowIDauthor_nameauthor_name_idhcp_nameRecordIdMatchID
1robert martin11111669149696
2michael flores32a103079215104
3andrew hill2b51539607552
4christina cooper3a0
5leslie kingkong53b94489280512
6jason gomez24a128849018880
7lisa hill taylor64b85899345920
8travis gordon45a120259084288
9leslie king5b60129542144
10leah arias6a77309411328
11travis gordon6b25769803776
12jason gomez7a42949672960
13michelle stewart7b17179869184
14lisa taylor8a68719476736
2 Answers
0

Hi,

In the given example, there may not be enough context provided to the Machine Learning system to produce matches as expected. This is because only a 'Name' as input may not be sufficient (as there could be multiple individuals with the same name). So, to improve this and get results as expected, you may consider adding another input such as a 'Phone Number'. This will then give more context to the Machine Learning system which will then provide more confident results. To test this, I used the same data as you and provided a 'Phone Number' field to both datasets and was then able to get matches as expected (even for names such as 'leslie kingkong').

Alternatively, if you want to keep using the same tables as they are, a rule-base matching could be more suitable here. I have tested this from my end and it is also working as expected.

If you are still facing issues despite the above recommendations, I would suggest raising a case with AWS Support who would be able to assist further. You can do so here.

Hope this helps!

AWS
SUPPORT ENGINEER
Shoan_D
answered 6 months ago
0

Hello, thank you for clarifying the ML-based match for me - I can see now that it won't be the right choice in our case - we only have one column to match with - we want to find similar names, in case they are mispronounced or in a different language (and find the same in another language). I've tried the rule-based matching like you suggested - matching only by "author_name" and "hcp_names" and unfortunately I also didn't get the desired results - each time I'm running this way I'm getting column named "Error.dynamic Recod.match Rule" with values "NoRule" in each row. I'm not sure why this error as output though.

Beg
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions