PDA

View Full Version : Duplicates that are not duplicates


Baker
May 16th, 2006, 08:28 AM
1. I have a large number of records (over 8,000).

2. Not all are complete records. Some might only have a name or an email address or a partial name and email address or it is a company entry.

3. I therefore will get a number of duplicates that are not easy to resolve using an automated tool. I am comfortable with these records being flagged when I run the tool to eliminate the duplicates.

The last time I ran the tool there was something over 100 duplicates. Some of the time two records were thought to match. Some of the time it was 3 or more. In some cases the entry had a full name but the name was John Smith or something else that is common.

I want to be able to step through all the records that appear to match and resolve them once and for all. This means that when I find two or more records that are actually not duplicates and which I want to have remain as not duplicates I would like some way to indicate this.

It could be that the indicator is advisory so that the tool will still show the matches but I can then skip all of the ones with the flag set by pushing one button. Or it could be that the tool has a way for me to 'skip' and to leave a marker so that the two that matched will never match in the future.

I do not like that I see the same duplicates each time and that I have to skip over them 1 at a time or that I need to manually edit the records enough to fool the system (fool is a relative term here as the system is showing a false match).

John Smith @ XYZ company with a unique email address should not match
John Smith @ ABC company and a unique email address.

Or John Smith with a phone number might not be the same as John Smith with an email and only I will know this. In those cases I want to seed the tool or otherwise help the tool to skip the pair until told otherwise.