Question regarding the use of partial or blinded SSNs
Question
In our application we will not have full social security numbers; we’ll have at most the last 4 or 5 digits. I know the current algorithm relies heavily on the SSN. If there are no SSN, how with the match be effected? What do you recommend?
Answer
Great question. I'd suggest perhaps 0 filling the SSN as part of data normalization coming into MM. You could do this in the Mirth Channel at this point or export it zero filled for imports.
As far as the algorithm goes, by being an invalid SSN (leading zeroes), the algorithm will not us it for comparison purposes. It will then rely heavily on LastName, FirstName, Middle Initial, DOB and Gender for matching. This shouldn't be a huge problem since, at best and on the aggregate across different feed types (ADT/ORU), SSN is only typically available in say 20-30% of the cases -- meaning it needs to be able to work effectively in the absence of SSN.
So, that said, if your data most always has this field, we may want to modify the algorithm to account for this. Meaning, all things considered (LN, FM, etc), even a partial SSN can be used as a tie breaker or as a negative offset to an otherwise high score based on Name information -- a non-matching partial SSN should cost score, whereas a matching partial might give less than a total match, but some.
Another aspect of this issue is that, during the blocking phase of the match where we search the gross population for candidates to consider is that, as written, the algorithm will not attempt to block or search for candidates with similar SSNs as it currently does, since the SSN_NORMAL (normalized or validated SSN) is null. We could improve the algoritm in your scenario by adding a new blocking trait on the LAST_4_SSN, let's say and hopefully bring in more candidates for consideration.
Just some thoughts on your question here. Short is: it should work fine as is, potentially missing some possible matches due to lack of an SSN. We could create JH a version of the existing algorithm that would attempt to take advantage of the data you do have. This later work might be worthwhile if your data contained this field the vast majority of the time and might take a programmer a week let's say to do the work.


