Astaire Avenue, Garland Drive, Lamarr Avenue, Skelton Circle, and Hepburn Circle are real street names in Culver City, CA, and are as equally prone to spelling errors as person names. In fact, streets, cities, states/provinces, or building names frequently include names. Often identification involves more than just a name but an entire set of identity attributes such as address and date of birth.
Extending Rosette to match “embedded names” in addresses was the logical next step. Rosette now applies algorithmic smarts to postal addresses the same way it does for personal names in the
How Address Matching Works
Rosette accepts fielded addresses or unstructured address strings, which Rosette parses into address fields. Depending on the type of field, Rosette applies the appropriate algorithm. For alphanumeric fields like postal code or street number, Rosette applies edit distance, which looks at character-level additions, substitutions, and deletions.
Rosette’s specialized name matching algorithms compare text fields like “street name,” “house (aka, building name),” “city,” “province/state,” and “country.” Within each of the text fields, Rosette matches with respect to:
|Phonetics and spelling differences
|100 Montvale Ave vs. 100 Montvail Av
|Missing address field components
|100 Montvale Ave vs. 100 Montvale
|Differences in upper and lowercase
|100 Montvale Ave vs. 100 MONTVALE AVE
|Reordered address components within a field
|100 Montvale Ave. vs. 100 Avenue Montvale
|Address field abbreviations
|Montvale St. vs Montvale Street
When comparing two names, Rosette matches every field of one address against every field of the other address to look for the best match.
Increased Accuracy with Address Field Groups
When Rosette parses unstructured addresses, data may be misfielded. Rosette groups related fields (such as “state, stateDistrict, island” or “city, cityDistrict, suburb”) so that if data in similar fields match, it can reduce the impact of misfielding during address parsing.
For example, if an address that includes the cityDistrict “Williamsburg” is parsed to assign “Williamsburg” to “city,” there will only be a small match penalty for the mismatched fields, because city and cityDistrict belong to the same address field group.
|Hawaii Paradise Apartments
On the other hand, if the house field value “Hawaii Paradise Apartment” matches “Hawaii” in the state field of a different address, a large penalty will be assessed for these fields that don’t belong to the same address field group.
Locale- and Language-specific Support
Currently, address matching in Rosette supports U.S., Canada, and UK locales. Locale support means:
- Postal code structure and geographic mapping is understood. Thus, even if a postal code is irregularly formatted, it is still recognized.For example, Canadian postal codes are in the pattern “A1A 1A1,” where A is a letter and 1 is a digit, with a space after the third character. If the space were missing or in another position, Rosette would still recognize the Canadian postal code pattern.
- Common address abbreviations for supported locales are handled through override files – which map common address words to their abbreviations.
For example, “Pennsylvania” maps to “PA”; “Street” maps to “St.”; “Calle” maps to “CLL.” Spanish street designations like “Calle” are common in California and other parts of the U.S.
- Stop words, such as “the” in “the United States” are removed.
Language-only Chinese support is available for matching two addresses written in Chinese script (Hanzi) or one address in Latin script and the other in Chinese script. While Rosette provides Chinese stop words and basic overrides for common Chinese address abbreviations, as of version 7.36.0, it is not yet customized to a particular Chinese-speaking locale or country.
How Fuzzy Date Matching Works
Date fuzzy matching by Rosette complements address and name matching. It can compare partial dates and misordered date components (DDMMYY vs. MMDDYY) for the Gregorian calendar. In particular, the matching engine considers several aspects of dates:
- Time: The number of days between Date 1 and Date 2
- Year: The difference of the year fields of Date 1 and Date 2
- Month: The difference of the month fields of Date 1 and Date 2
- Day: The difference of the day fields of Date 1 and Date 2 (even if they are close in time, 1 and 30 are considered far apart)
- String distance: Date 1 and Date 2 to a standard format; then the string distance score is calculated based on the edit distance between the two strings.
- Time proximity: Based on a given interval of years, Rosette computes the chronological distance between dates in years to determine similarity.
Date matching is currently available through the Rosette Enterprise SDK or its Elasticsearch plugin.
Who Needs Fuzzy Records Matching
Names, addresses, and dates are critical data points to check when matching records in many domains.
Know Your Business for Financial Compliance
Financial institutions are required by Know Your Customer regulations to avoid transacting business with known bad actors. Often these customers are other businesses. Suppose a new business customer requests a line of credit from a bank. Before approving it, the bank asks for information, such as the customer’s places of business and names of its executive board. The bank compares the provided information against business directory listings to verify the customer is who it says it is. For example, if a business is applying from the Cayman Islands, but the directory listing shows no offices there, that might be a red flag. Similarly, if the name of the executive applying for credit isn’t listed in the directory, that will be considered in the risk calculation of whether to take on this new client.
Assigning Unique IDs
In any system where the use of Social Security numbers as IDs is restricted by privacy rules— such as education or health care — a unique ID might be assigned to each person. These records include person names, nicknames, dates of birth, and current and previous addresses. Using Rosette’s Elasticsearch plugin, the various record fields can be weighted differently, depending on which data are known to be the most dependable, and thus an overall match score can take into account the various scores from fuzzy matching name, address, and date fields. Being able to positively identify matching or non-matching records eliminates the creation of duplicate records, which are costly to ferret out and correct.
Try address or name matching at match.rosette.com/compare. Simply click on “Address” in the navigation bar across the top to try address matching.