VBcoders Guest



Don't have an account yet? Register
 


Forgot Password?



Jaro-Winkler String Comparison

by Ernanie F. Gregorio Jr. (3 Submissions)
Category: String Manipulation
Difficulty: Beginner
Date Added: Wed 3rd February 2021
Rating: (2 Votes)

I think this is the first Jaro-Winkler Algorithm here on VBC.
Description:
The Jaro–Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995) and mainly used in the area of record linkage (duplicate detection). The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.

References:
https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
lingpipe

https://lingpipe-blog.com/2006/12/13/code-spelunking-jaro-winkler-string-comparison/

Rate Jaro-Winkler String Comparison


Public Function JaroWrinkler(ByVal prmKeyword As String, prmCompareTo As String) As Double
  Dim iProximity As Integer ' set the number of adjacent characters to compare to
  Dim i As Integer
  Dim x As Integer
  Dim iFrom As Integer
  Dim iTo As Integer
  Dim iMatchCharacters As Integer
  Dim iTransposeCount As Integer
  Dim iJaro As Double
  
  prmCompareTo = UCase$(Trim$(prmCompareTo))
  prmKeyword = UCase$(Trim$(prmKeyword))
  
  If prmCompareTo <> prmKeyword Then ' check if the two words are the same
    If InStr(1, prmCompareTo, prmKeyword) <= 0 Then
    
      ' compute for the proximity of character checking
      ' allows matching characters to be up to X number of characters away.
      If Len(prmCompareTo) >= Len(prmKeyword) Then
        iProximity = (Len(prmCompareTo) / 2) - 1
      Else
        iProximity = (Len(prmKeyword) / 2) - 1
      End If
      
      
      For i = 1 To Len(prmKeyword)
        ' this is the index of the character to be compared to
        iTo = (i + iProximity) - 1
        
        ' get the left most side character based on the iProximity
        If i <= iProximity Then
          iFrom = 1
        Else
          iFrom = i - iProximity + 1
        End If
        
        ' start the letter by letter comparison
        For x = iFrom To iTo
          If Mid$(prmKeyword, i, 1) = Mid$(prmCompareTo, x, 1) Then
            If i = x Then
              iMatchCharacters = iMatchCharacters + 1
              GoTo exitfor
            End If
            iMatchCharacters = iMatchCharacters + 1
            iTransposeCount = iTransposeCount + 1
            Exit For
          End If
        Next
exitfor:
      Next
      
      
      iTransposeCount = iTransposeCount \ 2
      If iMatchCharacters > 0 Then
        x = 0
        For i = 1 To 4
          If Mid$(prmKeyword, i, 1) = Mid$(prmCompareTo, i, 1) Then
            x = x + 1
          Else
            Exit For
          End If
        Next
      
        iJaro = ((iMatchCharacters / Len(prmKeyword)) + _
              (iMatchCharacters / Len(prmCompareTo)) + _
              ((iMatchCharacters - iTransposeCount) / iMatchCharacters)) / 3
              
        If x > 0 Then
          JaroWrinkler = iJaro + 0.1 * x * (1 - iJaro)
        Else
          JaroWrinkler = iJaro
        End If
      Else
        JaroWrinkler = 0
      End If
    Else ' return 1 result if the keyword is within the search string
      JaroWrinkler = 1
    End If
  Else ' return a 1 result if the string are the same
    JaroWrinkler = 1
  End If
End Function

Download this snippet    Add to My Saved Code

Jaro-Winkler String Comparison Comments

No comments have been posted about Jaro-Winkler String Comparison. Why not be the first to post a comment about Jaro-Winkler String Comparison.

Post your comment

Subject:
Message:
0/1000 characters