Simple Mispelling Problem — GNUcode.me

Simple Mispelling Problem

by Joshua Branson — November 22, 2022

Edit: Yes I am aware that I misspelled "mispelling". I figure it's funny if I leave it as it is. :)

I had this simple coding problem that I wanted to solve. Here's the problem:

Suppose you are writing an guix service (like I happen to be), and you are sanitizing user input for various records. Suppose your user mispells an option. Wouldn't it be nice to include a nice helpful hint on what he probably did wrong?

(opensmtpd-option (option "forany"))

error: (option "forany") is invalid. hint: Try "for rcpt-to", "for domain", "for local", "for any", or "for".

Using string-prefix-length-ci, I was able to construct a fairly naive prococedure that tries to guess what the user meant to type. Here's what I came up with:

;; if strings is (list "auth" "for any" "from local")
;; Then this will return "Try \"auth\", \"for any\", or \"from local\"."
(define (try-string strings)
  (string-append "Try "
                 (let loop ((strings strings))
                   (cond ((= 1 (length strings))
                          (string-append
                           "or \"" (car strings) "\".\n"))
                         (else
                          (string-append
                           "\"" (car strings) "\", "
                           (loop (cdr strings))))))))

;; suppose string is "for anys"
;; and strings is (list "for any" "for local" "for domain")
;; then hint-string will return "Did you mean "for any"?"
(define* (hint-string string strings
                      #:key (fieldname #f))
  (if (not (string? string))
      (try-string strings)
      (let loop ((current-max 1)
                 (loop-strings strings)
                 (hint-strings '()))
        (if (null? loop-strings)
            (cond ((= 1 (length hint-strings)) ;; only one worthwhile match
                   (if fieldname
                       (string-append "Did you mean (" fieldname " \""
                                      (car hint-strings) "\") ?\n")
                       (string-append "Did you mean \"" (car hint-strings)
                                      "\"?\n")))
                  (else (if (null? hint-strings)
                            (try-string strings)
                            (try-string hint-strings))))
            (let* ((element-string (car loop-strings))
                   (element-max
                    (string-prefix-length-ci element-string string)))
              (cond ((> element-max current-max)
                     (loop element-max (cdr loop-strings)
                           (list element-string)))
                    ((= element-max current-max)
                     (loop current-max (cdr loop-strings)
                           (cons element-string hint-strings)))
                    (else (loop current-max
                                (cdr loop-strings) hint-strings))))))))

It won't recognize that "or any" or "bor any" should match "for any", but for most mispellings, it should be half decent, provided the user got the first character right.

What do you all think? How would you write such a procedure?

EDIT: Well it turns out that the guix developers actually have a (string-closest) procedure. The relevant code can be found in (guix utils) and (guix combinators):

(define fold2
  (case-lambda
    ((proc seed1 seed2 lst)
     "Like `fold', but with a single list and two seeds."
     (let loop ((result1 seed1)
                (result2 seed2)
                (lst     lst))
       (if (null? lst)
           (values result1 result2)
           (call-with-values
               (lambda () (proc (car lst) result1 result2))
             (lambda (result1 result2)
               (loop result1 result2 (cdr lst)))))))
    ((proc seed1 seed2 lst1 lst2)
     "Like `fold', but with two lists and two seeds."
     (let loop ((result1 seed1)
                (result2 seed2)
                (lst1    lst1)
                (lst2    lst2))
       (if (or (null? lst1) (null? lst2))
           (values result1 result2)
           (call-with-values
               (lambda () (proc (car lst1) (car lst2) result1 result2))
             (lambda (result1 result2)
               (loop result1 result2 (cdr lst1) (cdr lst2)))))))))

(define (string-distance s1 s2)
  "Compute the Levenshtein distance between two strings."
  ;; Naive implemenation
  (define loop
    (mlambda (as bt)
      (match as
        (() (length bt))
        ((a s ...)
         (match bt
           (() (length as))
           ((b t ...)
            (if (char=? a b)
                (loop s t)
                (1+ (min
                     (loop as t)
                     (loop s bt)
                     (loop s t))))))))))

  (let ((c1 (string->list s1))
        (c2 (string->list s2)))
    (loop c1 c2)))

(define* (string-closest trial tests #:key (threshold 3))
  "Return the string from TESTS that is the closest from the TRIAL,
according to 'string-distance'.  If the TESTS are too far from TRIAL,
according to THRESHOLD, then #f is returned."
  (identity                              ;discard second return value
    (fold2 (lambda (test closest minimal)
             (let ((dist (string-distance trial test)))
               (if (and  (< dist minimal) (< dist threshold))
                   (values test dist)
                   (values closest minimal))))
           #f +inf.0
           tests)))

A lot of the above code is a little bit above my head, but it sure looks cool.

And it actually works better than mine.:

;; old scheme code
(display (hint-string "bor any" (list "for any" "auth" "rdns")))

Try "for any", "auth", or "rdns".

;; It didn't match any string.  :(

;; Let's try guix's (string-closest _) ...

(string-closest "bor any" (list "for any" "auth" "rdns"))

$1 = "for any"

Awesome!