Context Navigation

Ticket #225: spelling-formulae.rst

File spelling-formulae.rst, 1.0 KB (added by Olly Betts, 15 years ago)
Simple mathematical model of spelling errors

Line
1	.. -- coding: UTF-8 --
2
3	Starting from Bayes' Theorem, assuming X and A are independent::
4
5	p(X\|A)·p(A) = p(X∩A) = p(A∩X) = p(A\|X)·p(X)
6
7	If X is "the intended word was X" and A is "the typed word was A", let m be the
8	number of edits required to get from X to A, and assume each edit is equally
9	likely with probability p, then we want to pick the most likely X which the
10	user intended, given they actually typed A - i.e. we want to::
11
12	find X to maximise p(X\|A) = p(A\|X)·p(X)/p(A)
13	⇔ maximise p(A\|X)·p(X)
14	⇔ maximise p(A\|X)·freq(X)
15	⇔ maximise pᵐ·freq(X)
16	⇔ maximise log(p).m + log(freq(X))
17
18	Since p < 1, log(p) is negative, so we want to::
19
20	minimise m + log(freq(X))/log(p)
21
22	It's likely X and A aren't actually independent, and that not all edits have
23	equal probability, but this does at least give us a formula which instinctively
24	feels plausible and which has some theoretical justification. We should also
25	be able to optimise it fairly well.
26
27	We do need an estimate of the p though.