Project Structure
Math formula is generally represented in LaTeX
or MathML
format. Currently, We support only `Presentation MathML` format.
Example: x = 2 + b / c
in Presentation MathML format:-
<math> <mi> x </mi> <mo> = </mo> <mn> 2 </mn> <mo> + </mo> <mfrac> <mi> b </mi> <mi> c </mi> </mfrac> </math>
We parse the math equation and create a symbol layout structure. Symbol Layout structure is a visual representation of MathML format. This structure is formed by connected symbols in the math equations by an edge representing the spatial relationship between connected symbols. The spatial relation can be above, below, adjacent, within etc.
symbol Layout structure of above equation:
Symbol pair tuple is generated from the layout tree structure by taking multiple combinations of symbol pairs within certain path distance. Symbol pair tuple format: [S1, S2, path with spatial relation]. Ex. [V!xO!=N] where N stands for next.
==Key points about implementation in Xapian==
- Math term structure (symbol pair tuple) is different from terms generated from free text, we can't use existing
TermGenerator
class. We decided to add a new API classMathTermGenerator
to handle equations in MathML format.
- I planned to store the tree structure in
std::vector
, this avoids the frequent call to heap memory allocation, hence gives
better performance. I set the equation size as a heuristic and estimated tree structure size and symbol pair tuple size. These values are used to preallocate capacity for
std::vector
to avoid frequent reallocations. Once we generate symbol pair tuple using the layout tree, memory for the tree will be released.
Attachments (3)
- System_diagram.html (3.3 KB ) - added by 6 years ago.
- system_diagram.png (47.6 KB ) - added by 6 years ago.
- slt.png (9.3 KB ) - added by 6 years ago.
Download all attachments as: .zip