Project Structure
Math formula is generally represented in LaTeX or MathML format. Currently, We support only `Presentation MathML` format.
Example: x = 2 + b / c in Presentation MathML format:-
<math> <mi> x </mi> <mo> = </mo> <mn> 2 </mn> <mo> + </mo> <mfrac> <mi> b </mi> <mi> c </mi> </mfrac> </math>
We parse the math equation and create a symbol layout structure. Symbol Layout structure is a visual representation of MathML format. This structure is formed by connected symbols in the math equations by an edge representing the spatial relationship between connected symbols. The spatial relation can be above, below, adjacent, within etc.
symbol Layout structure of above equation:
Symbol pair tuple is generated from the layout tree structure by taking multiple combinations of symbol pairs within certain path distance. Symbol pair tuple format: [S1, S2, path with spatial relation]. Ex. [V!xO!=N] where N stands for next.
==Key points about implementation in Xapian==
- Math term structure (symbol pair tuple) is different from terms generated from free text, we can't use existing
TermGenerator class. We decided to add a new API class MathTermGenerator to handle equations in MathML format.
- I planned to store the tree structure in std::vector, this avoids the frequent call to heap memory allocation, hence gives
better performance. I set the equation size as a heuristic and estimated tree structure size and symbol pair tuple size. These values are used to preallocate capacity for std::vector to avoid frequent reallocations. Once we generate symbol pair tuple using the layout tree, memory for the tree will be released.
Attachments (3)
- System_diagram.html (3.3 KB) - added by gp1308 14 months ago.
- system_diagram.png (47.6 KB) - added by gp1308 14 months ago.
- slt.png (9.3 KB) - added by gp1308 14 months ago.
Download all attachments as: .zip