wiki:GSoC2018/Maths/ProjectPlan

Project Structure

Math formula is generally represented in LaTeX or MathML format. Currently, We support only `Presentation MathML` format.

Example: x = 2 + b / c in Presentation MathML format:-

<math>
  <mi> x </mi>
  <mo> = </mo>
  <mn> 2 </mn>
  <mo> + </mo>
  <mfrac>
    <mi> b </mi> 
    <mi> c </mi>
  </mfrac>
</math>

We parse the math equation and create a symbol layout structure. Symbol Layout structure is a visual representation of MathML format. This structure is formed by connected symbols in the math equations by an edge representing the spatial relationship between connected symbols. The spatial relation can be above, below, adjacent, within etc.

symbol Layout structure of above equation:

Symbol pair tuple is generated from the layout tree structure by taking multiple combinations of symbol pairs within certain path distance. Symbol pair tuple format: [S1, S2, path with spatial relation]. Ex. [V!xO!=N] where N stands for next.

==Key points about implementation in Xapian==

  • Math term structure (symbol pair tuple) is different from terms generated from free text, we can't use existing

TermGenerator class. We decided to add a new API class MathTermGenerator to handle equations in MathML format.

  • I planned to store the tree structure in std::vector, this avoids the frequent call to heap memory allocation, hence gives

better performance. I set the equation size as a heuristic and estimated tree structure size and symbol pair tuple size. These values are used to preallocate capacity for std::vector to avoid frequent reallocations. Once we generate symbol pair tuple using the layout tree, memory for the tree will be released.

Last modified 13 months ago Last modified on 09/08/18 11:09:54

Attachments (3)

Download all attachments as: .zip