wiki:GSoC2012/QueryParser/ErrorRecovery_API

Regarding the flags for error recovery code, after some thinking I came to this following scheme:

  • To place all the error recovery code under a SINGLE flag, say, FLAG_ERROR_RECOVERY. Justification of this flag is given later.
  • To ENABLE this flag by default.
  • To have a new string, named, say "modified_query", in the queryparser_internal. Thus whenever some error recovery code is run (except for BRA after text), the corresponding changes in query shall be made to modified_query.
  • This string "modified_query" shall be different from "corrected_query" in the following ways:
    • corrected_query contains the spelling suggestions for the original query (if any), BUT the original query can still be parsed (thus able to form a Query object without getting parse error), and hence the result of parsing of original query (not the spelling corrected query) are returned.
    • The API users are able to have access to corrected_query via method get_corrected_query() using which they can feed it to something like "Did you meant this ?"
    • modified_query is the modified version of query (corresponding to correcting parse errors) and the results are returned for this modified query (and not for the original query), since if the query is not modified, then it will correspond to parse error and thus no query object could be formed. Thus comparing modified_query with the original query, user will be able to get the info of what all chages (if any), were done on the original query so as to get the query parsed.
    • Introduce a new method for API users: get_modified_query() using which they can feed the modified query to something like: "Showing results for this." This is the same as what google does in many cases.
    • Now, having a flag (FLAG_ERROR_RECOVERY) is wise for users who want to intervene in the error recovery process. Those users should not feel that the query is quietly corrected for them, and they can't intervene. Thus disabling the FLAG_ERROR_RECOVERY would thus disable the error recovery parts of code.


  • Following Pseudo-Code and test-cases represent the above written idea.


/**** Pseudo Code *****/

Xapian::Stem stemmer("english");
Xapian::Database db;
Xapian::QueryParser parser;
parser.set_database(db);
parser.set_stemmer(stemmer);

// qs is the query string, and here I am not giving any explicit
// flags, so that means that default flags shall be used.  FLAG_DEFAULT =
// FLAG_PHRASE|FLAG_BOOLEAN|FLAG_LOVEHATE | FLAG_ERROR_RECOVERY
Xapian::Query query = parser.parse_query(qs);

const string & correction = parser.get_corrected_query_string();
    if (!correction.empty())
	cout << "Did you mean: " << correction << "\n\n";

const string & modified = parser.get_modified_query();	
if (!modified.empty())
    cout << "Showing results for: " << modified << "\n\n";

/**********************/


// Unless any modification in pseudo code is mentioned explicitly, results in the
// following testcases are shown w.r.t. above pseudo code (i.e. using FLAG_DEFAULT)

/**
 * Testcases :
 *
 * 1. example query: a AND (b OR c      // Contains only parse error and no spelling suggestion
 *    
 *    Results:
 *    modified = "a AND (b OR c)"
 *    correction is empty.
 *    
 *    NOTE since default flags are used (and hence FLAG_ERROR_RECOVERY is enabled)
 *    so the parser shall parse the modified query: "a AND (b OR c)" and not the
 *    original query: "a AND (b OR c".
 *    Hence we DON'T GET "parse error".
 *
 *
 *
 * 2. example query(same as 1.): a AND (b OR c  // Contains only parse error and no spelling suggestion
 *    
 *    Here we modify the pseudo code (mentioned above) as follow:
 *
 *        // Following changes means that now we disabling the FLAG_ERROR_RECOVERY flag.
 *        Xapian::Query query = parser.parse_query(qs, 1 | 2 | 4);
 *
 *    Results:
 *    modified = "a AND (b OR c)" // NOTE that disabling the flag doesn't mean that
 *    correction is empty         // *suggestion* is disabled. It simply means that  
 *    We GET "parse error"        // the parser will not automatically parse the modified query.
 *
 *
 *
 * 3. example query: xapain     // Contains only spelling suggestion and no parse error
 *
 *    Results:
 *    modified is empty
 *    correction = "xapian"
 *    
 *    NOTE that the parser just suggests that user MAY have meant "xapian" and not
 *    "xapain". But the parser parses the original query ("xapain") only, and not
 *    the spelling suggested query ("xapian")
 *    Also note that since in this case, modified is empty, hence enabling or
 *    disabling FLAG_ERROR_RECOVERY has no effect.
 *
 *
 *
 * 4. example query: xapain AND (b OR c    // Contains BOTH parse error as well as spelling suggestion
 *
 *    Results:
 *    modified = "xapain AND (b OR c)"  // No spelling suggestions contained in this.
 *    correction:  xapain -> xapian     // No error recovery changes contained in this.
 *    
 *    NOTE since default flags are used (and hence FLAG_ERROR_RECOVERY is enabled)
 *    so the parser shall parse the modified query: "xapain AND (b OR c)" and not the
 *    original query: "xapain AND (b OR c" and nor this query - "xapian AND (b OR c)"
 *    Hence we DON'T GET "parse error".
 *
 *    BUT the parser DOESN'T incorporate the spelling suggestion in the parsed query.
 *    There are two main reasons (that i thought of):
 *        
 *           a. At present we just suggest the spelling suggestion, but we parse the
 *           original query only. Hence not using spelling suggestion in "modified"
 *           shall be compatible with what we are doing till now, since in the default
 *           mode, we will be parsing the modified query, and not the original query 
 *           (IF modified is non-empty, because otherwise, it will correspond to 
 *           "parse error" situation. So in that case, correcting the query seems more
 *           reasonable rather than forming no query object and just saying that there
 *           is some parse error.)
 *
 *           b. For the spellings, we CAN'T be 100% sure as to whether those suggestions
 *           are correct or not. Moreover, we CAN parse the original query, without
 *           incorporating the spelling suggestion. But this is not the case for parse
 *           errors. Without handling parse errors, we CAN'T parse the original query,
 *           hence it's sure that the parser shall give "parse error", if we are not
 *           handling those cases.
 *
 *    So "modified" shall NOT contain any spelling suggestion and will contain only error recovery changes
 *    Whereas "correction" shall not contain any error recovery changes and will contain only spelling suggestions.
 *
 *
 *
 *    In the above 4th example, if we disable the FLAG_ERROR_RECOVERY, then "modified" and
 *    "correction" shall remain same. The only difference will be that we will get "parse error".
 * 
 */


Last modified 7 years ago Last modified on 28/06/12 16:50:34