MT5: mt-search.cgi query containing bare word beginning with "not" does not return expected results

Description

Copied from https://movabletype.atlassian.net/browse/MTC-5424#icft=MTC-5424 for MT5.

Issue:

Public searches which contain a bare (unquoted) word beginning with the characters 'not' such as 'notebook' do not return search results which contains the word.

Steps to reproduce:

1. Create and publish an entry containing the word "notebook" in the entry body.

2. Click the View Site button to browse to the weblog's main index page.

3. Enter "notebook" in the Search box (without the quotes), and click the Search button.

Expected result:

Entry containing the word "notebook" is displayed in search results.

Actual result:

All other entries (if any) are displayed in search results.

Search queries for "notebook" (with quotes) or "+notebook" (without quotes) work as expected.

Cause:

The Lucene::QueryParser perl module (used by MT::App::Search and shipped with the Movable Type distribution in /extlib) does not correctly parse search terms for the 'not' logical operator. From extlib/Lucene/QueryParser.pm, line 25:

elsif (s/^(

!

NOT)\s*//i){ $item

>{type} = "PROHIBITED"; }

The above regex will match on 'not' as a logical operator at the beginning of a search word regardless of what immediately follows the 'not' characters.

A search for the word "notebook" is parsed into "not ebook", which results in incorrect search results being returned.

Suggested resolution:

Modify the above regex at line 25 in Lucene::QueryParser to only match on the characters 'not' at the beginning of a word as a 'not' logical operator only if 'not' is immediately followed by a non-word character or colon (extracted from Plucene::QueryParser):

elsif (s/(-|!|NOT(?=[\w:]))\s*//i){ $item->{type} = "PROHIBITED"; }

Additional suggestion:

The regex which parses a search query for the 'and' and 'or' logical operators only matches on upper case 'AND' and 'OR', while the regex which parses a search query for the 'not' logical operator is case-insensitive. I think both regexes should behave in the same manner - the regex which parses a search query for the 'not' logical operator should also match only on upper case 'NOT':

elsif (s/(-|!|NOT(?=[\w:]))\s*//){ $item->{type} = "PROHIBITED"; }

Environment

None

Checklists

None

Smart Checklist

Activity

Show:

Reiko SatoMarch 13, 2020 at 2:12 AM


ありがとうございます。

Kenichi IshigakiMarch 13, 2020 at 2:10 AM


このチケット自体はMTが利用している外部モジュールを修正するものなのでクローズでよいとおもいます。いま利用しているモジュールは2003年時点ですでにメンテ終了しているものなので、正直もう入れ替えた方がいいとはおもうのですが、なやましいところです。

Reiko SatoMarch 13, 2020 at 1:59 AM


7.3でも再現しました

AnonymousMay 18, 2019 at 2:56 PM

contribution christmas artificial
[url=https://myheadlinks.cf/b6087.html]postcards interactions /url
categories proceeding arguments
converted achievements databases

Yuji TakayamaJuly 22, 2014 at 3:49 AM

Still true in 6.0.3.

We should open a ticket on CPAN and talk with the maintainer / developer.

Won't Fix

Details

Assignee

Reporter

Fix versions

External issue ID

Components

Priority

Smart Checklist

Created May 6, 2010 at 2:56 AM
Updated January 24, 2022 at 12:39 AM
Resolved June 12, 2018 at 3:10 AM