Change of behavior in searches after upgrading from Elasticsearch 6 to Elasticsearch 7 connector

Issue

After upgrading from Elasticsearch 6 to Elasticsearch 7, the search by a prefix on the content fields has stopped working. It continues working if you search by title fields.

For example, if you have a content field with the "legal" word on it and you search the "leg" prefix, the behavior changes from Elasticsearch 6 to Elasticsearch 7:

  • Elasticsearch 6: The content is found
  • Elasticsearch 7: The content is not found

The behavior of the title field remains unchanged, if the "legal" word is on the title and you search by the "leg" prefix, the content will be found in both Elasticsearch 6 and 7.

At the technical level, we have detected a change in the Elasticsearch query:

  • Elasticsearch 6: Liferay searches the content fields using a match_phrase_prefix clause
  • Elasticsearch 7: Liferay searches the content fields using a match_phrase clause
Is this change of behavior expected?
Is it possible to configure the old behavior in the new Elasticsearch 7 version?

 

Environment

  • Liferay DXP 7.1
  • Liferay DXP 7.2

 

Resolution

This change of behavior is expected, here is the full explanation:

1. Query types

Liferay has two type of query types:

  • title (TitleFieldQueryBuilder) => uses the match_phrase_prefix clause
  • description (DescriptionFieldQueryBuilder) => uses the match_phrase clause 

The reason we have two different query types is because we want to differentiate both:

  • "title" type is meant to be more strongly and closely searched because a user is expected to often search for the title of the asset and because the title is often an indication of the content.
  • "description" type is expected to be searched less strenuously as content can be large and is expected to contain more tangential information.

The query type used by each field is decided in the following Liferay code:

This code has the fields are configured in the FieldQueryBuilderFactoryImpl component properties and there is a default query type for all the non-configured fields.

2. Default query type differences between Elasticsearch 6 and 7

The default query type is used for all the fields not configured in the FieldQueryBuilderFactoryImpl component.

It is got in following Liferay code: https://github.com/liferay/liferay-portal/blob/ded0e9390637985231e962e4ad4cfa4639eabb26/modules/apps/portal-search/portal-search/src/main/java/com/liferay/portal/search/internal/query/field/FieldQueryFactoryImpl.java#L71

The default query type changes between Elasticsearch 6 and 7 versions:

  • Elasticsearch 6 connector uses the TitleFieldQueryBuilder as the fallback query type in case the field wasn't configured.
  • During the development of the Elasticsearch 7 connector, it was decided that the correct fallback query type should be the DescriptionFieldQueryBuilder

This explains the change of behavior between Elasticsearch 6 and Elasticsearch 7, as the fallback query has changed to DescriptionFieldQueryBuilder now, the content field is using match_phrase, that is the new default clause of this version.

3. Unsupported workaround to change the behavior

Warning: This workaround is unsupported.

You can change the behavior of FieldQueryBuilderFactoryImpl to set the match_phrase_prefix to other fields, configuring it with a custom value the properties of this FieldQueryBuilderFactoryImpl  OSGI  component

To configure a custom property of an OSGi component, you can follow the instructions of this community blog post:

If you want to apply this change to the FieldQueryBuilderFactoryImpl, you will have to create a config file com.liferay.portal.search.internal.analysis.FieldQueryBuilderFactoryImpl.config in the [LIFERAY_HOME]/osgi/configs folder with the desired description.fields y title.fields values.

Important: This configuration change is considered a personalization that would be out of the support service scope.

 

 

 

这篇文章有帮助吗?
0 人中有 0 人觉得有帮助