Behavior change in searches for partial matches when using Elasticsearch 7

Issue

The behavior for searches by a prefix on the content fields has changed. It continues working if you search by title fields.

For example, if you have a content field with the "legal" word on it and you search the "leg" prefix, the behavior changes as

  • With Elasticsearch 6: The content is found
  • With Elasticsearch 7: The content is not found

The behavior of the title field remains unchanged, if the "legal" word is on the title and you search by the "leg" prefix, the content can still be found.

Background

At the technical level, we have detected a change in the Elasticsearch 7 query: previously the portal was searching in the content fields using a match_phrase_prefix clause, now it is a match_phrase clause by default. This change was necessary to be introduced due to a difference between Elasticsearch 6 and 7.

Environment

  • DXP 7.3-7.4

  • DXP 7.2 with Elasticsearch 7

  • DXP 7.1 with Elasticsearch 7

Resolution

This change of behavior is expected, here is the full explanation:

1. Query types

Liferay has two type of query types:

  • title (TitleFieldQueryBuilder) => uses the match_phrase_prefix clause
  • content, description (DescriptionFieldQueryBuilder) => uses the match_phrase clause 

The reason we have two different query types is because we want to differentiate both:

  • "title" type is meant to be more strongly and closely searched because a user is expected to often search for the title of the asset and because the title is often an indication of the content.
  • "content" type is expected to be searched less strenuously as content can be large and is expected to contain more tangential information.

The query type used by each field is decided in the following Liferay code:

This code has the fields are configured in the FieldQueryBuilderFactoryImpl component properties and there is a default query type for all the non-configured fields.

2. Default query type differences between Elasticsearch 6 and 7

The default query type is used for all the fields not configured in the FieldQueryBuilderFactoryImpl component.

It is got in following Liferay code: https://github.com/liferay/liferay-portal/blob/ded0e9390637985231e962e4ad4cfa4639eabb26/modules/apps/portal-search/portal-search/src/main/java/com/liferay/portal/search/internal/query/field/FieldQueryFactoryImpl.java#L71

The default query type changes between the mentioned patch levels and versions:

  • Elasticsearch 6 connector uses the TitleFieldQueryBuilder as the fallback query type in case the field wasn't configured.
  • During the development of the Elasticsearch 7 connector, it was decided that the correct fallback query type should be the DescriptionFieldQueryBuilder

This explains the change, as the fallback query has changed to DescriptionFieldQueryBuilder now, the content field is using match_phrase, that is the new default clause of this version.

3. Unsupported workaround to change the behavior

Warning: This workaround is unsupported.

You can change the behavior of FieldQueryBuilderFactoryImpl to set the match_phrase_prefix to other fields, configuring it with a custom value the properties of this FieldQueryBuilderFactoryImpl  OSGI  component

To configure a custom property of an OSGi component, you can follow the instructions of this community blog post:

If you want to apply this change to the FieldQueryBuilderFactoryImpl, you will have to create a config file com.liferay.portal.search.internal.analysis.FieldQueryBuilderFactoryImpl.config in the [LIFERAY_HOME]/osgi/configs folder with the desired description.fields y title.fields values, for example:

title.fields="name|title|myfield1|myfield2"
description.fields="content_en_US|description"

Important: This configuration change is considered a personalization that would be out of the support service scope.

 

 

 

¿Fue útil este artículo?
Usuarios a los que les pareció útil: 1 de 2