Issue
The behavior for searches by a prefix on the content fields has changed. It continues working if you search by title fields.
For example, if you have a content field with the "legal" word on it and you search the "leg" prefix, the behavior changes as
- With Elasticsearch 6: The content is found
- With Elasticsearch 7: The content is not found
The behavior of the title field remains unchanged, if the "legal" word is on the title and you search by the "leg" prefix, the content can still be found.
Background
At the technical level, we have detected a change in the Elasticsearch 7 query: previously the portal was searching in the content fields using a match_phrase_prefix
clause, now it is a match_phrase
clause by default. This change was necessary to be introduced due to a difference between Elasticsearch 6 and 7.
Environment
-
DXP 7.3-7.4
-
DXP 7.2 with Elasticsearch 7
-
DXP 7.1 with Elasticsearch 7
Resolution
This change of behavior is expected, here is the full explanation:
1. Query types
Liferay has two type of query types:
-
title (TitleFieldQueryBuilder) => uses the
match_phrase_prefix
clause -
content, description (DescriptionFieldQueryBuilder) => uses the
match_phrase
clause
The reason we have two different query types is because we want to differentiate both:
- "title" type is meant to be more strongly and closely searched because a user is expected to often search for the title of the asset and because the title is often an indication of the content.
- "content" type is expected to be searched less strenuously as content can be large and is expected to contain more tangential information.
The query type used by each field is decided in the following Liferay code:
This code has the fields are configured in the FieldQueryBuilderFactoryImpl component properties and there is a default query type for all the non-configured fields.
2. Default query type differences between Elasticsearch 6 and 7
The default query type is used for all the fields not configured in the FieldQueryBuilderFactoryImpl component.
It is got in following Liferay code: https://github.com/liferay/liferay-portal/blob/ded0e9390637985231e962e4ad4cfa4639eabb26/modules/apps/portal-search/portal-search/src/main/java/com/liferay/portal/search/internal/query/field/FieldQueryFactoryImpl.java#L71
The default query type changes between the mentioned patch levels and versions:
- Elasticsearch 6 connector uses the TitleFieldQueryBuilder as the fallback query type in case the field wasn't configured.
- During the development of the Elasticsearch 7 connector, it was decided that the correct fallback query type should be the DescriptionFieldQueryBuilder
This explains the change, as the fallback query has changed to DescriptionFieldQueryBuilder now, the content field is using match_phrase, that is the new default clause of this version.
3. Unsupported workaround to change the behavior
Warning: This workaround is unsupported.
You can change the behavior of FieldQueryBuilderFactoryImpl to set the match_phrase_prefix
to other fields, configuring it with a custom value the properties of this FieldQueryBuilderFactoryImpl OSGI component
To configure a custom property of an OSGi component, you can follow the instructions of this community blog post:
If you want to apply this change to the FieldQueryBuilderFactoryImpl, you will have to create a config file com.liferay.portal.search.internal.analysis.FieldQueryBuilderFactoryImpl.config
in the [LIFERAY_HOME]/osgi/configs
folder with the desired description.fields
y title.fields
values, for example:
title.fields="name|title|myfield1|myfield2"description.fields
="content_en_US|description"
Important: This configuration change is considered a personalization that would be out of the support service scope.