Maintaining Search with High-Availability: Concurrent and Sync Reindex Modes (BETA)

Highlights

Reduce downtime, improve operational and resource utilization efficiency with the new Concurrent and Sync reindex execution modes providing high-availability in search capabilities while the operation is running.

Features

  1. Multiple Reindex ModesFull Reindex, Concurrent Reindex (Blue/Green), and Sync Reindex.
  2. High Availability: Continuous service during reindexing operations with select modes.
  3. System Configuration Options: Setting the default reindex mode.
  4. Enhanced User Interface: Grouping of related types, human-friendly names.
  5. Timestamp Implementation: Every document in Elasticsearch is timestamped.
  6. Resource Management: Disk space estimation in Elasticsearch for the Concurrent mode.
  7. Error Management: Fall back to the original index in case Concurrent reindexing fails.

Capabilities

  1. Clean Slate ReindexingFull Reindex mode deletes and recreates indexes.
  2. Parallel Index CreationConcurrent mode creates a new index alongside the old one and populates it.
  3. Update-First Approach: Sync mode updates existing documents without initial deletion.
  4. Configuration Customization: Ability to set the default reindex mode.
  5. Easy Identification and Operation: Improved user interface for better navigation.
  6. Staleness Detection: Timestamps help identify outdated documents.
  7. Preventive Resource Management: Warns users when disk space might be insufficient.
  8. Continuous Operation: Even if reindexing encounters an issue, the system continues to operate (with Concurrent mode).

Benefits

  1. Reduced Downtime: Users can continue to search and access content during reindex operations.
  2. Flexibility: Administrators can choose the reindexing mode that best fits their scenario.
  3. Operation Efficiency: Prevent potential outages or slowdowns due to resource constraints.
  4. Efficient Resource Utilization: Advanced warnings prevent operations that might crash the system due to lack of resources (with Concurrent mode).

Context

Search is a fundamental capability of any modern sites to discover content and products (from now on, content).

To do it right, content has to be stored in a special format, optimized for (full-text) search called a search index living inside a search engine, like Elasticsearch. Content like Web Content Articles, Object entries and their categories or tags; other types like users, organizations etc. are all indexed in Liferay by default.

image01.png

 

The search indexes are not only for serving user searches through a Search Bar though: under the hood, it is also driving many of Liferay’s out-of-the-box applications and features that users interact with over the UI or through headless APIs.

To propagate changes and to make sure that the database and the search index are in sync, Liferay has been offering the ability perform an operation called reindex via Control Panel - Configuration - Search > Index Actions.

image02.png

 

Problems

When We Do a Reindex

Over time, the search index requires maintenance. As Liferay's functionalities evolve, upgrades can introduce changes in how data is indexed, or a failed staging publication or outages in the connection between Liferay and Elasticsearch can also result in stale index data.

How We Do a Reindex

Maintaining the integrity of the search index is a challenge. While reindexing is a remedy, its traditional "delete-first & index again" (learn about later) approach in Liferay is resource-intensive, also leading to noticeable downtimes, subsequently impacting the user experience and system operations negatively.

Business Impact

  1. User Disruption: Traditional reindexing methods result in significant downtimes, disrupting users from accessing relevant content efficiently.
  2. Operational Overheads: Reindexing operations, if not optimized, can consume significant resources, leading to potential system lags or outages.
  3. Data Integrity: Without regular and efficient reindexing, the search index might display stale or irrelevant data, undermining the platform's credibility.
  4. Scalability Concerns: As data volumes grow, reindexing operations without high-availability modes can become increasingly challenging to manage, affecting scalability.
  5. Business Continuity Risks: Outdated or misaligned search indexes can hinder critical business operations that rely on search functionality, leading to potential revenue losses.
  6. Maintenance Costs: Frequent, inefficient reindexing can lead to higher maintenance costs

Desired Outcomes

When we approached this complex problem domain, one of the goals was to provide a way to perform a reindex with minimal or zero impact on the searching and indexing capabilities of the live environment to provide business continuity and high-availability.

What we have done

Starting with DXP 7.4 U98 / DXP 2023.Q4, two new reindex execution modes become available as BETA:

  • Concurrent and
  • Sync

when Liferay is operating with Elasticsearch as the search engine.

To understand the benefits and when to use them, let’s recap first how the Full reindex works (which remains available as the default mode).

image03.png

Control Panel - Configuration - Search > Index Actions with Execution Modes in U98+.

Full Reindex Mode

This means that when executing action,

  • Reindex All Search Indexes: indexes are deleted (erasing all content/data) and then re-created at the beginning of the process and content gets indexed again;

  • Reindex Individual Types (ie, users): documents corresponding to the selected type are deleted from the indexes at the beginning of the process and then content will be indexed again.

Despite the known downsides, this mode does not go away: it remains the default as not all deployments are impacted equally by the negative consequences of the disruptive nature, and there are still certain cases (see later) when it is a viable (and sometimes, the only) solution.

Concurrent Reindex Mode

(With Elasticsearch only)

At the beginning of the process, a second, new (“green”) index is created with the up-to-date storage instructions (aka. field mappings) and content is indexed into it.

Meanwhile, the current, original (“blue”) index will remain in use throughout the whole operation, serving interim searches providing high-availability.

Updates (originating from creating/updating/deleting content or users actions) are sent to both the original and new index at the same time during the operation (this is where the concurrent nature comes from).

Once the new index is populated, the platform deletes the original index and directs requests (both search and write) to the new index.

Sync Reindex Mode

(With Elasticsearch only)

In a nutshell, the Sync reindex mode follows an “index again & delete-last” strategy. This mode starts by updating documents in the index without deleting anything. At the end of the process, any stale documents are deleted according to a timestamp field which is populated on all documents starting with DXP 7.4 U90.

Reindex Modes Comparison

This comparison is here to help understanding the different modes, their main characteristics and when it is recommended to use them.

 

Full

Concurrent

Sync

 Feature Status

GA

BETA

BETA

 Provides High-Availability

 

 Available with Action:
 Reindex All Search Indexes

 Available with Action: Reindex   Single Type

 

 Available with Action: Reindex   Spell-Check Dictionaries

 

 

 Behavior: Index Deleted/Created

 

 Behavior: Field Mappings Updated

 

 Behavior: Documents Updated

 Recommended After: Liferay   Upgrades

 

 Recommended After:   Elasticsearch Upgrades1

 Recommended After: Connection   Outages

 

 

 Recommended After: Other   Uptime Search Issues

 

1 From 7.x to 8.x. Technically, a Full reindex is only required when connecting Liferay to a new, empty Elasticsearch cluster. In other cases, when Elasticsearch is upgraded (so the index data from the previous Elasticsearch cluster is also upgraded) currently a Sync reindex is enough.

 

Considerations

Concurrent mode requires more resources (primary in the form of disk space) in Elasticsearch. To prevent a situation when Elasticsearch would run of out space, the administrator user is presented with a warning confirmation dialog when hitting reindex if the estimated disk space available in Elasticsearch may not be enough to complete the operation.

image04.png

Warn dialog with Concurrent reindex mode in Index Actions.

Other Updates

Besides the Execution Mode selector, the Index Actions layout has received a solid visual revamp:

  • Now there is a confirmation dialog appearing before executing a reindex to avoid triggering a heavy-operation accidentally:

image05.png

Confirmation dialog in Index Actions.

  • Related individual types are grouped together and the human-friendly, localized name is also displayed (besides the Entry Class Name) for each to help administrators finding the right one easier than before (especially in case of Object definitions).

image06.png

Grouped related single types and human-friendly names in Index Actions.

 

When index.on.startup is enabled (not recommended), it is possible to configure the default reindex mode via Control Panel - Configuration - System Settings > Search:Reindex Configuration, defaulting to Full.

image07.png

Default Reindex Execution Mode configuration in System Settings.

 

Learn more.

这篇文章有帮助吗?
0 人中有 0 人觉得有帮助