Configuring Elasticsearch

Liferay DXP is an open source project, so you won’t be surprised to learn that its default search engine is also an open source project. Elasticsearch is a highly scalable, full-text search and analytics engine.

By default, Elasticsearch runs as an embedded search engine, but it’s only supported in production as a separate server or cluster. This guide walks you through the process of configuring Elasticsearch.

If you’d rather use Solr, it’s also supported. See here for information on installing and configuring Solr.

To get up and running quickly with Elasticsearch as a remote server, refer to the Installing Elasticsearch article. In that article you’ll find the basic instructions for the installation and configuration of Elasticsearch in a single server environment. This article includes more details and information on clustering and tuning Elasticsearch. In this article you’ll learn to configure your existing Elasticsearch installation for use in production environments.

If you’ve come here looking for information on search engines in general, or the low level search infrastructure of Liferay DXP, refer instead to the developer tutorial Introduction to Liferay Search.

These terms will be useful to understand as you read this guide:

  • Elasticsearch Home refers to the root folder of your unzipped Elasticsearch installation (for example, elasticsearch-2.4.0).

  • Liferay Home refers to the root folder of your Liferay DXP installation. It contains the osgi, deploy, data, and license folders, among others.

Embedded vs. Remote Operation Mode

When you install Liferay DXP, there’s an embedded Elasticsearch already installed. In embedded mode, Elasticsearch search runs in the same JVM to make it easy to test-drive with minimal configuration. Running both servers in the same process has drawbacks:

  • Elasticsearch must use the same JVM options as Liferay DXP.
  • Liferay DXP and Elasticsearch compete for resources.

You wouldn’t run an embedded database like HSQL in production, and you shouldn’t run Elasticsearch in embedded mode in production either. Instead, run Elasticsearch in remote operation mode, as a standalone server or cluster of server nodes.

Configuring Elasticsearch

For detailed Elasticsearch configuration information, refer to the Elasticsearch documentation.

The name of your Elasticsearch cluster is important. When you’re running Elasticsearch in remote mode, the cluster name is used by Liferay DXP to recognize the Elasticsearch cluster. To learn about setting the Elasticsearch cluster name on the Liferay DXP side, refer below to the section called Configuring the Liferay Elasticsearch adapter.

Elasticsearch’s configuration files are written in YAML and kept in the [Elasticsearch Home]/config folder:

  • elasticsearch.yml is for configuring Elasticsearch modules
  • logging.yml is for configuring Elasticsearch logging

To set the name of the Elasticsearch cluster, open [Elasticsearch Home]/config/elasticsearch.yml and specify

cluster.name: LiferayElasticsearchCluster

Since LiferayElasticsearchCluster is the default name given to the cluster in Liferay DXP, this would work just fine. Of course, you can name your cluster whatever you’d like (we humbly submit the recommendation clustery_mcclusterface).1 You can configure your node name using the same syntax (setting the node.name property).

If you’d rather work from the command line than in the configuration file, navigate to Elasticsearch Home and enter

./bin/elasticsearch --cluster.name clustery_mcclusterface --node.name nody_mcnodeface

Feel free to change the node name or the cluster name. Once you configure Elasticsearch to your liking, start it up.

Starting Elasticsearch

Start Elasticsearch by navigating to Elasticsearch Home and typing

./bin/elasticsearch

if you run Linux, or

\bin\elasticsearch.bat

if you run Windows.

To run as a daemon in the background, add the -d switch to either command:

./bin/elasticsearch -d

When you have Elasticsearch itself installed and running, and Liferay DXP installed and running (do that if you haven’t already) you need to introduce Liferay DXP and Elasticsearch to each other. Fortunately, Liferay provides an adapter that helps it find and integrate your Elasticsearch cluster.

Configuring the Liferay Elasticsearch Adapter

The Elasticsearch connector is a module that ships with the Foundation Suite and deployed to the OSGi runtime, titled Liferay Portal Search Elasticsearch. This connector provides integration between Elasticsearch and Liferay DXP. Before you configure the adapter, make sure Elasticsearch is running.

There are two ways to configure the adapter:

  1. Use the System Settings application in the Control Panel.

  2. Manually create an OSGi configuration file.

It’s convenient to configure the Elasticsearch adapter from System Settings, but this is often only possible during development and testing. If you’re not familiar with System Settings, you can read about it here. Even if you need a configuration file so you can use the same configuration on another Liferay DXP system, you can still use System Settings. Just make the configuration edits you need, then export the .config file with your configuration.

Configuring the Adapter in the Control Panel

Here are the steps to configure the Elasticsearch adapter from the System Settings application:

  1. Start Liferay DXP.

  2. Navigate to Control PanelConfigurationSystem SettingsFoundation.

  3. Find the Elasticsearch entry (scroll down and browse to it or use the search box) and click the Actions icon (Actions), then Edit.

    Figure 1: Use the System Settings application in Liferay DXPs Control Panel to
configure the Elasticsearch
adapter.

    Figure 1: Use the System Settings application in Liferay DXP's Control Panel to configure the Elasticsearch adapter.

  4. Change Operation Mode to Remote, and then click Save.

    Figure 2: Set Operation Mode to Remote from System
Settings.

    Figure 2: Set Operation Mode to *Remote* from System Settings.

  5. After you switch operation modes (EMBEDDEDREMOTE), you must trigger a re-index. Navigate to Control PanelServer Administration, find the Index Actions section, and click Execute next to Reindex all search indexes.

Configuring the Adapter with an OSGi .config File

When preparing a system for production deployment, you want to set up a repeatable deployment process. Therefore, it’s best to use the OSGi configuration file, where your configuration is maintained in a controlled source.

Follow these steps to configure the Elasticsearch adapter using an OSGi configuration file:

  1. Create the following file to configure the default adapter (for Elasticsearch 2.4):

     [Liferay_Home]/osgi/configs/com.liferay.portal.search.elasticsearch.configuration.ElasticsearchConfiguration.config
    

    To configure the Liferay Connector to Elasticsearch 6, name your file thus:

     [Liferay_Home]/osgi/configs/com.liferay.portal.search.elasticsearch6.configuration.ElasticsearchConfiguration.config
    
  2. Add this to the configuration file you just created:

     operationMode="REMOTE"
     # If running Elasticsearch from a different computer:
     #transportAddresses="ip.of.elasticsearch.node:9300"
     # Highly recommended for all non-prodcution usage (e.g., practice, tests, diagnostics):
     #logExceptionsOnly="false"
    
  3. Start Liferay DXP or re-index if already running.

As you can see from the System Settings entry for Elasticsearch, there are a lot more configuration options available that help you tune your system for optimal performance. For a detailed accounting of these, refer to the reference article on Elasticsearch Settings.

What follows here are some known good configurations for clustering Elasticsearch. These, however, can’t replace the manual process of tuning, testing under load, and tuning again, so we encourage you to examine the settings as well as the Elasticsearch documentation and go through that process once you have a working configuration.

Configuring a Remote Elasticsearch Host

In production systems Elasticsearch and Liferay DXP are installed on different servers. To make Liferay DXP aware of the Elasticsearch cluster, set

transportAddresses=[IP address of Elasticsearch Node]:9300

in the Elasticsearch adapter’s OSGi configuration file. List as many or as few Elasticsearch nodes in this property as you’d like. This tells Liferay DXP the IP address or host name where search requests are to be sent. If using System Settings, set the value in the Transport Addresses property.

On the Elasticsearch side, set the network.host property in your elasticsearch.yml file. This property simultaneously sets both the bind host (the host Elasticsearch listens on for requests) and the publish host (the host name or IP address Elasticsearch uses to communicate with other nodes). See here for more information.

Clustering Elasticsearch in Remote Operation Mode

Clustering Elasticsearch is easy. Each time you run the Elasticsearch start script, a new local storage node is added to the cluster. If you want four nodes running locally, for example, just run ./bin/elasticsearch four times. If you only run the start script once, you have a cluster with just one node.

Elasticsearch’s default configuration works for a cluster of up to ten nodes, since the default number of shards is 5, while the default number of replica shards is 1:

index.number_of_shards: 5
index.number_of_replicas: 1

For more information on configuring an Elasticsearch cluster, see the documentation on Elasticsearch Index Settings.

Advanced Configuration of the Liferay Elasticsearch Adapter

The default configurations for Liferay’s Elasticsearch adapter module are set in a Java class called ElasticsearchConfiguration.

While the Elasticsearch adapter has a lot of configuration options out of the box, you might find an Elasticsearch configuration you need that isn’t provided by default. In this case, add the configuration options you need. If something is configurable for Elasticsearch, its configurable using the Elasticsearch adapter.

Adding Settings and Mappings to the Liferay Elasticsearch Adapter

The available configuration options are divided into two groups: the ones you’ll use most often by default, and a catch-all for everything else. So if the necessary setting isn’t available by default, you can still configure it with the Liferay Elasticsearch adapter. Just specify the settings you need by using one or more of the additionalConfigurations, additionalIndexConfigurations, or additionalTypeMappings settings.

Figure 3: You can add Elasticsearch configurations to the ones currently available
in System Settings.

Figure 3: You can add Elasticsearch configurations to the ones currently available in System Settings.

Adding Configurations

additionalConfigurations is used to define extra settings (defined in YAML) for the embedded Elasticsearch or the local Elasticsearch client when running in remote mode. In production, only one additional configuration can be added here:

client.transport.ping_timeout

The rest of the settings for the client are available as default configuration options in the Liferay Elasticsearch adapter. See the Elasticsearch Settings reference article for more information. See the Elasticsearch documentation for a description of all the client settings and for an example.

Adding Index Configurations

additionalIndexConfigurations is used to define extra settings (in JSON or YAML format) that are applied to the Liferay DXP index when it’s created. For example, you can create custom analyzers and filters using this setting. For a complete list of available settings, see the Elasticsearch reference.

Here’s an example that shows how to configure analysis that can be applied to a field or dynamic template (see below.

{  
    "analysis": {
        "analyzer": {
            "kuromoji_liferay_custom": {
                "filter": [
                    "cjk_width",
                    "kuromoji_baseform",
                    "pos_filter"
                ],
                "tokenizer": "kuromoji_tokenizer"
            }
        },
        "filter": {
            "pos_filter": {
                "type": "kuromoji_part_of_speech"
            }
        }
    }
}

Adding Type Mappings

additionalTypeMappings is used to define extra mappings for the LiferayDocumentType type definition, which are applied when the index is created. Add mappings using JSON syntax. For more information see here and here. Use additionalTypeMappings for new field (properties) and dynamic template mappings, but do not try to override existing mappings. If any of the mappings set here overlap with existing mappings, index creation will fail. Use overrideTypeMappings to replace the default mappings.

As with dynamic templates, you can add sub-field mappings to Liferay DXP’s type mapping. These are referred to as properties in Elasticsearch.

{ 
    "LiferayDocumentType": {  
        "properties": {   
            "fooName": {
                "index": "not_analyzed",
                "store": "yes",
                "type": "string"
            }
        }   
    }
}

Elasticsearch 6: The above property mapping looks different in Elasticsearch 6.1:

{ 
    "LiferayDocumentType": {  
        "properties": {   
            "fooName": {
                "index": "true",
                "store": "true",
                "type": "keyword"
            }
        }   
    }
}

See here for more details on Elasticsearch’s field datatypes.

The above example shows how a fooName field might be added to Liferay DXP’s type mapping. Because fooName is not an existing property in the mapping, it will work just fine. If you try to override an existing property mapping, index creation will fail. Instead use the overrideTypeMappings setting to override properties in the mapping.

To see that your additional mappings have been added to the LiferayDocumentType, navigate to this URL after saving your additions and reindexing:

http://[HOST]:[ES_PORT]/liferay-[COMPANY_ID]/_mapping/LiferayDocumentType?pretty

Here’s what it would look like for an Elasticsearch instance running on localhost:9200, with a Liferay DXP Company ID of 20116:

http://localhost:9200/liferay-20116/_mapping/LiferayDocumentType?pretty

In the above URL, liferay-20116is the index name. Including it indicates that you want to see the mappings that were used to create the index with that name.

Overriding Type Mappings

Use overrideTypeMappings to override Liferay DXP’s default type mappings. This is an advanced feature that should be used only if strictly necessary. If you set this value, the default mappings used to define the Liferay Document Type in Liferay DXP source code (for example, liferay-type-mappings.json) are ignored entirely, so include the whole mappings definition in this property, not just the segment you’re modifying. To make a modification, find the entire list of the current mappings being used to create the index by navigating to the URL

http://[HOST]:[ES_PORT]/liferay-[COMPANY_ID]/_mapping/LiferayDocumentType?pretty

Copy the contents in as the value of this property (either into System Settings or your OSGi configuration file). Leave the opening curly brace {, but delete lines 2-4 entirely:

"liferay-[COMPANY_ID]": {
    "mappings" : {
        "LiferayDocumentType" : {

Then, from the end of the mappings, delete the concluding three curly braces.

        }
    }
}

Now modify whatever mappings you’d like. The changes take effect once you save the changes and trigger a reindex from Server Administration. If you need to add new custom mappings without overriding any defaults, use additionalTypeMappings instead.

Here’s a partial example, of a dynamic template that uses the analysis configuration above to analyze all string fields that end with _ja, overriding the default template_ja mapping.

{
    "LiferayDocumentType": {
        "dynamic_templates": [
            {
                "template_ja": {
                    "mapping": {
                        "analyzer": "kuromoji_liferay_custom",
                        "index": "analyzed",
                        "store": "true",
                        "term_vector": "with_positions_offsets",
                        "type": "string"
                    },
                    "match": "\\w+_ja\\b|\\w+_ja_[A-Z]{2}\\b",
                    "match_mapping_type": "string",
                    "match_pattern": "regex"
                }
                ...
            }
        ]
    }
}

Multi-line YAML Configurations

If you configure the settings from the last section using an OSGi configuration file, you might find yourself needing to write YAML snippets that span multiple lines. The syntax for that is straightforward and just requires appending each line with \n\, like this:

additionalConfigurations=\
                    cluster.routing.allocation.disk.threshold_enabled: false\n\
                    cluster.service.slow_task_logging_threshold: 600s\n\
                    index.indexing.slowlog.threshold.index.warn: 600s\n\
                    index.search.slowlog.threshold.fetch.warn: 600s\n\
                    index.search.slowlog.threshold.query.warn: 600s\n\
                    monitor.jvm.gc.old.warn: 600s\n\
                    monitor.jvm.gc.young.warn: 600s

Troubleshooting Elasticsearch

Sometimes things don’t go as planned. If you’ve set up Liferay DXP with Elasticsearch in remote mode, but Liferay DXP can’t connect to Elasticsearch, check these things:

  • Cluster name: The value of the cluster.name property in Elasticsearch must match the clusterName property you configured for Liferay’s Elasticsearch adapter.

  • Transport address: The value of the transportAddress property in the Elasticsearch adapter must match the port where Elasticsearch is running. If Liferay DXP is running in embedded mode, and you start a standalone Elasticsearch node or cluster, it detects that port 9300 is taken and switches to port 9301. If you then set Liferay’s Elasticsearch adapter to remote mode, it continues to look for Elasticsearch at the default port (9300).

Now you have Elasticsearch configured for use. If you’re a Liferay DXP customer, you can read here to learn about configuring Shield to secure your Elasticsearch data.

Elasticsearch Connector System Settings, By Operation Mode

Some of the settings available for the Elasticsearch connector are applicable for only one operation mode (REMOTE or EMBEDDED). Refer to the table below:

Adapter Setting/Operation ModeEMBEDDEDREMOTE
clusterNamexx
operationModexx
indexNamePrefixxx
indexNumberOfReplicas*xx
indexNumberOfShards*xx
bootstrapMlockAllx-
logExceptionsOnlyxx
retryOnConflictxx
discoveryZenPingUnicastHostsPortx-
networkHostx-
networkBindHostx-
networkPublishHostx-
transportTcpPortx-
transportAddresses-x
clientTransportSniff-x
clientTransportIgnoreClusterName-x
clientTransportPingTimeout*-x
clientTransportNodesSamplerInterval-x
httpEnabledx-
httpCORSEnabledx-
httpCORSAllowOriginx-
httpCORSConfigurationsx-
additionalConfigurationsxx
additionalIndexConfigurationsxx
additionalTypeMappingsxx
overrideTypeMappingsxx

* Note: Available in the Liferay Connector to Elasticsearch 6 only.

Introduction to Liferay Search

Customizing Liferay Search

1 This is, of course, a nod to all those fans of Boaty Mcboatface.

« Preparing to Install ElasticsearchSecuring Elasticsearch with Shield »
¿Fue útil este artículo?
Usuarios a los que les pareció útil: 1 de 1