Liferay DXP 7.0 has close ties to Elasticsearch for search-related features. As it happens, Elasticsearch is one part of the ELK stack. So it is natural to ask, "What is possible if I use the whole ELK stack?"
The answer is that there are many opportunities. One option presented here is the building of "dashboards" for DXP 7.0, various charts and graphs to present complex information in a graphical format. These are popular as they can synthesize information into a quickly consumable format.
This article is going to create some dashboard graphs using the ELK stack (assuming you've got the ELK stack installed).
The details presented here were developed and tested using:
- Liferay DXP 7.0 FP22+
- Elasticsearch 2.3.3
- Kibana 4.5.1
- Logstash 2.3.2
- Apache HTTPd 2.4
Liferay DXP
The dashboards being built are going to leverage the Liferay Audit functionality built into Liferay DXP. For that reason these will not work in Liferay Portal 7.0 CE because of missing functionality. Although it is possible to build an auditing layer for Liferay Portal 7.0 CE, this is outside of the scope of this document.
Elasticsearch
As mentioned previously, DXP 7.0 uses Elasticsearch for search-related features. In production environments, ES will be installed separately outside of the application container and likely in a back end system away from public view.
Kibana
Kibana is a visualization tool based off of Elasticsearch. In Kibana one can use search queries against ES and build visualizations as charts and graphs based on the results. This document will present a few simple visualizations, but Kibana supports using complex queries to build fascinating visualizations. Complex queries and visualizations are outside of the scope of this document however.
Kibana should also be installed in the back end, either on an ES server or a dedicated server. Client browsers will need to hit the Kibana server for resources, but this should be done through a proxy. In this document Apache httpd will be used to proxy requests to the Kibana server.
Logstash
The logstash name is misleading, giving the impression that it is a tool for log files. Actually logstash is an Extract, Transform and Load (ETL) tool suited to extracting data from unstructured data sources and sending to another location. While it started as a tool to extract information from log files to load into Elasticsearch as documents, a wide variety of plugins have been developed to allow logstash to work from databases, files, twitter feeds, etc. The filter and transformation mechanisms help facilitate turning extracted data into documents appropriate for loading as ElasticSearch documents.
Beats
A related tool to logstash is the beats framework. Logstash can be a resource-intensive tool, one that may be too heavy to install on all production systems. The beats framework, including the filebeats tool, is the solution. The beats framework provides light-weight agent which will do some local reading and filtering before shipping data over the network to a centralized logstash instance; this allows the beats agents to be installed on systems where the data lives yet offloads the bulk of the logstash processing to another (dedicated) server.
This document does not use the beats tools, but in a production environment it would be advisable to install logstash in the back end (either on the ES server or perhaps a dedicated server) and use filebeats agents on other systems to handle data submission to logstash.
Apache HTTPd
This document is using Apache HTTPd to proxy requests to the Liferay Tomcat instance and the back end Kibana server.
Resolution
The implementation presented here was designed to satisfy the following requirements:
- Use the ELK stack to create a dashboard to show login activity:
- Logins per day for the current week.
- Logins per hour for the current day.
- Use the ELK stack to create a dashboard to show form submissions per day.
- Use the ELK stack to create a dashboard to show page popularity.
As the data needed to build these dashboards is activity based, the Audit framework will be leveraged to handle the data collection. The architecture diagram looks something like:
Using the Audit Framework, the logic used to generate the events is separated from the code that maintains the audit file. Also by leveraging the audit framework, out of the box examples for generating audit messages are available right in the DXP source.
Implementation Details
To build out the dashboards to satisfy the requirements, the following will need to be generated:
- Custom auditing code to generate appropriate audit messages to provide the data needed for the dashboards. The login dashboards can be satisfied using the OOTB User model listener audit event generator.
- An AuditMessageProcessor to receive audit messages and write them to the audit file.
- A logstash configuration file to process the audit file and load documents into ES.
- Kibana visualizations for each of the dashboards.
- Apache setup to proxy messages to the Liferay/Tomcat and Kibana servers.
Auditing Code
For the dashboards, audit information will need to be triggered for the following actions:
- User Logins—Fortunately this is already handled by the OOTB audit event generation code in com.liferay.portal.security.audit.event.generators.events.LoginPostAction. When a login occurs, this post login handler generates an audit message, and these audit messages can be used for the dashboard graphs.
- Form Submissions—When forms are submitted in LR7, the form fields will be serialized and stored as a DDMContent record. So a DDMContent ModelListener would be needed to identify when forms are being saved, updated or deleted. During testing, it was found that DDMContents were saved before a corresponding DDMStorageLink element which is necessary for form data deserialization, so a ModelListener on DDMStorageLinks was added to audit the initial form submit.
- Page views can be audited in many ways, but for this implementation a ServicePreAction is used to identify the page about to be rendered. Using code adapted from the LayoutAction class, the page to be rendered is identified and audited.
In the attached project, the relevant classes are:
- com.liferay.portal.security.audit.event.generators.DDMContentModelListener
- com.liferay.portal.security.audit.event.generators.DDMStorageLinkModelListener
- com.liferay.portal.security.audit.event.generators.AuditServicePreAction
There are also some utility classes in the com.liferay.portal.security.audit.generators.util
package used to create AuditMessage instances.
AuditMessageProcessor
In the DXP audit handling mechanism, AuditMessage instances are put on the LiferayMessageBus. The default message bus listener, com.liferay.portal.security.audit.router.internal.DefaultAuditRouter
, will receive each audit message and forward it to each registered AuditMessageProcessor instance.
In order to leverage the logstash component of the ELK stack, the implementation presented here will be writing the audit messages to a JSON file. Each audit message will be serialized into JSON and written to a rotating file.
The implementation class for this is com.liferay.portal.security.audit.json.log.JsonLoggingAuditMessageProcessor
and it's corresponding class iscom.liferay.portal.security.audit.json.log.JsonLoggingAuditMessageProcessorConfiguration
.
Deploy And Generate Events
Building out visualizations requires some documents available in the ES index. Build and deploy the bundle to a DXP environment, then start generating events. Log in and out a few times, navigate around in the portal, and define a form and start submitting values.
As you go forward in the next steps, keep coming back to the environment and generate some new events - this will give a span of time when all of the events can be reported on.
Logstash Configuration File
The logstash configuration file defines the input, filters and output for the particular process.
Input Configuration
For this implementation, the input is the audit json file:
input { file { # For the demo just using a fixed path to file. path => "/Users/dnebinger/liferay/clients/liferay/elk/bundles/logs/json/audit.json" # Since the file may roll over, we should start at the beginning start_position => beginning # Don't ignore older records ignore_older => 0 # our file is a json file so use it for the codec. codec => "json" } }
Filter Configuration
The filter is going to do some transformations on the read in records:
filter { # Apply some changes to the incoming records mutate { add_field => { # Copy the eventType field to the action field. "action" => "%{eventType}" } # Strip out the values that we don't care to include in the index. remove_field => [ "sessionID","path","host" ] } # If path is provided, clone to the not analyzed and not indexed fields. if "" in [additionalInfo][path] { mutate { add_field => { "[additionalInfo][pathUrl]" => "%{[additionalInfo][path]}" } add_field => { "[additionalInfo][pathString]" => "%{[additionalInfo][path]}" } } } # The audit timestamp is mashed together, we need to extract it out. # Date follows the following format: 20160608135725305 date { match => [ "timestamp", "yyyyMMddHHmmssSSS" ] } }
Output Configuration
For the output, the records will be uploaded to Elasticsearch:
output { # Target elasticsearch elasticsearch { # Point at the backend server. If a cluster we'd use multiple hosts. hosts => ["192.168.1.2"] # Specify the index where our records should go index => "audit-%{+YYYY.MM.dd}" } # For debugging purposes, also write the records to the console. stdout { codec => rubydebug } }
Running Logstash
Preparing the Index
Before loading the index, some fields will be manually defined to disable analysis and indexing. These kinds of changes have to be done before the index is loaded the first time lest all data must be reindexed.
Issue the following command on the command line to pre-define the fields:
curl -XPUT 192.168.1.2:9200/audit/logs/_mapping -d ' { "logs": { "properties": { "additionalInfo": { "properties": { "pathUrl": { "type":"string", "index":"not_analyzed" }, "pathString": { "type":"string", "index":"no" } } } } } } '
When the audit log is processed by logstash, the remaining columns will be added with default settings.
Run Logstash
When the configuration file is ready, logstash can be started. From the logstash directory, run the following command.
bin/logstash agent -f audit.conf
Since some audit events have already been created, when logstash starts it should begin processing the file and you should see some of the messages going by in the console like:
{ "companyId" => "20116", "classPK" => "20164", "clientHost" => "::1", "clientIP" => "::1", "serverName" => "localhost", "className" => "com.liferay.portal.kernel.model.User", "eventType" => "LOGIN", "serverPort" => 80, "userName" => "Test Test", "userId" => "20164", "timestamp" => "20160610231917907", "@version" => "1", "@timestamp" => "2016-06-11T03:19:17.907Z", "action" => "LOGIN" }
As these messages flow by, the records are being consumed from the file, transformed and loaded as documents into Elasticsearch.
Kibana Visualizations
Kibana visualizations actually are created within the Kibana UI. First step will be defining an index pattern. The index pattern is the foundation of building a visualization.
Create Index Pattern
Create an index pattern according to the diagram below:
Define Search Queries
The next step is to build a search query. The first one is the logins by hour for the current day. Click on the Discover tab to begin.
Start by changing the timeframe in the upper right corner. Click on the link and change the timeframe to Today; this defines the timeframe to limit records to display.
Next change the index in the drop down below the Kibana banner to the newly defined audit index pattern. The page will look something like:
From the Available Fields section, add the action, userName and userId fields. This will show the timestamp for the record, the action (like NAVIGATE or LOGIN) and the user details for the event. Since we are building a query for the logins per hour, change the search bar so it reads
action:LOGIN
and hit enter.
Click the Save button to save the query as Logins. This same search will be used for the Logins by Hour and Logins by Day visualizations coming later.
Next is the page hits search. Click the New Search link to start a new search. Set the query to:
action:NAVIGATION
Add the action, userName and additionalInfo.pageUrl to the selected fields. Save this search as Page Hits.
For the form submission search, click the New Search link to start a new search. Set the query to:
action:ADD AND className:com.liferay.dynamic.data.mapping.model.DDMContent
Add the action and className to the selected fields and save this search as Forms Submitted.
Create Visualizations
Click the Visualize link to switch views.
Click on the Vertical Bar Chart to create the Logins by Hour visualization. Select the From a saved search option and select the Logins search.
In the upper right corner, make sure Today is the selected option. Under the Y-Axis section, set the custom label to Logins.
Under the buckets section, select the X-Axis option. Choose the Date Histogram for the Aggregation, set the Interval to Hourly, and set the Custom Label to Hour.
Click the green > button to update the visualization. Hopefully you will see something like:
Save this visualization as Logins By Hour.
For the next visualization, click the New Visualization button, From a saved search and the Logins search. For the X-Axis label, use Day. In the upper right corner, change to the This Week option. Save this visualization as Logins By Day.
For the next visualization, repeat the above steps but use the Forms Submitted search. For the Y-Axis, use Forms and for the X-Axis label use Day. Save this visualization as Forms Submitted By Day.
For the last visualization, click the New Visualization button but use the Pie Chart. Use the Page Hits for the search. Under the Bucket type, select Split Slices and use the Terms for the aggregation. Use the additionalInfo.pathUrl for the field, 10 for the size and Page for the custom label.
The pie chart should look something like:
Choice - Visualization or Dashboard
There are now four different visualizations that can be used individually within Liferay. Another choice is to create a Kibana Dashboard.
Kibana Dashboards are defined on the Kibana side and act as a freeform Liferay page. On a dashboard, visualizations can be added, sized and moved around.
Either can be embedded on a Liferay page although the dashboard option wants to take up a large area of a page. For that reason, individual visualizations may work better.
Placing Visualizations on Liferay
Since the Kibana server is on a back-end server, it will not normally be addressable in the browser, but Kibana needs to expose some resources (js and css) to the browser for the visualizations to work.
The IFrame portlet will be used in Liferay to place the visualization on a page, but Liferay will not proxy the resources.
To expose the Kibana server to the browser, a fronting web server will be used to route most requests to Liferay and some requests to the Kibana server.
In the setup used here, Apache HTTPd was used to send requests to Liferay/Tomcat via AJP and route /kibana/ and /bundles/ requests to the Kibana server. Kibana needed to be configured with a server.basePath setting and the proxy statements defined for httpd. These files are available in the project as src/main/resources/kibana/kibana.yml and src/main/resources/apache/mod_jk.conf.
Getting the Visualization URL
To configure the IFrame portlet, a URL is necessary. These URLs come from Kibana (after the server.basePath has been set and Kibana restarted).
Whether using a visualization or a dashboard, the process is the same. Open the visualization or dashboard that is going to be placed in Liferay and click the Share ... button. Two lines are available, the top is the Embed (IFrame) link and the second is the Shared link. Both links are the same but the Share link does not have the surrounding IFrame tag.
On the right side of each of these links are two buttons. The first is the Generate Short URL button which returns a code string for a complete visualization, the second is the Copy to Clipboard link. Using the short URL is nicer to manage, but the long form has some value too. Pick the form you want to use and copy the value. On the Liferay side, place an IFrame portlet on the page and configure to use the URL as the source URL. If everything is working correctly, the visualization should be on the page:
So the long form of the URL has value too. The URL actually has all of the details used to render the visualization and dashboard. Given some time and some understanding, it is possible to build out or modify Kibana visualizations without going to Kibana to do the work. It's certainly easier to leverage the Kibana UI to define the visualizations, but it is not required.
Additional Information
Creating dashboards will be really easy to do using the ELK stack and leveraging some Liferay functionality. The flexibility of all of the tools in the stack really open the doors to all kinds of possibilities that were not so available before.
Take a spin over on google and do a search for Kibana Dashboard Examples and there are some really cool ones out there. And now they are embeddable within a Liferay page and can be populated by Liferay data too.
This document, these examples and this sample code only scratch the surface of the possibilities, but these are real-world examples.
One project currently underway has requirements for showing dashboards for the numbers of different types of forms submitted over a time period. Using the code from this example, all form submissions result in audit records so all form data is in Elasticsearch in a consumable format. Visualizations based on the forms and the different data types result in the individual dashboard charts the requirements mandate. Client gets all of these snazzy responsive charts and graphs, but the implementation team has to do little more than add audit event generators and define visualizations based upon Elasticsearch searches.