Amazon OpenSearch Service - Introduction

Our Blog

Daniel Lacko17.12.202115 minutes

This is the first part of 3-part series on the Amazon OpenSearch service. This part is introductory and contains the necessary theory. If you wish to go straight to deploying, please continue with the upcoming second part of the series.

In the first part, we will take a look at:

  • Introduction to Amazon OpenSearch service.
  • What happened to Amazon the ElasticSearch service?
  • Deployment architecture.
  • Accessibility & Authentication Methods.
  • Service pricing.
  • Cost estimation.

Introduction

Amazon OpenSearch service is a solution for searching, analyzing, and visualizing your data. The data is indexed into performant data structures that enable fast and easy search and filter capabilities. You can further work with the data to build monitoring systems, anomaly detection, alarms, visualizations and put everything into dashboards. OpenSearch is a community-driven open-source fork of ElasticSearch and Kibana. As such, this service still offers an option to deploy either ElasticSearch or OpenSearch. It is up to you. At the moment of writing this post, these versions are available:

  • OpenSearch: 1.0, 1.1
  • ElasticSearch: 1.5, 2.3, 5.1, 5.3, 5.5, 5.6, 6.0, 6.2, 6.3, 6.4, 6.5, 6.7, 6.8, 7.1, 7.4, 7.7, 7.8, 7.9, 7.10

If you opt-in for ElasticSearch, there is also an option to migrate to OpenSearch. Your choice should also take into consideration these notes:

  • Amazon ElasticSearch won't receive any further updates, version 7.10 is the latest you will get.
  • OpenSearch alpha was released in April 2021 and the production version (1.0) was released on the 12th of July, 2021. As such, Amazon ElasticSearch might be more stable and mature than Amazon OpenSearch.
  • If you are interested in what is to come, check OpenSearch Roadmap.

To have a better idea of what it can look like working with OpenSearch, imagine you need to track the performance metrics of your AWS Lambda function. AWS Lambda includes these metrics in the REPORT statement of lambda invocation logs. With the right filter, we can focus on these statements: Discover

In continuation of our effort, we can create visualizations from these metrics and put them on a dashboard: Visualize

These metrics are also a great candidate for anomaly detection. Once you have the anomaly detection in place, you can view the anomaly history: Anomaly detection 1

Anomaly detection depends on the availability of data. In this image you can see that we are missing some data points: Anomaly detection 2

What Happened To Amazon ElasticSearch Service?

To those who are completely unaware of the story, the 'Amazon OpenSearch service' was formerly named 'Amazon ElasticSearch service'. The AWS service was named Amazon ElasticSearch from its release in 2015 until this year. Long story short, there are two sides to this story, Amazon and Elastic companies. The Elastic company did not like the way they were treated both from legal and open source POV. Hence, they changed licensing of their products and Amazon responded by creating a fork from ElasticSearch 7.10 - Amazon OpenSearch. This is also a reason why you won't get any further updates on Amazon ElasticSearch. If you would like to read more on the story, check the Elastic.co blog.

Deployment Architecture

In general, these parameters have the highest impact on the cluster performance:

Availability zones - Deployments using only 1 Availability zone are prone to outages and possibly even to complete loss of data. This can be caused by non-standard events like AWS outages or natural disasters. 1 availability zone also means higher latency for people/resources accessing the OpenSearch cluster from a more remote location. Deployments using 2 or 3 availability zones are using data centers isolated from each other, which makes them immune to AWS outages and natural disasters. The latency is also lower as the deployment covers a wider geographical area.

Master nodes - Master nodes help to offload cluster management and maintenance tasks. They don't hold any data. Having master nodes increases the stability of the cluster. You can deploy 1 master node, but in that case, OpenSearch does not have a backup master node and in case of failure, your cluster will be down. Recommended number of master nodes is 3. You should always choose an odd number of master nodes. There is always only one master node activated and the rest are idle and will be activated in case of failure. You are also charged for the idle nodes and their only use case is the backup of the master node.

Master node instance type - Master node instance type depends on the number of (Master + Data) nodes, indices, and shards. The more data nodes you have, the higher the master instance type should be. AWS recommends the following:

+------------+----------------------------------------------------+ | Node count | Recommended minimum dedicated master instance type | +------------+----------------------------------------------------+ | 1-10 | m5.large.search OR m6g.large.search | +------------+----------------------------------------------------+ | 10-30 | c5.xlarge.search OR c6g.xlarge.search | +------------+----------------------------------------------------+ | 30-75 | c5.2xlarge.search OR c6g.2xlarge.search | +------------+----------------------------------------------------+ | 75-200 | r5.4xlarge.search OR r6g.4xlarge.search | +------------+----------------------------------------------------+

Data nodes - Data nodes hold the data and execute operations like uploading and querying the data. You should always have at least 2 nodes for high availability. The number of nodes depends on the amount of total storage you need and the allowed maximum storage per instance type. You can start with a minimal amount of data nodes with a minimal instance type that can hold your data for your data retention period.

Data node instance type - Data node instance type dictates vCPU count, amount of RAM, and the maximal amount of storage it can hold. It's more or less trial and error until you find a well-balanced configuration. If you are running out of space, but your CPU load is stable, it's better to just add another data node of the same type. If you are having performance issues and your CPU load is constantly high, it's time to scale up and level up the instance type. Use CloudWatch monitoring to review the CPU/Memory/Storage usage.

Warm/Cold storage nodes - Storage nodes are used for read-only data. Warm storage node is used for frequently accessed data. Cold storage node is used for infrequently accessed data. Warm storage is using S3 in the backend with caching solution for reduced latency on top of that. Cold storage is also using S3, but there is no computing involved. Storage nodes can be deployed only to a cluster with dedicated master nodes. When you query data stored in warm storage, the data is moved from Amazon S3 to local storage and processing that requires compute power. If you encounter high latency while querying data from warm storage, you need to scale out or scale in. Good approach is to test the performance of warm storage using example data set from your workload while monitoring warm storage node metrics.

It's an industry standard to develop a project in multiple stages. Your architecture should reflect your use case and requirements for each stage. Each stage comes with different requirements and the closer to the production, the more scaled out/up the resources are, the higher usage/traffic, more data is generated and failure/downtime is less tolerable. In the next few sections, we will describe each stage. Of course, everything depends on the type and scale of the project you are running. Some companies might have a production cluster with requirements not being compliant to even your development cluster. Let's consider these 4 stages:

  • Test
  • Development
  • Staging
  • Production

Test Stage

OpenSearch usage:

  • You are trying out new technology and experimenting
  • You are aggregating only dummy/disposable data for the sake of learning/experimenting
  • Usage/traffic is very low
  • Data throughput is very low
  • Downtime is expected regularly

Architecture recommendations:

  • 1 Availability zone
  • No master nodes
  • 1-2 data nodes
  • Minimal storage per node (10GB)

Development stage

OpenSearch usage:

  • OpenSearch is part of the development process and helps with log inspection and debugging
  • You are aggregating all the data from the development stage of your project
  • Usage/traffic is low
  • Data throughput is low
  • Downtime is expected, but should not hinder the development process too much

Architecture recommendations:

  • 1 Availability zone
  • No master nodes
  • 2-5 data nodes
  • Storage per node according to your data throughput

Staging stage

If your staging stage is available to your customers as an open alpha/beta, then you might consider applying production stage recommendations, otherwise, you can use the development stage recommendations. In the end, it depends on the usage/traffic and data throughput.

Production stage

OpenSearch usage:

  • OpenSearch is a critical part of debugging errors on the production stage
  • You are aggregating all the data from the production stage of your project
  • Usage/traffic is high
  • Data throughput is high
  • Downtime is not desired

Architecture recommendations:

  • 2-3 Availability zones
  • At least 3 master nodes
  • At least 5 data nodes per 1 master node
  • Storage per node according to your data throughput
  • Use warm/cold storage nodes for historical data (good example is moving historical data to storage nodes due to legal obligation of holding the data for certain period of time)

Accessibility & Authentication Methods

Amazon OpenSearch service incorporates a fork of the Kibana dashboard. Through this dashboard, you can access the data stored in OpenSearch and run queries and visualizations on them. The question is, how do you want to access this dashboard? You have 2 options on accessibility:

  • Public access from the Internet
  • VPC access

Authentication is also one of the things you have to configure right from the start. This part of the configuration is solely up to you. You have these options:

  • IAM user / IP-based access
  • HTTP basic auth
  • Cognito
  • SAML (external SSO providers)

Pricing

You are charged based on 4 metrics:

  • Instance uptime (compute) / per hour
  • Type of storage (standard EBS vs. provisioned IOPS) / per month
  • Amount of allocated storage / per month
  • Data transfer / per GB

Amazon OpenSearch is not a cheap service and if your workloads generate a high amount of logs/data, it will also translate into the requirements of the OpenSearch cluster you will need.

Cost Estimation

When it comes to cost estimation, you need to have a good understanding of your cluster requirements, data throughput, and data retention.

  • Cluster requirements: How many data/master nodes do I need?
  • Data throughput: How much data is my project generating? How many logs per minute? How big on average is 1 log?
  • Data retention: How long do I need to keep the data?

Data throughput can be tricky. OpenSearch saves the data with an overhead. The best way to estimate the throughput is to deploy a test cluster for a short period, feed the data into it, and calculate the average. The minimal cluster you can deploy is this:

  • No master nodes
  • 1 data-node t3.small.search (2vCPU, 2GB RAM)
  • Standard EBS storage
  • Allocated 10GB of storage (per node)
  • Total cost: 31.38$/month (Europe/Frankfurt)

Once you have all the numbers, you can use the AWS cost calculator.

Conclusion

This part should give you an idea of what the OpenSearch is, what you can do with it, history with the ElasticSearch, what are the options on deploying the service, how does the pricing work, and how to create a cost estimation.

The second part will be mostly practical as we will guide you through the deployment of publicly accessible OpenSearch.

Useful Links

© 2022 Created by Remastr. All rights reserved.

Company ID: CZ10666648