Friday, March 21, 2014

Elastic Search: Preparing for Prod Deployment

Elastic search comes with pretty good defaults. But they may not be suitable for your prod environment. I am giving here a list of most important configuration options in general with reasons. In all means this might not be the complete list for your requirement, so please explore all possible options you want to look for. One needs to give a serious thought to various configuration options based on requirements of the application before moving to prod environment.

  • Set cluster name to your application specific name, should be different from cluster name in your QA/Dev environments to prevent unwanted node to join the cluster in prod.
    • mycluster
  • Set node name to some meaningful name specific to application
    • mynode
  • Set paths to locations outside of ES installation directory to avoid any overwriting while upgrading to next versions
    • path.conf: path to directory containing elasticsearch.yml and logging.yml
    • path to directory where ES stores index related data
    • path to directory to hold temporary files
    • path.logs: path to directory where ES generates log files
  • Avoid deleting all indices by mistake
    • action.disable_delete_all_indices: true
  • If you do not want to create index automatically set following property to false. This may be desired if you want to add custom settings to index i.e. custom analyzers, number of shards and replicas
    • action.auto_create_index: false
  • Set the following property to false if you want to disable schemaless feature which may desired in prod. ES will not create mapping for unmapped types automatically if set to false.
    • index.mapper.dynamic: false
  • Specify default field for query, search is performed against this field when no field is specified. By default it's _all
    • index.query.default_field: _all
  • Set type of storage, set it to non-blocking I/O type for file system based storage as given below
    • niofs
  • Set the number of shards of an index, 5 is by default. This value can be overridden by specifying it in index settings json while creating index via REST API.
    • index.number_of_shards: n
  • Set the number of replicas of an index, 1 is by default. This value can be overridden by specifying it in index settings json while creating index via REST API.
    • index.number_of_replicas: 1
  • set minimum number of nodes that should come up after full restart of cluster before they start replication of data nodes will start swapping data as soon as they come up if this property is not set. This may result in very high I/O traffic in large clusters.
    • gateway.recover_after_nodes: n
  • Prefer setting node discovery to unicast and disable multicast. Add names of few hosts to seed list of unicast.
    • false
    • ["host1", "host2:port", "host3[portX-portY]"]
  • Set node discovery timeout to higher value for slower/congested network, default value is 3 seconds
    • 3s
  • Set minimum number of nodes to quorum to elect a master in split brain situation. So master is not elected in a split brain part if total number of master-eligible nodes in that part are not more than half of total number of nodes in normal cluster. Here n is total number of master-eligible nodes in cluster. This is a dynamic setting, so it's value can be changed on live cluster as number of nodes changes by using cluster API i.e. curl -XPUT localhost:9200/_cluster/settings -d '{"persistent" : {"discovery.zen.minimum_master_nodes": 3}}'
    • discovery.zen.minimum_master_nodes: (n/2) + 1
  • Disable memory swapping, swapping kills performance. Enabling mlockall tries to lock the process address space so it won’t be swapped.
    • bootstrap.mlockall: true
    • ulimit -l unlimited (unix/linux)        
  • Number of max. open file descriptors, in the machine running ES, should be set to a high value i.e. 65535. Verify this using curl -XGET localhost:9200/_nodes/process?pretty
JVM Heap
JVM heap size, ideally half of the available memory can be set to ES JVM. Set ES_MIN_MEM and ES_MAX_MEM env variables to same value in ES script and enable mlockall as explained above. Verify this using curl -XGET localhost:9200/_nodes/jvm

  • Prefer to use SSD in place of spinning disk. SSDs can give hundred of times faster IO ops.
  • More CPU cores the better. Since ES is asynchronous it can use multiple cores efficiently.
  • Low latency network