Elk
Elasticsearch Distributed search and analytics engine - indexing, querying, cluster management, and performance tuning
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It provides near real-time search capabilities and is the core storage and query engine of the ELK Stack.
Concept Description Index A collection of documents with similar characteristics (analogous to a database) Document A JSON object stored in an index (analogous to a row) Shard A subdivision of an index; each shard is a self-contained Lucene index Replica A copy of a primary shard for high availability and read throughput Node A single Elasticsearch instance in a cluster Cluster A collection of nodes that holds all data
PUT /logs -2024
{
"settings" : {
"number_of_shards" : 3 ,
"number_of_replicas" : 1 ,
"refresh_interval" : "5s" ,
"index.lifecycle.name" : "logs-policy"
},
"mappings" : {
"properties" : {
"@timestamp" : { "type" : "date" },
"level" : { "type" : "keyword" },
"service" : { "type" : "keyword" },
"message" : { "type" : "text" , "analyzer" : "standard" },
"host" : { "type" : "keyword" },
"duration_ms" : { "type" : "float" },
"request_id" : { "type" : "keyword" },
"metadata" : { "type" : "object" , "dynamic" : true }
}
}
}
POST /logs -2024 /_doc
{
"@timestamp" : "2024-03-15T10:30:00Z" ,
"level" : "ERROR" ,
"service" : "api-gateway" ,
"message" : "Connection timeout to upstream service" ,
"host" : "prod-api-01" ,
"duration_ms" : 30000 ,
"request_id" : "abc-123"
}
POST /_bulk
{ "index" : { "_index" : "logs-2024" }}
{ "@timestamp" : "2024-03-15T10:30:00Z" , "level" : "INFO" , "message" : "Request processed" }
{ "index" : { "_index" : "logs-2024" }}
{ "@timestamp" : "2024-03-15T10:30:01Z" , "level" : "ERROR" , "message" : "Connection failed" }
GET /logs -2024 /_search
{
"query" : {
"bool" : {
"must" : [
{ "match" : { "message" : "connection timeout" } }
],
"filter" : [
{ "term" : { "level" : "ERROR" } },
{ "term" : { "service" : "api-gateway" } },
{
"range" : {
"@timestamp" : {
"gte" : "2024-03-15T00:00:00Z" ,
"lte" : "2024-03-15T23:59:59Z"
}
}
}
]
}
},
"sort" : [{ "@timestamp" : "desc" }],
"size" : 50
}
GET /logs -2024 /_search
{
"size" : 0 ,
"query" : {
"range" : {
"@timestamp" : { "gte" : "now-24h" }
}
},
"aggs" : {
"errors_per_service" : {
"terms" : { "field" : "service" , "size" : 20 },
"aggs" : {
"error_count" : {
"filter" : { "term" : { "level" : "ERROR" } }
},
"avg_duration" : {
"avg" : { "field" : "duration_ms" }
},
"percentile_duration" : {
"percentiles" : {
"field" : "duration_ms" ,
"percents" : [ 50 , 90 , 95 , 99 ]
}
}
}
},
"errors_over_time" : {
"date_histogram" : {
"field" : "@timestamp" ,
"fixed_interval" : "1h"
},
"aggs" : {
"by_level" : {
"terms" : { "field" : "level" }
}
}
}
}
}
Role Flag Purpose Master node.roles: [master]Cluster state management, index creation/deletion Data node.roles: [data]Stores data, executes search and aggregation Data Hot node.roles: [data_hot]Stores frequently queried recent data (fast SSDs) Data Warm node.roles: [data_warm]Stores less frequently queried data (standard disks) Data Cold node.roles: [data_cold]Stores rarely queried data (large, slow disks) Ingest node.roles: [ingest]Runs ingest pipelines before indexing Coordinating node.roles: []Routes requests, merges results (no data)
# elasticsearch.yml (data node)
cluster.name : production
node.name : data-node-01
node.roles : [ data_hot , ingest ]
path.data : /var/data/elasticsearch
path.logs : /var/log/elasticsearch
network.host : 0.0.0.0
http.port : 9200
transport.port : 9300
discovery.seed_hosts :
- master-01:9300
- master-02:9300
- master-03:9300
# Security
xpack.security.enabled : true
xpack.security.transport.ssl.enabled : true
xpack.security.http.ssl.enabled : true
PUT _ilm/policy/logs-policy
{
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms" ,
"actions" : {
"rollover" : {
"max_primary_shard_size" : "50gb" ,
"max_age" : "1d"
},
"set_priority" : { "priority" : 100 }
}
},
"warm" : {
"min_age" : "7d" ,
"actions" : {
"shrink" : { "number_of_shards" : 1 },
"forcemerge" : { "max_num_segments" : 1 },
"set_priority" : { "priority" : 50 }
}
},
"cold" : {
"min_age" : "30d" ,
"actions" : {
"set_priority" : { "priority" : 0 },
"allocate" : {
"require" : { "data" : "cold" }
}
}
},
"delete" : {
"min_age" : "90d" ,
"actions" : {
"delete" : {}
}
}
}
}
}
PUT _ingest/pipeline/log-pipeline
{
"description" : "Parse and enrich log data" ,
"processors" : [
{
"grok" : {
"field" : "message" ,
"patterns" : [ "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:msg}" ]
}
},
{
"date" : {
"field" : "timestamp" ,
"formats" : [ "ISO8601" ],
"target_field" : "@timestamp"
}
},
{
"geoip" : {
"field" : "client_ip" ,
"target_field" : "geo" ,
"ignore_missing" : true
}
},
{
"remove" : {
"field" : "timestamp"
}
}
]
}
Setting Recommendation Impact refresh_interval30s for bulk indexing, 1s for near real-time Higher interval = better indexing throughput number_of_replicas0 during bulk load, then increase No replication overhead during indexing translog.durabilityasync for high throughputSlight risk of data loss on crash Bulk size 5-15 MB per request Balance between throughput and memory Thread pool Default is usually optimal Monitor rejected count
Strategy Description Use filter context Filters are cached and don't compute relevance scores Prefer keyword fields Exact match on keyword is faster than text search Limit _source fields Return only needed fields with _source: ["field1", "field2"] Use search_after More efficient than from/size for deep pagination Avoid wildcard leading *term is expensive; prefer term* or exact matchShard routing Route related documents to the same shard
# jvm.options
-Xms16g
-Xmx16g
# Never exceed 50% of available RAM or 31GB
# Always set Xms = Xmx to avoid GC pauses
Metric Healthy Range Warning Sign Cluster status Green Yellow or Red JVM heap usage < 75% > 85% sustained Search latency (p99) < 500ms > 1s Indexing rate Stable Sudden drops Disk usage < 80% > 85% per node Pending tasks 0 Growing queue Circuit breaker trips 0 Any trips
# Cluster health
GET _cluster/health
# Node stats
GET _nodes/stats
# Index stats
GET /logs- * /_stats
# Pending tasks
GET _cluster/pending_tasks
# Hot threads (debugging)
GET _nodes/hot_threads
# Shard allocation explanation
GET _cluster/allocation/explain