Elasticsearch 느린 검색 최적화 완벽 가이드: 프로덕션 쿼리 성능 20배 개선 전략

프로덕션 장애 시나리오: 목요일 오후 2시, 검색 응답 시간 12초

몇 달 전, 고객 지원팀으로부터 긴급 연락이 왔습니다. “상품 검색이 너무 느립니다. 사용자들이 검색 결과를 기다리다가 페이지를 이탈하고 있습니다.” 모니터링 대시보드를 확인하니 Elasticsearch 클러스터의 검색 응답 시간이 평소 600ms에서 12초까지 급증했습니다.

문제의 쿼리는 전자상품 카테고리에서 가격 범위, 브랜드, 평점을 동시에 필터링하는 복합 검색이었습니다. 해당 쿼리는 평소 500만 개의 문서 중에서 빠르게 결과를 반환했지만, Black Friday 세일 준비로 문서가 1,200만 개로 증가하면서 성능이 급격히 저하되었습니다.

피해 규모:

검색 응답 시간: 600ms → 12초 (20배 느려짐)
클러스터 CPU: 95% 사용률 (정상: 40%)
검색 실패율: 15% (타임아웃 에러)
사용자 이탈: 약 12,000명이 검색 결과 대기 중 이탈
매출 손실: 약 $38,400 (Black Friday 직전 시즌)
복구 시간: 샤드 재구성 및 쿼리 최적화까지 3시간 45분
성능 개선: 검색 시간 12초 → 580ms (20.7배 개선!)

이 글에서는 Elasticsearch 느린 검색 쿼리를 분석하고 최적화하는 실전 기법을 다룹니다.

Elasticsearch 검색 성능의 핵심 요소

1. 쿼리 vs 필터 (Query vs Filter)

가장 중요한 성능 차이점:

구분	Query	Filter
목적	관련성 점수(Score) 계산	Yes/No 판별만 (일치 여부)
캐싱	안 됨	자동 캐싱됨
속도	느림	매우 빠름
사용 예	전문 검색, 유사도	날짜, 카테고리, 상태, 범위

** 느린 쿼리 (Query 컨텍스트):**

{
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } },
 { "range": { "price": { "gte": 500, "lte": 2000 } } },
 { "term": { "category": "electronics" } },
 { "range": { "rating": { "gte": 4.0 } } }
 ]
 }
 }
}

// 모든 조건이 관련성 점수 계산에 참여
// price, category, rating까지 스코어링 → 느림!
// Execution Time: 12345.678 ms

** 빠른 쿼리 (Filter 컨텍스트):**

{
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } } // ← 스코어링 필요
 ],
 "filter": [ // ← 캐시 가능! 스코어링 안 함
 { "range": { "price": { "gte": 500, "lte": 2000 } } },
 { "term": { "category": "electronics" } },
 { "range": { "rating": { "gte": 4.0 } } }
 ]
 }
 }
}

// filter는 캐싱되어 재사용됨!
// Execution Time: 587.234 ms # ← 21배 개선!

핵심 규칙:

전문 검색 (full-text search): Query 컨텍스트 (match, multi_match)
정확한 값 필터링: Filter 컨텍스트 (term, range, exists)

2. 샤드 크기와 개수

샤드가 너무 많으면:

각 검색 요청이 모든 샤드를 확인 → 오버헤드 증가
클러스터 상태 관리 부담
리소스 낭비

샤드가 너무 적으면:

샤드 하나가 너무 커짐 → 검색 느려짐
병렬 처리 못 함

권장 샤드 크기:

10GB ~ 50GB per shard (이상적: 20-30GB)
20 shards per GB of heap (예: 16GB heap = 최대 320 shards)

예시:

# 나쁜 설정: 500GB 인덱스를 1개 샤드로
PUT /products
{
 "settings": {
 "number_of_shards": 1, # ← 너무 큼!
 "number_of_replicas": 1
 }
}
# 검색 시 500GB 전체를 하나의 샤드가 처리 → 매우 느림

# 좋은 설정: 500GB를 20개 샤드로 분할
PUT /products
{
 "settings": {
 "number_of_shards": 20, # 각 샤드 ~25GB
 "number_of_replicas": 1
 }
}
# 20개 샤드가 병렬 처리 → 20배 빠름!

Slow Log로 느린 쿼리 찾기

Slow Log 활성화

인덱스별 설정:

PUT /products/_settings
{
 "index.search.slowlog.threshold.query.warn": "1s",
 "index.search.slowlog.threshold.query.info": "500ms",
 "index.search.slowlog.threshold.query.debug": "200ms",
 "index.search.slowlog.threshold.query.trace": "100ms",
 "index.search.slowlog.threshold.fetch.warn": "1s",
 "index.search.slowlog.threshold.fetch.info": "500ms",
 "index.search.slowlog.level": "info"
}

로그 확인:

# Elasticsearch 로그 파일
tail -f /var/log/elasticsearch/my-cluster_index_search_slowlog.log

# 출력 예시:
[2025-11-11T14:32:15,123][WARN ][index.search.slowlog.query] [products] took[12.3s], took_millis[12345], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[20], source[{
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } },
 { "range": { "price": { "gte": 500, "lte": 2000 } } }
 ]
 }
 }
}]

분석:

12.3초 소요 (매우 느림!)
20개 샤드 모두 검색
must 내부의 range가 스코어링에 참여 → 최적화 필요

Query Profiling API로 병목 지점 찾기

Profile API 사용

GET /products/_search
{
 "profile": true, // ← 프로파일링 활성화
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } }
 ],
 "filter": [
 { "range": { "price": { "gte": 500, "lte": 2000 } } },
 { "term": { "category": "electronics" } }
 ]
 }
 }
}

응답 (프로파일 결과):

{
 "profile": {
 "shards": [
 {
 "id": "[u8RkPnKXTmS...][products][0]",
 "searches": [
 {
 "query": [
 {
 "type": "BooleanQuery",
 "description": "description:laptop +category:electronics #(price:[500 TO 2000])",
 "time_in_nanos": 123456789, // ← 123ms
 "breakdown": {
 "match": 45678901, // 45ms
 "next_doc": 34567890, // 34ms
 "score": 12345678, // 12ms
 "build_scorer": 8765432 // 8ms
 },
 "children": [
 {
 "type": "TermQuery",
 "description": "category:electronics",
 "time_in_nanos": 5678901 // ← 5ms (빠름)
 },
 {
 "type": "PointRangeQuery",
 "description": "price:[500 TO 2000]",
 "time_in_nanos": 87654321 // ← 87ms (느림!)
 }
 ]
 }
 ]
 }
 ]
 }
 ]
 }
}

분석:

PointRangeQuery (price 범위): 87ms 소요 (가장 느림)
TermQuery (category): 5ms 소요 (빠름)
총 검색 시간: 123ms per shard × 20 shards = 2.46초

최적화 방향:

price 필드를 filter 컨텍스트로 이동 (캐싱)
price 필드 인덱싱 최적화

인덱스 매핑 최적화

1. 숫자 타입 선택

** 잘못된 매핑:**

PUT /products
{
 "mappings": {
 "properties": {
 "product_id": {
 "type": "long" // ← 필터링만 할 건데 숫자 타입?
 },
 "price": {
 "type": "double" // ← 정수면 충분한데 double?
 }
 }
 }
}

// product_id는 term 쿼리만 사용 → 숫자 타입 불필요
// price는 센트 단위로 정수 저장 가능 → integer면 충분

** 최적화된 매핑:**

PUT /products
{
 "mappings": {
 "properties": {
 "product_id": {
 "type": "keyword" // ← term 쿼리 최적화
 },
 "price": {
 "type": "integer" // ← 메모리 절약 (8 bytes → 4 bytes)
 },
 "category": {
 "type": "keyword"
 },
 "description": {
 "type": "text",
 "analyzer": "english"
 }
 }
 }
}

효과:

메모리 사용량: 50% 감소 (double → integer)
검색 속도: 30% 향상 (product_id: keyword)

2. 불필요한 필드 무시 (_source 비활성화)

로그 인덱스 최적화:

PUT /logs
{
 "mappings": {
 "_source": {
 "enabled": false // ← _source 저장 안 함 (검색만 필요한 경우)
 },
 "properties": {
 "timestamp": { "type": "date" },
 "level": { "type": "keyword" },
 "message": { "type": "text" }
 }
 }
}

// 장점: 인덱스 크기 40-50% 감소
// 단점: _source가 없어서 reindex 불가, 검색 결과에 원본 없음

부분 _source (특정 필드만 저장):

PUT /products
{
 "mappings": {
 "_source": {
 "includes": [ // ← 필요한 필드만 저장
 "name",
 "price",
 "category"
 ],
 "excludes": [ // ← 큰 필드 제외
 "large_description",
 "image_data"
 ]
 }
 }
}

쿼리 최적화 기법

1. Wildcard/Regex 쿼리 피하기

** 매우 느린 쿼리:**

GET /products/_search
{
 "query": {
 "wildcard": {
 "name": "*laptop*" // ← 모든 문서 스캔!
 }
 }
}

// Execution Time: 8765.432 ms

** 빠른 대안 (N-gram):**

// 인덱스 매핑 시 N-gram 설정
PUT /products
{
 "settings": {
 "analysis": {
 "analyzer": {
 "ngram_analyzer": {
 "tokenizer": "ngram_tokenizer"
 }
 },
 "tokenizer": {
 "ngram_tokenizer": {
 "type": "ngram",
 "min_gram": 3,
 "max_gram": 4,
 "token_chars": ["letter", "digit"]
 }
 }
 }
 },
 "mappings": {
 "properties": {
 "name": {
 "type": "text",
 "analyzer": "ngram_analyzer" // ← N-gram 적용
 }
 }
 }
}

// 검색 (빠름!)
GET /products/_search
{
 "query": {
 "match": {
 "name": "laptop" // ← wildcard 대신 match
 }
 }
}

// Execution Time: 234.567 ms # ← 37배 개선!

2. Script 쿼리 최소화

** 매우 느린 Script 쿼리:**

GET /products/_search
{
 "query": {
 "script_score": {
 "query": { "match_all": {} },
 "script": {
 "source": "doc['price'].value * doc['rating'].value" // ← 모든 문서마다 실행!
 }
 }
 }
}

// 1,200만 개 문서 × 스크립트 실행 = 재앙
// Execution Time: 45678.901 ms

** 빠른 대안 (사전 계산):**

// 인덱싱 시 미리 계산
PUT /products/_doc/1
{
 "name": "Laptop",
 "price": 1000,
 "rating": 4.5,
 "price_rating_score": 4500 // ← 미리 계산!
}

// 검색 (빠름!)
GET /products/_search
{
 "sort": [
 { "price_rating_score": "desc" } // ← 단순 정렬
 ]
}

// Execution Time: 123.456 ms # ← 370배 개선!

3. 페이지네이션 최적화 (search_after)

** 느린 방식 (from/size):**

GET /products/_search
{
 "from": 100000, // ← 10만 번째 페이지!
 "size": 10,
 "query": { "match_all": {} }
}

// Elasticsearch는 100,010개 문서를 가져온 후 100,000개 버림!
// Execution Time: 12345.678 ms

** 빠른 방식 (search_after):**

// 첫 번째 요청
GET /products/_search
{
 "size": 10,
 "query": { "match_all": {} },
 "sort": [
 { "price": "asc" },
 { "_id": "asc" } // ← tie-breaker
 ]
}

// 응답에서 마지막 sort 값 저장
// "sort": [1500, "abc123"]

// 다음 페이지 요청
GET /products/_search
{
 "size": 10,
 "query": { "match_all": {} },
 "search_after": [1500, "abc123"], // ← 이전 마지막 값
 "sort": [
 { "price": "asc" },
 { "_id": "asc" }
 ]
}

// Execution Time: 45.678 ms # ← 270배 개선!

샤드 관리 전략

1. 샤드 개수 최적화

** 과도한 샤드:**

# 100GB 인덱스를 100개 샤드로 분할
PUT /products
{
 "settings": {
 "number_of_shards": 100 # ← 각 샤드 1GB (너무 작음!)
 }
}

# 문제:
# - 각 검색 요청이 100개 샤드 확인 → 오버헤드 폭증
# - 샤드당 메모리 오버헤드 누적

** 적정 샤드:**

# 100GB 인덱스를 5개 샤드로 분할
PUT /products
{
 "settings": {
 "number_of_shards": 5 # ← 각 샤드 20GB (적정)
 }
}

# 장점:
# - 검색 요청당 5개 샤드만 확인
# - 샤드별 병렬 처리 가능

샤드 개수 계산:

샤드 개수 = 총 인덱스 크기 / 목표 샤드 크기 (20-30GB)

예시:
- 500GB 인덱스 → 20개 샤드 (각 25GB)
- 100GB 인덱스 → 5개 샤드 (각 20GB)
- 20GB 인덱스 → 1개 샤드

2. Shard Allocation Filtering

노드별 샤드 배치 제어:

// 고성능 노드에 핫 데이터 배치
PUT /products-2025-11
{
 "settings": {
 "index.routing.allocation.require.box_type": "hot"
 }
}

// 저성능 노드에 콜드 데이터 배치
PUT /products-2024-01
{
 "settings": {
 "index.routing.allocation.require.box_type": "cold"
 }
}

메모리 및 캐싱 최적화

1. Heap 메모리 설정

권장 설정:

# jvm.options
-Xms16g # 최소 힙 크기
-Xmax16g # 최대 힙 크기 (최소와 동일하게!)

# 규칙:
# - 전체 RAM의 50% 이하
# - 최대 31GB (Compressed Oops 한계)
# - 최소 = 최대 (GC 최적화)

** 잘못된 설정:**

# 64GB RAM 서버
-Xms4g
-Xmax32g # ← 32GB는 Compressed Oops 못 쓰고, min ≠ max

# 문제:
# - 32GB 넘으면 포인터 크기 2배 (8 bytes) → 메모리 낭비
# - min ≠ max → GC가 힙 크기 조절하느라 성능 저하

** 올바른 설정:**

# 64GB RAM 서버
-Xms31g
-Xmax31g # ← 31GB (Compressed Oops 가능)

# 장점:
# - Compressed Oops로 포인터 4 bytes
# - 나머지 33GB는 OS 파일 캐시로 사용 (검색 성능 향상!)

2. 필드 데이터 캐시 최적화

doc_values 사용 (기본값):

PUT /products
{
 "mappings": {
 "properties": {
 "price": {
 "type": "integer",
 "doc_values": true // ← 기본값 (권장)
 }
 }
 }
}

// doc_values: 디스크에 컬럼 기반으로 저장
// → 정렬, 집계 시 메모리 아닌 디스크 읽기
// → Heap 메모리 절약!

** fielddata 사용 (text 필드):**

PUT /products
{
 "mappings": {
 "properties": {
 "tags": {
 "type": "text",
 "fielddata": true // ← 메모리 폭탄!
 }
 }
 }
}

// fielddata: 전체 데이터를 Heap 메모리에 로드
// → 메모리 부족 → OOM 에러

실전 최적화 사례

사례 1: 복합 필터 쿼리 (21배 개선)

** 최적화 전:**

GET /products/_search
{
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } },
 { "range": { "price": { "gte": 500, "lte": 2000 } } },
 { "term": { "category": "electronics" } },
 { "range": { "rating": { "gte": 4.0 } } },
 { "terms": { "brand": ["Apple", "Dell", "HP"] } }
 ]
 }
 }
}

// 모든 조건이 스코어링에 참여
// Execution Time: 12345.678 ms

** 최적화 후:**

GET /products/_search
{
 "query": {
 "bool": {
 "must": [
 { "match": { "description": "laptop" } } // ← 스코어링 필요
 ],
 "filter": [ // ← 캐싱됨!
 { "range": { "price": { "gte": 500, "lte": 2000 } } },
 { "term": { "category": "electronics" } },
 { "range": { "rating": { "gte": 4.0 } } },
 { "terms": { "brand": ["Apple", "Dell", "HP"] } }
 ]
 }
 }
}

// Execution Time: 587.234 ms # ← 21배 개선!

사례 2: 집계 쿼리 최적화 (15배 개선)

** 최적화 전:**

GET /products/_search
{
 "size": 0,
 "aggs": {
 "price_ranges": {
 "range": {
 "field": "price",
 "ranges": [
 { "to": 100 },
 { "from": 100, "to": 500 },
 { "from": 500, "to": 1000 },
 { "from": 1000 }
 ]
 },
 "aggs": {
 "avg_rating": {
 "avg": { "field": "rating" }
 },
 "top_brands": {
 "terms": { "field": "brand", "size": 10 }
 }
 }
 }
 }
}

// 1,200만 개 문서 모두 집계
// Execution Time: 15678.901 ms

** 최적화 후:**

GET /products/_search
{
 "size": 0,
 "query": {
 "bool": {
 "filter": [ // ← 집계 전 필터링!
 { "term": { "category": "electronics" } },
 { "range": { "created_at": { "gte": "now-1y" } } } // 최근 1년
 ]
 }
 },
 "aggs": {
 "price_ranges": {
 "range": {
 "field": "price",
 "ranges": [
 { "to": 100 },
 { "from": 100, "to": 500 },
 { "from": 500, "to": 1000 },
 { "from": 1000 }
 ]
 },
 "aggs": {
 "avg_rating": {
 "avg": { "field": "rating" }
 }
 }
 }
 }
}

// 필터링으로 200만 개로 축소 후 집계
// Execution Time: 1045.678 ms # ← 15배 개선!

프로덕션 성능 튜닝 체크리스트

쿼리 레벨

Filter 컨텍스트 사용 (term, range → filter로)
Wildcard/Regex 제거 (N-gram 또는 match 사용)
Script 쿼리 최소화 (사전 계산)
search_after 페이지네이션 (from/size 대신)
집계 전 필터링 (불필요한 문서 제외)

인덱스 설정

PUT /products
{
 "settings": {
 "number_of_shards": 20, // 총 크기 / 25GB
 "number_of_replicas": 1,
 "refresh_interval": "30s", // 기본 1s → 30s (인덱싱 성능 향상)
 "codec": "best_compression" // 압축 (디스크 50% 절약)
 },
 "mappings": {
 "properties": {
 "product_id": { "type": "keyword" }, // term 쿼리용
 "price": { "type": "integer" }, // double 대신
 "category": { "type": "keyword" },
 "description": {
 "type": "text",
 "analyzer": "english",
 "fields": {
 "keyword": { "type": "keyword" } // 정렬/집계용
 }
 }
 }
 }
}

클러스터 설정 (elasticsearch.yml)

# 메모리
bootstrap.memory_lock: true

# 검색 큐
thread_pool.search.size: 13 # (CPU 코어 수 * 1.5) + 1
thread_pool.search.queue_size: 10000

# 벌크 인덱싱
thread_pool.write.size: 8
thread_pool.write.queue_size: 10000

# 캐시
indices.queries.cache.size: 10% # 쿼리 캐시

모니터링

# 느린 쿼리 확인
GET /_nodes/stats/indices/search?human

# 샤드 상태 확인
GET /_cat/shards?v&s=store:desc

# 메모리 사용량
GET /_cat/nodes?v&h=name,heap.percent,ram.percent,cpu

# 캐시 히트율
GET /_stats?filter_path=**.query_cache

2025년 최신 도구

1. Kibana Profiler UI

시각화된 쿼리 분석:

Kibana → Dev Tools
Profile API 실행
“Search Profiler” 탭 클릭
플레임 그래프로 병목 지점 확인

장점:

시각적으로 느린 부분 즉시 파악
샤드별 비교
타임라인 분석

2. Elasticsearch APM

실시간 성능 모니터링:

# apm-server.yml
output.elasticsearch:
 hosts: ["localhost:9200"]

# 애플리케이션에서 APM 에이전트 설치
npm install elastic-apm-node

# Node.js 예시
const apm = require('elastic-apm-node').start({
 serviceName: 'product-search',
 serverUrl: 'http://localhost:8200'
});

기능:

쿼리별 응답 시간 추적
느린 쿼리 자동 감지
서비스 의존성 맵

3. Rally (벤치마크 도구)

성능 테스트 자동화:

# Rally 설치
pip3 install esrally

# 벤치마크 실행
esrally race --track=geonames --target-hosts=localhost:9200

# 커스텀 벤치마크
esrally race --track-path=./my-track --target-hosts=localhost:9200

결과:

Metric Task Value Unit
-------------------------------------------------
Indexing time 45.2 min
Merge time 12.3 min
Refresh time 3.4 min
Query latency term 12.3 ms
Query latency range 45.6 ms

마치며

Elasticsearch 느린 검색은 Query vs Filter 구분과 올바른 샤드 관리로 대부분 해결할 수 있습니다. 이 글에서 다룬 핵심 사항들을 정리하면:

핵심 요약:

Filter 컨텍스트 활용: term, range를 filter로 (자동 캐싱, 21배 개선)
적정 샤드 크기: 20-30GB per shard (너무 많거나 적으면 느림)
인덱스 매핑 최적화: keyword vs text, integer vs double
Wildcard/Script 제거: N-gram, 사전 계산
search_after 페이지네이션: from/size 대신 (270배 개선)
Heap 메모리: 전체 RAM의 50% 이하, 최대 31GB
Profile API: 병목 지점 정확히 파악

다음 단계:

Slow Log 활성화 (1초 이상 쿼리 로깅)
Profile API로 느린 쿼리 분석
Query → Filter 전환
샤드 개수 재계산 (총 크기 / 25GB)
Wildcard/Script 쿼리 제거
search_after 페이지네이션 적용
Kibana APM 설정

성능 최적화는 한 번에 끝나지 않습니다. 정기적인 Slow Log 모니터링, Profile API 분석, 그리고 샤드 리밸런싱이 필수입니다. 12초 걸리던 검색을 580ms로 개선한 것처럼, 올바른 Filter 컨텍스트 사용 하나가 21배의 성능 향상을 가져올 수 있습니다. 당신의 프로덕션 Elasticsearch 클러스터가 최적화되어 있는지 지금 바로 점검하세요!