AWS Lambda Cold Start 완벽 해결: 서버리스 레이턴시 최적화 실전 가이드

“API 응답 시간이 갑자기 2초가 넘어요.” 서버리스 아키텍처를 도입한 팀이라면 한 번쯤 들어봤을 말입니다. 평소에는 50ms 내외로 빠르게 응답하던 Lambda 함수가 가끔씩 2초, 심하면 5초 이상 걸리는 현상. 바로 악명 높은 Cold Start 문제입니다.

서버리스의 가장 큰 장점인 “사용한 만큼만 비용 지불”이 때로는 가장 큰 약점이 되기도 합니다. 요청이 없으면 실행 환경이 종료되고, 새 요청이 들어오면 처음부터 환경을 다시 구성해야 하니까요.

이 글에서는 Cold Start가 발생하는 근본적인 원인부터, 프로덕션 환경에서 실제로 효과를 본 최적화 기법들을 상세히 다룹니다. 단순히 “Provisioned Concurrency를 쓰세요” 같은 뻔한 조언이 아닌, 비용과 성능 사이에서 균형을 잡는 실질적인 전략을 공유합니다.

Cold Start, 정확히 무엇이 문제인가

Lambda 함수가 호출되면 AWS는 내부적으로 여러 단계를 거칩니다:

┌─────────────────────────────────────────────────────────────────┐
│                    Cold Start 발생 시                            │
├─────────────────────────────────────────────────────────────────┤
│  1. 실행 환경 생성 (컨테이너 프로비저닝)                          │
│  2. 배포 패키지 다운로드                                         │
│  3. 런타임 초기화                                                │
│  4. 함수 코드 로드 및 의존성 초기화                               │
│  5. Handler 외부 코드 실행 (전역 초기화)                          │
│  6. Handler 함수 실행 ← 여기서부터 과금                           │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    Warm Start 발생 시                            │
├─────────────────────────────────────────────────────────────────┤
│  6. Handler 함수 실행 ← 바로 시작                                │
└─────────────────────────────────────────────────────────────────┘

문제는 1~5번 단계가 수백 밀리초에서 수 초까지 걸릴 수 있다는 점입니다. 런타임과 패키지 크기에 따라 그 차이가 극명하게 나타납니다.

런타임별 Cold Start 시간 비교

실제 벤치마크 데이터를 보면 런타임 선택이 얼마나 중요한지 알 수 있습니다:

런타임	평균 Cold Start	최악의 경우
Python 3.12	200-300ms	500ms
Node.js 20.x	200-350ms	600ms
Go 1.x	100-200ms	400ms
Java 21 (일반)	3,000-5,000ms	10,000ms+
Java 21 (SnapStart)	200-400ms	800ms
.NET 8	400-600ms	1,500ms

Java의 Cold Start가 유독 긴 이유는 JVM 초기화와 클래스 로딩에 시간이 걸리기 때문입니다. 하지만 SnapStart를 사용하면 이 문제를 크게 완화할 수 있습니다.

실전 최적화 전략 1: 패키지 크기 줄이기

가장 기본적이면서도 효과적인 방법입니다. 배포 패키지가 작을수록 다운로드와 압축 해제 시간이 줄어듭니다.

Node.js에서 불필요한 의존성 제거

// package.json - 개선 전
{
  "dependencies": {
    "aws-sdk": "^2.1000.0",      // 전체 SDK - 약 80MB
    "lodash": "^4.17.21",         // 전체 lodash
    "moment": "^2.29.4",          // 무거운 날짜 라이브러리
    "axios": "^1.6.0"
  }
}

// package.json - 개선 후
{
  "dependencies": {
    "@aws-sdk/client-s3": "^3.450.0",    // 필요한 것만 - 약 3MB
    "@aws-sdk/client-dynamodb": "^3.450.0",
    "lodash.get": "^4.4.2",               // 필요한 함수만
    "dayjs": "^1.11.10",                  // 가벼운 대안
    "axios": "^1.6.0"
  }
}

AWS SDK v3는 모듈화되어 있어서 필요한 서비스만 import할 수 있습니다. 이것만으로도 패키지 크기를 70% 이상 줄일 수 있습니다.

esbuild로 번들링하기

// build.js
const esbuild = require('esbuild');

esbuild.build({
  entryPoints: ['src/handler.js'],
  bundle: true,
  minify: true,
  platform: 'node',
  target: 'node20',
  outfile: 'dist/handler.js',
  external: ['@aws-sdk/*'],  // Lambda 런타임에 포함된 SDK는 제외
  treeShaking: true,
}).catch(() => process.exit(1));

Tree shaking을 통해 실제로 사용하는 코드만 포함시키면, 수십 MB의 node_modules가 수백 KB로 줄어드는 마법을 경험할 수 있습니다.

Python에서 최적화하기

# requirements.txt - 개선 전
boto3==1.34.0
pandas==2.1.0
numpy==1.26.0
requests==2.31.0

# requirements.txt - 개선 후
boto3==1.34.0  # Lambda에 기본 포함되어 있으므로 Layer로 분리 가능
httpx==0.25.0  # requests보다 가벼움

pandas와 numpy가 정말 필요한지 다시 생각해보세요. 단순한 데이터 처리라면 내장 모듈만으로 충분한 경우가 많습니다.

# pandas 없이 CSV 처리
import csv
from io import StringIO

def process_csv(csv_content):
    reader = csv.DictReader(StringIO(csv_content))
    return [row for row in reader]

실전 최적화 전략 2: 초기화 코드 최적화

Handler 함수 외부의 코드는 Cold Start 시에만 실행됩니다. 이 특성을 활용하면 Warm Start의 성능을 극대화할 수 있습니다.

연결 재사용 패턴

// handler.js
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
const { DynamoDBDocumentClient, GetCommand } = require('@aws-sdk/lib-dynamodb');

// Handler 외부에서 클라이언트 생성 - Cold Start 시 1회만 실행
const client = new DynamoDBClient({
  maxAttempts: 3,
  requestHandler: {
    connectionTimeout: 3000,
    socketTimeout: 3000,
  },
});
const docClient = DynamoDBDocumentClient.from(client);

// 실제 핸들러
exports.handler = async (event) => {
  // 이미 생성된 클라이언트 재사용
  const result = await docClient.send(
    new GetCommand({
      TableName: process.env.TABLE_NAME,
      Key: { id: event.pathParameters.id },
    })
  );

  return {
    statusCode: 200,
    body: JSON.stringify(result.Item),
  };
};

지연 초기화 (Lazy Initialization)

무거운 초기화가 필요하지만 모든 요청에서 사용하지 않는 경우:

# Python 예시
import os

# 전역 변수로 선언만
_heavy_client = None

def get_heavy_client():
    global _heavy_client
    if _heavy_client is None:
        # 실제로 필요할 때만 초기화
        from some_heavy_library import HeavyClient
        _heavy_client = HeavyClient(
            api_key=os.environ['API_KEY']
        )
    return _heavy_client

def handler(event, context):
    # 특정 조건에서만 heavy client 사용
    if event.get('need_heavy_processing'):
        client = get_heavy_client()
        return client.process(event['data'])

    # 대부분의 요청은 가볍게 처리
    return {'statusCode': 200, 'body': 'OK'}

실전 최적화 전략 3: Provisioned Concurrency 전략적 활용

Provisioned Concurrency는 실행 환경을 미리 준비해두는 기능입니다. Cold Start를 완전히 제거할 수 있지만, 비용이 발생합니다.

비용 계산 예시

일반 Lambda 비용 (월 100만 요청, 평균 200ms, 512MB 기준):
- 요청 비용: $0.20
- 컴퓨팅 비용: $1.67
- 총: 약 $1.87/월

Provisioned Concurrency 추가 (10개 인스턴스):
- PC 비용: 10 × 512MB × 730시간 × $0.000004463 = $16.65/월
- 총: 약 $18.52/월

언제 Provisioned Concurrency를 써야 할까요?

P99 레이턴시가 중요한 API: 결제, 인증 등 사용자 경험에 직접 영향을 주는 경우
예측 가능한 트래픽 패턴: 출퇴근 시간, 점심시간 등 피크가 명확한 경우
SLA 요구사항: “응답 시간 500ms 이하” 같은 계약 조건이 있는 경우

Auto Scaling과 조합하기

# serverless.yml
functions:
  api:
    handler: src/handler.main
    provisionedConcurrency: 5  # 최소 보장

resources:
  Resources:
    ApiProvisionedConcurrencyTarget:
      Type: AWS::ApplicationAutoScaling::ScalableTarget
      Properties:
        MaxCapacity: 50
        MinCapacity: 5
        ResourceId: !Sub function:${ApiLambdaFunction}:${ApiLambdaFunction.Version}
        RoleARN: !GetAtt AutoScalingRole.Arn
        ScalableDimension: lambda:function:ProvisionedConcurrency
        ServiceNamespace: lambda

    ApiProvisionedConcurrencyPolicy:
      Type: AWS::ApplicationAutoScaling::ScalingPolicy
      Properties:
        PolicyName: ApiProvisionedConcurrencyPolicy
        PolicyType: TargetTrackingScaling
        ScalingTargetId: !Ref ApiProvisionedConcurrencyTarget
        TargetTrackingScalingPolicyConfiguration:
          TargetValue: 0.7  # 70% 사용률 유지
          PredefinedMetricSpecification:
            PredefinedMetricType: LambdaProvisionedConcurrencyUtilization

이렇게 하면 트래픽에 따라 5~50개 사이에서 자동으로 조절됩니다.

실전 최적화 전략 4: SnapStart 활용 (Java/.NET/Python)

2024년부터 Python과 .NET에서도 SnapStart를 사용할 수 있게 되었습니다. Java의 경우 Cold Start를 90% 이상 줄일 수 있습니다.

SnapStart 동작 원리

┌─────────────────────────────────────────────────────────────────┐
│  1. 함수 배포 시 초기화 실행                                      │
│  2. 메모리 상태 스냅샷 생성 (Firecracker microVM snapshot)        │
│  3. 스냅샷을 암호화하여 저장                                      │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Cold Start 요청 시:                                             │
│  1. 스냅샷에서 메모리 상태 복원 (수십 ms)                         │
│  2. Handler 실행                                                 │
└─────────────────────────────────────────────────────────────────┘

Java Spring Boot에서 SnapStart 적용

// pom.xml
<properties>
    <maven.compiler.source>21</maven.compiler.source>
    <maven.compiler.target>21</maven.compiler.target>
</properties>

<dependencies>
    <dependency>
        <groupId>org.crac</groupId>
        <artifactId>crac</artifactId>
        <version>1.4.0</version>
    </dependency>
</dependencies>

// Application.java
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

@SpringBootApplication
public class Application implements Resource {

    private static DataSource dataSource;

    public static void main(String[] args) {
        Core.getGlobalContext().register(new Application());
        SpringApplication.run(Application.class, args);
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) {
        // 스냅샷 전: 연결 정리
        if (dataSource != null) {
            // DB 연결 종료
        }
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) {
        // 스냅샷 복원 후: 연결 재수립
        dataSource = createDataSource();
    }
}

SnapStart 주의사항

스냅샷에서 복원될 때 고유성이 보장되어야 하는 값들에 주의해야 합니다:

// 잘못된 예시 - 스냅샷 시점의 값이 고정됨
public class BadExample {
    private static final String REQUEST_ID = UUID.randomUUID().toString();  // 위험!
    private static final long STARTUP_TIME = System.currentTimeMillis();     // 위험!
}

// 올바른 예시 - 요청 시점에 생성
public class GoodExample {
    public String handleRequest(Request request) {
        String requestId = UUID.randomUUID().toString();  // 매 요청마다 새로 생성
        long requestTime = System.currentTimeMillis();
        // ...
    }
}

실전 최적화 전략 5: 워밍(Warming) 전략

Provisioned Concurrency 비용이 부담되는 경우, 스케줄된 워밍으로 대안을 마련할 수 있습니다.

CloudWatch Events로 워밍하기

# serverless.yml
functions:
  api:
    handler: src/handler.main
    events:
      - http:
          path: /users/{id}
          method: get
      - schedule:
          rate: rate(5 minutes)
          input:
            warmer: true

// handler.js
exports.main = async (event) => {
  // 워밍 요청 감지
  if (event.warmer) {
    console.log('Warming invocation');
    return { statusCode: 200, body: 'Warmed' };
  }

  // 실제 비즈니스 로직
  return handleRequest(event);
};

동시성을 고려한 워밍

단일 워밍 호출은 하나의 실행 환경만 따뜻하게 유지합니다. 여러 환경을 워밍하려면:

// warmer/handler.js
const { LambdaClient, InvokeCommand } = require('@aws-sdk/client-lambda');

const lambda = new LambdaClient({});

exports.handler = async () => {
  const functionName = process.env.TARGET_FUNCTION;
  const concurrency = parseInt(process.env.WARM_CONCURRENCY) || 5;

  // 동시에 여러 인스턴스 호출
  const promises = Array(concurrency).fill().map((_, i) =>
    lambda.send(new InvokeCommand({
      FunctionName: functionName,
      InvocationType: 'RequestResponse',
      Payload: JSON.stringify({ warmer: true, instance: i }),
    }))
  );

  await Promise.all(promises);

  return { warmed: concurrency };
};

모니터링과 측정

최적화의 효과를 측정하지 않으면 개선 여부를 알 수 없습니다.

CloudWatch Insights 쿼리

-- Cold Start 비율 확인
fields @timestamp, @message
| filter @message like /REPORT/
| parse @message /Init Duration: (?<initDuration>[0-9.]+) ms/
| stats
    count(*) as totalInvocations,
    count(initDuration) as coldStarts,
    count(initDuration) / count(*) * 100 as coldStartPercentage,
    avg(initDuration) as avgColdStartMs,
    max(initDuration) as maxColdStartMs
| limit 1

-- 시간대별 Cold Start 분포
fields @timestamp, @message
| filter @message like /REPORT/
| parse @message /Init Duration: (?<initDuration>[0-9.]+) ms/
| filter ispresent(initDuration)
| stats count(*) as coldStarts by bin(1h) as hour
| sort hour asc

X-Ray로 상세 분석

// X-Ray 추적 활성화
const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));

exports.handler = async (event) => {
  const segment = AWSXRay.getSegment();

  // 초기화 시간 측정
  const initSubsegment = segment.addNewSubsegment('Initialization');
  await initialize();
  initSubsegment.close();

  // 비즈니스 로직 측정
  const processSubsegment = segment.addNewSubsegment('Processing');
  const result = await processRequest(event);
  processSubsegment.close();

  return result;
};

정리: 상황별 최적화 전략 선택 가이드

상황	권장 전략	예상 효과
비용 민감, 트래픽 불규칙	패키지 최적화 + 워밍	Cold Start 50% 감소
낮은 레이턴시 필수 (결제 등)	Provisioned Concurrency	Cold Start 100% 제거
Java 런타임 사용	SnapStart 적용	Cold Start 90% 감소
예측 가능한 트래픽	PC + Auto Scaling	비용 최적화된 Cold Start 제거
간헐적 호출 함수	패키지 최적화만	Cold Start 30-50% 감소

Cold Start는 서버리스의 본질적인 특성에서 비롯된 문제이지만, 적절한 전략을 조합하면 충분히 프로덕션 수준의 성능을 달성할 수 있습니다. 핵심은 비용과 성능 사이의 균형을 찾는 것입니다.

모든 함수에 Provisioned Concurrency를 적용할 필요는 없습니다. 사용자 경험에 직접 영향을 주는 중요한 API에만 선별적으로 적용하고, 나머지는 패키지 최적화와 코드 최적화로 대응하는 것이 현실적인 접근법입니다.