SPRi - 소프트웨어정책연구소

IS-230

AI 인프라 경쟁에서 소프트웨어의 구조적 역할

강호준AI정책연구실 선임연구원
안성원AI정책연구실 실장

2026.04.09

6341

글자크기

2026년 전 세계 AI 지출은 2.5조 달러에 이를 전망이며, 그중 절반 이상이 서버·가속기·데이터센터 등 인프라에 집중된다. 이 투자의 중심에 NVIDIA GPU가 있으며, 데이터센터 GPU 매출의 약 86%를 차지하는 압도적 지배력을 유지하고 있다. 그러나 이러한 지배력은 단순히 하드웨어(HW) 성능에서 비롯된 것이 아니다. 동일한 H100 칩에서도 소프트웨어(SW) 최적화 수준에 따라 실제 처리량이 3배 이상 차이가 발생하며, NVIDIA가 2006년 CUDA 출시 이후 약 20년간 축적한 SW 생태계가 구조적 진입장벽을 형성하고 있다.

보고서는 AI SW 스택을 프레임워크, 컴파일러, 가속 라이브러리, 드라이버/런타임 4계층으로 구분하고, 각 계층이 HW 종속을 형성하는 기술적 경로를 추적한다. 이를 토대로 세 유형의 종속 메커니즘을 도출한다. CUDA 경로에서만 최적 성능이 발휘되는 성능 종속, JAX-XLA-TPU처럼 SW 선택이 곧 HW를 확정하는 설계 종속, 폐쇄적 드라이버 구조가 HW 대체를 물리적으로 차단하는 구조적 종속이 각각 상이한 메커니즘으로 작동하며, 이 세 유형이 중첩될 때 전환 비용은 기하급수적으로 증가한다. 아울러 vLLM·SGLang 등 오픈소스 추론 서빙 엔진과 LMCache 등 KV 캐시 최적화 계층이 기존 종속 구조를 부분적으로 완화하는 새로운 변수로 부상하고 있다.

주요국·기업을 3유형 종속 프레임워크로 분석한 결과, NVIDIA는 성능 종속과 구조적 종속의 이중 장벽을, Google은 설계 종속이라는 별도 경로를 구축하고 있으며, 화웨이는 3유형 종속 구조를 자국 내에서 복제·내재화하고 있다.

이 분석틀을 K-NPU에 적용하면, 한국 NPU 생태계는 '프레임워크 계층 진입에는 성공했으나, 컴파일러·라이브러리 계층의 성능 격차와 운영 생태계 규모 부족이 시장 확산을 제약하는' 구조적 위치에 놓여 있다. PyTorch 네이티브 지원과 vLLM 통합으로 1단계(프레임워크 진입)는 달성하였으나, 2단계(성능 종속 해소)는 진행 중이며, 3단계(운영 생태계 확보)는 초기 단계이다. 특히 2단계의 성능 격차가 좁혀지지 않으면 3단계의 운영 레퍼런스 축적이 곤란하고, 3단계의 레퍼런스가 없으면 2단계의 투자 정당성 확보가 어려운 순환 구조가 존재한다.

이러한 진단에 기반하여 종속 유형별 정책 대응을 제언한다. 첫째, 성능 종속 해소를 위해 칩 설계 중심의 R&D 지원을 컴파일러·런타임·SDK 등 SW 생태계 전반으로 확대하는 HW-SW 균형 발전 패러다임 전환이 필요하다. 둘째, 성능 종속 완화를 위해 PyTorch 호환성 확보와 OpenXLA·MLIR 등 글로벌 오픈소스 표준 참여를 통해 최적화 격차를 협력적으로 축소해야 한다. 셋째, 구조적 종속 우회를 위해 국가 AI 데이터센터 등 공공부문의 실증환경 제공으로 대규모 운영 레퍼런스를 확보하여 순환 구조를 깨야 한다. 넷째, 3유형 종속이 공통적으로 유발하는 전환비용을 가시화하기 위해 TCO 기반 평가 체계를 도입해야 한다. 다섯째, 모든 정책의 실행 주체인 AI 컴파일러·시스템 SW 전문 인력 양성 체계를 구축해야 한다.

Executive Summary

Global AI spending is projected to reach $2.5 trillion in 2026, with over half flowing into infrastructure. NVIDIA dominates this landscape, capturing roughly 86% of data center GPU revenue. Yet this dominance is not purely a hardware story: on identical H100 chips, software optimization alone can produce over 3x differences in actual throughput. The software ecosystem built over nearly two decades since CUDA's 2006 launch constitutes a structural barrier that competitors cannot easily replicate.

This report classifies the AI software stack into four layers—Framework, Compiler, Acceleration Library, and Driver/Runtime—and derives three distinct lock-in mechanisms: performance lock-in, where optimization asymmetries cause de facto convergence toward specific hardware; design lock-in, where framework-compiler-hardware co-design fixes the hardware path at the point of software selection; and structural lock-in, where the closed-source driver/runtime physically blocks hardware substitution. When these types overlap, switching costs increase exponentially. Meanwhile, open-source inference serving engines such as vLLM and SGLang are emerging as new variables that partially mitigate traditional lock-in structures.

Analyzing major players through this framework reveals that NVIDIA maintains a dual barrier of performance and structural lock-in, Google constructs a separate design lock-in pathway through TPU-XLA-JAX, and Huawei replicates all three lock-in types domestically through Ascend-CANN-MindSpore.

Applying this framework to K-NPU, we diagnose that Korea's NPU ecosystem has successfully entered the framework layer through PyTorch native support and vLLM integration, but faces a sequential three-stage challenge: Stage 1 (framework entry) is achieved, Stage 2 (resolving performance lock-in in compiler and library layers) is in progress, and Stage 3 (building operational ecosystem scale) remains nascent. A circular dependency exists between Stages 2 and 3—performance gaps hinder reference accumulation, while lack of references undermines investment justification.

Based on this diagnosis, we recommend lock-in-type-specific policy responses: (1) resolving performance lock-in by expanding R&D from chip design to the full software stack; (2) mitigating performance lock-in through participation in global open-source projects such as OpenXLA and MLIR; (3) circumventing structural lock-in by creating public-sector demand to build large-scale operational references that break the circular dependency; (4) introducing a TCO evaluation framework to make switching costs across all three lock-in types visible and quantifiable; and (5) establishing talent pipelines for AI compiler and system software specialists as the execution foundation for all policy measures.