LC-MS/MS 기반 단백체학에서 데이터 획득(Data Acquisition) 방식의 차이와 DIA Deconvolution의 핵심 원리

현대 LC-MS/MS 기반 proteomics에서는 수천~수만 개의 peptide를 동시에 분석해야 합니다.

하지만 질량분석기는 모든 precursor ion을 동시에 완벽하게 fragmentation할 수 없기 때문에, 어떤 precursor를 어떤 방식으로 선택하고 fragmentation할 것인지가 매우 중요합니다.

이러한 MS/MS 데이터 획득 전략(acquisition strategy)은 proteomics 성능 자체를 결정합니다.

현재 proteomics 분야에서는 크게 두 가지 acquisition 방식이 사용됩니다.

DDA (Data-Dependent Acquisition)
DIA (Data-Independent Acquisition)

초기 shotgun proteomics는 대부분 DDA 기반으로 발전했지만, 최근에는 DIA가 proteomics workflow의 핵심 기술로 빠르게 확산되고 있습니다.

특히 DIA는:

높은 reproducibility
deep proteome coverage
강력한 quantitative capability

를 제공하는 대신,

매우 복잡한 chimeric MS/MS spectrum
정교한 spectral deconvolution
AI 기반 peptide identification

을 필요로 합니다.

이번 글에서는 DDA와 DIA의 acquisition 원리, 데이터 구조의 차이, 장단점, 그리고 DIA에서 왜 고해상도 질량분석기와 복잡한 deconvolution 알고리즘이 필수적인지 자세히 설명하겠습니다.

Comparison of DDA and DIA proteomics workflows in LC-MS/MS showing precursor selection, fragmentation strategy, chimeric spectra generation, and DIA deconvolution principles using RT correlation and high-resolution mass spectrometry.

Comparison of DDA (Data-Dependent Acquisition) and DIA (Data-Independent Acquisition) workflows in LC-MS/MS proteomics. The figure illustrates precursor selection, fragmentation mechanisms, chimeric spectrum generation, and the role of chromatographic co-elution and spectral deconvolution in DIA analysis.

Proteomics에서 MS/MS Acquisition이 중요한 이유

LC-MS/MS 기반 proteomics에서는 peptide를 직접 읽는 것이 아니라:

precursor ion 검출
precursor fragmentation
fragment pattern 분석

을 통해 peptide를 추론합니다.

즉 proteomics identification의 핵심은:

어떤 precursor를 선택하는가
어떤 fragment spectrum을 얻는가
fragment를 얼마나 정확하게 분리할 수 있는가

입니다.

이 acquisition 전략이 바로:

입니다.

Data-Dependent Acquisition (DDA)란 무엇인가?

DDA(Data-Dependent Acquisition)는 가장 전통적인 proteomics acquisition 방식입니다.

DDA에서는:

MS1 full scan 수행
intensity가 높은 precursor 자동 선택
선택된 precursor를 fragmentation
MS/MS spectrum 획득

순서로 분석이 진행됩니다.

즉 acquisition이 실제 데이터(signal intensity)에 의존합니다.

그래서 이름이:

Data-Dependent Acquisition

입니다.

DDA Workflow

일반적인 workflow는 다음과 같습니다.

MS1 Full Scan
Top-N precursor selection
Isolation
CID/HCD/ETD fragmentation
MS/MS acquisition

예를 들어 Orbitrap 장비에서는:

MS1에서 intensity가 가장 높은 peptide 20개 선택
sequential fragmentation 수행
각각의 MS/MS spectrum 생성

과 같은 방식이 사용됩니다.

이를 흔히:

Top-N DDA

라고 부릅니다.

예:

Top10
Top20
Top40

등.

DDA의 핵심 특징

DDA는 기본적으로:

signal intensity 기반 precursor selection

방식입니다.

즉 abundance가 높은 peptide가 우선적으로 fragmentation됩니다.

DDA의 장점

1. 상대적으로 깨끗한 MS/MS Spectrum

하나의 precursor를 isolation한 후 fragmentation하기 때문에:

fragment assignment가 명확
spectral interpretation이 비교적 쉬움
database search가 안정적

입니다.

2. Spectral Library 구축에 유리

DDA는 clean spectrum을 제공하므로:

peptide spectral library
PTM spectral database

구축에 매우 적합합니다.

3. Traditional Proteomics Workflow와 호환성 우수

다음과 같은 classical search engine과 잘 맞습니다.

Mascot
Sequest
Andromeda
PEAKS

4. PTM 분석에 강점

복잡한 PTM fragmentation 해석에서는:

cleaner spectrum
명확한 fragment assignment

가 유리한 경우가 많습니다.

DDA의 한계점

하지만 DDA는 매우 중요한 문제를 가지고 있습니다.

1. Stochastic Sampling 문제

DDA는:

intensity 기반 precursor selection

을 수행하기 때문에 run마다 선택되는 precursor가 달라질 수 있습니다.

이를:

stochastic sampling

이라고 합니다.

2. Missing Values 문제

예를 들어:

Run 1에서는 peptide A 선택
Run 2에서는 peptide A 미선택

상황이 발생할 수 있습니다.

특히:

low abundance peptide
co-eluting peptide

에서 자주 발생합니다.

이는 quantitative proteomics에서 큰 문제입니다.

3. Dynamic Range 한계

복잡한 biological sample에서는:

high abundance peptide dominance
low abundance peptide suppression

문제가 발생합니다.

4. Run-to-Run Reproducibility 제한

DDA는 acquisition 자체가 dynamic하기 때문에:

replicate 간 consistency
quantitative reproducibility

가 제한될 수 있습니다.

DIA(Data-Independent Acquisition)란 무엇인가?

DIA(Data-Independent Acquisition)는 DDA의 한계를 극복하기 위해 등장한 acquisition 방식입니다.

핵심 개념은 매우 다릅니다.

DIA에서는:

특정 precursor만 선택하지 않고
넓은 m/z window 전체를 fragmentation

합니다.

즉 acquisition이 precursor intensity에 의존하지 않습니다.

그래서:

Data-Independent Acquisition

이라고 부릅니다.

DIA Workflow

예를 들어 다음과 같은 isolation window를 sequentially fragmentation합니다.

400–425 m/z
425–450 m/z
450–475 m/z
475–500 m/z

각 window 내부의 모든 precursor가 동시에 fragmentation됩니다.

즉 DIA는:

가능한 모든 peptide 정보를 빠짐없이 기록

하려는 전략입니다.

DIA의 핵심 특징

DIA는 기본적으로:

모든 precursor fragmentation을 최대한 보존

하려는 acquisition 방식입니다.

즉:

comprehensive acquisition
reproducibility 향상
quantitative stability 증가

에 강점이 있습니다.

DIA의 장점

1. 높은 Reproducibility

모든 precursor window를 반복적으로 측정하기 때문에:

run-to-run variability 감소
missing values 감소

효과가 큽니다.

2. Quantitative Proteomics에 매우 강함

특히:

large cohort study
clinical proteomics
biomarker discovery

에서 강력한 quantitative stability를 제공합니다.

3. Low-Abundance Peptide 검출 증가

DDA에서는 선택되지 못했던 peptide도:

DIA에서는 fragmentation 가능성 증가

효과가 있습니다.

4. Deep Proteome Coverage

더 많은 peptide 정보를 지속적으로 획득할 수 있습니다.

DIA의 가장 큰 문제: 데이터 복잡도

하지만 DIA에는 매우 큰 문제가 존재합니다.

바로:

동시에 너무 많은 precursor가 fragmentation된다는 점

입니다.

Co-Fragmentation 문제

예를 들어:

425–450 m/z window 내부에:

peptide A
peptide B
peptide C

가 동시에 존재할 수 있습니다.

DIA에서는 이들을:

동시에 fragmentation

합니다.

Chimeric Spectrum이란?

결과적으로 하나의 MS/MS spectrum 안에:

여러 peptide fragment
여러 charge state
여러 isotope pattern

이 혼합됩니다.

이를:

Chimeric Spectrum

이라고 합니다.

즉 DIA 데이터는 본질적으로:

혼합 fragment spectrum

입니다.

왜 DIA에서 Deconvolution이 중요한가?

DIA의 핵심 난제는:

fragment가 어떤 precursor에서 유래했는지 분리하는 것

입니다.

즉:

signal disentangling
fragment assignment
chromatographic correlation

이 필수적입니다.

이를 위해 사용되는 것이:

Spectral Deconvolution

알고리즘입니다.

DIA Deconvolution의 핵심 원리

현대 DIA software는 여러 정보를 동시에 사용하여 혼합 spectrum을 분리합니다.

1. Accurate Mass

고정밀 질량 정확도(ppm-level mass accuracy)를 사용합니다.

예:

5 ppm
sub-ppm accuracy

2. Retention Time (RT)

동일 peptide에서 유래한 fragment는:

동일한 chromatographic retention time profile

을 가집니다.

즉:

같은 시간에 증가하고
같은 시간에 감소합니다.

3. Fragment Co-Elution

동일 precursor에서 유래한 fragment ion들은 아무리 fragmentation되어도:

LC column에서 동일한 retention time profile
동일한 chromatographic apex
유사한 peak shape

를 유지합니다.

따라서 DIA software는:

XIC(Extracted Ion Chromatogram)의 상관계수(correlation coefficient)

를 계산하여:

서로 함께 elution되는 fragment들을 grouping하고
혼합 spectrum을 disentangling합니다.

이 chromatographic correlation은 DIA deconvolution의 핵심 원리 중 하나입니다.

4. Spectral Library Matching

기존 peptide fragmentation library와 비교하여:

peptide identity
fragment consistency

를 확인합니다.

5. Isotope Pattern Consistency

동위원소 패턴의 consistency를 이용해:

false assignment 감소
fragment confidence 증가

를 수행합니다.

DIA에서 고해상도 질량분석기(HRMS)가 중요한 이유

DIA는 데이터 복잡도가 매우 높기 때문에:

높은 mass accuracy
높은 resolving power

가 필수적입니다.

Orbitrap/QTOF가 중요한 이유

고해상도 장비는:

narrow mass tolerance
isotope separation
fragment discrimination
interference reduction

을 가능하게 합니다.

Resolution이 낮으면 발생하는 문제

Low-resolution MS에서는:

fragment overlap 증가
false assignment 증가
co-isolation interference 증가

문제가 심각해집니다.

DIA와 AI 기반 알고리즘

최근 DIA 발전의 핵심은:

software algorithm의 발전

입니다.

특히:

AI
machine learning
neural network scoring

기술이 DIA proteomics에서 매우 중요해지고 있습니다.

대표적인 DIA Software

DIA-NN

딥러닝 기반 DIA 분석 플랫폼.

Spectronaut

대표적인 commercial DIA software.

OpenSWATH

Open-source DIA workflow.

Skyline

Targeted proteomics 및 DIA 지원.

EncyclopeDIA

Library 기반 DIA analysis software.

DIA 분석 Workflow

실제 DIA workflow는 일반적으로 다음 단계를 거칩니다.

Raw DIA Data
Signal Extraction
Spectral Deconvolution
Library Matching
FDR Filtering
Quantification

즉 modern DIA proteomics에서는:

software 자체가 proteomics 성능의 핵심

이 되었습니다.

DDA vs DIA 비교표

항목	DDA	DIA
precursor 선택	intensity 기반	전체 window
데이터 복잡도	상대적으로 낮음	매우 높음
reproducibility	제한적	매우 우수
missing values	많음	적음
low abundance 검출	제한적	우수
spectral purity	높음	낮음
deconvolution 필요성	제한적	매우 중요
software 의존성	중간	매우 높음
quantitative proteomics	제한적	매우 강함

DIA가 Proteomics 패러다임을 바꾸는 이유

최근 proteomics는:

가능한 많은 peptide를 안정적으로 정량

하는 방향으로 발전하고 있습니다.

이 목표에 DIA는 매우 적합합니다.

특히 다음 분야에서 DIA adoption이 빠르게 증가하고 있습니다.

biomarker discovery
clinical proteomics
cohort analysis
longitudinal proteomics
systems biology

Practical Troubleshooting: DIA 분석 시 주의할 점

1. Isolation Window Size

window가 너무 넓으면:

co-fragmentation 증가
chimeric complexity 증가

문제가 발생합니다.

2. Spectral Library 품질

library 품질이 낮으면:

false identification
incorrect matching

이 증가합니다.

3. Retention Time Alignment

RT drift는 DIA 성능에 큰 영향을 줍니다.

4. Mass Calibration

ppm error 증가 시:

deconvolution 성능 저하
peptide assignment 오류

가 증가합니다.

5. Chromatographic Separation

LC separation quality가 낮으면:

co-elution 증가
spectral complexity 증가

문제가 심각해집니다.

Conclusion

DDA와 DIA는 단순 acquisition 차이를 넘어:

현대 proteomics 데이터 해석 철학의 차이

를 보여줍니다.

DDA는:

cleaner spectrum
precursor 중심 acquisition

에 강점이 있습니다.

반면 DIA는:

comprehensive acquisition
높은 reproducibility
강력한 quantitative capability

를 제공합니다.

하지만 DIA는 본질적으로:

복잡한 혼합 fragment spectrum

을 생성하기 때문에:

고해상도 질량분석기(HRMS)
chromatographic correlation
spectral deconvolution
AI 기반 peptide identification

이 필수적입니다.

즉 modern proteomics에서는:

instrument 성능만큼 software와 data interpretation이 중요 해지고 있습니다.

Spectra and workflow illustrations shown in this article were generated or adapted for educational purposes using Willy’s LCMS concepts and proteomics interpretation workflows.

→ LC-MS/MS 질량분석 전체 흐름 가이드 보기

DDA vs DIA Proteomics란 무엇인가?

LC-MS/MS 기반 단백체학에서 데이터 획득(Data Acquisition) 방식의 차이와 DIA Deconvolution의 핵심 원리

Proteomics에서 MS/MS Acquisition이 중요한 이유

Data-Dependent Acquisition (DDA)란 무엇인가?

DDA Workflow

DDA의 핵심 특징

DDA의 장점

1. 상대적으로 깨끗한 MS/MS Spectrum

2. Spectral Library 구축에 유리

3. Traditional Proteomics Workflow와 호환성 우수

4. PTM 분석에 강점

DDA의 한계점

1. Stochastic Sampling 문제

2. Missing Values 문제

3. Dynamic Range 한계

4. Run-to-Run Reproducibility 제한

DIA(Data-Independent Acquisition)란 무엇인가?

DIA Workflow

DIA의 핵심 특징

DIA의 장점

1. 높은 Reproducibility

2. Quantitative Proteomics에 매우 강함

3. Low-Abundance Peptide 검출 증가

4. Deep Proteome Coverage

DIA의 가장 큰 문제: 데이터 복잡도

Co-Fragmentation 문제

Chimeric Spectrum이란?

왜 DIA에서 Deconvolution이 중요한가?

DIA Deconvolution의 핵심 원리

1. Accurate Mass

2. Retention Time (RT)

3. Fragment Co-Elution

4. Spectral Library Matching

5. Isotope Pattern Consistency

DIA에서 고해상도 질량분석기(HRMS)가 중요한 이유

Orbitrap/QTOF가 중요한 이유

Resolution이 낮으면 발생하는 문제

DIA와 AI 기반 알고리즘

대표적인 DIA Software

DIA-NN

Spectronaut

OpenSWATH

Skyline

EncyclopeDIA

DIA 분석 Workflow

DDA vs DIA 비교표

DIA가 Proteomics 패러다임을 바꾸는 이유

Practical Troubleshooting: DIA 분석 시 주의할 점

1. Isolation Window Size

2. Spectral Library 품질

3. Retention Time Alignment

4. Mass Calibration

5. Chromatographic Separation

Conclusion

관련글 :