[논문] EDNet: Efficient Disparity Estimation with Cost Volume Combination and Attention-based Spatial Residual

1. Motivation

Disparity(시차) 추정은 stereo 이미지로부터 depth를 계산하는 핵심 과제이다. 기존의 cost volume 기반 방법들은 높은 정확도를 달성하지만 계산 비용이 크다는 한계가 있다. EDNet은 Cost Volume Combination과 Attention-based Spatial Residual을 결합하여 효율적인 disparity 추정을 목표로 한다.

(이 논문은 원문 노트가 미완성 상태로, 아래 내용은 논문 제목과 컨텍스트를 기반으로 재구성한 것이다.)

PSMNet, GwcNet: 3D convolution 기반의 cost volume 처리로 높은 정확도를 달성하지만 연산량이 많음
DispNet: Encoder-decoder 기반의 빠른 disparity 추정. 정확도보다 속도를 우선시
Attention in Stereo Matching: 최근 self-attention을 stereo matching에 적용하여 non-local context를 활용하는 연구들이 증가

3. Proposed Method

Cost Volume Combination

단일 resolution의 cost volume 대신, 여러 scale의 cost volume을 조합하여 계산 효율성과 정확도를 동시에 개선한다. Cost volume을 조합함으로써 다양한 disparity range와 세밀도에서의 매칭 정보를 통합한다.

Attention-based Spatial Residual

Attention mechanism을 활용하여 disparity map의 spatial residual을 추정한다. Context-aware한 방식으로 local 정보와 non-local 정보를 결합하여 경계선 주변이나 반복 패턴이 있는 영역에서의 정확도를 높인다.

전체 구조

효율적인 연산을 위해 경량화된 backbone과 결합한 구조를 사용하며, 실시간 또는 near-realtime 수행을 목표로 설계된다.

4. Experiments

KITTI Stereo 벤치마크에서 기존 방법들과 속도-정확도 trade-off를 비교 평가한다. Cost volume combination과 attention-based residual 각각의 기여도를 ablation study로 검증한다.

5. Conclusion & Limitation

Cost Volume Combination과 Attention-based Spatial Residual을 결합한 EDNet은 정확도를 크게 희생하지 않으면서 효율적인 disparity 추정을 달성하는 것을 목표로 한다.

(원문 노트가 미완성 상태여서 상세한 방법론 및 실험 결과 분석이 제한적이다. 추후 논문 원문을 참조하여 보완이 필요하다.)

1. Motivation#

2. Related Work#

3. Proposed Method#

Cost Volume Combination#

Attention-based Spatial Residual#

전체 구조#

4. Experiments#

5. Conclusion & Limitation#