[논문] VideoMoCo, Contrastive Video Representation Learning with Temporally Adversarial Examples

Motivation

MoCo 구조를 비디오 도메인으로 확장

생략

Propose temporallly adversarial learning to improve the feature representation of the encoder
ConvLSTM을 통해 프레임 마스크를 출력 → Discriminator(encoder)를 통해
쿼리 피처와 프레임 피처를 출력 → 프레임이 같으면 0, 마스킹된 것은 차이가 최대
마스킹 프레임의 피처를 잘 배울 수 있도록 이 차이가 최대가 되도록 학습
Propose temporal decay to reduce the effect from historical keys in the memory queyes during contrastive learning
MoCo의 representation에 근거한 큐로부터 key의 기여도를 평가하기 때문에, temporal한 성질을 알 수 없음
큐 안의 키가 길수록, representaion들을 비교하기가 어려워짐 → 따라서 temporal decay를 도입 (Positional Encoding의 역할인가?)

생략

생략