ECCV paper using the official ECCV style on the Springer LNCS track. Single-column body, anonymous submission mode, LNCS-style references via splncs04, and ORCID link support. 14-page body + unlimited references is the standard ECCV limit.
eccv/main.tex
% This is samplepaper.tex, a sample chapter demonstrating the LLNCS + ECCV style.
\documentclass[runningheads]{llncs}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\usepackage{amsmath,amssymb}
\usepackage{booktabs}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{multirow}
\usepackage{array}
\usepackage[accsupp]{axessibility} % Improves PDF readability for screen readers
\usepackage{color}
% ECCV style toggle file. Swap to the current year's variant.
\usepackage{eccv}
% For anonymous submission this line is commented out; for camera-ready uncomment:
% \eccvfinalcopy
\def\eccvPaperID{7777} % *** Enter the ECCV paper ID here ***
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
\ifdefined\eccvfinalcopy
\else
\pagestyle{headings}
\mainmatter
\def\ECCVSubNumber{7777}
\fi
\begin{document}
\title{Self-Supervised Dense Correspondence via\\
Forward-Backward Cyclic Consistency}
\titlerunning{Self-Supervised Dense Correspondence}
\ifdefined\eccvfinalcopy
\author{First Last\inst{1}\orcidlink{0000-0000-0000-0000} \and
Jane Doe\inst{2}\orcidlink{0000-0000-0000-0000} \and
John Smith\inst{1}}
\authorrunning{F. Last et al.}
\institute{University of Example, City, Country\\
\email{[email protected]}
\and Example Research Labs, City, Country\\
\email{[email protected]}}
\else
\author{Anonymous ECCV submission\\\vspace{1em}Paper ID \eccvPaperID}
\authorrunning{ECCV-26 submission ID \eccvPaperID}
\institute{\email{Anonymous}}
\fi
\maketitle
\begin{abstract}
We propose a self-supervised method for learning dense pixel
correspondences across video frames by enforcing forward-backward
cyclic consistency. Given a pair of frames, we learn a bidirectional
flow field and penalize deviations from the identity under composition
of forward and backward maps. The method requires no labeled training
data and achieves state-of-the-art results among self-supervised
approaches on TAP-Vid-DAVIS, TAP-Vid-Kinetics, and YouTube-VOS,
improving average Jaccard by up to 9 points. We further show that
cyclic consistency is a strict generalization of prior photometric
losses, explaining the consistent gains.
\keywords{Dense correspondence \and Self-supervised learning \and
Video tracking \and Cycle consistency}
\end{abstract}
\section{Introduction}
Dense correspondence underpins optical flow, tracking, video editing,
and 3D reconstruction. Supervised methods rely on synthetic or
expensively annotated data; self-supervised methods are abundant but
have lagged in accuracy, particularly for long temporal horizons.
We revisit cyclic consistency, a well-known idea in computer vision,
and propose an especially strong form that enforces agreement across
non-adjacent frames via composition of intermediate flows. This yields
a dense self-supervised signal that scales with video length.
\paragraph{Contributions.} (1) A cyclic self-supervised objective
applicable to any dense-correspondence architecture. (2) Theoretical
analysis showing that standard photometric self-supervision is a strict
weakening of our objective. (3) State-of-the-art self-supervised
results on three long-horizon tracking benchmarks.
\section{Related Work}
Self-supervised optical flow~\cite{meister2018unflow,liu2020flow}, point
tracking~\cite{doersch2022tap}, and cycle consistency in vision are the
directly relevant prior work. Our contribution is the combination of
high-order cycles with modern dense-tracker architectures.
\section{Method}
\subsection{Cyclic Consistency Loss}
Given a frame pair $(I_t, I_{t+1})$, we compute a forward map $F$ and
backward map $B$. Per-pixel consistency is enforced as
\begin{equation}
\mathcal{L}_{\text{cyc}} = \mathbb{E}_{x} \big\| B(F(x)) - x \big\|^2.
\label{eq:cyc}
\end{equation}
For frame triples we enforce a three-hop cycle:
$\mathcal{L}_{\text{cyc}}^{(3)} = \mathbb{E}_x \| B \circ B \circ F \circ F \circ F(x) - x \|^2$.
\subsection{Photometric Loss}
As a secondary signal, we use census-transform photometric matching
across warped frames.
\subsection{Architecture}
We build on RAFT~\cite{teed2020raft}, adding cyclic supervision on top
of its iterative refinement.
\section{Experiments}
\subsection{Setup}
We pretrain on an unlabeled 20k-video subset of Kinetics and fine-tune
on each benchmark's training split.
\begin{table}[t]
\centering
\small
\begin{tabular}{lcc}
\toprule
Method & DAVIS (AJ) & Kinetics (AJ) \\
\midrule
DINO-track~\cite{caron2021dino} & 32.4 & 27.8 \\
RAFT (self-supervised) & 38.1 & 31.2 \\
\textbf{Ours} & \textbf{47.6} & \textbf{40.3} \\
\bottomrule
\end{tabular}
\caption{Average Jaccard on TAP-Vid benchmarks.}
\label{tab:tapvid}
\end{table}
\subsection{Ablations}
Removing the higher-order cycle hurts tracking by 11.3 AJ points.
Removing the photometric secondary signal costs 2.7 points.
\section{Discussion}
The method is self-supervised but is not invariant to occlusion: when
a point is occluded, the cycle deviates. We handle this with a learned
visibility mask, introduced in Section~3.3 of the supplementary material.
\section{Conclusion}
High-order cyclic consistency is a simple and powerful self-supervised
signal for learning dense correspondence over long horizons.
\subsubsection{Acknowledgements.} We thank our anonymous reviewers
and colleagues at Example Research Labs.
\bibliographystyle{splncs04}
\bibliography{refs}
\end{document}

PDF Preview
Create an account to compile and preview