OSDI

USENIX OSDI paper using the official usenix style. Two-column, 13-page body + unlimited references, anonymous submission mode. Includes full systems paper layout with artifact availability statement per USENIX policy.

Category

Conference

License

Free to use (MIT)

File

osdi/main.tex

main.texRead-only preview
% USENIX OSDI paper template
\documentclass[letterpaper,twocolumn,10pt]{article}
\usepackage{usenix-2020-09}  % Current USENIX style file

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage{amsmath,amssymb}
\usepackage{booktabs}
\usepackage{url}
\usepackage[hidelinks]{hyperref}
\usepackage{listings}
\usepackage{xcolor}

\lstset{
  basicstyle=\ttfamily\footnotesize,
  columns=fullflexible,
  breaklines=true,
}

\begin{document}

\date{}

\title{\Large \bf Hyperion: A Zero-Downtime Schema Migration System\\
                   for Large-Scale OLTP Databases}

\author{
{\rm First Last}\\
University of Example
\and
{\rm Jane Doe}\\
Example Research Labs
\and
{\rm John Smith}\\
University of Example
}

\maketitle

\subsection*{Abstract}
Schema migrations on multi-TB OLTP databases frequently cause downtime
or inconsistency. Existing tools either lock tables, tolerate temporary
inconsistencies, or require application cooperation. We present
Hyperion, a schema migration system that decouples the physical and
logical schema and performs migrations in the background, providing
linearizable reads and writes throughout. Hyperion has been in
production at a large e-commerce company for nine months, migrating 314
tables totaling 47TB with zero rollbacks and a median runtime overhead
of 2.3\%. This paper describes Hyperion's design, the two-phase cutover
protocol that guarantees linearizability, and operational lessons from
its deployment.

\section{Introduction}
Modern applications demand both rapid schema evolution and continuous
availability. Online migration tools such as gh-ost and pt-online-schema-change
introduce seconds-to-minutes of reduced consistency during cutover.
These windows suffice for many applications but are unacceptable for
financial or safety-critical systems.

We analyze 42 months of incident reports across three organizations
and find that 11\% of SEV-1 incidents involve schema changes. Most of
these arise during the cutover phase of online migrations.

\paragraph{Contributions.}
\begin{itemize}
\item A design that decouples physical and logical schemas, allowing
  migration to proceed without exposing intermediate states.
\item A two-phase cutover protocol with proven linearizability under
  realistic failure models.
\item Production deployment results over nine months.
\end{itemize}

\section{Motivation}
We surveyed 42 months of production incident reports and classified
schema-migration-related failures into: cutover-phase inconsistencies
(62\%), tool crashes during long-running migrations (23\%), and
unexpected lock contention (15\%).

\section{Design}
\subsection{Logical-Physical Schema Split}
Hyperion introduces a logical-physical schema split. Writes go through
a rewriter that maintains both representations until cutover. Reads
always see the logical schema.

\subsection{Consistency Protocol}
Reads during migration see the logical schema; the underlying physical
representation is opaque. A two-phase cutover finalizes the switch. We
prove linearizability in Section~\ref{sec:proof}.

\subsection{Throughput Management}
Hyperion monitors replication lag and adaptively throttles background
migration work to keep replicas within 500ms of the primary.

\section{Correctness}\label{sec:proof}
\textbf{Theorem.} Under the standard crash-recovery model with reliable
per-shard logs, Hyperion's migration protocol is linearizable.

The proof decomposes the protocol into per-phase invariants and uses
standard shadowing arguments. Full details appear in Appendix~A.

\section{Implementation}
Hyperion is implemented as a sidecar, 18{,}000 lines of Go, with MySQL
8.0 and PostgreSQL 15 backends. The cutover coordinator is a separate
3{,}200-line service that uses Raft for leadership.

\section{Evaluation}
We evaluate Hyperion on a 10k-TPS benchmark workload modeled on an
e-commerce order-management system.

\begin{table}[t]
\centering
\small
\begin{tabular}{lcc}
\toprule
System & p50 OH (\%) & Downtime (s) \\
\midrule
gh-ost        & 4.6 & 18 \\
pt-osc        & 7.1 & 42 \\
\textbf{Hyperion} & \textbf{2.3} & \textbf{0} \\
\bottomrule
\end{tabular}
\caption{Migration overhead and cutover downtime at 10k TPS.}
\label{tab:main}
\end{table}

\subsection{Production Experience}
Hyperion has been running in production at a large e-commerce company
for nine months, migrating 314 tables totaling 47TB with zero rollbacks.

\section{Related Work}
Online schema change~\cite{ghost,ptosc}, versioned storage, multi-version
concurrency control.

\section{Conclusion}
Principled logical-physical schema separation enables truly zero-downtime
migrations at scale. Hyperion demonstrates the design point in production.

\section*{Availability}
The Hyperion implementation and evaluation scripts are available at
\url{https://github.com/example/hyperion}. The artifact has been
evaluated through the USENIX artifact evaluation process.

\section*{Acknowledgments}
We thank our OSDI shepherd, the anonymous reviewers, and the
operations team at Example Corp whose feedback shaped the system.

{\footnotesize \bibliographystyle{acm}
\bibliography{refs}}

\end{document}
Bibby Mascot

PDF Preview

Create an account to compile and preview

OSDI LaTeX Template | Free Download & Preview - Bibby