% USENIX OSDI paper template
\documentclass[letterpaper,twocolumn,10pt]{article}
\usepackage{usenix-2020-09} % Current USENIX style file
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage{amsmath,amssymb}
\usepackage{booktabs}
\usepackage{url}
\usepackage[hidelinks]{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\lstset{
basicstyle=\ttfamily\footnotesize,
columns=fullflexible,
breaklines=true,
}
\begin{document}
\date{}
\title{\Large \bf Hyperion: A Zero-Downtime Schema Migration System\\
for Large-Scale OLTP Databases}
\author{
{\rm First Last}\\
University of Example
\and
{\rm Jane Doe}\\
Example Research Labs
\and
{\rm John Smith}\\
University of Example
}
\maketitle
\subsection*{Abstract}
Schema migrations on multi-TB OLTP databases frequently cause downtime
or inconsistency. Existing tools either lock tables, tolerate temporary
inconsistencies, or require application cooperation. We present
Hyperion, a schema migration system that decouples the physical and
logical schema and performs migrations in the background, providing
linearizable reads and writes throughout. Hyperion has been in
production at a large e-commerce company for nine months, migrating 314
tables totaling 47TB with zero rollbacks and a median runtime overhead
of 2.3\%. This paper describes Hyperion's design, the two-phase cutover
protocol that guarantees linearizability, and operational lessons from
its deployment.
\section{Introduction}
Modern applications demand both rapid schema evolution and continuous
availability. Online migration tools such as gh-ost and pt-online-schema-change
introduce seconds-to-minutes of reduced consistency during cutover.
These windows suffice for many applications but are unacceptable for
financial or safety-critical systems.
We analyze 42 months of incident reports across three organizations
and find that 11\% of SEV-1 incidents involve schema changes. Most of
these arise during the cutover phase of online migrations.
\paragraph{Contributions.}
\begin{itemize}
\item A design that decouples physical and logical schemas, allowing
migration to proceed without exposing intermediate states.
\item A two-phase cutover protocol with proven linearizability under
realistic failure models.
\item Production deployment results over nine months.
\end{itemize}
\section{Motivation}
We surveyed 42 months of production incident reports and classified
schema-migration-related failures into: cutover-phase inconsistencies
(62\%), tool crashes during long-running migrations (23\%), and
unexpected lock contention (15\%).
\section{Design}
\subsection{Logical-Physical Schema Split}
Hyperion introduces a logical-physical schema split. Writes go through
a rewriter that maintains both representations until cutover. Reads
always see the logical schema.
\subsection{Consistency Protocol}
Reads during migration see the logical schema; the underlying physical
representation is opaque. A two-phase cutover finalizes the switch. We
prove linearizability in Section~\ref{sec:proof}.
\subsection{Throughput Management}
Hyperion monitors replication lag and adaptively throttles background
migration work to keep replicas within 500ms of the primary.
\section{Correctness}\label{sec:proof}
\textbf{Theorem.} Under the standard crash-recovery model with reliable
per-shard logs, Hyperion's migration protocol is linearizable.
The proof decomposes the protocol into per-phase invariants and uses
standard shadowing arguments. Full details appear in Appendix~A.
\section{Implementation}
Hyperion is implemented as a sidecar, 18{,}000 lines of Go, with MySQL
8.0 and PostgreSQL 15 backends. The cutover coordinator is a separate
3{,}200-line service that uses Raft for leadership.
\section{Evaluation}
We evaluate Hyperion on a 10k-TPS benchmark workload modeled on an
e-commerce order-management system.
\begin{table}[t]
\centering
\small
\begin{tabular}{lcc}
\toprule
System & p50 OH (\%) & Downtime (s) \\
\midrule
gh-ost & 4.6 & 18 \\
pt-osc & 7.1 & 42 \\
\textbf{Hyperion} & \textbf{2.3} & \textbf{0} \\
\bottomrule
\end{tabular}
\caption{Migration overhead and cutover downtime at 10k TPS.}
\label{tab:main}
\end{table}
\subsection{Production Experience}
Hyperion has been running in production at a large e-commerce company
for nine months, migrating 314 tables totaling 47TB with zero rollbacks.
\section{Related Work}
Online schema change~\cite{ghost,ptosc}, versioned storage, multi-version
concurrency control.
\section{Conclusion}
Principled logical-physical schema separation enables truly zero-downtime
migrations at scale. Hyperion demonstrates the design point in production.
\section*{Availability}
The Hyperion implementation and evaluation scripts are available at
\url{https://github.com/example/hyperion}. The artifact has been
evaluated through the USENIX artifact evaluation process.
\section*{Acknowledgments}
We thank our OSDI shepherd, the anonymous reviewers, and the
operations team at Example Corp whose feedback shaped the system.
{\footnotesize \bibliographystyle{acm}
\bibliography{refs}}
\end{document}

PDF Preview
Create an account to compile and preview