Templates

ACL 2026

Preview

ACL 2026

ACL 2026 NLP conference paper template with two-column format

Category

Conference

License

Free to use (MIT)

File

acl-2026/main.tex

main.texRead-only preview
\documentclass[11pt,twocolumn]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[margin=0.75in]{geometry}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{enumitem}
\usepackage{xcolor}
\usepackage{times}
\usepackage{natbib}
\usepackage{hyperref}
\usepackage{url}
\usepackage{microtype}
\usepackage{caption}

\setlength{\columnsep}{0.25in}

\title{\Large\bfseries Bridging Syntax and Semantics: Cross-Lingual Transfer Learning for Low-Resource Named Entity Recognition}

\author{
  \textbf{Anika Patel}\textsuperscript{1}\quad
  \textbf{James O'Brien}\textsuperscript{1}\quad
  \textbf{Wei-Lin Chen}\textsuperscript{2}\quad
  \textbf{Fatima Al-Rashid}\textsuperscript{1}\\[4pt]
  \textsuperscript{1}Department of Computer Science, Stanford University\\
  \textsuperscript{2}Language Technology Institute, Carnegie Mellon University\\[2pt]
  {\small\texttt{\{apatel,jobrien,falrashid\}@stanford.edu, [email protected]}}
}

\date{}

\begin{document}
\maketitle

\begin{abstract}
Cross-lingual transfer learning has shown remarkable promise for extending NLP capabilities to low-resource languages. However, existing approaches often fail to account for syntactic divergences between source and target languages, leading to degraded performance on structurally dissimilar language pairs. In this paper, we propose \textsc{SynSem-Transfer}, a novel framework that explicitly models both syntactic and semantic alignment during cross-lingual transfer for named entity recognition (NER). Our approach leverages universal dependency structures as a language-agnostic bridge, combined with a dual-encoder architecture that disentangles syntactic and semantic representations. Experiments on 12 typologically diverse languages demonstrate that \textsc{SynSem-Transfer} achieves an average F1 improvement of 4.7 points over strong baselines, with gains of up to 8.2 F1 on agglutinative languages. We further show that our syntactic alignment component provides interpretable insights into cross-lingual transfer dynamics.
\end{abstract}

\section{Introduction}

Named entity recognition (NER) is a fundamental task in natural language processing, serving as a building block for information extraction, question answering, and knowledge graph construction. While NER systems for high-resource languages such as English have achieved near-human performance \citep{devlin2019bert}, extending these capabilities to the world's 7,000+ languages remains a significant challenge.

Cross-lingual transfer learning offers a compelling solution: train on labeled data in a high-resource \emph{source} language and deploy to a low-resource \emph{target} language. Multilingual pretrained models like mBERT \citep{devlin2019bert} and XLM-R \citep{conneau2020xlmr} have made zero-shot cross-lingual transfer surprisingly effective. However, a persistent gap remains between transfer performance and fully supervised baselines, particularly for typologically distant language pairs.

We identify a key limitation of current approaches: they rely primarily on lexical and semantic alignment through shared multilingual representations, while largely ignoring \emph{syntactic} divergences between languages. For NER specifically, entity boundaries and contextual cues are deeply intertwined with syntactic structure. Languages with different word orders, case systems, or morphological complexity present fundamentally different structural patterns that semantic similarity alone cannot bridge.

In this paper, we make the following contributions:
\begin{itemize}[nosep,leftmargin=*]
  \item We propose \textsc{SynSem-Transfer}, a dual-encoder framework that explicitly models syntactic and semantic alignment for cross-lingual NER transfer.
  \item We introduce a universal dependency--based syntactic bridge that creates language-agnostic structural representations.
  \item We demonstrate state-of-the-art results on the WikiAnn benchmark across 12 typologically diverse target languages.
  \item We provide interpretable analysis of when and why syntactic alignment helps cross-lingual transfer.
\end{itemize}

\section{Related Work}

\paragraph{Cross-lingual NER.}
Early work on cross-lingual NER relied on parallel corpora and word-level alignment \citep{yarowsky2001inducing}. Annotation projection methods \citep{ni2017weakly} transfer labels through word alignments in parallel text, but require aligned corpora that may not exist for truly low-resource languages. More recently, zero-shot approaches using multilingual pretrained models have become dominant \citep{wu2020enhanced}.

\paragraph{Syntactic Transfer.}
The role of syntax in cross-lingual transfer has been explored primarily in dependency parsing \citep{mcdonald2011multi}. \citet{ahmad2019difficulties} showed that syntactic features can improve cross-lingual NER when source and target languages share structural properties. Our work differs in explicitly modeling syntactic alignment as a learnable component rather than relying on feature engineering.

\paragraph{Representation Disentanglement.}
Disentangling different aspects of linguistic representation has gained attention in multilingual NLP. \citet{chi2020finding} demonstrated that multilingual BERT encodes syntactic information in identifiable subspaces. We build on this insight by designing an architecture that explicitly separates and leverages syntactic and semantic streams.

\section{Methodology}

\subsection{Problem Formulation}

Let $\mathcal{D}_s = \{(\mathbf{x}_i^s, \mathbf{y}_i^s)\}_{i=1}^{N_s}$ denote labeled NER data in the source language and $\mathcal{D}_t = \{\mathbf{x}_j^t\}_{j=1}^{N_t}$ unlabeled text in the target language. Our goal is to learn a model $f_\theta$ using $\mathcal{D}_s$ that generalizes to $\mathcal{D}_t$.

\subsection{Dual-Encoder Architecture}

Our model consists of three components:

\paragraph{Shared Backbone.} We use XLM-R \citep{conneau2020xlmr} as the shared multilingual encoder, producing contextual representations $\mathbf{H} = \text{XLM-R}(\mathbf{x}) \in \mathbb{R}^{n \times d}$ for an input sequence of $n$ tokens.

\paragraph{Syntactic Encoder.} We extract universal dependency parse trees for both source and target language inputs using UDPipe \citep{straka2018udpipe}. The syntactic encoder applies graph attention over the dependency tree:
\begin{equation}
  \mathbf{h}_i^{\text{syn}} = \sum_{j \in \mathcal{N}(i)} \alpha_{ij} \mathbf{W}_r \mathbf{h}_j + \mathbf{b}_r
\end{equation}
where $\mathcal{N}(i)$ is the set of syntactic neighbors of token $i$, $r$ is the dependency relation type, and $\alpha_{ij}$ are attention weights computed as:
\begin{equation}
  \alpha_{ij} = \frac{\exp(\text{LeakyReLU}(\mathbf{a}^T[\mathbf{W}\mathbf{h}_i \| \mathbf{W}\mathbf{h}_j]))}{\sum_{k \in \mathcal{N}(i)}\exp(\text{LeakyReLU}(\mathbf{a}^T[\mathbf{W}\mathbf{h}_i \| \mathbf{W}\mathbf{h}_k]))}
\end{equation}

\paragraph{Semantic Encoder.} A transformer layer refines the backbone representations with self-attention focused on semantic content:
\begin{equation}
  \mathbf{H}^{\text{sem}} = \text{TransformerLayer}(\mathbf{H})
\end{equation}

The final representation fuses both streams:
\begin{equation}
  \mathbf{h}_i^{\text{final}} = \mathbf{W}_f [\mathbf{h}_i^{\text{syn}} \| \mathbf{h}_i^{\text{sem}}] + \mathbf{b}_f
\end{equation}

\subsection{Syntactic Alignment Loss}

To encourage language-agnostic syntactic representations, we introduce an alignment loss that minimizes the distance between syntactic subspaces of source and target languages:
\begin{equation}
  \mathcal{L}_{\text{align}} = \sum_{(i,j) \in \mathcal{A}} \| \mathbf{h}_i^{\text{syn},s} - \mathbf{h}_j^{\text{syn},t} \|_2^2
\end{equation}
where $\mathcal{A}$ represents aligned token pairs obtained through parallel data or dictionary-based alignment.

\subsection{Training Objective}

The total loss combines the NER objective with the alignment regularizer:
\begin{equation}
  \mathcal{L} = \mathcal{L}_{\text{NER}}(\mathcal{D}_s) + \lambda \mathcal{L}_{\text{align}}
\end{equation}
where $\mathcal{L}_{\text{NER}}$ is the standard CRF loss for sequence labeling and $\lambda$ controls the alignment strength.

\section{Experiments}

\subsection{Setup}

We evaluate on the WikiAnn NER dataset \citep{pan2017cross}, using English as the source language. We select 12 typologically diverse target languages spanning 6 language families: Germanic (de, nl), Romance (es, fr), Slavic (ru, pl), Semitic (ar), Turkic (tr), Sino-Tibetan (zh), Japonic (ja), Dravidian (ta), and Uralic (fi).

\paragraph{Baselines.} We compare against: (1) XLM-R zero-shot, (2) XLM-R with translate-train, (3) UniTrans \citep{li2020unitrans}, and (4) XTREME benchmark results.

\paragraph{Implementation.} We use XLM-R\textsubscript{base} (270M parameters) as the backbone. The syntactic encoder has 2 GAT layers with 8 attention heads. We train for 20 epochs with AdamW ($\text{lr} = 2\!\times\!10^{-5}$, warmup over 10\% of steps). Alignment weight $\lambda = 0.1$ was tuned on a development set.

\subsection{Results}

\begin{table}[t]
\centering
\caption{F1 scores on WikiAnn NER. Best results in \textbf{bold}.}
\label{tab:results}
\small
\begin{tabular}{@{}lcccc@{}}
\toprule
\textbf{Lang} & \textbf{XLM-R} & \textbf{UniTrans} & \textbf{Ours} & $\Delta$ \\
\midrule
de & 79.2 & 81.5 & \textbf{83.1} & +1.6 \\
nl & 80.8 & 82.3 & \textbf{84.0} & +1.7 \\
es & 76.4 & 79.1 & \textbf{81.3} & +2.2 \\
fr & 78.6 & 80.7 & \textbf{82.9} & +2.2 \\
ru & 68.3 & 71.8 & \textbf{75.4} & +3.6 \\
pl & 72.1 & 75.0 & \textbf{78.2} & +3.2 \\
ar & 52.6 & 56.3 & \textbf{61.7} & +5.4 \\
tr & 64.8 & 68.2 & \textbf{76.4} & +8.2 \\
zh & 48.7 & 52.4 & \textbf{56.1} & +3.7 \\
ja & 35.2 & 39.6 & \textbf{44.8} & +5.2 \\
ta & 58.3 & 62.1 & \textbf{68.5} & +6.4 \\
fi & 67.5 & 70.8 & \textbf{77.3} & +6.5 \\
\midrule
\textbf{Avg} & 65.2 & 68.3 & \textbf{74.1} & +4.7 \\
\bottomrule
\end{tabular}
\end{table}

Table~\ref{tab:results} presents the main results. \textsc{SynSem-Transfer} outperforms all baselines on every language, with an average improvement of 4.7 F1 over the strongest baseline. Notably, the largest gains occur on agglutinative (Turkish: +8.2, Finnish: +6.5) and morphologically rich languages (Tamil: +6.4, Arabic: +5.4), confirming our hypothesis that syntactic alignment is most beneficial when source and target languages diverge structurally.

\section{Analysis}

\subsection{Ablation Study}

We conduct ablations by removing components individually: (1) without syntactic encoder ($-$3.1 avg F1), (2) without alignment loss ($-$1.8 avg F1), (3) without semantic encoder ($-$4.5 avg F1). The semantic encoder contributes most, but the syntactic components provide complementary gains that are especially pronounced for distant languages.

\subsection{Syntactic Distance Correlation}

We compute syntactic distance between English and each target language using dependency tree edit distance on parallel sentences. We observe a strong positive correlation ($r = 0.83$, $p < 0.01$) between syntactic distance and the improvement gained from our syntactic alignment component. This confirms that explicit syntactic modeling is most valuable precisely when it is most needed.

\section{Conclusion}

We presented \textsc{SynSem-Transfer}, a framework that bridges syntactic and semantic representations for cross-lingual NER transfer. By explicitly modeling syntactic alignment through universal dependency structures, our approach achieves state-of-the-art results on 12 typologically diverse languages. The largest improvements occur on structurally distant languages, demonstrating that syntactic alignment addresses a genuine gap in existing multilingual representations. Future work will explore extending this framework to other structured prediction tasks and incorporating morphological features.

\section*{Limitations}

Our approach requires dependency parsers for target languages, which may not be available for the lowest-resource languages. The quality of UDPipe parsers varies across languages and may introduce noise. Additionally, our experiments use a single source language (English); multi-source transfer remains unexplored.

\section*{Ethics Statement}

Our work aims to improve NLP capabilities for underrepresented languages, promoting linguistic inclusivity. The WikiAnn dataset is derived from Wikipedia and may reflect biases present in that resource. We do not foresee direct negative societal impacts from this research.

\bibliographystyle{plainnat}
\begin{thebibliography}{20}
\bibitem[Ahmad et~al.(2019)]{ahmad2019difficulties}
W.~Ahmad, Z.~Zhang, X.~Ma, E.~Hovy, K.~Chang, and N.~Peng.
\newblock On difficulties of cross-lingual transfer with order differences.
\newblock In \emph{Proc.\ NAACL-HLT}, 2019.

\bibitem[Chi et~al.(2020)]{chi2020finding}
E.~Chi, J.~Hewitt, and C.~Manning.
\newblock Finding universal grammatical relations in multilingual {BERT}.
\newblock In \emph{Proc.\ ACL}, 2020.

\bibitem[Conneau et~al.(2020)]{conneau2020xlmr}
A.~Conneau, K.~Khandelwal, N.~Goyal, V.~Chaudhary, G.~Wenzek, F.~Guzm{\'a}n, E.~Grave, M.~Ott, L.~Zettlemoyer, and V.~Stoyanov.
\newblock Unsupervised cross-lingual representation learning at scale.
\newblock In \emph{Proc.\ ACL}, 2020.

\bibitem[Devlin et~al.(2019)]{devlin2019bert}
J.~Devlin, M.-W. Chang, K.~Lee, and K.~Toutanova.
\newblock {BERT}: Pre-training of deep bidirectional transformers for language understanding.
\newblock In \emph{Proc.\ NAACL-HLT}, 2019.

\bibitem[Li et~al.(2020)]{li2020unitrans}
Q.~Li, H.~Ji, and L.~Huang.
\newblock {UniTrans}: Unifying model transfer and data transfer for cross-lingual named entity recognition.
\newblock In \emph{Proc.\ IJCAI}, 2020.

\bibitem[McDonald et~al.(2011)]{mcdonald2011multi}
R.~McDonald, S.~Petrov, and K.~Hall.
\newblock Multi-source transfer of delexicalized dependency parsers.
\newblock In \emph{Proc.\ EMNLP}, 2011.

\bibitem[Ni et~al.(2017)]{ni2017weakly}
J.~Ni, G.~Dinu, and R.~Florian.
\newblock Weakly supervised cross-lingual named entity recognition via effective annotation and representation projection.
\newblock In \emph{Proc.\ ACL}, 2017.

\bibitem[Pan et~al.(2017)]{pan2017cross}
X.~Pan, B.~Zhang, J.~May, J.~Nothman, K.~Knight, and H.~Ji.
\newblock Cross-lingual name tagging and linking for 282 languages.
\newblock In \emph{Proc.\ ACL}, 2017.

\bibitem[Straka(2018)]{straka2018udpipe}
M.~Straka.
\newblock {UDPipe} 2.0 prototype at {CoNLL} 2018 {UD} shared task.
\newblock In \emph{Proc.\ CoNLL Shared Task}, 2018.

\bibitem[Yarowsky et~al.(2001)]{yarowsky2001inducing}
D.~Yarowsky, G.~Ngai, and R.~Wicentowski.
\newblock Inducing multilingual text analysis tools via robust projection across aligned corpora.
\newblock In \emph{Proc.\ HLT}, 2001.

\bibitem[Wu and Dredze(2020)]{wu2020enhanced}
S.~Wu and M.~Dredze.
\newblock Are all languages created equal in multilingual {BERT}?
\newblock In \emph{Proc.\ RepL4NLP Workshop at ACL}, 2020.
\end{thebibliography}

\end{document}
Bibby Mascot

PDF Preview

Create an account to compile and preview

ACL 2026 LaTeX Template | Free Download & Preview - Bibby