MarineEVT: Advancing Event-Centric Marine Video Understanding via Visual Tool Reasoning

Tuan-An To1, Yuk-Kwan Wong1, Tuan-Anh Vu1,2, Zheng Ziqiang1, Sai-Kit Yeung1

1Hong Kong University of Science and Technology, 2 University of California, Los Angeles

The 19th European Conference on Computer Vision (ECCV) 2026

Paper Supplementary arXiv GitHub Hugging Face

Overview

Welcome to the project page for "Awesome Paper Title". In this work, we propose a novel method that significantly improves state-of-the-art performance on standard benchmarks while maintaining computational efficiency.

Teaser Image

Figure 1: Teaser image showing the qualitative results of our proposed method compared to previous baselines.

Dataset

To facilitate research in this domain, we introduce AwesomeDataset-10K. It consists of 10,000 high-resolution images with dense annotations.

Method

Our architecture leverages a dual-branch transformer design. The first branch extracts local spatial features, while the second branch captures global contextual dependencies.

Video 1: A short explanation of our dual-branch transformer architecture.

Results

We evaluate our method on three standard benchmarks. Below are the quantitative comparisons.

Method Dataset A (mAP) Dataset B (F1) Params (M)
Baseline (2023) 72.4 68.1 45.2
SOTA (2025) 78.9 74.5 120.5
Ours 84.2 81.3 48.1

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{author2026awesome,
  title={Awesome Paper Title: A Novel Approach to Something Great},
  author={One, Author and Two, Author and Three, Author},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}