Intitulé du sujet: Foundation Models for Time Series Analysis: Zero-shot Learning and Scaling Laws
Sujet
Codirection: avec: Ievgen REDKO (HDR)
Nombre de mois: 48 mois
Ecole Doctorale: ED 130 - Informatique, Telecommunications et Electronique de Paris
Unité de recherche et équipe:
LIPADE, diNo
Coordonnées de l’équipe:
Secteur: Sciences Physiques et Ingénierie / Physical sciences and Engineering
Langue attendue: Anglais
Niveau de langue attendu: B2
Description
Description du sujet:
Introduction
Time series analysis is a critical component of various domains, including finance, energy, healthcare, and climate science. Traditional methods, such as ARIMA, exponential smoothing, and specialized machine learning techniques like recurrent neural networks (RNNs) or transformers, require significant domain expertise for feature engineering and hyperparameter tuning. Recent advancements in foundation models — large, pre-trained neural networks capable of generalizing across tasks — offer a promising paradigm shift by enabling zero-shot learning, often associated with the emerging in-context learning capabilities.
Despite the ability of foundation models to generalize across diverse tasks without task-specific fine-tuning in natural language processing (NLP) and computer vision, their performance in time series analysis remained somewhat subpar with many specialized smaller models exceeding them. Similarly, the discovery of scaling laws—predictable relationships between model performance, size, and training data—has provided a theoretical backbone for their development in NLP, yet in time series analysis they remain largely unexplored. Time series data present unique challenges, including temporal dependencies, non-stationarities, and diverse structures across domains. Foundation models for time series often exhibit inconsistent performance, particularly in scenarios such as zero-shot generalization, where they are applied to unseen datasets. These scaling laws guide model architecture, data requirements, and computational resources, fostering a deeper understanding of generalization and efficiency.
This thesis aims to uncover scaling laws governing foundation models in time series analysis, focusing on fundamental questions:
- How do model size, data diversity, and task complexity influence generalization?
- Are there universal patterns, akin to those in NLP, that dictate the efficiency and accuracy of foundation models for time series?
- What architectural adaptations are necessary to align time series models with established scaling behaviors?
The research will provide a principled understanding of the relationship between scale, structure, and performance in time series foundation models, laying the groundwork for future advancements in this domain.
Related work
The discovery of scaling laws, as first detailed by [1], has been a cornerstone in understanding the performance of foundation models. Their work demonstrated that, for autoregressive language models, performance improves predictably with increases in model size, training data volume, and computational resources. This finding, later corroborated and expanded upon in other domains such as vision [2] and multimodal tasks [3], has solidified the principle that larger models, when trained on sufficiently diverse and extensive datasets, tend to achieve better generalization and downstream task performance. In NLP, these scaling laws have been especially consistent, with larger models like GPT-3 achieving state-of-the-art results across a variety of tasks, even in zero-shot or few-shot scenarios.
In contrast, many recently proposed foundation models for time series, including adaptations of transformers such as MORAI [4], TimeLLM [5], lag-Llama [6], have not demonstrated analogous scaling behavior in zero-shot settings. Despite their large parameter counts and access to extensive training datasets, these models often fail to outperform traditional baselines such as simple deep learning models based on MLPs [7], Mixers [8] or Transformers [9] when applied directly to unseen time series tasks. This discrepancy suggests a fundamental issue, either with the training data, which may lack the diversity or representativeness needed for generalization, or with the architectures, which may not adequately capture the unique characteristics of time series data.
This limitation in scaling behavior might explain why foundation models in time series have yet to consistently outperform standalone models trained individually for specific datasets. Unlike in NLP, where scaling has unlocked robust zero-shot capabilities, time series models frequently require fine-tuning or task-specific adjustments to match the performance of simpler, specialized methods. This raises critical questions about how to design foundation models that can generalize effectively across the heterogeneous and dynamic nature of time series tasks.
Research plan
Year 1: Establishing Baselines and Theoretical Foundations
Objectives:
- Conduct an extensive review of scaling laws in other domains (e.g., NLP, vision).
- Develop baseline experiments to evaluate the scalability of existing time series foundation models.
- Identify key factors influencing model performance, such as sequence length, dataset diversity, and temporal resolution.
Deliverables:
- A research paper questioning the scaling laws for time series analysis and pointing out the flaws of the current scaling with respect to the other domains and the expected performance benefits.
Year 2: Formulating and Testing Scaling Laws
Objectives:
- Conduct controlled experiments to quantify relationships between model size, training data volume, and generalization performance.
- Explore architectural modifications to enhance scalability (e.g., efficient attention mechanisms).
- Develop models with varying parameterizations to empirically validate scaling hypotheses.
Deliverables:
- Research paper on a model that abides by a meaningful scaling law with an open-source contribution of a new architecture and/or benchmark for evaluation.
Year 3: Scaling Law Optimization and Practical Implications
Objectives:
- Validate scaling laws across diverse real-world time series tasks (e.g., energy, healthcare, climate).
- Investigate trade-offs between scaling efficiency and robustness to domain-specific nuances.
- Provide recommendations for designing resource-efficient foundation models for time series.
Deliverables:
- Research paper on the series of trained models achieving a good performance on a chosen task. Contribution of the model to the open-source community.
Expected Impact
This research will contribute to the fundamental understanding of scaling laws in time series foundation models, analogous to their success in NLP. By focusing on the theoretical and empirical underpinnings of model generalization and efficiency, the work will establish a rigorous framework for scaling time series models.
All findings (including source code, datasets, and research results in the form of papers) will be open-sourced providing an invaluable basis for researchers and practitioners aiming to develop scalable and generalizable models while avoiding sensitive or domain-specific applications.
Compétences requises:
Applicant Profile
Master’s, or Engineering school student in applied mathematics / data science / computer science. Very good knowledge of C.Excellent Python skills, very good knowledge of deep learning frameworks (PyTorch/GPU, etc.) and libraries in data analysis workflow (NumPy, Matplotlib, etc.).
Références bibliographiques:
Bibliography
[1] Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
[2] Zhai, X., Kolesnikov, A., et al. (2021). Scaling vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Hoffmann, J., Borgeaud, S., et al. (2022). Training Compute-Optimal Large Language Models. arXiv preprint arXiv:2203.15556.
[4] Gerald Woo, Chenghao Liu, Akshat Kumar., et al. (2024). Unified Training of Universal Time Series Forecasting Transformers. Proceeding of the International conference on machine learning (ICML).
[5] Kashif Rasul, Arjun Ashok, Andrew Robert Williams et al. (2024). Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Proceeding of the International Conference on Machine Learning (ICML).
[6] Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu et al. (2024). Time-LLM: Time Series Forecasting by Reprogramming Large Language Models, Proceeding of the International Conference on Learning Representation (ICLR).
[7] Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu (2024). Are Transformers Effective for Time Series Forecasting? Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
[8] Si-An Chen, Chun-Liang Li, Sercan O Arik et al. (2023). TSMixer: An All-MLP Architecture for Time Series Forecasting. Transactions on Machine Learning Research (TMLR).
[9] Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov et al. (2024). Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention. Proceeding of the International Conference on Machine Learning (ICML).