Publications

2026
Neural Prostheses for Communication Restoration: Algorithmic, Systemic, and Clinical Perspectives
Mohammad Ali Shaeri*, Amir Hossein Yari*, Jinhan Liu, Mahsa Shoaran
đź“„ Paper
â–¸ Abstract
Brain-computer interfaces (BCIs) are transforming the lives of individuals with severe motor and speech impairments by restoring communication or movement control through direct decoding of neural activity. This review provides a comprehensive overview of communication BCIs, covering cursor&click, typing, handwriting, speech, and emotion decoding applications from algorithmic, system-level, and clinical perspectives. Recent advances in machine learning have improved the decoding of both semantic tokens (e.g., words) and linguistic primitives (e.g., letters), enabling context-aware reconstruction of coherent text and speech. Current BCI systems, typically implemented on bulky rack-mounted or bench-top platforms, can synthesize speech, handwriting, or typing at roughly half the speed of natural conversation. Future-generation BCIs, however, are being developed as implantable systems for safe and convenient everyday use. Yet, key challenges remain for real-world deployment, including safety, reliability, portability, user-friendliness, and naturalness, along with ethical considerations and societal implications. Addressing these challenges requires careful attention to patient-centered factors, such as target populations, task paradigms, and implantation sites, which guide translational development. Looking ahead, improving model adaptability, cross-user generalization, and hardware efficiency will be essential for realizing practical, scalable, and fully embodied neural prostheses.
@article{shaeri2026neural,
  title={Neural Prostheses for Communication Restoration: Algorithmic, Systemic, and Clinical Perspectives},
  author={Shaeri, MohammadAli and Yari, Amir Hossein and Liu, Jinhan and Shoaran, Mahsa},
  year={2026},
  publisher={TechRxiv}
}
AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
Amir Hossein Yari, Fajri Koto
â–¸ Abstract
Despite its effectiveness for post-training reasoning models, group relative policy optimization (GRPO) suffers from several key limitations. Specifically, it introduces response-level length bias, weakly penalizes low-quality trajectories, and reduces rich intra-group preference information to a simplified scalar reward, limiting direct pairwise comparison between reasoning trajectories. To address these issues, we propose AMIR-GRPO, which augments GRPO with an implicit DPO-style contrastive regularizer constructed directly from intra-group reward rankings, without requiring additional annotations. This mechanism strengthens suppression of low-reward trajectories, mitigates length bias, and converts each rollout group into a denser set of supervision constraints. Across multiple mathematical reasoning benchmarks, AMIR-GRPO consistently outperforms strong GRPO baselines, achieves clearer separation between correct and incorrect reasoning chains, and delivers broader coverage beyond instances solved by standard GRPO.
@article{yari2026amir,
  title={AMIR-GRPO: Inducing Implicit Preference Signals into GRPO},
  author={Yari, Amir Hossein and Koto, Fajri},
  journal={arXiv preprint arXiv:2601.03661},
  year={2026}
}
Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
Saeed Almheiri, Bilal Elbouardi, ..., Amir Hossein Yari, ..., Fajri Koto Saeed Almheiri, Bilal Elbouardi, Salsabila Zahirah Pranida, Irina Nikishina, Ashwath Rao B, Parameswari Krishnamurthy, Muhammad Cendekia Airlangga, Rifo Ahmad Genadi, Nguyễn Phan Gia Bảo, Amir Hossein Yari, Hawau Olamide Toyin, Nurdaulet Mukhituly, Mena Attia, Besher Hassan, Ahmad Fathan Hidayatullah, Tatsuki Kuribayashi, Haonan Li, Suma Bhat, Fajri Koto
ACL 2026 San Diego, CA, USA
â–¸ Abstract
Idiomatic expressions pose a major challenge for multilingual NLP because their meanings shift between figurative and literal usage, often requiring context for accurate interpretation. Prior work has focused on high-resource languages typically evaluates isolated idiom-meaning questions, overlooking realistic discourse. We introduce MIDI, a multilingual idiom dataset spanning 3 high-, 3 medium-, and 12 low-resource languages, curated by native speakers. Unlike previous datasets, MIDI provides idioms embedded in both sentence-level and conversational contexts, capturing both literal and figurative readings. Benchmarking state-of-the-art models shows that idiom comprehension degrades in low-resource languages and that, in all resource tiers, literal interpretations are substantially harder than figurative ones. Conversational context improves performance but does not eliminate these disparities. Through controlled tests and interventions on hidden representations, we further separate memorization from reasoning, exposing core limitations of current models.


Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
Amir Hossein Yari, Kalmit Kulkarni, Ahmad Raza Khan, Fajri Koto
ACL 2026 San Diego, CA, USA
â–¸ Abstract
While automatic metrics drive progress in Machine Translation (MT) and Text Summarization (TS), existing metrics have been developed and validated almost exclusively for English and other high-resource languages. This narrow focus leaves Indian languages—spoken by over 1.5 billion people—largely overlooked, casting doubt on the universality of current evaluation practices. To address this gap, we introduce ITEM, a large-scale benchmark that systematically evaluates the alignment of 29 automatic metrics with human judgments across six major Indian languages, enriched with fine-grained annotations. Our extensive evaluation—covering agreement with human judgments, sensitivity to outliers, language-specific reliability, inter-metric correlations, and resilience to controlled perturbations—reveals four central findings: (1) LLM-based evaluators show the strongest alignment with human judgments at both segment and system levels; (2) outliers exert a significant impact on metric-human agreement; (3) in TS, metrics are more effective at capturing content fidelity, whereas in MT, they better reflect fluency; and (4) metrics differ in their robustness and sensitivity when subjected to diverse perturbations. Collectively, these findings offer critical guidance for advancing metric design and evaluation in Indian languages.
@article{yari2025revisiting,
  title={Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages},
  author={Yari, Amir Hossein and Kulkarni, Kalmit and Khan, Ahmad Raza and Koto, Fajri},
  journal={arXiv preprint arXiv:2510.07061},
  year={2025}
}
2025
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
Amir Hossein Yari, Fajri Koto
ACL 2025 Vienna, Austria
â–¸ Abstract
Despite the impressive performance of multilingual large language models (mLLMs) in various natural language processing tasks, their ability to understand procedural texts, particularly those with culture-specific content, remains largely unexplored. Texts describing cultural procedures, including rituals, traditional craftsmanship, and social etiquette, require an inherent understanding of cultural context, presenting a significant challenge for mLLMs. In this work, we introduce CAPTex a benchmark designed to evaluate mLLMs’ ability to process and reason over culturally diverse procedural texts in multiple languages. Using a range of evaluation methods, we find that (1) mLLMs struggle with culturally contextualized procedural content, particularly in low-resource languages; (2) performance varies across cultural domains, with some proving more difficult than others; and (3) models perform better on multiple-choice tasks presented in conversational formats than on direct questions. These results highlight the current limitations of mLLMs and emphasize the need for culturally informed benchmarks like CAPTex to support more accurate and inclusive language understanding.
@inproceedings{yari-koto-2025-unveiling,
    title = "Unveiling Cultural Blind Spots: Analyzing the Limitations of m{LLM}s in Procedural Text Comprehension",
    author = "Yari, Amir Hossein  and
      Koto, Fajri",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.987/",
    doi = "10.18653/v1/2025.acl-long.987",
    pages = "20151--20170",
    ISBN = "979-8-89176-251-0",
}