2025-11-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference	Lei Zhao, Luca Buonanno, Archit Gajjar, John Moon, Aishwarya Natarajan, Sergey Serebryakov, Ron M. Roth, Xia Sheng, Youtao Zhang, Paolo Faraboschi, Jim Ignowski, Giacomo Pedretti	2025-11-17	下载	Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations...
QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention	Hyunwoo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Wenjun Huang, Tamoghno Das, Suyeon Jang, Mohsen Imani	2025-11-17	下载	Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity.
T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization	Hyunwoo Oh, KyungIn Nam, Rajat Bhattacharjya, Hanning Chen, Tamoghno Das, Sanggeon Yun, Suyeon Jang, Andrew Ding, Nikil Dutt, Mohsen Imani	2025-11-17	下载	Recent advances in LLMs have outpaced the computational and memory capacities of edge platforms that primarily employ CPUs, thereby challenging efficient and scalable deployment.
Coliseum project: Correlating climate change data with the behavior of heritage materials	A Cormier, David Roqui, Fabrice Surma, Martin Labouré, Jean-Marc Vallet, Odile Guillon, N Grozavu, Ann Bourgès	2025-11-17	下载	Heritage materials are already affected by climate change, and increasing climatic variations reduces the lifespan of monuments. As weathering depends on many factors, it is also difficult to link its...
Assessing Large Language Models in Generating RTL Design Specifications	Hung-Ming Huang, Yu-Hsin Yang, Fu-Chieh Chang, Yun-Chia Hsu, Yin-Yu Lin, Ming-Fang Tsai, Chun-Chih Yang, Pei-Yuan Wu	2025-11-17	下载	As IC design grows more complex, automating comprehension and documentation of RTL code has become increasingly important. Engineers currently should manually interpret existing RTL code and write spe...
Think with Self-Decoupling and Self-Verification: Automated RTL Design with Backtrack-ToT	Zhiteng Chao, Yonghao Wang, Xinyu Zhang, Jiaxin Zhou, Tenghui Hua, Husheng Han, Tianmeng Yang, Jianan Mu, Bei Yu, Rui Zhang, Jing Ye, Huawei Li	2025-11-17	下载	Large language models (LLMs) hold promise for automating integrated circuit (IC) engineering using register transfer level (RTL) hardware description languages (HDLs) like Verilog.
Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration	Changhun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, Jongse Park	2025-11-17	下载	3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences.
Dissecting and Re-architecting 3D NAND Flash PIM Arrays for Efficient Single-Batch Token Generation in LLMs	Yongjoo Jang, Sangwoo Hwang, Hojin Lee, Sangwoo Jung, Donghun Lee, Wonbo Shim, Jaeha Kung	2025-11-17	下载	The advancement of large language models has led to models with billions of parameters, significantly increasing memory and compute demands. Serving such models on conventional hardware is challenging...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels	Stuart H. Sul, Simran Arora, Benjamin F. Spector, Christopher Ré	2025-11-17	下载	Inter-GPU communication has become a major bottleneck for modern AI workloads as models scale and improvements in hardware compute throughput outpace improvements in interconnect bandwidth.
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI	Saicharan Kolluru	2025-11-17	下载	The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization.
Asymptotic analysis of cooperative censoring policies in sensor networks	Jesus Fernandez-Bes, Rocío Arroyo-Valles, Jesús Cid-Sueiro	2025-11-17	下载	The problem of cooperative data censoring in battery-powered multihop sensor networks is analyzed in this paper. We are interested in scenarios where nodes generate messages (which are related to the ...
Do MPI Derived Datatypes Actually Help? A Single-Node Cross-Implementation Study on Shared-Memory Communication	Temitayo Adefemi	2025-11-17	下载	MPI's derived datatypes (DDTs) promise easier, copy-free communication of non-contiguous data, yet their practical performance remains debated and is often reported only for a single MPI stack.
InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference	Ruijun Deng, Zhihui Lu, Qiang Duan	2025-11-17	下载	Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can recover the ...
Distributed Hierarchical Machine Learning for Joint Resource Allocation and Slice Selection in In-Network Edge Systems	Sulaiman Muhammad Rashid, Ibrahim Aliyu, Jaehyung Park, Jinsul Kim	2025-11-17	下载	The Metaverse promises immersive, real-time experiences; however, meeting its stringent latency and resource demands remains a major challenge.
Pico-Cloud: Cloud Infrastructure for Tiny Edge Devices	Mordechai Guri	2025-11-17	下载	This paper introduces the Pico-Cloud, a micro-edge cloud architecture built on ultra-minimal hardware platforms such as the Raspberry Pi Zero and comparable single-board computers.
Learning Process Energy Profiles from Node-Level Power Data	Jonathan Bader, Julius Irion, Jannis Kappel, Joel Witzke, Niklas Fomin, Diellza Sherifi, Odej Kao	2025-11-17	下载	The growing demand for data center capacity, driven by the growth of high-performance computing, cloud computing, and especially artificial intelligence, has led to a sharp increase in data center ene...
MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity	Vladimír Macko, Vladimír Boža	2025-11-17	下载	Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in the inference of sparse Large Language Models (LLMs). Because existing SpMV methods perform poorly under the low and unstructur...
On the Fundamental Limits of LLMs at Scale	Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zeeshan Memon, Muhammad Ibtsaam Qadir, Sagnik Bhattacharya, Hassan Rizwan, Abhiram R. Gorle, Maahe Zehra Kazmi, Nukhba Amir, Ali Subhan, Muhammad Usman Rafique, Zihao He, Pulkit Mehta, Muhammad Ali Jamshed, John M. Cioffi	2025-11-17	下载	Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation,...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Secure Semantic Communication System Based on Knowledge Graph	Qin Guo, Haonan Tong, Sihua Wang, Peiyuan Si, Jun Zhao, Changchuan Yin	2025-11-17	下载	This study proposes a novel approach to ensure the security of textual data transmission in a semantic communication system. In the proposed system, a sender transmits textual information to a receive...
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications	Gagan Raj Gupta, Anshul Kumar, Manish Rai, Apu Chakraborty, Ashutosh Modi, Abdelaali Chaoub, Soumajit Pramanik, Moyank Giri, Yashwanth Holla, Sunny Kumar, M. V. Kiran Sooraj	2025-11-17	下载	Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization...
Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile Networks	Zhuoran Duan, Yuhao Wei, Guoshun Nan, Zijun Wang, Yan Yan, Lihua Xiong, Yuhan Ran, Ji Zhang, Jian Li, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek	2025-11-17	下载	Large models (LMs), such as ChatGPT, have made a significant impact across diverse domains and hold great potential to facilitate the evolution of network intelligence.
Distributed Self-allocated Time Slot Reuse: Multi-hop Communication in Rigid UAV Formations	Amelia Samandari, Andreas Willig, Barry Wu, Philippa Martin	2025-11-17	下载	Deployment of Unmanned Aerial Vehicles (UAVs) in autonomous formations necessitates accurate and timely communication of safety information. A communication protocol that supports timely and successfu...
Indirect Coflow Scheduling	Alexander Lindermayr, Kirk Pruhs, Andréa W. Richa, Tegan Wilson	2025-11-17	下载	We consider routing in reconfigurable networks, which is also known as coflow scheduling in the literature. The algorithmic literature generally (perhaps implicitly) assumes that the amount of data to...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Sharpe-Driven Stock Selection and Liquidiy-Constrained Portfolio Optimization: Evidence from the Chinese Equity Market	Thanh Nguyen	2025-11-17	下载	This paper develops and empirically evaluates a Sharpe-driven stock selection and liquidity-constrained portfolio optimization framework designed for the Chinese equity market.
Talyxion: From Speculation to Optimization in Risk Managed Crypto Portfolio Allocation	Thanh Nguyen	2025-11-17	下载	Cryptocurrency trading has attracted tremendous attention from both retail and institutional investors. However, most traders fail to scale their assets under management due to fragile strategies that...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Enabling Heterogeneous Performance Analysis for Scientific Workloads	Maksymilian Graczyk, Vincent Desbiolles, Stefan Roiser, Andrea Guerrieri	2025-11-17	下载	Heterogeneous computing integrates diverse processing elements, such as CPUs, GPUs, and FPGAs, within a single system, aiming to leverage the strengths of each architecture to optimize performance and...
AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention	Aleksandar Stankovic	2025-11-17	下载	Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that chooses tiling ...
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI	Saicharan Kolluru	2025-11-17	下载	The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization.
Hardware optimization on Android for inference of AI models	Iulius Gherasim, Carlos García Sánchez	2025-11-17	下载	The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing.
Evaluation of Domain-Specific Architectures for General-Purpose Applications in Apple Silicon	Álvaro Corrochano López, Carlos García Sánchez	2025-11-17	下载	The rise of AI and its growing computational demands have driven the integration of domain-specific accelerators (such as GPUs, TPUs, and NPUs) across the entire computing infrastructure.
KForge: Program Synthesis for Diverse AI Hardware Accelerators	Taras Sereda, Tom St. John, Burak Bartan, Natalie Serrino, Sachin Katti, Zain Asgar	2025-11-17	下载	GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a gener...
Large-scale Multigrid with Adaptive Galerkin Coarsening	Fabian Böhm, Nils Kohl, Harald Köstler, Ulrich Rüde	2025-11-17	下载	We propose a robust, adaptive coarse-grid correction scheme for matrix-free geometric multigrid targeting PDEs with strongly varying coefficients.