Skip to content

2025-11-17

cs.AR - Architecture

标题作者发布日期PDF摘要
NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM InferenceLei Zhao, Luca Buonanno, Archit Gajjar, John Moon, Aishwarya Natarajan, Sergey Serebryakov, Ron M. Roth, Xia Sheng, Youtao Zhang, Paolo Faraboschi, Jim Ignowski, Giacomo Pedretti2025-11-17下载Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations...
QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable AttentionHyunwoo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Wenjun Huang, Tamoghno Das, Suyeon Jang, Mohsen Imani2025-11-17下载Deformable transformers deliver state-of-the-art detection but map poorly to hardware due to irregular memory access and low arithmetic intensity.
T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU ReorganizationHyunwoo Oh, KyungIn Nam, Rajat Bhattacharjya, Hanning Chen, Tamoghno Das, Sanggeon Yun, Suyeon Jang, Andrew Ding, Nikil Dutt, Mohsen Imani2025-11-17下载Recent advances in LLMs have outpaced the computational and memory capacities of edge platforms that primarily employ CPUs, thereby challenging efficient and scalable deployment.
Coliseum project: Correlating climate change data with the behavior of heritage materialsA Cormier, David Roqui, Fabrice Surma, Martin Labouré, Jean-Marc Vallet, Odile Guillon, N Grozavu, Ann Bourgès2025-11-17下载Heritage materials are already affected by climate change, and increasing climatic variations reduces the lifespan of monuments. As weathering depends on many factors, it is also difficult to link its...
Assessing Large Language Models in Generating RTL Design SpecificationsHung-Ming Huang, Yu-Hsin Yang, Fu-Chieh Chang, Yun-Chia Hsu, Yin-Yu Lin, Ming-Fang Tsai, Chun-Chih Yang, Pei-Yuan Wu2025-11-17下载As IC design grows more complex, automating comprehension and documentation of RTL code has become increasingly important. Engineers currently should manually interpret existing RTL code and write spe...
Think with Self-Decoupling and Self-Verification: Automated RTL Design with Backtrack-ToTZhiteng Chao, Yonghao Wang, Xinyu Zhang, Jiaxin Zhou, Tenghui Hua, Husheng Han, Tianmeng Yang, Jianan Mu, Bei Yu, Rui Zhang, Jing Ye, Huawei Li2025-11-17下载Large language models (LLMs) hold promise for automating integrated circuit (IC) engineering using register transfer level (RTL) hardware description languages (HDLs) like Verilog.
Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting AccelerationChanghun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, Jongse Park2025-11-17下载3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences.
Dissecting and Re-architecting 3D NAND Flash PIM Arrays for Efficient Single-Batch Token Generation in LLMsYongjoo Jang, Sangwoo Hwang, Hojin Lee, Sangwoo Jung, Donghun Lee, Wonbo Shim, Jaeha Kung2025-11-17下载The advancement of large language models has led to models with billions of parameters, significantly increasing memory and compute demands. Serving such models on conventional hardware is challenging...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI KernelsStuart H. Sul, Simran Arora, Benjamin F. Spector, Christopher Ré2025-11-17下载Inter-GPU communication has become a major bottleneck for modern AI workloads as models scale and improvements in hardware compute throughput outpace improvements in interconnect bandwidth.
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGISaicharan Kolluru2025-11-17下载The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization.
Asymptotic analysis of cooperative censoring policies in sensor networksJesus Fernandez-Bes, Rocío Arroyo-Valles, Jesús Cid-Sueiro2025-11-17下载The problem of cooperative data censoring in battery-powered multihop sensor networks is analyzed in this paper. We are interested in scenarios where nodes generate messages (which are related to the ...
Do MPI Derived Datatypes Actually Help? A Single-Node Cross-Implementation Study on Shared-Memory CommunicationTemitayo Adefemi2025-11-17下载MPI's derived datatypes (DDTs) promise easier, copy-free communication of non-contiguous data, yet their practical performance remains debated and is often reported only for a single MPI stack.
InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split InferenceRuijun Deng, Zhihui Lu, Qiang Duan2025-11-17下载Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can recover the ...
Distributed Hierarchical Machine Learning for Joint Resource Allocation and Slice Selection in In-Network Edge SystemsSulaiman Muhammad Rashid, Ibrahim Aliyu, Jaehyung Park, Jinsul Kim2025-11-17下载The Metaverse promises immersive, real-time experiences; however, meeting its stringent latency and resource demands remains a major challenge.
Pico-Cloud: Cloud Infrastructure for Tiny Edge DevicesMordechai Guri2025-11-17下载This paper introduces the Pico-Cloud, a micro-edge cloud architecture built on ultra-minimal hardware platforms such as the Raspberry Pi Zero and comparable single-board computers.
Learning Process Energy Profiles from Node-Level Power DataJonathan Bader, Julius Irion, Jannis Kappel, Joel Witzke, Niklas Fomin, Diellza Sherifi, Odej Kao2025-11-17下载The growing demand for data center capacity, driven by the growth of high-performance computing, cloud computing, and especially artificial intelligence, has led to a sharp increase in data center ene...
MACKO: Sparse Matrix-Vector Multiplication for Low SparsityVladimír Macko, Vladimír Boža2025-11-17下载Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in the inference of sparse Large Language Models (LLMs). Because existing SpMV methods perform poorly under the low and unstructur...
On the Fundamental Limits of LLMs at ScaleMuhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zeeshan Memon, Muhammad Ibtsaam Qadir, Sagnik Bhattacharya, Hassan Rizwan, Abhiram R. Gorle, Maahe Zehra Kazmi, Nukhba Amir, Ali Subhan, Muhammad Usman Rafique, Zihao He, Pulkit Mehta, Muhammad Ali Jamshed, John M. Cioffi2025-11-17下载Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation,...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Secure Semantic Communication System Based on Knowledge GraphQin Guo, Haonan Tong, Sihua Wang, Peiyuan Si, Jun Zhao, Changchuan Yin2025-11-17下载This study proposes a novel approach to ensure the security of textual data transmission in a semantic communication system. In the proposed system, a sender transmits textual information to a receive...
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom ApplicationsGagan Raj Gupta, Anshul Kumar, Manish Rai, Apu Chakraborty, Ashutosh Modi, Abdelaali Chaoub, Soumajit Pramanik, Moyank Giri, Yashwanth Holla, Sunny Kumar, M. V. Kiran Sooraj2025-11-17下载Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization...
Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile NetworksZhuoran Duan, Yuhao Wei, Guoshun Nan, Zijun Wang, Yan Yan, Lihua Xiong, Yuhan Ran, Ji Zhang, Jian Li, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek2025-11-17下载Large models (LMs), such as ChatGPT, have made a significant impact across diverse domains and hold great potential to facilitate the evolution of network intelligence.
Distributed Self-allocated Time Slot Reuse: Multi-hop Communication in Rigid UAV FormationsAmelia Samandari, Andreas Willig, Barry Wu, Philippa Martin2025-11-17下载Deployment of Unmanned Aerial Vehicles (UAVs) in autonomous formations necessitates accurate and timely communication of safety information. A communication protocol that supports timely and successfu...
Indirect Coflow SchedulingAlexander Lindermayr, Kirk Pruhs, Andréa W. Richa, Tegan Wilson2025-11-17下载We consider routing in reconfigurable networks, which is also known as coflow scheduling in the literature. The algorithmic literature generally (perhaps implicitly) assumes that the amount of data to...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Sharpe-Driven Stock Selection and Liquidiy-Constrained Portfolio Optimization: Evidence from the Chinese Equity MarketThanh Nguyen2025-11-17下载This paper develops and empirically evaluates a Sharpe-driven stock selection and liquidity-constrained portfolio optimization framework designed for the Chinese equity market.
Talyxion: From Speculation to Optimization in Risk Managed Crypto Portfolio AllocationThanh Nguyen2025-11-17下载Cryptocurrency trading has attracted tremendous attention from both retail and institutional investors. However, most traders fail to scale their assets under management due to fragile strategies that...

cs.PF - Performance

标题作者发布日期PDF摘要
Enabling Heterogeneous Performance Analysis for Scientific WorkloadsMaksymilian Graczyk, Vincent Desbiolles, Stefan Roiser, Andrea Guerrieri2025-11-17下载Heterogeneous computing integrates diverse processing elements, such as CPUs, GPUs, and FPGAs, within a single system, aiming to leverage the strengths of each architecture to optimize performance and...
AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR AttentionAleksandar Stankovic2025-11-17下载Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that chooses tiling ...
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGISaicharan Kolluru2025-11-17下载The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization.
Hardware optimization on Android for inference of AI modelsIulius Gherasim, Carlos García Sánchez2025-11-17下载The pervasive integration of Artificial Intelligence models into contemporary mobile computing is notable across numerous use cases, from virtual assistants to advanced image processing.
Evaluation of Domain-Specific Architectures for General-Purpose Applications in Apple SiliconÁlvaro Corrochano López, Carlos García Sánchez2025-11-17下载The rise of AI and its growing computational demands have driven the integration of domain-specific accelerators (such as GPUs, TPUs, and NPUs) across the entire computing infrastructure.
KForge: Program Synthesis for Diverse AI Hardware AcceleratorsTaras Sereda, Tom St. John, Burak Bartan, Natalie Serrino, Sachin Katti, Zain Asgar2025-11-17下载GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a gener...
Large-scale Multigrid with Adaptive Galerkin CoarseningFabian Böhm, Nils Kohl, Harald Köstler, Ulrich Rüde2025-11-17下载We propose a robust, adaptive coarse-grid correction scheme for matrix-free geometric multigrid targeting PDEs with strongly varying coefficients.

基于 VitePress 构建