Skip to content

2025-09-26

cs.AR - Architecture

标题作者发布日期PDF摘要
Enhanced Hybrid Temporal Computing Using Deterministic Summations for Ultra-Low-Power AcceleratorsSachin Sachdeva, Jincong Lu, Wantong Li, Sheldon X. -D. Tan2025-09-26下载This paper presents an accuracy-enhanced Hybrid Temporal Computing (E-HTC) framework for ultra-low-power hardware accelerators with deterministic additions.
CryptoSRAM: Enabling High-Throughput Cryptography on MCUs via In-SRAM ComputingJingyao Zhang, Elaheh Sadredini2025-09-26下载Secure communication is a critical requirement for Internet of Things (IoT) devices, which are often based on Microcontroller Units (MCUs). Current cryptographic solutions, which rely on software libr...
No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-MemoryJingyao Zhang, Elaheh Sadredini2025-09-26下载Processing-in-Memory (PIM) is a promising approach to overcoming the memory-wall bottleneck. However, the PIM community has largely treated its two fundamental data layouts, Bit-Parallel (BP) and Bit-...
Latency Based TilingJack Cashman2025-09-26下载Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time.
AxLLM: accelerator architecture for large language models with computation reuse capabilitySoroush Ahadi, Mehdi Modarressi, Masoud Daneshtalab2025-09-26下载Large language models demand massive computational power and memory resources, posing significant challenges for efficient deployment. While quantization has been widely explored to reduce model size ...
NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance PredictionShayne Wadle, Yanxin Zhang, Vikas Singh, Karthikeyan Sankaralingam2025-09-26下载The evaluation of new microprocessor designs is constrained by slow, cycle-accurate simulators that rely on unrepresentative benchmark traces.
SAHM: State-Aware Heterogeneous Multicore for Single-Thread PerformanceShayne Wadle, Karthikeyan Sankaralingam2025-09-26下载Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution ...
Privacy-Preserving Performance Profiling of In-The-Wild GPUsIan McDougall, Michael Davies, Rahul Chatterjee, Somesh Jha, Karthikeyan Sankaralingam2025-09-26下载GPUs are the dominant platform for many important applications today including deep learning, accelerated computing, and scientific simulation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
OptimES: Optimizing Federated Learning Using Remote Embeddings for Graph Neural NetworksPranjal Naman, Yogesh Simmhan2025-09-26下载Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures.
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data HeterogeneityArtavazd Maranjyan, Peter Richtárik2025-09-26下载Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities.
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLMBiyao Zhang, Mingkai Zheng, Debargha Ganguly, Xuecen Zhang, Vikash Singh, Vipin Chaudhary, Zhao Zhang2025-09-26下载Training Large Language Models(LLMs) is one of the most compute-intensive tasks in high-performance computing. Predicting end-to-end training time for multi-billion parameter models distributed across...
Agora: Bridging the GPU Cloud Resource-Price DisconnectIan McDougall, Noah Scott, Joon Huh, Kirthevasan Kandasamy, Karthikeyan Sankaralingam2025-09-26下载The historic trend of Moore's Law, which predicted exponential growth in computational performance per dollar, has diverged for modern Graphics Processing Units (GPUs).
Role-Aware Multi-modal federated learning system for detecting phishing webpagesBo Wang, Imran Khan, Martin White, Natalia Beloff2025-09-26下载We present a federated, multi-modal phishing website detector that supports URL, HTML, and IMAGE inputs without binding clients to a fixed modality at inference: any client can invoke any modality hea...
Orientation does not help with 3-coloring a grid in online-LOCALThomas Boudier, Filippo Casagrande, Avinandan Das, Massimo Equi, Henrik Lievonen, Augusto Modanese, Ronja Stimpert2025-09-26下载The online-LOCAL and SLOCAL models are extensions of the LOCAL model where nodes are processed in a sequential but potentially adversarial order.
The AI_INFN Platform: Artificial Intelligence Development in the CloudLucio Anderlini, Giulio Bianchini, Diego Ciangottini, Stefano Dal Pra, Diego Michelotto, Rosa Petrini, Daniele Spiga2025-09-26下载Machine Learning (ML) is profoundly reshaping the way researchers create, implement, and operate data-intensive software. Its adoption, however, introduces notable challenges for computing infrastruct...
Code once, Run Green: Automated Green Code Translation in Serverless ComputingSebastian Werner, Mathis Kähler, Alireza Hakamian2025-09-26下载The rapid digitization and the increasing use of emerging technologies such as AI models have significantly contributed to the emissions of computing infrastructure.
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMsShun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri2025-09-26下载In this study, we propose VibeCodeHPC, a multi-agent system based on large language models (LLMs) for the automatic tuning of high-performance computing (HPC) programs on supercomputers.
Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model TrainingChang Chen, Tiancheng Chen, Jiangfei Duan, Qianchao Zhu, Zerui Wang, Qinghao Hu, Peng Sun, Xiuhong Li, Chao Yang, Torsten Hoefler2025-09-26下载Training large language models (LLMs) with increasingly long and varying sequence lengths introduces severe load imbalance challenges in large-scale data-parallel training.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Bridging Language Models and Formal Methods for Intent-Driven Optical Network DesignAnis Bekri, Amar Abane, Abdella Battou, Saddek Bensalem2025-09-26下载Intent-Based Networking (IBN) aims to simplify network management by enabling users to specify high-level goals that drive automated network design and configuration.
Bridging Technical Capability and User Accessibility: Off-grid Civilian Emergency CommunicationKarim Khamaisi, Oliver Kamer, Bruno Rodrigues, Jan von der Assen, Burkhard Stiller2025-09-26下载During large-scale crises disrupting cellular and Internet infrastructure, civilians lack reliable methods for communication, aid coordination, and access to trustworthy information.
Extreme Value Theory-enhanced Radio Maps for Handovers in Ultra-reliable CommunicationsDian Echevarría Pérez, Onel L. Alcaraz López, Hirley Alves2025-09-26下载Efficient handover (HO) strategies are essential for maintaining the stringent performance requirements of ultra-reliable communication (URC) systems.
Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum SecurityPetar Radanliev2025-09-26下载This study presents a structured approach to evaluating vulnerabilities within quantum cryptographic protocols, focusing on the BB84 quantum key distribution method and National Institute of Standards...
LLM Agent Communication Protocol (LACP) Requires Urgent Standardization: A Telecom-Inspired Protocol is NecessaryXin Li, Mengbing Liu, Chau Yuen2025-09-26下载This position paper argues that the field of LLM agents requires a unified, telecom-inspired communication protocol to ensure safety, interoperability, and scalability, especially within the context o...
Evaluating Open-Source Large Language Models for Technical Telecom Question AnsweringArina Caraus, Alessio Buscemi, Sumit Kumar, Ion Turcanu2025-09-26下载Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored.
Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial EnvironmentsMuhammad Junaid Asif, Abdul Rehman, Asim Mehmood, Rana Fayyaz Ahmad, Shazia Saqib2025-09-26下载This research proposes an extensive technique for monitoring and controlling the industrial parameters using Internet of Things (IoT) technology based on wireless communication.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Secure and Efficient Access Control for Computer-Use Agents via Context SpaceHaochen Gong, Chenxiao Li, Rui Chang, Wenbo Shen2025-09-26下载Large language model (LLM)-based computer-use agents represent a convergence of AI and OS capabilities, enabling natural language to control system- and application-level functions.

cs.PF - Performance

标题作者发布日期PDF摘要
Tiny-QMoEJack Cashman, Jiaqi Nie2025-09-26下载The QMoE model provides a practical approach for compression of massive Mixture-of-Experts (MoE) models. QMoE offers a solution geared towards memory limitations that often reach terabyte scales, and ...
Latency Based TilingJack Cashman2025-09-26下载Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time.
SAHM: State-Aware Heterogeneous Multicore for Single-Thread PerformanceShayne Wadle, Karthikeyan Sankaralingam2025-09-26下载Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution ...
Less is More: Faster Maximum Clique Search by Work-AvoidanceHans Vandierendonck2025-09-26下载The maximum clique (MC) problem is a challenging graph mining problem which, due to its NP-hard nature, can take a substantial amount of execution time.
Light Differentiable Logic Gate NetworksLukas Rüttgers, Till Aczel, Andreas Plesner, Roger Wattenhofer2025-09-26下载Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost i...

基于 VitePress 构建