Appearance
2025-09-26
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Enhanced Hybrid Temporal Computing Using Deterministic Summations for Ultra-Low-Power Accelerators | Sachin Sachdeva, Jincong Lu, Wantong Li, Sheldon X. -D. Tan | 2025-09-26 | 下载 | This paper presents an accuracy-enhanced Hybrid Temporal Computing (E-HTC) framework for ultra-low-power hardware accelerators with deterministic additions. |
| CryptoSRAM: Enabling High-Throughput Cryptography on MCUs via In-SRAM Computing | Jingyao Zhang, Elaheh Sadredini | 2025-09-26 | 下载 | Secure communication is a critical requirement for Internet of Things (IoT) devices, which are often based on Microcontroller Units (MCUs). Current cryptographic solutions, which rely on software libr... |
| No One-Size-Fits-All: A Workload-Driven Characterization of Bit-Parallel vs. Bit-Serial Data Layouts for Processing-using-Memory | Jingyao Zhang, Elaheh Sadredini | 2025-09-26 | 下载 | Processing-in-Memory (PIM) is a promising approach to overcoming the memory-wall bottleneck. However, the PIM community has largely treated its two fundamental data layouts, Bit-Parallel (BP) and Bit-... |
| Latency Based Tiling | Jack Cashman | 2025-09-26 | 下载 | Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time. |
| AxLLM: accelerator architecture for large language models with computation reuse capability | Soroush Ahadi, Mehdi Modarressi, Masoud Daneshtalab | 2025-09-26 | 下载 | Large language models demand massive computational power and memory resources, posing significant challenges for efficient deployment. While quantization has been widely explored to reduce model size ... |
| NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction | Shayne Wadle, Yanxin Zhang, Vikas Singh, Karthikeyan Sankaralingam | 2025-09-26 | 下载 | The evaluation of new microprocessor designs is constrained by slow, cycle-accurate simulators that rely on unrepresentative benchmark traces. |
| SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance | Shayne Wadle, Karthikeyan Sankaralingam | 2025-09-26 | 下载 | Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution ... |
| Privacy-Preserving Performance Profiling of In-The-Wild GPUs | Ian McDougall, Michael Davies, Rahul Chatterjee, Somesh Jha, Karthikeyan Sankaralingam | 2025-09-26 | 下载 | GPUs are the dominant platform for many important applications today including deep learning, accelerated computing, and scientific simulation. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| OptimES: Optimizing Federated Learning Using Remote Embeddings for Graph Neural Networks | Pranjal Naman, Yogesh Simmhan | 2025-09-26 | 下载 | Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures. |
| Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity | Artavazd Maranjyan, Peter Richtárik | 2025-09-26 | 下载 | Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. |
| Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM | Biyao Zhang, Mingkai Zheng, Debargha Ganguly, Xuecen Zhang, Vikash Singh, Vipin Chaudhary, Zhao Zhang | 2025-09-26 | 下载 | Training Large Language Models(LLMs) is one of the most compute-intensive tasks in high-performance computing. Predicting end-to-end training time for multi-billion parameter models distributed across... |
| Agora: Bridging the GPU Cloud Resource-Price Disconnect | Ian McDougall, Noah Scott, Joon Huh, Kirthevasan Kandasamy, Karthikeyan Sankaralingam | 2025-09-26 | 下载 | The historic trend of Moore's Law, which predicted exponential growth in computational performance per dollar, has diverged for modern Graphics Processing Units (GPUs). |
| Role-Aware Multi-modal federated learning system for detecting phishing webpages | Bo Wang, Imran Khan, Martin White, Natalia Beloff | 2025-09-26 | 下载 | We present a federated, multi-modal phishing website detector that supports URL, HTML, and IMAGE inputs without binding clients to a fixed modality at inference: any client can invoke any modality hea... |
| Orientation does not help with 3-coloring a grid in online-LOCAL | Thomas Boudier, Filippo Casagrande, Avinandan Das, Massimo Equi, Henrik Lievonen, Augusto Modanese, Ronja Stimpert | 2025-09-26 | 下载 | The online-LOCAL and SLOCAL models are extensions of the LOCAL model where nodes are processed in a sequential but potentially adversarial order. |
| The AI_INFN Platform: Artificial Intelligence Development in the Cloud | Lucio Anderlini, Giulio Bianchini, Diego Ciangottini, Stefano Dal Pra, Diego Michelotto, Rosa Petrini, Daniele Spiga | 2025-09-26 | 下载 | Machine Learning (ML) is profoundly reshaping the way researchers create, implement, and operate data-intensive software. Its adoption, however, introduces notable challenges for computing infrastruct... |
| Code once, Run Green: Automated Green Code Translation in Serverless Computing | Sebastian Werner, Mathis Kähler, Alireza Hakamian | 2025-09-26 | 下载 | The rapid digitization and the increasing use of emerging technologies such as AI models have significantly contributed to the emissions of computing infrastructure. |
| VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs | Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri | 2025-09-26 | 下载 | In this study, we propose VibeCodeHPC, a multi-agent system based on large language models (LLMs) for the automatic tuning of high-performance computing (HPC) programs on supercomputers. |
| Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training | Chang Chen, Tiancheng Chen, Jiangfei Duan, Qianchao Zhu, Zerui Wang, Qinghao Hu, Peng Sun, Xiuhong Li, Chao Yang, Torsten Hoefler | 2025-09-26 | 下载 | Training large language models (LLMs) with increasingly long and varying sequence lengths introduces severe load imbalance challenges in large-scale data-parallel training. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Bridging Language Models and Formal Methods for Intent-Driven Optical Network Design | Anis Bekri, Amar Abane, Abdella Battou, Saddek Bensalem | 2025-09-26 | 下载 | Intent-Based Networking (IBN) aims to simplify network management by enabling users to specify high-level goals that drive automated network design and configuration. |
| Bridging Technical Capability and User Accessibility: Off-grid Civilian Emergency Communication | Karim Khamaisi, Oliver Kamer, Bruno Rodrigues, Jan von der Assen, Burkhard Stiller | 2025-09-26 | 下载 | During large-scale crises disrupting cellular and Internet infrastructure, civilians lack reliable methods for communication, aid coordination, and access to trustworthy information. |
| Extreme Value Theory-enhanced Radio Maps for Handovers in Ultra-reliable Communications | Dian Echevarría Pérez, Onel L. Alcaraz López, Hirley Alves | 2025-09-26 | 下载 | Efficient handover (HO) strategies are essential for maintaining the stringent performance requirements of ultra-reliable communication (URC) systems. |
| Red Teaming Quantum-Resistant Cryptographic Standards: A Penetration Testing Framework Integrating AI and Quantum Security | Petar Radanliev | 2025-09-26 | 下载 | This study presents a structured approach to evaluating vulnerabilities within quantum cryptographic protocols, focusing on the BB84 quantum key distribution method and National Institute of Standards... |
| LLM Agent Communication Protocol (LACP) Requires Urgent Standardization: A Telecom-Inspired Protocol is Necessary | Xin Li, Mengbing Liu, Chau Yuen | 2025-09-26 | 下载 | This position paper argues that the field of LLM agents requires a unified, telecom-inspired communication protocol to ensure safety, interoperability, and scalability, especially within the context o... |
| Evaluating Open-Source Large Language Models for Technical Telecom Question Answering | Arina Caraus, Alessio Buscemi, Sumit Kumar, Ion Turcanu | 2025-09-26 | 下载 | Large Language Models (LLMs) have shown remarkable capabilities across various fields. However, their performance in technical domains such as telecommunications remains underexplored. |
| Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments | Muhammad Junaid Asif, Abdul Rehman, Asim Mehmood, Rana Fayyaz Ahmad, Shazia Saqib | 2025-09-26 | 下载 | This research proposes an extensive technique for monitoring and controlling the industrial parameters using Internet of Things (IoT) technology based on wireless communication. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Secure and Efficient Access Control for Computer-Use Agents via Context Space | Haochen Gong, Chenxiao Li, Rui Chang, Wenbo Shen | 2025-09-26 | 下载 | Large language model (LLM)-based computer-use agents represent a convergence of AI and OS capabilities, enabling natural language to control system- and application-level functions. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Tiny-QMoE | Jack Cashman, Jiaqi Nie | 2025-09-26 | 下载 | The QMoE model provides a practical approach for compression of massive Mixture-of-Experts (MoE) models. QMoE offers a solution geared towards memory limitations that often reach terabyte scales, and ... |
| Latency Based Tiling | Jack Cashman | 2025-09-26 | 下载 | Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time. |
| SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance | Shayne Wadle, Karthikeyan Sankaralingam | 2025-09-26 | 下载 | Improving single-thread performance remains a critical challenge in modern processor design, as conventional approaches such as deeper speculation, wider pipelines, and complex out-of-order execution ... |
| Less is More: Faster Maximum Clique Search by Work-Avoidance | Hans Vandierendonck | 2025-09-26 | 下载 | The maximum clique (MC) problem is a challenging graph mining problem which, due to its NP-hard nature, can take a substantial amount of execution time. |
| Light Differentiable Logic Gate Networks | Lukas Rüttgers, Till Aczel, Andreas Plesner, Roger Wattenhofer | 2025-09-26 | 下载 | Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost i... |