2025-06-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini	2025-06-17	下载	The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors	Jongin Choi, Jina Park, Woojoo Lee, Jae-Jin Lee, Massoud Pedram	2025-06-17	下载	Multi-channel keyword spotting (KWS) has become crucial for voice-based applications in edge environments. However, its substantial computational and energy requirements pose significant challenges.
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees	Ahmed Heakl, Sarim Hashmi, Chaimaa Abi, Celine Lee, Abdulrahman Mahmoud	2025-06-17	下载	The hardware ecosystem is rapidly evolving, with increasing interest in translating low-level programs across different instruction set architectures (ISAs) in a quick, flexible, and correct way to en...
Empirically-Calibrated H100 Node Power Models for Reducing Uncertainty in AI Training Energy Estimation	Alex C. Newkirk, Jared Fernandez, Jonathan Koomey, Imran Latif, Emma Strubell, Arman Shehabi, Constantine Samaras	2025-06-17	下载	As AI's energy demand continues to grow, it is critical to enhance the understanding of characteristics of this demand, to improve grid infrastructure planning and environmental assessment.
Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC	Weiyu Zhou, Zheng Wang, Chao Chen, Yike Li, Yongkui Yang, Zhuoyu Wu, Anupam Chattopadhyay	2025-06-17	下载	While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal com...
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification	Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, Haoxing Ren	2025-06-17	下载	We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini	2025-06-17	下载	The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
Zarr-Based Chunk-Level Cumulative Sums in Reduced Dimensions	Hailiang Zhang, Dieu My T. Nguyen, Christine Smit, Mahabal Hegde	2025-06-17	下载	Data analysis on massive multi-dimensional data, such as high-resolution large-region time averaging or area averaging for geospatial data, often involves calculations over a significant number of dat...
Utility-Driven Speculative Decoding for Mixture-of-Experts	Anish Saxena, Po-An Tsai, Hritvik Taneja, Aamer Jaleel, Moinuddin Qureshi	2025-06-17	下载	GPU memory bandwidth is the main bottleneck for low-latency Large Language Model (LLM) inference. Speculative decoding leverages idle GPU compute by using a lightweight drafter to propose K tokens, wh...
Event-Driven Online Vertical Federated Learning	Ganyu Wang, Boyu Wang, Bin Gu, Charles Ling	2025-06-17	下载	Online learning is more adaptable to real-world scenarios in Vertical Federated Learning (VFL) compared to offline learning. However, integrating online learning into VFL presents challenges due to th...
Scalable GPU Performance Variability Analysis framework	Ankur Lahiry, Ayush Pokharel, Seth Ockerman, Amal Gueroudji, Line Pouchard, Tanzima Z. Islam	2025-06-17	下载	Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the int...
Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters	Sergio Iserte, Iker Martín-Álvarez, Krzysztof Rojek, José I. Aliaga, Maribel Castillo, Weronika Folwarska, Antonio J. Peña	2025-06-17	下载	Dynamic resource management is essential for optimizing computational efficiency in modern high-performance computing (HPC) environments, particularly as systems scale.
SETI@home: Data Acquisition and Front-End Processing	Eric J. Korpela, David P. Anderson, Jeff Cobb, Matt Lebofsky, Wei Liu, Dan Werthimer	2025-06-17	下载	SETI@home is a radio Search for Extraterrestrial Intelligence (SETI) project, looking for technosignatures in data recorded at multiple observatories from 1998 to 2020.
ClusterRCA: An End-to-End Approach for Network Fault Localization and Classification for HPC System	Yongqian Sun, Xijie Pan, Xiao Xiong, Lei Tao, Jiaju Wang, Shenglin Zhang, Yuan Yuan, Yuqi Li, Kunlin Jian	2025-06-17	下载	Network failure diagnosis is challenging yet critical for high-performance computing (HPC) systems. Existing methods cannot be directly applied to HPC scenarios due to data heterogeneity and lack of a...
Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)	Rúben Adão, Zhongjie Wu, Changjun Zhou, Oana Balmau, João Paulo, Ricardo Macedo	2025-06-17	下载	We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage...
Concepts for designing modern C++ interfaces for MPI	C. Nicole Avans, Alfredo A. Correa, Sayan Ghosh, Matthias Schimek, Joseph Schuchart, Anthony Skjellum, Evan D. Suggs, Tim Niklas Uhl	2025-06-17	下载	Since the C++ bindings were deleted in 2008, the Message Passing Interface (MPI) community has revived efforts in building high-level modern C++ interfaces.
Consensus Power Inequality: A Comparative Study of Blockchain Networks	Kamil Tylinski, Abylay Satybaldy, Paolo Tasca	2025-06-17	下载	The distribution of consensus power is a cornerstone of decentralisation, influencing the security, resilience, and fairness of blockchain networks while ensuring equitable impact among participants.
Convergence-Privacy-Fairness Trade-Off in Personalized Federated Learning	Xiyu Zhao, Qimei Cui, Weicai Li, Wei Ni, Ekram Hossain, Quan Z. Sheng, Xiaofeng Tao, Ping Zhang	2025-06-17	下载	Personalized federated learning (PFL), e.g., the renowned Ditto, strikes a balance between personalization and generalization by conducting federated learning (FL) to guide personalized learning (PL).
A Novel Indicator for Quantifying and Minimizing Information Utility Loss of Robot Teams	Xiyu Zhao, Qimei Cui, Wei Ni, Quan Z. Sheng, Abbas Jamalipour, Guoshun Nan, Xiaofeng Tao, Ping Zhang	2025-06-17	下载	The timely exchange of information among robots within a team is vital, but it can be constrained by limited wireless capacity. The inability to deliver information promptly can result in estimation e...
The Redundancy of Full Nodes in Bitcoin: A Network-Theoretic Demonstration of Miner-Centric Propagation Topologies	Dr Craig S Wright	2025-06-17	下载	This paper formally examines the network structure of Bitcoin CORE (BTC) and Bitcoin Satoshi Vision (BSV) using complex graph theory to demonstrate that home-hosted full nodes are incapable of partici...
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents	Qizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun	2025-06-17	下载	LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements.
Efficient Serving of LLM Applications with Probabilistic Demand Modeling	Yifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo	2025-06-17	下载	Applications based on Large Language Models (LLMs) contains a series of tasks to address real-world problems with boosted capability, which have dynamic demand volumes on diverse backends.
Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse	Jinwoo Hwang, Daeun Kim, Sangyeop Lee, Yoonsung Kim, Guseul Heo, Hojoon Kim, Yunseok Jeong, Tadiwos Meaza, Eunhyeok Park, Jeongseob Ahn, Jongse Park	2025-06-17	下载	Recently, Video-Language Models (VideoLMs) have demonstrated remarkable capabilities, offering significant potential for flexible and powerful video query systems.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC	Eman Alqudah, Ashfaq Khokhar	2025-06-17	下载	Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks.
CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC	Eman Alqudah, Ashfaq Khokhar	2025-06-17	下载	Ensuring packet-level communication quality is vital for ultra-reliable, low-latency communications (URLLC) in large-scale industrial wireless networks.
Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando Autoencoders	Luan Gonçalves Miranda, Pedro Ivo da Cruz, Murilo Bellezoni Loiola	2025-06-17	下载	Currently, digital security mechanisms like Anomaly Detection Systems using Autoencoders (AE) show great potential for bypassing problems intrinsic to the data, such as data imbalance.
Vulnerability Disclosure or Notification? Best Practices for Reaching Stakeholders at Scale	Ting-Han Chen, Jeroen van der Ham-de Vos	2025-06-17	下载	Security researchers are interested in security vulnerabilities, but these security vulnerabilities create risks for stakeholders. Coordinated Vulnerability Disclosure has been an accepted best practi...
A Novel Dynamic Bandwidth Allocation Design for 100G Coherent Passive Optical Network	Rujia Zou, Haipeng Zhang, Karthik Sundaresan, Zhensheng Jia, Suresh Subramaniam	2025-06-17	下载	With the rapid advancements in coherent Passive Optical Network (PON) technologies featuring 100G and higher data rates, this paper addresses the urgent requirement for sophisticated simulation and MA...
Optimizing System Latency for Blockchain-Encrypted Edge Computing in Internet of Vehicles	Cui Zhang, Maoxin Ji, Qiong Wu, Pingyi Fan, Qiang Fan	2025-06-17	下载	As Internet of Vehicles (IoV) technology continues to advance, edge computing has become an important tool for assisting vehicles in handling complex tasks.
TraGe: A Generic Packet Representation for Traffic Classification Based on Header-Payload Differences	Chungang Lin, Yilong Jiang, Weiyao Zhang, Xuying Meng, Tianyu Zuo, Yujun Zhang	2025-06-17	下载	Traffic classification has a significant impact on maintaining the Quality of Service (QoS) of the network. Since traditional methods heavily rely on feature extraction and large scale labeled data, s...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models	Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini	2025-06-17	下载	The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and co...
Determinação Automática de Limiar de Detecção de Ataques em Redes de Computadores Utilizando Autoencoders	Luan Gonçalves Miranda, Pedro Ivo da Cruz, Murilo Bellezoni Loiola	2025-06-17	下载	Currently, digital security mechanisms like Anomaly Detection Systems using Autoencoders (AE) show great potential for bypassing problems intrinsic to the data, such as data imbalance.
Scalable GPU Performance Variability Analysis framework	Ankur Lahiry, Ayush Pokharel, Seth Ockerman, Amal Gueroudji, Line Pouchard, Tanzima Z. Islam	2025-06-17	下载	Analyzing large-scale performance logs from GPU profilers often requires terabytes of memory and hours of runtime, even for basic summaries. These constraints prevent timely insight and hinder the int...
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents	Qizheng Zhang, Michael Wornow, Gerry Wan, Kunle Olukotun	2025-06-17	下载	LLM-based agent applications have shown increasingly remarkable capabilities in complex workflows but incur substantial costs and latency due to extensive planning and reasoning requirements.