2025-04-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
LLM-based AI Agent for Sizing of Analog and Mixed Signal Circuit	Chang Liu, Emmanuel A. Olowe, Danial Chitnis	2025-04-14	下载	The design of Analog and Mixed-Signal (AMS) integrated circuits (ICs) often involves significant manual effort, especially during the transistor sizing process.
FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AI	Hong Ding, Chia Chao Kang, SuYang Xi, Zehang Liu, Xuan Zhang, Yi Ding	2025-04-14	下载	This research introduces an FPGA-based hardware accelerator to optimize the Singular Value Decomposition (SVD) and Fast Fourier transform (FFT) operations in AI models.
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning	Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang Li	2025-04-14	下载	Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis.
AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance	Matteo Perotti, Vincenzo Maisto, Moritz Imfeld, Nils Wistoff, Alessandro Cilardo, Luca Benini	2025-04-14	下载	Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency.
Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation	Kartik Ramkrishnan, Antonia Zhai, Stephen McCamant, Pen Chung Yew	2025-04-14	下载	Microarchitectural attacks are a significant concern, leading to many hardware-based defense proposals. However, different defenses target different classes of attacks, and their impact on each other ...
Graph Neural Networks Based Analog Circuit Link Prediction	Guanyuan Pan, Tiansheng Zhou, Jianxiang Zhao, Zhi Li, Yugui Lin, Bingtao Ma, Yaqi Wang, Pietro Liò, Shuai Wang	2025-04-14	下载	Circuit link prediction, which identifies missing component connections from incomplete netlists, is crucial in analog circuit design automation.
Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures	Marco Siracusa, Olivia Hsu, Victor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Doug Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, Adrià Armejach	2025-04-14	下载	Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these look...
Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability	Aikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis, Hussam Amrouch, Iraklis Anagnostopoulos	2025-04-14	下载	As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabri...
Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna	2025-04-14	下载	The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing	Pratyush Agnihotri, Boris Koldehofe, Roman Heinrich, Carsten Binnig, Manisha Luthra	2025-04-14	下载	The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment.
Container-level Energy Observability in Kubernetes Clusters	Bjorn Pijnacker, Brian Setz, Vasilios Andrikopoulos	2025-04-14	下载	Kubernetes has been for a number of years the default cloud orchestrator solution across multiple application and research domains. As such, optimizing the energy efficiency of Kubernetes-deployed wor...
Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE	Jesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin, Emine Kucukbenli, Henry W. Sprueill, Ilyes Batatia, Sotiris S. Xantheas, MalSoon Lee, Chris Mundy, Gabor Csanyi, Justin S. Smith, Ponnuswamy Sadayappan, Sutanay Choudhury	2025-04-14	下载	Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scien...
Overcoming Bottlenecks in Homomorphic Encryption for the 2024 Mexican Federal Election	Eric Landquist, Nimit Sawhney, Simer Sawhney	2025-04-14	下载	On June 2, 2024, Mexico held its federal elections. The majority of Mexican citizens voted in person at the polls in this historic election. For the first time though, Mexican citizens living outside ...
Load Balancing with Network Latencies via Distributed Gradient Descent	Santiago R. Balseiro, Vahab S. Mirrokni, Bartek Wydrowski	2025-04-14	下载	Motivated by the growing demand for serving large language model inference requests, we study distributed load balancing for global serving systems with network latencies.
PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation Framework	Christopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr	2025-04-14	下载	As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor...
A Real-Time, Auto-Regression Method for In-Situ Feature Extraction in Hydrodynamics Simulations	Kewei Yan, Yonghong Yan	2025-04-14	下载	Hydrodynamics simulations are powerful tools for studying fluid behavior under physical forces, enabling extraction of features that reveal key flow characteristics.
Silent Self-Stabilizing Ranking: Time Optimal and Space Efficient	Petra Berenbrink, Robert Elsässer, Thorsten Götte, Lukas Hintze, Dominik Kaaser	2025-04-14	下载	We present a silent, self-stabilizing ranking protocol for the population protocol model of distributed computing, where agents interact in randomly chosen pairs to solve a common task.
Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks	Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief	2025-04-14	下载	Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks.
Optimal Graph Stretching for Distributed Averaging	Florine W. Dekker, Zekeriya Erkin, Mauro Conti	2025-04-14	下载	The performance of distributed averaging depends heavily on the underlying topology. In various fields, including compressed sensing, multi-party computation, and abstract graph theory, graphs may be ...
Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs	Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, Hang Liu	2025-04-14	下载	Random walks are a primary means for extracting information from large-scale graphs. While most real-world graphs are inherently dynamic, state-of-the-art random walk engines failed to efficiently sup...
Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads	Mert Yildiz, Alexey Rolich, Andrea Baiocchi	2025-04-14	下载	Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters.
Lightweight Trustworthy Distributed Clustering	Hongyang Li, Caesar Wu, Mohammed Chadli, Said Mammar, Pascal Bouvry	2025-04-14	下载	Ensuring data trustworthiness within individual edge nodes while facilitating collaborative data processing poses a critical challenge in edge computing systems (ECS), particularly in resource-constra...
Solvers for the Hermitian and the pseudo-Hermitian Bethe-Salpeter equation in the Yambo code: Implementation and Performance	Petru Milev, Blanca Mellado-Pinto, Muralidhar Nalabothula, Ali Esquembre Kucukalic, Fernando Alvarruiz, Enrique Ramos, Francesco Filippone, Alejandro Molina-Sanchez, Ludger Wirtz, Jose E. Roman, Davide Sangalli	2025-04-14	下载	We analyze the performance of two strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics.
Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project	Carolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kesselheim, Andreas Herten	2025-04-14	下载	The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency.
COUNTER: Cluster GCN based Energy Efficient Resource Management for Sustainable Cloud Computing Environments	Han Wang, Sukhpal Singh Gill, Steve Uhlig	2025-04-14	下载	Cloud computing, thanks to the pervasiveness of information technologies, provides a foundational environment for developing IT applications, offering organizations virtually unlimited and flexible co...
FTHP-MPI: Towards Providing Replication-based Fault Tolerance in a Fault-Intolerant Native MPI Library	Sarthak Joshi, Sathish Vadhiyar	2025-04-14	下载	Faults in high-performance systems are expected to be very large in the current exascale computing era. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to...
DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training	Masahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase	2025-04-14	下载	The rapid growth of deep learning models has increased the demand for efficient distributed training strategies. Fully sharded approaches like ZeRO-3 and FSDP partition model parameters across GPUs an...
MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training	Juntao Zhao, Qi Lu, Wei Jia, Borui Wan, Lei Zuo, Junda Feng, Jianyu Jiang, Yangrui Chen, Shuaishuai Cao, Jialing He, Kaihua Jiang, Yuanzhe Hu, Shibiao Nong, Yanghua Peng, Haibin Lin, Chuan Wu	2025-04-14	下载	Modern frameworks for training large foundation models (LFMs) employ dataloaders in a data-parallel manner, with each loader processing a disjoint subset of training data.
You can lie but not deny: SWMR registers with signature properties in systems with Byzantine processes	Xing Hu, Sam Toueg	2025-04-14	下载	We define and show how to implement SWMR registers that provide properties of unforgeable digital signatures - without actually using such signatures - in systems with Byzantine processes.
Understanding and Optimizing Multi-Stage AI Inference Pipelines	Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna	2025-04-14	下载	The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Introducing Large Language Models as the Next Challenging Internet Traffic Source	Nataliia Koneva, Alejandro Leonardo García Navarro, Alfonso Sánchez-Macián, José Alberto Hernández, Moshe Zukerman, Óscar González de Dios	2025-04-14	下载	This article explores the growing impact of large language models (LLMs) and Generative AI (GenAI) tools on Internet traffic, focusing on their role as a new and significant source of network load.
Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks	Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief	2025-04-14	下载	Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks.
Staggering and Fragmentation for Improved Large Message Handling in libp2p GossipSub	Muhammad Umar Farooq, Tanguy Cizain, Daniel Kaiser	2025-04-14	下载	The libp2p GossipSub protocol leverages a full-message mesh with a lower node degree and a more densely connected metadata-only (gossip) mesh.
IRR-Based AS Type of Relationship Inference	Amit Zulan, Omer Miron, Tal Shapira, Yuval Shavitt	2025-04-14	下载	The Internet comprises tens of thousands of autonomous systems (ASes) whose commercial relationships are not publicly announced. The classification of the Type of Relationship (ToR) between ASes has b...
Implementation and Performance Evaluation of TCP over QUIC Tunnels	Xuanhong Guo, Zekun Bao, Ying Chen	2025-04-14	下载	QUIC, a UDP-based transport protocol, addresses several limitations of TCP by offering built-in encryption, stream multiplexing, and improved loss recovery.
Vermilion: A Traffic-Aware Reconfigurable Optical Interconnect with Formal Throughput Guarantees	Vamsi Addanki, Chen Avin, Goran Dario Knabe, Giannis Patronas, Dimitris Syrivelis, Nikos Terzenidis, Paraskevas Bakopoulos, Ilias Marinos, Stefan Schmid	2025-04-14	下载	The increasing gap between datacenter traffic volume and the capacity of electrical switches has driven the development of reconfigurable network designs utilizing optical circuit switching.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation Framework	Christopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr	2025-04-14	下载	As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor...
Improving Upon the generalized c-mu rule: a Whittle approach	Zhouzi Li, Keerthana Gurushankar, Mor Harchol-Balter, Alan Scheller-Wolf	2025-04-14	下载	Scheduling a stream of jobs whose holding cost changes over time is a classic and practical problem. Specifically, each job is associated with a holding cost (penalty), where a job's instantaneous hol...

2025-04-14 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.PF - Performance ​

2025-04-14

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.PF - Performance