Skip to content

2025-04-14

cs.AR - Architecture

标题作者发布日期PDF摘要
LLM-based AI Agent for Sizing of Analog and Mixed Signal CircuitChang Liu, Emmanuel A. Olowe, Danial Chitnis2025-04-14下载The design of Analog and Mixed-Signal (AMS) integrated circuits (ICs) often involves significant manual effort, especially during the transistor sizing process.
FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AIHong Ding, Chia Chao Kang, SuYang Xi, Zehang Liu, Xuan Zhang, Yi Ding2025-04-14下载This research introduces an FPGA-based hardware accelerator to optimize the Singular Value Decomposition (SVD) and Fast Fourier transform (FFT) operations in AI models.
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic ReasoningYiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang Li2025-04-14下载Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis.
AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit PerformanceMatteo Perotti, Vincenzo Maisto, Moritz Imfeld, Nils Wistoff, Alessandro Cilardo, Luca Benini2025-04-14下载Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency.
Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing ObfuscationKartik Ramkrishnan, Antonia Zhai, Stephen McCamant, Pen Chung Yew2025-04-14下载Microarchitectural attacks are a significant concern, leading to many hardware-based defense proposals. However, different defenses target different classes of attacks, and their impact on each other ...
Graph Neural Networks Based Analog Circuit Link PredictionGuanyuan Pan, Tiansheng Zhou, Jianxiang Zhao, Zhi Li, Yugui Lin, Bingtao Ma, Yaqi Wang, Pietro Liò, Shuai Wang2025-04-14下载Circuit link prediction, which identifies missing component connections from incomplete netlists, is crucial in analog circuit design automation.
Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute ArchitecturesMarco Siracusa, Olivia Hsu, Victor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Doug Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, Adrià Armejach2025-04-14下载Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these look...
Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and SustainabilityAikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis, Hussam Amrouch, Iraklis Anagnostopoulos2025-04-14下载As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabri...
Understanding and Optimizing Multi-Stage AI Inference PipelinesAbhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna2025-04-14下载The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream ProcessingPratyush Agnihotri, Boris Koldehofe, Roman Heinrich, Carsten Binnig, Manisha Luthra2025-04-14下载The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment.
Container-level Energy Observability in Kubernetes ClustersBjorn Pijnacker, Brian Setz, Vasilios Andrikopoulos2025-04-14下载Kubernetes has been for a number of years the default cloud orchestrator solution across multiple application and research domains. As such, optimizing the energy efficiency of Kubernetes-deployed wor...
Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACEJesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin, Emine Kucukbenli, Henry W. Sprueill, Ilyes Batatia, Sotiris S. Xantheas, MalSoon Lee, Chris Mundy, Gabor Csanyi, Justin S. Smith, Ponnuswamy Sadayappan, Sutanay Choudhury2025-04-14下载Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scien...
Overcoming Bottlenecks in Homomorphic Encryption for the 2024 Mexican Federal ElectionEric Landquist, Nimit Sawhney, Simer Sawhney2025-04-14下载On June 2, 2024, Mexico held its federal elections. The majority of Mexican citizens voted in person at the polls in this historic election. For the first time though, Mexican citizens living outside ...
Load Balancing with Network Latencies via Distributed Gradient DescentSantiago R. Balseiro, Vahab S. Mirrokni, Bartek Wydrowski2025-04-14下载Motivated by the growing demand for serving large language model inference requests, we study distributed load balancing for global serving systems with network latencies.
PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation FrameworkChristopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr2025-04-14下载As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor...
A Real-Time, Auto-Regression Method for In-Situ Feature Extraction in Hydrodynamics SimulationsKewei Yan, Yonghong Yan2025-04-14下载Hydrodynamics simulations are powerful tools for studying fluid behavior under physical forces, enabling extraction of features that reveal key flow characteristics.
Silent Self-Stabilizing Ranking: Time Optimal and Space EfficientPetra Berenbrink, Robert Elsässer, Thorsten Götte, Lukas Hintze, Dominik Kaaser2025-04-14下载We present a silent, self-stabilizing ranking protocol for the population protocol model of distributed computing, where agents interact in randomly chosen pairs to solve a common task.
Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power NetworksYan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief2025-04-14下载Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks.
Optimal Graph Stretching for Distributed AveragingFlorine W. Dekker, Zekeriya Erkin, Mauro Conti2025-04-14下载The performance of distributed averaging depends heavily on the underlying topology. In various fields, including compressed sensing, multi-party computation, and abstract graph theory, graphs may be ...
Bingo: Radix-based Bias Factorization for Random Walk on Dynamic GraphsPinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, Hang Liu2025-04-14下载Random walks are a primary means for extracting information from large-scale graphs. While most real-world graphs are inherently dynamic, state-of-the-art random walk engines failed to efficiently sup...
Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world WorkloadsMert Yildiz, Alexey Rolich, Andrea Baiocchi2025-04-14下载Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters.
Lightweight Trustworthy Distributed ClusteringHongyang Li, Caesar Wu, Mohammed Chadli, Said Mammar, Pascal Bouvry2025-04-14下载Ensuring data trustworthiness within individual edge nodes while facilitating collaborative data processing poses a critical challenge in edge computing systems (ECS), particularly in resource-constra...
Solvers for the Hermitian and the pseudo-Hermitian Bethe-Salpeter equation in the Yambo code: Implementation and PerformancePetru Milev, Blanca Mellado-Pinto, Muralidhar Nalabothula, Ali Esquembre Kucukalic, Fernando Alvarruiz, Enrique Ramos, Francesco Filippone, Alejandro Molina-Sanchez, Ludger Wirtz, Jose E. Roman, Davide Sangalli2025-04-14下载We analyze the performance of two strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics.
Training LLMs on HPC Systems: Best Practices from the OpenGPT-X ProjectCarolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kesselheim, Andreas Herten2025-04-14下载The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency.
COUNTER: Cluster GCN based Energy Efficient Resource Management for Sustainable Cloud Computing EnvironmentsHan Wang, Sukhpal Singh Gill, Steve Uhlig2025-04-14下载Cloud computing, thanks to the pervasiveness of information technologies, provides a foundational environment for developing IT applications, offering organizations virtually unlimited and flexible co...
FTHP-MPI: Towards Providing Replication-based Fault Tolerance in a Fault-Intolerant Native MPI LibrarySarthak Joshi, Sathish Vadhiyar2025-04-14下载Faults in high-performance systems are expected to be very large in the current exascale computing era. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to...
DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning TrainingMasahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase2025-04-14下载The rapid growth of deep learning models has increased the demand for efficient distributed training strategies. Fully sharded approaches like ZeRO-3 and FSDP partition model parameters across GPUs an...
MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model TrainingJuntao Zhao, Qi Lu, Wei Jia, Borui Wan, Lei Zuo, Junda Feng, Jianyu Jiang, Yangrui Chen, Shuaishuai Cao, Jialing He, Kaihua Jiang, Yuanzhe Hu, Shibiao Nong, Yanghua Peng, Haibin Lin, Chuan Wu2025-04-14下载Modern frameworks for training large foundation models (LFMs) employ dataloaders in a data-parallel manner, with each loader processing a disjoint subset of training data.
You can lie but not deny: SWMR registers with signature properties in systems with Byzantine processesXing Hu, Sam Toueg2025-04-14下载We define and show how to implement SWMR registers that provide properties of unforgeable digital signatures - without actually using such signatures - in systems with Byzantine processes.
Understanding and Optimizing Multi-Stage AI Inference PipelinesAbhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna2025-04-14下载The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Introducing Large Language Models as the Next Challenging Internet Traffic SourceNataliia Koneva, Alejandro Leonardo García Navarro, Alfonso Sánchez-Macián, José Alberto Hernández, Moshe Zukerman, Óscar González de Dios2025-04-14下载This article explores the growing impact of large language models (LLMs) and Generative AI (GenAI) tools on Internet traffic, focusing on their role as a new and significant source of network load.
Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power NetworksYan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief2025-04-14下载Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks.
Staggering and Fragmentation for Improved Large Message Handling in libp2p GossipSubMuhammad Umar Farooq, Tanguy Cizain, Daniel Kaiser2025-04-14下载The libp2p GossipSub protocol leverages a full-message mesh with a lower node degree and a more densely connected metadata-only (gossip) mesh.
IRR-Based AS Type of Relationship InferenceAmit Zulan, Omer Miron, Tal Shapira, Yuval Shavitt2025-04-14下载The Internet comprises tens of thousands of autonomous systems (ASes) whose commercial relationships are not publicly announced. The classification of the Type of Relationship (ToR) between ASes has b...
Implementation and Performance Evaluation of TCP over QUIC TunnelsXuanhong Guo, Zekun Bao, Ying Chen2025-04-14下载QUIC, a UDP-based transport protocol, addresses several limitations of TCP by offering built-in encryption, stream multiplexing, and improved loss recovery.
Vermilion: A Traffic-Aware Reconfigurable Optical Interconnect with Formal Throughput GuaranteesVamsi Addanki, Chen Avin, Goran Dario Knabe, Giannis Patronas, Dimitris Syrivelis, Nikos Terzenidis, Paraskevas Bakopoulos, Ilias Marinos, Stefan Schmid2025-04-14下载The increasing gap between datacenter traffic volume and the capacity of electrical switches has driven the development of reconfigurable network designs utilizing optical circuit switching.

cs.PF - Performance

标题作者发布日期PDF摘要
PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation FrameworkChristopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr2025-04-14下载As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor...
Improving Upon the generalized c-mu rule: a Whittle approachZhouzi Li, Keerthana Gurushankar, Mor Harchol-Balter, Alan Scheller-Wolf2025-04-14下载Scheduling a stream of jobs whose holding cost changes over time is a classic and practical problem. Specifically, each job is associated with a holding cost (penalty), where a job's instantaneous hol...

基于 VitePress 构建