Appearance
2025-04-14
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| LLM-based AI Agent for Sizing of Analog and Mixed Signal Circuit | Chang Liu, Emmanuel A. Olowe, Danial Chitnis | 2025-04-14 | 下载 | The design of Analog and Mixed-Signal (AMS) integrated circuits (ICs) often involves significant manual effort, especially during the transistor sizing process. |
| FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AI | Hong Ding, Chia Chao Kang, SuYang Xi, Zehang Liu, Xuan Zhang, Yi Ding | 2025-04-14 | 下载 | This research introduces an FPGA-based hardware accelerator to optimize the Singular Value Decomposition (SVD) and Fast Fourier transform (FFT) operations in AI models. |
| SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning | Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang Li | 2025-04-14 | 下载 | Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. |
| AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance | Matteo Perotti, Vincenzo Maisto, Moritz Imfeld, Nils Wistoff, Alessandro Cilardo, Luca Benini | 2025-04-14 | 下载 | Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. |
| Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation | Kartik Ramkrishnan, Antonia Zhai, Stephen McCamant, Pen Chung Yew | 2025-04-14 | 下载 | Microarchitectural attacks are a significant concern, leading to many hardware-based defense proposals. However, different defenses target different classes of attacks, and their impact on each other ... |
| Graph Neural Networks Based Analog Circuit Link Prediction | Guanyuan Pan, Tiansheng Zhou, Jianxiang Zhao, Zhi Li, Yugui Lin, Bingtao Ma, Yaqi Wang, Pietro Liò, Shuai Wang | 2025-04-14 | 下载 | Circuit link prediction, which identifies missing component connections from incomplete netlists, is crucial in analog circuit design automation. |
| Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures | Marco Siracusa, Olivia Hsu, Victor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Doug Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, Adrià Armejach | 2025-04-14 | 下载 | Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these look... |
| Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability | Aikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis, Hussam Amrouch, Iraklis Anagnostopoulos | 2025-04-14 | 下载 | As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabri... |
| Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna | 2025-04-14 | 下载 | The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing | Pratyush Agnihotri, Boris Koldehofe, Roman Heinrich, Carsten Binnig, Manisha Luthra | 2025-04-14 | 下载 | The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. |
| Container-level Energy Observability in Kubernetes Clusters | Bjorn Pijnacker, Brian Setz, Vasilios Andrikopoulos | 2025-04-14 | 下载 | Kubernetes has been for a number of years the default cloud orchestrator solution across multiple application and research domains. As such, optimizing the energy efficiency of Kubernetes-deployed wor... |
| Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE | Jesun Firoz, Franco Pellegrini, Mario Geiger, Darren Hsu, Jenna A. Bilbrey, Han-Yi Chou, Maximilian Stadler, Markus Hoehnerbach, Tingyu Wang, Dejun Lin, Emine Kucukbenli, Henry W. Sprueill, Ilyes Batatia, Sotiris S. Xantheas, MalSoon Lee, Chris Mundy, Gabor Csanyi, Justin S. Smith, Ponnuswamy Sadayappan, Sutanay Choudhury | 2025-04-14 | 下载 | Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scien... |
| Overcoming Bottlenecks in Homomorphic Encryption for the 2024 Mexican Federal Election | Eric Landquist, Nimit Sawhney, Simer Sawhney | 2025-04-14 | 下载 | On June 2, 2024, Mexico held its federal elections. The majority of Mexican citizens voted in person at the polls in this historic election. For the first time though, Mexican citizens living outside ... |
| Load Balancing with Network Latencies via Distributed Gradient Descent | Santiago R. Balseiro, Vahab S. Mirrokni, Bartek Wydrowski | 2025-04-14 | 下载 | Motivated by the growing demand for serving large language model inference requests, we study distributed load balancing for global serving systems with network latencies. |
| PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation Framework | Christopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr | 2025-04-14 | 下载 | As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor... |
| A Real-Time, Auto-Regression Method for In-Situ Feature Extraction in Hydrodynamics Simulations | Kewei Yan, Yonghong Yan | 2025-04-14 | 下载 | Hydrodynamics simulations are powerful tools for studying fluid behavior under physical forces, enabling extraction of features that reveal key flow characteristics. |
| Silent Self-Stabilizing Ranking: Time Optimal and Space Efficient | Petra Berenbrink, Robert Elsässer, Thorsten Götte, Lukas Hintze, Dominik Kaaser | 2025-04-14 | 下载 | We present a silent, self-stabilizing ranking protocol for the population protocol model of distributed computing, where agents interact in randomly chosen pairs to solve a common task. |
| Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks | Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief | 2025-04-14 | 下载 | Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. |
| Optimal Graph Stretching for Distributed Averaging | Florine W. Dekker, Zekeriya Erkin, Mauro Conti | 2025-04-14 | 下载 | The performance of distributed averaging depends heavily on the underlying topology. In various fields, including compressed sensing, multi-party computation, and abstract graph theory, graphs may be ... |
| Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs | Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, Hang Liu | 2025-04-14 | 下载 | Random walks are a primary means for extracting information from large-scale graphs. While most real-world graphs are inherently dynamic, state-of-the-art random walk engines failed to efficiently sup... |
| Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads | Mert Yildiz, Alexey Rolich, Andrea Baiocchi | 2025-04-14 | 下载 | Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. |
| Lightweight Trustworthy Distributed Clustering | Hongyang Li, Caesar Wu, Mohammed Chadli, Said Mammar, Pascal Bouvry | 2025-04-14 | 下载 | Ensuring data trustworthiness within individual edge nodes while facilitating collaborative data processing poses a critical challenge in edge computing systems (ECS), particularly in resource-constra... |
| Solvers for the Hermitian and the pseudo-Hermitian Bethe-Salpeter equation in the Yambo code: Implementation and Performance | Petru Milev, Blanca Mellado-Pinto, Muralidhar Nalabothula, Ali Esquembre Kucukalic, Fernando Alvarruiz, Enrique Ramos, Francesco Filippone, Alejandro Molina-Sanchez, Ludger Wirtz, Jose E. Roman, Davide Sangalli | 2025-04-14 | 下载 | We analyze the performance of two strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics. |
| Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project | Carolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kesselheim, Andreas Herten | 2025-04-14 | 下载 | The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. |
| COUNTER: Cluster GCN based Energy Efficient Resource Management for Sustainable Cloud Computing Environments | Han Wang, Sukhpal Singh Gill, Steve Uhlig | 2025-04-14 | 下载 | Cloud computing, thanks to the pervasiveness of information technologies, provides a foundational environment for developing IT applications, offering organizations virtually unlimited and flexible co... |
| FTHP-MPI: Towards Providing Replication-based Fault Tolerance in a Fault-Intolerant Native MPI Library | Sarthak Joshi, Sathish Vadhiyar | 2025-04-14 | 下载 | Faults in high-performance systems are expected to be very large in the current exascale computing era. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to... |
| DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training | Masahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase | 2025-04-14 | 下载 | The rapid growth of deep learning models has increased the demand for efficient distributed training strategies. Fully sharded approaches like ZeRO-3 and FSDP partition model parameters across GPUs an... |
| MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training | Juntao Zhao, Qi Lu, Wei Jia, Borui Wan, Lei Zuo, Junda Feng, Jianyu Jiang, Yangrui Chen, Shuaishuai Cao, Jialing He, Kaihua Jiang, Yuanzhe Hu, Shibiao Nong, Yanghua Peng, Haibin Lin, Chuan Wu | 2025-04-14 | 下载 | Modern frameworks for training large foundation models (LFMs) employ dataloaders in a data-parallel manner, with each loader processing a disjoint subset of training data. |
| You can lie but not deny: SWMR registers with signature properties in systems with Byzantine processes | Xing Hu, Sam Toueg | 2025-04-14 | 下载 | We define and show how to implement SWMR registers that provide properties of unforgeable digital signatures - without actually using such signatures - in systems with Byzantine processes. |
| Understanding and Optimizing Multi-Stage AI Inference Pipelines | Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna | 2025-04-14 | 下载 | The rapid evolution of Large Language Models (LLMs) has driven the need for increasingly sophisticated inference pipelines and hardware platforms. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Introducing Large Language Models as the Next Challenging Internet Traffic Source | Nataliia Koneva, Alejandro Leonardo García Navarro, Alfonso Sánchez-Macián, José Alberto Hernández, Moshe Zukerman, Óscar González de Dios | 2025-04-14 | 下载 | This article explores the growing impact of large language models (LLMs) and Generative AI (GenAI) tools on Internet traffic, focusing on their role as a new and significant source of network load. |
| Satellite Federated Fine-Tuning for Foundation Models in Space Computing Power Networks | Yan Zhu, Jingyang Zhu, Ting Wang, Yuanming Shi, Chunxiao Jiang, Khaled Ben Letaief | 2025-04-14 | 下载 | Advancements in artificial intelligence (AI) and low-earth orbit (LEO) satellites have promoted the application of large remote sensing foundation models for various downstream tasks. |
| Staggering and Fragmentation for Improved Large Message Handling in libp2p GossipSub | Muhammad Umar Farooq, Tanguy Cizain, Daniel Kaiser | 2025-04-14 | 下载 | The libp2p GossipSub protocol leverages a full-message mesh with a lower node degree and a more densely connected metadata-only (gossip) mesh. |
| IRR-Based AS Type of Relationship Inference | Amit Zulan, Omer Miron, Tal Shapira, Yuval Shavitt | 2025-04-14 | 下载 | The Internet comprises tens of thousands of autonomous systems (ASes) whose commercial relationships are not publicly announced. The classification of the Type of Relationship (ToR) between ASes has b... |
| Implementation and Performance Evaluation of TCP over QUIC Tunnels | Xuanhong Guo, Zekun Bao, Ying Chen | 2025-04-14 | 下载 | QUIC, a UDP-based transport protocol, addresses several limitations of TCP by offering built-in encryption, stream multiplexing, and improved loss recovery. |
| Vermilion: A Traffic-Aware Reconfigurable Optical Interconnect with Formal Throughput Guarantees | Vamsi Addanki, Chen Avin, Goran Dario Knabe, Giannis Patronas, Dimitris Syrivelis, Nikos Terzenidis, Paraskevas Bakopoulos, Ilias Marinos, Stefan Schmid | 2025-04-14 | 下载 | The increasing gap between datacenter traffic volume and the capacity of electrical switches has driven the development of reconfigurable network designs utilizing optical circuit switching. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PlantD: Performance, Latency ANalysis, and Testing for Data Pipelines -- An Open Source Measurement, Testing, and Simulation Framework | Christopher Bogart, Rajeev Chhajer, Baljit Singh, Tony Fontana, Majd Sakr | 2025-04-14 | 下载 | As the volume of data available from sensor-enabled devices such as vehicles expands, it is increasingly hard for companies to make informed decisions about the cost of capturing, processing, and stor... |
| Improving Upon the generalized c-mu rule: a Whittle approach | Zhouzi Li, Keerthana Gurushankar, Mor Harchol-Balter, Alan Scheller-Wolf | 2025-04-14 | 下载 | Scheduling a stream of jobs whose holding cost changes over time is a classic and practical problem. Specifically, each job is associated with a holding cost (penalty), where a job's instantaneous hol... |