Skip to content

2025-10-23

cs.AR - Architecture

标题作者发布日期PDF摘要
FIFOAdvisor: A DSE Framework for Automated FIFO Sizing of High-Level Synthesis DesignsStefan Abi-Karam, Rishov Sarkar, Suhail Basalama, Jason Cong, Callie Hao2025-10-23下载Dataflow hardware designs enable efficient FPGA implementations via high-level synthesis (HLS), but correctly sizing first-in-first-out (FIFO) channel buffers remains challenging.
Lincoln AI Computing Survey (LAICS) and TrendsAlbert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Jeremy Kepner2025-10-23下载In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI.
Hardware-Aware DNN Compression for Homogeneous Edge DevicesKunlong Zhang, Guiying Li, Ning Lu, Peng Yang, Ke Tang2025-10-23下载Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them.
Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound KernelsRubén Langarita, Jesús Alastruey-Benedé, Pablo Ibáñez-Marín, Santiago Marco-Sola, Miquel Moretó, Adrià Armejach2025-10-23下载Multiple HPC applications are often bottlenecked by compute-intensive kernels implementing complex dependency patterns (data-dependency bound).
In-DRAM True Random Number Generation Using Simultaneous Multiple-Row Activation: An Experimental Study of Real DRAM ChipsIsmail Emir Yuksel, Ataberk Olgun, F. Nisa Bostanci, Oguzhan Canpolat, Geraldo F. Oliveira, Mohammad Sadrosadati, Abdullah Giray Yaglikci, Onur Mutlu2025-10-23下载In this work, we experimentally demonstrate that it is possible to generate true random numbers at high throughput and low latency in commercial off-the-shelf (COTS) DRAM chips by leveraging simultane...
HALOC-AxA: An Area/-Energy-Efficient Approximate Adder for Image Processing ApplicationHasnain A. Ziad, Ashiq A. Sakib2025-10-23下载The design of approximate adders has been widely researched to advance energy-efficient hardware for computation-intensive multimedia applications, such as image, audio, or video processing.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training WorkloadsJiabo Shi, Dimitrios Pezaros, Yehia Elkhatib2025-10-23下载The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundament...
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel OptimizationZijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hong, Caiwen Ding2025-10-23下载Developing efficient CUDA kernels is increasingly critical for AI applications such as large-scale LLM training. However, manual kernel design is both costly and time-consuming, motivating automatic a...
JSTprove: Pioneering Verifiable AI for a Trustless FutureJonathan Gold, Tristan Freiberg, Haruna Isah, Shirin Shahabi2025-10-23下载The integration of machine learning (ML) systems into critical industries such as healthcare, finance, and cybersecurity has transformed decision-making processes, but it also brings new challenges ar...
Lincoln AI Computing Survey (LAICS) and TrendsAlbert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Jeremy Kepner2025-10-23下载In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI.
Decentralized Exchange that Mitigate a Bribery AttackNitin Awathare2025-10-23下载Despite the popularity of Hashed Time-Locked Contracts (HTLCs) because of their use in wide areas of applications such as payment channels, atomic swaps, etc, their use in exchange is still questionab...
Morpheus: Lightweight RTT Prediction for Performance-Aware Load BalancingPanagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos2025-10-23下载Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources.
GPU-Accelerated Primal Heuristics for Mixed Integer ProgrammingAkif Çördük, Piotr Sielski, Alice Boucher, Kumar Aatish2025-10-23下载We introduce a fusion of GPU accelerated primal heuristics for Mixed Integer Programming. Leveraging GPU acceleration enables exploration of larger search regions and faster iterations.
Accurate Performance Predictors for Edge Computing ApplicationsPanagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos2025-10-23下载Accurate prediction of application performance is critical for enabling effective scheduling and resource management in resource-constrained dynamic edge environments.
Symmetry in Software Platforms as an Architectural PrincipleBjorn Remseth2025-10-23下载Software platforms often act as structure preserving systems. They provide consistent interfaces and behaviors that remain stable under specific transformations that we denote as symmetries.
FLAS: a combination of proactive and reactive auto-scaling architecture for distributed servicesVíctor Rampérez, Javier Soriano, David Lizcano, Juan A. Lara2025-10-23下载Cloud computing has established itself as the support for the vast majority of emerging technologies, mainly due to the characteristic of elasticity it offers.
In-DRAM True Random Number Generation Using Simultaneous Multiple-Row Activation: An Experimental Study of Real DRAM ChipsIsmail Emir Yuksel, Ataberk Olgun, F. Nisa Bostanci, Oguzhan Canpolat, Geraldo F. Oliveira, Mohammad Sadrosadati, Abdullah Giray Yaglikci, Onur Mutlu2025-10-23下载In this work, we experimentally demonstrate that it is possible to generate true random numbers at high throughput and low latency in commercial off-the-shelf (COTS) DRAM chips by leveraging simultane...
HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on EdgeYu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung2025-10-23下载Privacy-preserving machine learning (PPML) is an emerging topic to handle secure machine learning inference over sensitive data in untrusted environments.
HGraphScale: Hierarchical Graph Learning for Autoscaling Microservice Applications in Container-based Cloud ComputingZhengxin Fang, Hui Ma, Gang Chen, Rajkumar Buyya2025-10-23下载Microservice architecture has become a dominant paradigm in application development due to its advantages of being lightweight, flexible, and resilient.
Collective Communication for 100k+ GPUsMin Si, Pavan Balaji, Yongzhou Chen, Ching-Hsiang Chu, Adi Gangidi, Saif Hasan, Subodh Iyengar, Dan Johnson, Bingzhe Liu, Regina Ren, Deep Shah, Ashmitha Jeevaraj Shetty, Greg Steinbrecher, Yulun Wang, Bruce Wu, Xinfeng Xie, Jingyi Yang, Mingran Yang, Kenny Yu, Minlan Yu, Cen Zhao, Wes Bland, Denis Boyda, Suman Gumudavelli, Prashanth Kannan, Cristian Lumezanu, Rui Miao, Zhe Qu, Venkat Ramesh, Maxim Samoylov, Jan Seidel, Srikanth Sundaresan, Feng Tian, Qiye Tan, Shuqiang Zhang, Yimeng Zhao, Shengbao Zheng, Art Zhu, Hongyi Zeng2025-10-23下载The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs.
ADP-VRSGP: Decentralized Learning with Adaptive Differential Privacy via Variance-Reduced Stochastic Gradient PushXiaoming Wu, Teng Liu, Xin Wang, Ming Yang, Jiguo Yu2025-10-23下载Differential privacy is widely employed in decentralized learning to safeguard sensitive data by introducing noise into model updates. However, existing approaches that use fixed-variance noise often ...
A Full Stack Framework for High Performance Quantum-Classical ComputingXin Zhan, K. Grace Johnson, Aniello Esposito, Barbara Chapman, Marco Fiorentino, Kirk M. Bresniker, Raymond G. Beausoleil, Masoud Mohseni2025-10-23下载To address the growing needs for scalable High Performance Computing (HPC) and Quantum Computing (QC) integration, we present our HPC-QC full stack framework and its hybrid workload development capabi...
AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM TrainingHuawei Bai, Yifan Huang, Wenqi Shi, Ansheng You, Feifan Shao, Tengfei Han, Minghui Yu2025-10-23下载The training efficiency and scalability of language models on massive clusters currently remain a critical bottleneck. Mainstream approaches like ND parallelism are often cumbersome and complex, while...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
AI-Enabled Digital Twins for Next-Generation Networks: Forecasting Traffic and Resource Management in 5G/6GJohn Sengendo, Fabrizio Granelli2025-10-23下载As 5G and future 6G mobile networks become increasingly more sophisticated, the requirements for agility, scalability, resilience, and precision in real-time service provisioning cannot be met using t...
Trust, But Verify: An Empirical Evaluation of AI-Generated Code for SDN ControllersFelipe Avencourt Soares, Muriel F. Franco, Eder J. Scheid, Lisandro Z. Granville2025-10-23下载Generative Artificial Intelligence (AI) tools have been used to generate human-like content across multiple domains (e.g., sound, image, text, and programming).
On the cybersecurity of LoRaWAN-based system: a Smart-Lighting case studyFlorian Hofer, Barbara Russo2025-10-23下载Cyber-physical systems and the Internet of Things (IoT) are key technologies in the Industry 4.0 vision. They incorporate sensors and actuators to interact with the physical environment.
Multicast-partitioning in Time-triggered Stream Planning for Time-Sensitive NetworksHeiko Geppert, Frank Dürr, Simon Naß, Kurt Rothermel2025-10-23下载Multicast allows sending a message to multiple recipients without having to create and send a separate message for each recipient. This preserves network bandwidth, which is particularly important in ...
MAC Aggregation over Lossy Channels in DTLS 1.3Eric Wagner, David Heye, Jan Bauer, Klaus Wehrle, Martin Serror2025-10-23下载Aggregating Message Authentication Codes (MACs) promises to save valuable bandwidth in resource-constrained environments. The idea is simple: Instead of appending an authentication tag to each message...
Rediscovering Recurring Routing ResultsXiao Song, John Heidemann2025-10-23下载Routing is central to networking performance, including: (1) latency in anycast services and websites served from multiple locations,(2) networking expenses and throughput in multi-homed enterprises, ...
Collective Communication for 100k+ GPUsMin Si, Pavan Balaji, Yongzhou Chen, Ching-Hsiang Chu, Adi Gangidi, Saif Hasan, Subodh Iyengar, Dan Johnson, Bingzhe Liu, Regina Ren, Deep Shah, Ashmitha Jeevaraj Shetty, Greg Steinbrecher, Yulun Wang, Bruce Wu, Xinfeng Xie, Jingyi Yang, Mingran Yang, Kenny Yu, Minlan Yu, Cen Zhao, Wes Bland, Denis Boyda, Suman Gumudavelli, Prashanth Kannan, Cristian Lumezanu, Rui Miao, Zhe Qu, Venkat Ramesh, Maxim Samoylov, Jan Seidel, Srikanth Sundaresan, Feng Tian, Qiye Tan, Shuqiang Zhang, Yimeng Zhao, Shengbao Zheng, Art Zhu, Hongyi Zeng2025-10-23下载The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs.

cs.PF - Performance

标题作者发布日期PDF摘要
xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training WorkloadsJiabo Shi, Dimitrios Pezaros, Yehia Elkhatib2025-10-23下载The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundament...
Prefetching Cache Optimization Using Graph Neural Networks: A Modular Framework and Conceptual AnalysisF. I. Qowy2025-10-23下载Caching and prefetching techniques are fundamental to modern computing, serving to bridge the growing performance gap between processors and memory.

基于 VitePress 构建