Appearance
2025-10-23
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| FIFOAdvisor: A DSE Framework for Automated FIFO Sizing of High-Level Synthesis Designs | Stefan Abi-Karam, Rishov Sarkar, Suhail Basalama, Jason Cong, Callie Hao | 2025-10-23 | 下载 | Dataflow hardware designs enable efficient FPGA implementations via high-level synthesis (HLS), but correctly sizing first-in-first-out (FIFO) channel buffers remains challenging. |
| Lincoln AI Computing Survey (LAICS) and Trends | Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Jeremy Kepner | 2025-10-23 | 下载 | In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI. |
| Hardware-Aware DNN Compression for Homogeneous Edge Devices | Kunlong Zhang, Guiying Li, Ning Lu, Peng Yang, Ke Tang | 2025-10-23 | 下载 | Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. |
| Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels | Rubén Langarita, Jesús Alastruey-Benedé, Pablo Ibáñez-Marín, Santiago Marco-Sola, Miquel Moretó, Adrià Armejach | 2025-10-23 | 下载 | Multiple HPC applications are often bottlenecked by compute-intensive kernels implementing complex dependency patterns (data-dependency bound). |
| In-DRAM True Random Number Generation Using Simultaneous Multiple-Row Activation: An Experimental Study of Real DRAM Chips | Ismail Emir Yuksel, Ataberk Olgun, F. Nisa Bostanci, Oguzhan Canpolat, Geraldo F. Oliveira, Mohammad Sadrosadati, Abdullah Giray Yaglikci, Onur Mutlu | 2025-10-23 | 下载 | In this work, we experimentally demonstrate that it is possible to generate true random numbers at high throughput and low latency in commercial off-the-shelf (COTS) DRAM chips by leveraging simultane... |
| HALOC-AxA: An Area/-Energy-Efficient Approximate Adder for Image Processing Application | Hasnain A. Ziad, Ashiq A. Sakib | 2025-10-23 | 下载 | The design of approximate adders has been widely researched to advance energy-efficient hardware for computation-intensive multimedia applications, such as image, audio, or video processing. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads | Jiabo Shi, Dimitrios Pezaros, Yehia Elkhatib | 2025-10-23 | 下载 | The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundament... |
| CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization | Zijian Zhang, Rong Wang, Shiyang Li, Yuebo Luo, Mingyi Hong, Caiwen Ding | 2025-10-23 | 下载 | Developing efficient CUDA kernels is increasingly critical for AI applications such as large-scale LLM training. However, manual kernel design is both costly and time-consuming, motivating automatic a... |
| JSTprove: Pioneering Verifiable AI for a Trustless Future | Jonathan Gold, Tristan Freiberg, Haruna Isah, Shirin Shahabi | 2025-10-23 | 下载 | The integration of machine learning (ML) systems into critical industries such as healthcare, finance, and cybersecurity has transformed decision-making processes, but it also brings new challenges ar... |
| Lincoln AI Computing Survey (LAICS) and Trends | Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Jeremy Kepner | 2025-10-23 | 下载 | In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI. |
| Decentralized Exchange that Mitigate a Bribery Attack | Nitin Awathare | 2025-10-23 | 下载 | Despite the popularity of Hashed Time-Locked Contracts (HTLCs) because of their use in wide areas of applications such as payment channels, atomic swaps, etc, their use in exchange is still questionab... |
| Morpheus: Lightweight RTT Prediction for Performance-Aware Load Balancing | Panagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos | 2025-10-23 | 下载 | Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources. |
| GPU-Accelerated Primal Heuristics for Mixed Integer Programming | Akif Çördük, Piotr Sielski, Alice Boucher, Kumar Aatish | 2025-10-23 | 下载 | We introduce a fusion of GPU accelerated primal heuristics for Mixed Integer Programming. Leveraging GPU acceleration enables exploration of larger search regions and faster iterations. |
| Accurate Performance Predictors for Edge Computing Applications | Panagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos | 2025-10-23 | 下载 | Accurate prediction of application performance is critical for enabling effective scheduling and resource management in resource-constrained dynamic edge environments. |
| Symmetry in Software Platforms as an Architectural Principle | Bjorn Remseth | 2025-10-23 | 下载 | Software platforms often act as structure preserving systems. They provide consistent interfaces and behaviors that remain stable under specific transformations that we denote as symmetries. |
| FLAS: a combination of proactive and reactive auto-scaling architecture for distributed services | Víctor Rampérez, Javier Soriano, David Lizcano, Juan A. Lara | 2025-10-23 | 下载 | Cloud computing has established itself as the support for the vast majority of emerging technologies, mainly due to the characteristic of elasticity it offers. |
| In-DRAM True Random Number Generation Using Simultaneous Multiple-Row Activation: An Experimental Study of Real DRAM Chips | Ismail Emir Yuksel, Ataberk Olgun, F. Nisa Bostanci, Oguzhan Canpolat, Geraldo F. Oliveira, Mohammad Sadrosadati, Abdullah Giray Yaglikci, Onur Mutlu | 2025-10-23 | 下载 | In this work, we experimentally demonstrate that it is possible to generate true random numbers at high throughput and low latency in commercial off-the-shelf (COTS) DRAM chips by leveraging simultane... |
| HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge | Yu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung | 2025-10-23 | 下载 | Privacy-preserving machine learning (PPML) is an emerging topic to handle secure machine learning inference over sensitive data in untrusted environments. |
| HGraphScale: Hierarchical Graph Learning for Autoscaling Microservice Applications in Container-based Cloud Computing | Zhengxin Fang, Hui Ma, Gang Chen, Rajkumar Buyya | 2025-10-23 | 下载 | Microservice architecture has become a dominant paradigm in application development due to its advantages of being lightweight, flexible, and resilient. |
| Collective Communication for 100k+ GPUs | Min Si, Pavan Balaji, Yongzhou Chen, Ching-Hsiang Chu, Adi Gangidi, Saif Hasan, Subodh Iyengar, Dan Johnson, Bingzhe Liu, Regina Ren, Deep Shah, Ashmitha Jeevaraj Shetty, Greg Steinbrecher, Yulun Wang, Bruce Wu, Xinfeng Xie, Jingyi Yang, Mingran Yang, Kenny Yu, Minlan Yu, Cen Zhao, Wes Bland, Denis Boyda, Suman Gumudavelli, Prashanth Kannan, Cristian Lumezanu, Rui Miao, Zhe Qu, Venkat Ramesh, Maxim Samoylov, Jan Seidel, Srikanth Sundaresan, Feng Tian, Qiye Tan, Shuqiang Zhang, Yimeng Zhao, Shengbao Zheng, Art Zhu, Hongyi Zeng | 2025-10-23 | 下载 | The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. |
| ADP-VRSGP: Decentralized Learning with Adaptive Differential Privacy via Variance-Reduced Stochastic Gradient Push | Xiaoming Wu, Teng Liu, Xin Wang, Ming Yang, Jiguo Yu | 2025-10-23 | 下载 | Differential privacy is widely employed in decentralized learning to safeguard sensitive data by introducing noise into model updates. However, existing approaches that use fixed-variance noise often ... |
| A Full Stack Framework for High Performance Quantum-Classical Computing | Xin Zhan, K. Grace Johnson, Aniello Esposito, Barbara Chapman, Marco Fiorentino, Kirk M. Bresniker, Raymond G. Beausoleil, Masoud Mohseni | 2025-10-23 | 下载 | To address the growing needs for scalable High Performance Computing (HPC) and Quantum Computing (QC) integration, we present our HPC-QC full stack framework and its hybrid workload development capabi... |
| AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training | Huawei Bai, Yifan Huang, Wenqi Shi, Ansheng You, Feifan Shao, Tengfei Han, Minghui Yu | 2025-10-23 | 下载 | The training efficiency and scalability of language models on massive clusters currently remain a critical bottleneck. Mainstream approaches like ND parallelism are often cumbersome and complex, while... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AI-Enabled Digital Twins for Next-Generation Networks: Forecasting Traffic and Resource Management in 5G/6G | John Sengendo, Fabrizio Granelli | 2025-10-23 | 下载 | As 5G and future 6G mobile networks become increasingly more sophisticated, the requirements for agility, scalability, resilience, and precision in real-time service provisioning cannot be met using t... |
| Trust, But Verify: An Empirical Evaluation of AI-Generated Code for SDN Controllers | Felipe Avencourt Soares, Muriel F. Franco, Eder J. Scheid, Lisandro Z. Granville | 2025-10-23 | 下载 | Generative Artificial Intelligence (AI) tools have been used to generate human-like content across multiple domains (e.g., sound, image, text, and programming). |
| On the cybersecurity of LoRaWAN-based system: a Smart-Lighting case study | Florian Hofer, Barbara Russo | 2025-10-23 | 下载 | Cyber-physical systems and the Internet of Things (IoT) are key technologies in the Industry 4.0 vision. They incorporate sensors and actuators to interact with the physical environment. |
| Multicast-partitioning in Time-triggered Stream Planning for Time-Sensitive Networks | Heiko Geppert, Frank Dürr, Simon Naß, Kurt Rothermel | 2025-10-23 | 下载 | Multicast allows sending a message to multiple recipients without having to create and send a separate message for each recipient. This preserves network bandwidth, which is particularly important in ... |
| MAC Aggregation over Lossy Channels in DTLS 1.3 | Eric Wagner, David Heye, Jan Bauer, Klaus Wehrle, Martin Serror | 2025-10-23 | 下载 | Aggregating Message Authentication Codes (MACs) promises to save valuable bandwidth in resource-constrained environments. The idea is simple: Instead of appending an authentication tag to each message... |
| Rediscovering Recurring Routing Results | Xiao Song, John Heidemann | 2025-10-23 | 下载 | Routing is central to networking performance, including: (1) latency in anycast services and websites served from multiple locations,(2) networking expenses and throughput in multi-homed enterprises, ... |
| Collective Communication for 100k+ GPUs | Min Si, Pavan Balaji, Yongzhou Chen, Ching-Hsiang Chu, Adi Gangidi, Saif Hasan, Subodh Iyengar, Dan Johnson, Bingzhe Liu, Regina Ren, Deep Shah, Ashmitha Jeevaraj Shetty, Greg Steinbrecher, Yulun Wang, Bruce Wu, Xinfeng Xie, Jingyi Yang, Mingran Yang, Kenny Yu, Minlan Yu, Cen Zhao, Wes Bland, Denis Boyda, Suman Gumudavelli, Prashanth Kannan, Cristian Lumezanu, Rui Miao, Zhe Qu, Venkat Ramesh, Maxim Samoylov, Jan Seidel, Srikanth Sundaresan, Feng Tian, Qiye Tan, Shuqiang Zhang, Yimeng Zhao, Shengbao Zheng, Art Zhu, Hongyi Zeng | 2025-10-23 | 下载 | The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads | Jiabo Shi, Dimitrios Pezaros, Yehia Elkhatib | 2025-10-23 | 下载 | The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundament... |
| Prefetching Cache Optimization Using Graph Neural Networks: A Modular Framework and Conceptual Analysis | F. I. Qowy | 2025-10-23 | 下载 | Caching and prefetching techniques are fundamental to modern computing, serving to bridge the growing performance gap between processors and memory. |