Appearance
2025-07-15
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage | Junius Pun, Xilai Dai, Grace Zgheib, Mahesh A. Iyer, Andrew Boutros, Vaughn Betz, Mohamed S. Abdelfattah | 2025-07-15 | 下载 | Flexibility and customization are key strengths of Field-Programmable Gate Arrays (FPGAs) when compared to other computing devices. For instance, FPGAs can efficiently implement arbitrary-precision ar... |
| ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques | Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang | 2025-07-15 | 下载 | To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. |
| SystolicAttention: Fusing FlashAttention within a Single Systolic Array | Jiawei Lin, Yuanlong Li, Guokai Chen, Thomas Bourgeat | 2025-07-15 | 下载 | Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and ... |
| Fault-Free Analog Computing with Imperfect Hardware | Zhicheng Xu, Jiawei Liu, Sitao Huang, Zefan Li, Shengbo Wang, Bo Wen, Ruibin Mao, Mingrui Jiang, Giacomo Pedretti, Jim Ignowski, Kaibin Huang, Can Li | 2025-07-15 | 下载 | The growing demand for edge computing and AI drives research into analog in-memory computing using memristors, which overcome data movement bottlenecks by computing directly within memory. |
| Security Enclave Architecture for Heterogeneous Security Primitives for Supply-Chain Attacks | Kshitij Raj, Atri Chatterjee, Patanjali SLPSK, Swarup Bhunia, Sandip Ray | 2025-07-15 | 下载 | Designing secure architectures for system-on-chip (SoC) platforms is a highly intricate and time-intensive task, often requiring months of development and meticulous verification. |
| Mapping Fusion: Improving FPGA Technology Mapping with ASIC Mapper | Cunxi Yu | 2025-07-15 | 下载 | LUT (Look-Up Table) mapping is a critical step in FPGA logic synthesis, where a logic network is transformed into a form that can be directly implemented using the FPGA's LUTs. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training | Seth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman | 2025-07-15 | 下载 | Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. However, their applications have been limited primarily to small-scale datasets b... |
| ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs | Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby | 2025-07-15 | 下载 | Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performa... |
| Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine | Aditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael Matheson, Sarp Oral, Feiyi Wang | 2025-07-15 | 下载 | Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platf... |
| ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques | Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang | 2025-07-15 | 下载 | To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange. |
| D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data | Harsha Varun Marisetty, Manik Gupta, Yogesh Simmhan | 2025-07-15 | 下载 | With advancements in computing and communication technologies, the Internet of Things (IoT) has seen significant growth. IoT devices typically collect data from various sensors, such as temperature, h... |
| Uniting the World by Dividing it: Federated Maps to Enable Spatial Applications | Sagar Bharadwaj, Srinivasan Seshan, Anthony Rowe | 2025-07-15 | 下载 | The emergence of the Spatial Web -- the Web where content is tied to real-world locations has the potential to improve and enable many applications such as augmented reality, navigation, robotics, and... |
| FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning | Arnab Mukherjee, Raju Halder, Joydeep Chandra | 2025-07-15 | 下载 | Federated Learning (FL) has undergone significant development since its inception in 2016, advancing from basic algorithms to complex methodologies tailored to address diverse challenges and use cases... |
| Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via Simulations | Miray Özcan, Philipp Wiesner, Philipp Weiß, Odej Kao | 2025-07-15 | 下载 | The environmental impact of Large Language Models (LLMs) is rising significantly, with inference now accounting for more than half of their total lifecycle carbon emissions. |
| A new Dune grid for scalable dynamic adaptivity based on the p4est software library | Carsten Burstedde, Mikhail Kirilin, Robert Klöfkorn | 2025-07-15 | 下载 | In this work we extend the Dune solver library with another grid interface to the open-source p4est software. While Dune already supports about a dozen different mesh implementations through its mesh ... |
| Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular Dynamics | Martin Rose, Simon Homes, Lukas Ramsperger, Jose Gracia, Christoph Niethammer, Jadran Vrabec | 2025-07-15 | 下载 | In the quest for highest performance in scientific computing, we present a novel framework that relies on high-bandwidth communication between GPUs in a compute cluster. |
| FedFlex: Federated Learning for Diverse Netflix Recommendations | Sven Lankester, Gustavo de Carvalho Bertoli, Matias Vizcaino, Emmanuelle Beauxis Aussalet, Manel Slokom | 2025-07-15 | 下载 | The drive for personalization in recommender systems creates a tension between user privacy and the risk of "filter bubbles". Although federated learning offers a promising paradigm for privacy-preser... |
| Deterministic Lower Bounds for -Edge Connectivity in the Distributed Sketching Model | Peter Robinson, Ming Ming Tan | 2025-07-15 | 下载 | We study the -edge connectivity problem on undirected graphs in the distributed sketching model, where we have nodes and a referee. Each node sends a single message to the referee based on its ... |
| Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration | Shixun Wu, Jinwen Pan, Jinyang Liu, Jiannan Tian, Ziwei Qiu, Jiajun Huang, Kai Zhao, Xin Liang, Sheng Di, Zizhong Chen, Franck Cappello | 2025-07-15 | 下载 | As high-performance computing architectures evolve, more scientific computing workflows are being deployed on advanced computing platforms such as GPUs. |
| Generating Dynamic Graph Algorithms for Multiple Backends for a Graph DSL | Nibedita Behera, Ashwina Kumar, Atharva Chougule, Mohammed Shan P S, Rushabh Nirdosh Lalwani, Rupesh Nasre | 2025-07-15 | 下载 | With the rapid growth of unstructured and semistructured data, parallelizing graph algorithms has become essential for efficiency. However, due to the inherent irregularity in computation, memory acce... |
| MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix Unit | Yinuo Wang, Tianqi Mao, Lin Gan, Wubing Wan, Zeyu Song, Jiayu Fu, Lanke He, Wenqiang Wang, Zekun Yin, Wei Xue, Guangwen Yang | 2025-07-15 | 下载 | Matrix-accelerated stencil computation is a hot research topic, yet its application to three-dimensional (3D) high-order stencils and HPC remains underexplored. |
| Arcturus: A Cloud Overlay Network for Global Accelerator with Enhanced Performance and Stability | Matthew Yang Liu, Chuang Chen, Pengcheng Lv, Hui Guo, Yanan Zhang, Cong Wang, Yusen Li, Zhenyu Li, Yu-Chu Tian | 2025-07-15 | 下载 | Global Accelerator (GA) services play a vital role in ensuring low-latency, high-reliability communication for real-time interactive applications. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| On QoE-Aware Traffic Management for Real-time, Interactive Video with Time-variant Spatial Complexity | Szilveszter Nádas, Lars Ernström, David Lindero, Jonathan Lynam | 2025-07-15 | 下载 | We analyzed spatial complexity, defined as the relationship between the required bitrate and a corresponding picture Quality of Experience (QoE) metric, for realistic, long, real-time, interactive vid... |
| Towards a Non-Binary View of IPv6 Adoption | Sulyab Thottungal Valapu, John Heidemann | 2025-07-15 | 下载 | Twelve years have passed since World IPv6 Launch Day, but what is the current state of IPv6 deployment? Prior work has examined IPv6 status as a binary: can a user do any IPv6? As deployment increases... |
| ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs | Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby | 2025-07-15 | 下载 | Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performa... |
| JamShield: A Machine Learning Detection System for Over-the-Air Jamming Attacks | Ioannis Panitsas, Yagmur Yigit, Leandros Tassiulas, Leandros Maglaras, Berk Canberk | 2025-07-15 | 下载 | Wireless networks are vulnerable to jamming attacks due to the shared communication medium, which can severely degrade performance and disrupt services. |
| Resilient Time-Sensitive Networking for Industrial IoT: Configuration and Fault-Tolerance Evaluation | Mohamed Seliem, Dirk Pesch, Utz Roedig, Cormac Sreenan | 2025-07-15 | 下载 | Time-Sensitive Networking (TSN) is increasingly adopted in industrial systems to meet strict latency, jitter, and reliability requirements. However, evaluating TSN's fault tolerance under realistic fa... |
| An Agentic Flow for Finite State Machine Extraction using Prompt Chaining | Fares Wael, Youssef Maklad, Ali Hamdi, Wael Elsersy | 2025-07-15 | 下载 | Finite-State Machines (FSMs) are critical for modeling the operational logic of network protocols, enabling verification, analysis, and vulnerability discovery. |
| PRATA: A Framework to Enable Predictive QoS in Vehicular Networks via Artificial Intelligence | Federico Mason, Tommaso Zugno, Matteo Drago, Marco Giordani, Mate Boban, Michele Zorzi | 2025-07-15 | 下载 | Predictive Quality of Service (PQoS) makes it possible to anticipate QoS changes, e.g., in wireless networks, and trigger appropriate countermeasures to avoid performance degradation. |
| Improving Wi-Fi Network Performance Prediction with Deep Learning Models | Gabriele Formis, Amanda Ericson, Stefan Forsstrom, Kyi Thar, Gianluca Cena, Stefano Scanzio | 2025-07-15 | 下载 | The increasing need for robustness, reliability, and determinism in wireless networks for industrial and mission-critical applications is the driver for the growth of new innovative methods. |
| White paper: Towards Human-centric and Sustainable 6G Services -- the fortiss Research Perspective | Rute C. Sofia, Hao Shen, Yuanting Liu, Severin Kacianka, Holger Pfeifer | 2025-07-15 | 下载 | As a leading research institute in software-intensive systems, fortiss is actively shaping the vision of Sixth Generation Mobile Communication (6G). |
| Graph-based Fingerprint Update Using Unlabelled WiFi Signals | Ka Ho Chiu, Handi Yin, Weipeng Zhuo, Chul-Ho Lee, S. -H. Gary Chan | 2025-07-15 | 下载 | WiFi received signal strength (RSS) environment evolves over time due to movement of access points (APs), AP power adjustment, installation and removal of APs, etc. |
| SIMCODE: A Benchmark for Natural Language to ns-3 Network Simulation Code Generation | Tasnim Ahmed, Mirza Mohammad Azwad, Salimur Choudhury | 2025-07-15 | 下载 | Large language models (LLMs) have demonstrated remarkable capabilities in code generation across various domains. However, their effectiveness in generating simulation scripts for domain-specific envi... |
| Arcturus: A Cloud Overlay Network for Global Accelerator with Enhanced Performance and Stability | Matthew Yang Liu, Chuang Chen, Pengcheng Lv, Hui Guo, Yanan Zhang, Cong Wang, Yusen Li, Zhenyu Li, Yu-Chu Tian | 2025-07-15 | 下载 | Global Accelerator (GA) services play a vital role in ensuring low-latency, high-reliability communication for real-time interactive applications. |
| LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC Provisioning | Parisa Fard Moshiri, Xinyu Zhu, Poonam Lohan, Burak Kantarci, Emil Janulewicz | 2025-07-15 | 下载 | Effective management of Service Function Chains (SFCs) and optimal Virtual Network Function (VNF) placement are critical challenges in modern Software-Defined Networking (SDN) and Network Function Vir... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Oneiros: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving | Ruihao Li, Shagnik Pal, Vineeth Narayan Pullu, Prasoon Sinha, Jeeho Ryoo, Lizy K. John, Neeraja J. Yadwadkar | 2025-07-15 | 下载 | KV cache accelerates LLM inference by avoiding redundant computation, at the expense of memory. To support larger KV caches, prior work extends GPU memory with CPU memory via CPU-offloading. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine | Aditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael Matheson, Sarp Oral, Feiyi Wang | 2025-07-15 | 下载 | Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platf... |
| Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular Dynamics | Martin Rose, Simon Homes, Lukas Ramsperger, Jose Gracia, Christoph Niethammer, Jadran Vrabec | 2025-07-15 | 下载 | In the quest for highest performance in scientific computing, we present a novel framework that relies on high-bandwidth communication between GPUs in a compute cluster. |