Skip to content

2025-07-15

cs.AR - Architecture

标题作者发布日期PDF摘要
Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain UsageJunius Pun, Xilai Dai, Grace Zgheib, Mahesh A. Iyer, Andrew Boutros, Vaughn Betz, Mohamed S. Abdelfattah2025-07-15下载Flexibility and customization are key strengths of Field-Programmable Gate Arrays (FPGAs) when compared to other computing devices. For instance, FPGAs can efficiently implement arbitrary-precision ar...
ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler TechniquesYiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang2025-07-15下载To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange.
SystolicAttention: Fusing FlashAttention within a Single Systolic ArrayJiawei Lin, Yuanlong Li, Guokai Chen, Thomas Bourgeat2025-07-15下载Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and ...
Fault-Free Analog Computing with Imperfect HardwareZhicheng Xu, Jiawei Liu, Sitao Huang, Zefan Li, Shengbo Wang, Bo Wen, Ruibin Mao, Mingrui Jiang, Giacomo Pedretti, Jim Ignowski, Kaibin Huang, Can Li2025-07-15下载The growing demand for edge computing and AI drives research into analog in-memory computing using memristors, which overcome data movement bottlenecks by computing directly within memory.
Security Enclave Architecture for Heterogeneous Security Primitives for Supply-Chain AttacksKshitij Raj, Atri Chatterjee, Patanjali SLPSK, Swarup Bhunia, Sandip Ray2025-07-15下载Designing secure architectures for system-on-chip (SoC) platforms is a highly intricate and time-intensive task, often requiring months of development and meticulous verification.
Mapping Fusion: Improving FPGA Technology Mapping with ASIC MapperCunxi Yu2025-07-15下载LUT (Look-Up Table) mapping is a critical step in FPGA logic synthesis, where a logic network is transformed into a form that can be directly implemented using the FPGA's LUTs.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed TrainingSeth Ockerman, Amal Gueroudji, Tanwi Mallick, Yixuan He, Line Pouchard, Robert Ross, Shivaram Venkataraman2025-07-15下载Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. However, their applications have been limited primarily to small-scale datasets b...
ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge ProofsDaniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby2025-07-15下载Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performa...
Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machineAditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael Matheson, Sarp Oral, Feiyi Wang2025-07-15下载Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platf...
ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler TechniquesYiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang2025-07-15下载To meet the increasing demand of deep learning (DL) models, AI chips are employing both off-chip memory (e.g., HBM) and high-bandwidth low-latency interconnect for direct inter-core data exchange.
D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series DataHarsha Varun Marisetty, Manik Gupta, Yogesh Simmhan2025-07-15下载With advancements in computing and communication technologies, the Internet of Things (IoT) has seen significant growth. IoT devices typically collect data from various sensors, such as temperature, h...
Uniting the World by Dividing it: Federated Maps to Enable Spatial ApplicationsSagar Bharadwaj, Srinivasan Seshan, Anthony Rowe2025-07-15下载The emergence of the Spatial Web -- the Web where content is tied to real-world locations has the potential to improve and enable many applications such as augmented reality, navigation, robotics, and...
FLsim: A Modular and Library-Agnostic Simulation Framework for Federated LearningArnab Mukherjee, Raju Halder, Joydeep Chandra2025-07-15下载Federated Learning (FL) has undergone significant development since its inception in 2016, advancing from basic algorithms to complex methodologies tailored to address diverse challenges and use cases...
Quantifying the Energy Consumption and Carbon Emissions of LLM Inference via SimulationsMiray Özcan, Philipp Wiesner, Philipp Weiß, Odej Kao2025-07-15下载The environmental impact of Large Language Models (LLMs) is rising significantly, with inference now accounting for more than half of their total lifecycle carbon emissions.
A new Dune grid for scalable dynamic adaptivity based on the p4est software libraryCarsten Burstedde, Mikhail Kirilin, Robert Klöfkorn2025-07-15下载In this work we extend the Dune solver library with another grid interface to the open-source p4est software. While Dune already supports about a dozen different mesh implementations through its mesh ...
Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular DynamicsMartin Rose, Simon Homes, Lukas Ramsperger, Jose Gracia, Christoph Niethammer, Jadran Vrabec2025-07-15下载In the quest for highest performance in scientific computing, we present a novel framework that relies on high-bandwidth communication between GPUs in a compute cluster.
FedFlex: Federated Learning for Diverse Netflix RecommendationsSven Lankester, Gustavo de Carvalho Bertoli, Matias Vizcaino, Emmanuelle Beauxis Aussalet, Manel Slokom2025-07-15下载The drive for personalization in recommender systems creates a tension between user privacy and the risk of "filter bubbles". Although federated learning offers a promising paradigm for privacy-preser...
Deterministic Lower Bounds for kk-Edge Connectivity in the Distributed Sketching ModelPeter Robinson, Ming Ming Tan2025-07-15下载We study the kk-edge connectivity problem on undirected graphs in the distributed sketching model, where we have nn nodes and a referee. Each node sends a single message to the referee based on its ...
Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless OrchestrationShixun Wu, Jinwen Pan, Jinyang Liu, Jiannan Tian, Ziwei Qiu, Jiajun Huang, Kai Zhao, Xin Liang, Sheng Di, Zizhong Chen, Franck Cappello2025-07-15下载As high-performance computing architectures evolve, more scientific computing workflows are being deployed on advanced computing platforms such as GPUs.
Generating Dynamic Graph Algorithms for Multiple Backends for a Graph DSLNibedita Behera, Ashwina Kumar, Atharva Chougule, Mohammed Shan P S, Rushabh Nirdosh Lalwani, Rupesh Nasre2025-07-15下载With the rapid growth of unstructured and semistructured data, parallelizing graph algorithms has become essential for efficiency. However, due to the inherent irregularity in computation, memory acce...
MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix UnitYinuo Wang, Tianqi Mao, Lin Gan, Wubing Wan, Zeyu Song, Jiayu Fu, Lanke He, Wenqiang Wang, Zekun Yin, Wei Xue, Guangwen Yang2025-07-15下载Matrix-accelerated stencil computation is a hot research topic, yet its application to three-dimensional (3D) high-order stencils and HPC remains underexplored.
Arcturus: A Cloud Overlay Network for Global Accelerator with Enhanced Performance and StabilityMatthew Yang Liu, Chuang Chen, Pengcheng Lv, Hui Guo, Yanan Zhang, Cong Wang, Yusen Li, Zhenyu Li, Yu-Chu Tian2025-07-15下载Global Accelerator (GA) services play a vital role in ensuring low-latency, high-reliability communication for real-time interactive applications.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
On QoE-Aware Traffic Management for Real-time, Interactive Video with Time-variant Spatial ComplexitySzilveszter Nádas, Lars Ernström, David Lindero, Jonathan Lynam2025-07-15下载We analyzed spatial complexity, defined as the relationship between the required bitrate and a corresponding picture Quality of Experience (QoE) metric, for realistic, long, real-time, interactive vid...
Towards a Non-Binary View of IPv6 AdoptionSulyab Thottungal Valapu, John Heidemann2025-07-15下载Twelve years have passed since World IPv6 Launch Day, but what is the current state of IPv6 deployment? Prior work has examined IPv6 status as a binary: can a user do any IPv6? As deployment increases...
ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge ProofsDaniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby2025-07-15下载Federated Learning (FL) enables collaborative model training on decentralized data without exposing raw data. However, the evaluation phase in FL may leak sensitive information through shared performa...
JamShield: A Machine Learning Detection System for Over-the-Air Jamming AttacksIoannis Panitsas, Yagmur Yigit, Leandros Tassiulas, Leandros Maglaras, Berk Canberk2025-07-15下载Wireless networks are vulnerable to jamming attacks due to the shared communication medium, which can severely degrade performance and disrupt services.
Resilient Time-Sensitive Networking for Industrial IoT: Configuration and Fault-Tolerance EvaluationMohamed Seliem, Dirk Pesch, Utz Roedig, Cormac Sreenan2025-07-15下载Time-Sensitive Networking (TSN) is increasingly adopted in industrial systems to meet strict latency, jitter, and reliability requirements. However, evaluating TSN's fault tolerance under realistic fa...
An Agentic Flow for Finite State Machine Extraction using Prompt ChainingFares Wael, Youssef Maklad, Ali Hamdi, Wael Elsersy2025-07-15下载Finite-State Machines (FSMs) are critical for modeling the operational logic of network protocols, enabling verification, analysis, and vulnerability discovery.
PRATA: A Framework to Enable Predictive QoS in Vehicular Networks via Artificial IntelligenceFederico Mason, Tommaso Zugno, Matteo Drago, Marco Giordani, Mate Boban, Michele Zorzi2025-07-15下载Predictive Quality of Service (PQoS) makes it possible to anticipate QoS changes, e.g., in wireless networks, and trigger appropriate countermeasures to avoid performance degradation.
Improving Wi-Fi Network Performance Prediction with Deep Learning ModelsGabriele Formis, Amanda Ericson, Stefan Forsstrom, Kyi Thar, Gianluca Cena, Stefano Scanzio2025-07-15下载The increasing need for robustness, reliability, and determinism in wireless networks for industrial and mission-critical applications is the driver for the growth of new innovative methods.
White paper: Towards Human-centric and Sustainable 6G Services -- the fortiss Research PerspectiveRute C. Sofia, Hao Shen, Yuanting Liu, Severin Kacianka, Holger Pfeifer2025-07-15下载As a leading research institute in software-intensive systems, fortiss is actively shaping the vision of Sixth Generation Mobile Communication (6G).
Graph-based Fingerprint Update Using Unlabelled WiFi SignalsKa Ho Chiu, Handi Yin, Weipeng Zhuo, Chul-Ho Lee, S. -H. Gary Chan2025-07-15下载WiFi received signal strength (RSS) environment evolves over time due to movement of access points (APs), AP power adjustment, installation and removal of APs, etc.
SIMCODE: A Benchmark for Natural Language to ns-3 Network Simulation Code GenerationTasnim Ahmed, Mirza Mohammad Azwad, Salimur Choudhury2025-07-15下载Large language models (LLMs) have demonstrated remarkable capabilities in code generation across various domains. However, their effectiveness in generating simulation scripts for domain-specific envi...
Arcturus: A Cloud Overlay Network for Global Accelerator with Enhanced Performance and StabilityMatthew Yang Liu, Chuang Chen, Pengcheng Lv, Hui Guo, Yanan Zhang, Cong Wang, Yusen Li, Zhenyu Li, Yu-Chu Tian2025-07-15下载Global Accelerator (GA) services play a vital role in ensuring low-latency, high-reliability communication for real-time interactive applications.
LiLM-RDB-SFC: Lightweight Language Model with Relational Database-Guided DRL for Optimized SFC ProvisioningParisa Fard Moshiri, Xinyu Zhu, Poonam Lohan, Burak Kantarci, Emil Janulewicz2025-07-15下载Effective management of Service Function Chains (SFCs) and optimal Virtual Network Function (VNF) placement are critical challenges in modern Software-Defined Networking (SDN) and Network Function Vir...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Oneiros: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM ServingRuihao Li, Shagnik Pal, Vineeth Narayan Pullu, Prasoon Sinha, Jeeho Ryoo, Lizy K. John, Neeraja J. Yadwadkar2025-07-15下载KV cache accelerates LLM inference by avoiding redundant computation, at the expense of memory. To support larger KV caches, prior work extends GPU memory with CPU memory via CPU-offloading.

cs.PF - Performance

标题作者发布日期PDF摘要
Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machineAditya Kashi, Nicholson Koukpaizan, Hao Lu, Michael Matheson, Sarp Oral, Feiyi Wang2025-07-15下载Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platf...
Cyclic Data Streaming on GPUs for Short Range Stencils Applied to Molecular DynamicsMartin Rose, Simon Homes, Lukas Ramsperger, Jose Gracia, Christoph Niethammer, Jadran Vrabec2025-07-15下载In the quest for highest performance in scientific computing, we present a novel framework that relies on high-bandwidth communication between GPUs in a compute cluster.

基于 VitePress 构建