Appearance
2026-03-31
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World | Karthikeyan Sankaralingam | 2026-03-31 | 下载 | The end of Moore's Law and Dennard scaling has fundamentally changed the economics of computer architecture. With transistor scaling delivering diminishing returns, architectural innovation is now the... |
| A Security-Aware Nonlinearity Study of FPGA-Based Time-to-Digital Converters for Quantum Key Distribution Systems | Kun Qin, Carsten Trinitis | 2026-03-31 | 下载 | Intrinsic nonlinearity in FPGA-based time-to-digital converters (TDCs) is often treated as a calibration issue and evaluated mainly through post-correction metrics. |
| SISA: A Scale-In Systolic Array for GEMM Acceleration | Luigi Altamura, Alessio Cicero, Mateo Vázquez Maceiras, Mohammad Ali Maleki, Pedro Trancoso | 2026-03-31 | 下载 | The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. |
| HLC: A High-Quality Lightweight Mezzanine Codec Featuring High-Throughput Palette | Chenlong He, Leilei Huang, Wei Li, Hanyang Cui, Zhijian Hao, Xiaoyang Zeng, Yibo Fan | 2026-03-31 | 下载 | Existing mezzanine image codecs lack specialized screen content coding tools and therefore struggle to maintain high image quality under bandwidth constraints, especially in areas with dense text. |
| CXLRAMSim v1.0: System-Level Exploration of CXL Memory Expander Cards | Karan Pathak, David Atienza, Marina Zapater | 2026-03-31 | 下载 | The growing demands in the training and inference of Large Language Models (LLMs) are accelerating the adoption of scale-up systems that extend server shared memory through the use of Compute Express ... |
| Deep Learning-Based Anomaly Detection in Spacecraft Telemetry on Edge Devices | Christopher Goetze, Tim Schlippe, Daniel Lakey | 2026-03-31 | 下载 | Spacecraft anomaly detection is critical for mission safety, yet deploying sophisticated models on-board presents significant challenges due to hardware constraints. |
| AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP | Enlai Li, Zhe Lin, Sharad Sinha, Wei Zhang | 2026-03-31 | 下载 | Deep reinforcement learning has demonstrated remarkable success across various domains. However, the tight coupling between training and inference processes makes accelerating DRL training an essentia... |
| From Physics to Surrogate Intelligence: A Unified Electro-Thermo-Optimization Framework for TSV Networks | Mohamed Gharib, Leonid Popryho, Inna Partin-Vaisband | 2026-03-31 | 下载 | High-density through-substrate vias (TSVs) enable 2.5D/3D heterogeneous integration but introduce significant signal-integrity and thermal-reliability challenges due to electrical coupling, insertion ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters | Jinghan Yao, Kaushik Kandadi, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda | 2026-03-31 | 下载 | Modern GPU-based high-performance computing clusters offer unprecedented communication bandwidth through heterogeneous intra-node interconnects and inter-node networks. |
| MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation | Jinghan Yao, Sam Adé Jacobs, Walid Krichene, Masahiro Tanaka, Dhabaleswar K Panda | 2026-03-31 | 下载 | Long-context decoding in LLMs is IO-bound: each token re-reads an ever-growing KV cache. Prior accelerations cut bytes via compression, which lowers fidelity, or selection/eviction, which restricts wh... |
| Source Known Identifiers: A Three-Tier Identity System for Distributed Applications | Duran Serkan Kılıç | 2026-03-31 | 下载 | Distributed applications need identifiers that satisfy storage efficiency, chronological sortability, origin metadata embedding, zero-lookup verifiability, confidentiality for external consumers, and ... |
| A Lightweight Hybrid Publish/Subscribe Event Fabric for IPC and Modular Distributed Systems | Dimitris Gkoulis | 2026-03-31 | 下载 | Modular software deployed on mini compute units in controlled distributed environments often needs two messaging paths: low-overhead in-process coordination and selective cross-node distribution. |
| Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing | Derek Anderson, Amit Bashyal, Markus Diefenthaler, Cristiano Fanelli, Wen Guan, Tanja Horn, Alex Jentsch Meifeng Lin, Tadashi Maeno, Kei Nagai, Hemalata Nayak, Connor Pecar, Karthik Suresh, Fang-Ying Tsai, Anselm Vossen, Tianle Wang, Torre Wenaus | 2026-03-31 | 下载 | The Production and Distributed Analysis (PanDA) system, originally developed for the ATLAS experiment at the CERN Large Hadron Collider (LHC), has evolved into a robust platform for orchestrating larg... |
| A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations | Hang Liu, Junjie Li, Yinzhi Wang, Niraj K. Nepal, Yang Wang | 2026-03-31 | 下载 | This study explores the use of INT8-based emulation for accelerating traditional FP64-based HPC workloads on modern GPU architectures. Through SCILIB-Accel automatic BLAS offload tool for cache-cohere... |
| M3SA: Exploring Datacenter Performance and Climate-Impact with Multi- and Meta-Model Simulation and Analysis | Radu Nicolae, Dante Niewenhuis, Sacheendra Talluri, Alexandru Iosup | 2026-03-31 | 下载 | Datacenters are vital to our digital society, but consume a considerable fraction of global electricity and demand is projected to increase. To improve their sustainability and performance, we envisio... |
| Storing Less, Finding More: How Novelty Filtering Improves Cross-Modal Retrieval on Edge Cameras | Sherif Abdelwahab | 2026-03-31 | 下载 | Always-on edge cameras generate continuous video streams where redundant frames degrade cross-modal retrieval by crowding correct results out of top-k search. |
| Efficient Parallel Compilation and Profiling of Quantum Circuits at Large Scales | Jane Moore, Michael Hart, John McAllister | 2026-03-31 | 下载 | Compiling quantum circuits is a major bottleneck in quantum computing, and given the scale required in a few years, is likely to become infeasibly long. |
| Polynomial Time Local Decision Revisited | Laurent Feuilloley, Soumyadeep Paul, Ami Paz | 2026-03-31 | 下载 | We consider three classification systems for distributed decision tasks: With unbounded computation and certificates, defined by Balliu, D'Angelo, Fraigniaud, and Olivetti [JCSS'18], and with (two fla... |
| Exploration of Energy and Throughput Tradeoffs for Dataflow Networks | Abrarul Karim, Joachim Falk, Jürgen Teich | 2026-03-31 | 下载 | The introduction of dynamic power management strategies such as clock gating and power gating in dataflow networks has been shown to provide significant energy savings when applied during idle times. |
| Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning | Kavindu Herath, Joshua Zhao, Saurabh Bagchi | 2026-03-31 | 下载 | Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. |
| Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry | Akhil Gupta Chigullapally, Sharvan Vittala, Razin Farhan Hussian, Mohsen Amini Salehi | 2026-03-31 | 下载 | The fast pace of modern AI is rapidly transforming traditional industrial systems into vast, intelligent and potentially unmanned autonomous operational environments driven by AI-based solutions. |
| 1.5 Million Messages Per Second on 3 Machines: Benchmarking and Latency Optimization of Apache Pulsar at Enterprise Scale | Muhamed Ramees Cheriya Mukkolakkal | 2026-03-31 | 下载 | This paper presents two independent contributions for Apache Pulsar practitioners. First, we validate 1,499,947 msg/s at 3.88 ms median publish latency on just three bare-metal Kubernetes nodes runnin... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| From Skew to Symmetry: Node-Interconnect Multi-Path Balancing with Execution-time Planning for Modern GPU Clusters | Jinghan Yao, Kaushik Kandadi, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda | 2026-03-31 | 下载 | Modern GPU-based high-performance computing clusters offer unprecedented communication bandwidth through heterogeneous intra-node interconnects and inter-node networks. |
| Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from Practitioners | Ruoyu Su, Matteo Esposito, Roberta Capuano, Rafiullah Omar, June Sallou, Henry Muccini, Davide Taibi | 2026-03-31 | 下载 | To support practitioners in understanding how agentic systems are designed in real-world industrial practice, we present a review of practitioner conference talks on AI agents. |
| GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning | Theodora Panagea, Nikolaos Koursioumpas, Lina Magoula, Ramin Khalili | 2026-03-31 | 下载 | Progressing toward a new generation of mobile networks, a clear focus on integrating distributed intelligence across the system is observed to drive performance, autonomy, and real-time adaptability. |
| 6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management | Jiao Chen, Jianhua Tang, Xiaotong Yang, Zuohong Lv | 2026-03-31 | 下载 | Autonomous 6G network management requires agents that can execute tools, observe the resulting state changes, and adapt their decisions accordingly. |
| Mean Masked Autoencoder with Flow-Mixing for Encrypted Traffic Classification | Xiao Liu, Xiaowei Fu, Fuxiang Huang, Lei Zhang | 2026-03-31 | 下载 | Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. |
| TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification | Qing He, Xiaowei Fu, Lei Zhang | 2026-03-31 | 下载 | Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard mod... |
| Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning | Jiaao Ma, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhenyu Wang, Chen An | 2026-03-31 | 下载 | In recent years, advances in underwater networking and multi-agent reinforcement learning (MARL) have significantly expanded multi-autonomous underwater vehicle (AUV) applications in marine exploratio... |
| TORCH: Characterizing Invalid Route Filtering via Tunnelled Observation | Renrui Tian, Yahui Li, Xia Yin, Han Zhang, Xingang Shi, Zhiliang Wang | 2026-03-31 | 下载 | To mitigate BGP prefix hijacking, the Resource Public Key Infrastructure (RPKI) provides prefix origin authentication via Route Origin Validation (ROV). |
| Needle in a Haystack: Tracking UAVs from Massive Noise in Real-World 5G-A Base Station Data | Chengzhen Meng, Chenming He, Yidong Jiang, Xiaoran Fan, Dequan Wang, Lingyu Wang, Jianmin Ji, Yanyong Zhang | 2026-03-31 | 下载 | The potential usage of UAVs in daily life has made monitoring them essential. However, existing systems for monitoring UAVs typically rely on cameras, LiDARs, or radars, whose limited sensing range or... |
| Enabling Programmable Inference and ISAC at the 6GR Edge with dApps | Michele Polese, Rajeev Gangula, Tommaso Melodia | 2026-03-31 | 下载 | The convergence of communication, sensing, and Artificial Intelligence (AI) in the Radio Access Network (RAN) offers compelling economic advantages through shared spectrum and infrastructure. |
| A Multi-Sensor Fusion Parking Barrier System with Lightweight Vision on Edge | Yuwen Zhu, Feiyang Qi, Zhengzhe Xiang | 2026-03-31 | 下载 | To address the challenges of simultaneously satisfying detection accuracy, edge real-time performance, low-power operation, and end-to-end business linkage in parking scenarios, this paper proposes an... |
| 1.5 Million Messages Per Second on 3 Machines: Benchmarking and Latency Optimization of Apache Pulsar at Enterprise Scale | Muhamed Ramees Cheriya Mukkolakkal | 2026-03-31 | 下载 | This paper presents two independent contributions for Apache Pulsar practitioners. First, we validate 1,499,947 msg/s at 3.88 ms median publish latency on just three bare-metal Kubernetes nodes runnin... |
| LoRaWAN Gateway Placement for Network Planning Using Ray Tracing-based Channel Models | Cláudio Modesto, Lucas Mozart, Glauco Gonçalves, Cleverson Nahum, Bruno Castro, Aldebaro Klautau | 2026-03-31 | 下载 | Network planning is a fundamental task in wireless communications, primarily focused on guaranteeing adequate coverage for every network device. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Risk-Aware Batch Testing for Performance Regression Detection | Ali Sayedsalehi, Peter C. Rigby, Gregory Mierzwinski | 2026-03-31 | 下载 | Performance regression testing is essential in large-scale continuous-integration (CI) systems, yet executing full performance suites for every commit is prohibitively expensive. |
| An Empirical Study on How Architectural Topology Affects Microservice Performance and Energy Usage | Irena Ristova, Vincenzo Stoico | 2026-03-31 | 下载 | Microservice architectures form the backbone of modern software systems for their scalability, resilience, and maintainability, but their rise in cloud-native environments raises energy efficiency con... |
| A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations | Hang Liu, Junjie Li, Yinzhi Wang, Niraj K. Nepal, Yang Wang | 2026-03-31 | 下载 | This study explores the use of INT8-based emulation for accelerating traditional FP64-based HPC workloads on modern GPU architectures. Through SCILIB-Accel automatic BLAS offload tool for cache-cohere... |
| SysOM-AI: Continuous Cross-Layer Performance Diagnosis for Production AI Training | Yusheng Zheng, Wenan Mao, Shuyi Cheng, Fuqiu Feng, Guangshui Li, Zhaoyan Liao, Yongzhuo Huang, Zhenwei Xiao, Yuqing Li, Andi Quinn, Tao Ma | 2026-03-31 | 下载 | Performance diagnosis in production-scale AI training is challenging because subtle OS-level issues can trigger cascading GPU delays and network slowdowns, degrading training efficiency across thousan... |
| Closed-Loop Integrated Sensing, Communication, and Control for Efficient Drone Flight | Jingli Li, Yiyan Ma, Bo Ai, Wei Chen, Weijie Yuan, Qingqing Cheng, Tongyang Xu, Guoyu Ma, Mi Yang, Yunlong Lu, Wenwei Yue, Christos Masouros, Zhangdui Zhong | 2026-03-31 | 下载 | Low-altitude wireless networks (LAWN) require drones to follow specific trajectories controlled by ground base stations (GBSs). However, given complex low-altitude channel conditions and limited spect... |