Skip to content

2025-03-20

cs.AR - Architecture

标题作者发布日期PDF摘要
Revisiting DRAM Read Disturbance: Identifying Inconsistencies Between Experimental Characterization and Device-Level StudiesHaocong Luo, İsmail Emir Yüksel, Ataberk Olgun, A. Giray Yağlıkçı, Onur Mutlu2025-03-20下载Modern DRAM is vulnerable to read disturbance (e.g., RowHammer and RowPress) that significantly undermines the robust operation of the system.
Design and Implementation of an FPGA-Based Hardware Accelerator for TransformerRichie Li, Sicheng Chen2025-03-20下载Transformer-based large language models (LLMs) rely heavily on intensive matrix multiplications for attention and feed-forward layers, with the Q, K, and V linear projections in the Multi-Head Self-At...
GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian SplattingSixu Li, Ben Keller, Yingyan Celine Lin, Brucek Khailany2025-03-20下载3D intelligence leverages rich 3D features and stands as a promising frontier in AI, with 3D rendering fundamental to many downstream applications.
A Scalable and Robust Compilation Framework for Emitter-Photonic Graph StateXiangyu Ren, Yuexun Huang, Zhiding Liang, Antonio Barbalace2025-03-20下载Quantum graph states are critical resources for various quantum algorithms, and also determine essential interconnections in distributed quantum computing.
Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic ArraysAyesha Siddique, Khurram Khalil, Khaza Anuarul Hoque2025-03-20下载Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use o...
DSLUT: An Asymmetric LUT and its Automatic Design Flow Based on Practical FunctionsMoucheng Yang, Kaixiang Zhu, Lingli Wang, Xuegong Zhou2025-03-20下载The conventional LUT is redundant since practical functions in real-world benchmarks only occupy a small proportion of all the functions. For example, there are only 3881 out of more than 101410^{14} NP...
ALLMod: Exploring A\underline{\mathbf{A}}rea-Efficiency of L\underline{\mathbf{L}}UT-based L\underline{\mathbf{L}}arge Number Mod\underline{\mathbf{Mod}}ular Reduction via Hybrid WorkloadsFangxin Liu, Haomin Li, Zongwu Wang, Bo Zhang, Mingzhe Zhang, Shoumeng Yan, Li Jiang, Haibing Guan2025-03-20下载Modular arithmetic, particularly modular reduction, is widely used in cryptographic applications such as homomorphic encryption (HE) and zero-knowledge proofs (ZKP).
Physically Grounded Monocular Depth via Nanophotonic Wavefront PromptingBingxuan Li, Jiahao Wu, Yuan Xu, Zezheng Zhu, Yunxiang Zhang, Kenneth Chen, Yanqi Liang, Nanfang Yu, Qi Sun2025-03-20下载Depth foundation models offer strong learned priors for 3D perception but lack physical depth cues, leading to ambiguities in metric scale. We introduce a birefringent metalens -- a planar nanophotoni...
CATCH: a Cost Analysis Tool for Co-optimization of chiplet-based Heterogeneous systemsAlexander Graening, Jonti Talukdar, Saptadeep Pal, Krishnendu Chakrabarty, Puneet Gupta2025-03-20下载With the increasing prevalence of chiplet systems in high-performance computing applications, the number of design options has increased dramatically.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
HiAER-Spike: Hardware-Software Co-Design for Large-Scale Reconfigurable Event-Driven Neuromorphic ComputingGwenevere Frank, Gopabandhu Hota, Keli Wang, Abhinav Uppal, Omowuyi Olajide, Kenneth Yoshimoto, Leif Gibb, Qingbo Wang, Johannes Leugering, Stephen Deiss, Gert Cauwenberghs2025-03-20下载In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 bi...
Random-sketching Techniques to Enhance the Numerical Stability of Block Orthogonalization Algorithms for s-step GMRESIchitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld2025-03-20下载We integrate random sketching techniques into block orthogonalization schemes needed for s-step GMRES. The resulting block orthogonalization schemes generate the basis vectors whose overall orthogonal...
Flowshop Machine Scheduling: Markov Modeling, Optimal Schedules and HeuristicsSamah A. M. Ghanem2025-03-20下载Flowshop machine scheduling has been of main interest in several applications where the timing of its processes plays a fundamental role in the utilization of system resources.
Graph of Effort: Quantifying Risk of AI Usage for Vulnerability AssessmentAnket Mehra, Andreas Aßmuth, Malte Prieß2025-03-20下载With AI-based software becoming widely available, the risk of exploiting its capabilities, such as high automation and complex pattern recognition, could significantly increase.
A parallel algorithm for the odd two-face shortest k-disjoint path problemSrijan Chakraborty, Samir Datta2025-03-20下载The shortest Disjoint Path problem (SDPP) requires us to find pairwise vertex disjoint paths between k designated pairs of terminal vertices such that the sum of the path lengths is minimum.
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility in Autonomous VehiclesDawood Wasif, Terrence J. Moore, Jin-Hee Cho2025-03-20下载Autonomous vehicles (AVs) increasingly rely on Federated Learning (FL) to enhance perception models while preserving privacy. However, existing FL frameworks struggle to balance privacy, fairness, and...
Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AIDawood Wasif, Dian Chen, Sindhuja Madabushi, Nithin Alluru, Terrence J. Moore, Jin-Hee Cho2025-03-20下载Federated Learning (FL) enables collaborative model training while preserving data privacy; however, balancing privacy preservation (PP) and fairness poses significant challenges.
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future DirectionsHadi Amini, Md Jueal Mia, Yasaman Saadati, Ahmed Imteaj, Seyedsina Nabavirazavi, Urmish Thakker, Md Zarif Hossain, Awal Ahmed Fime, S. S. Iyengar2025-03-20下载Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text.
Dispersion is (Almost) Optimal under (A)synchronyAjay D. Kshemkalyani, Manish Kumar, Anisur Rahaman Molla, Gokarna Sharma2025-03-20下载The dispersion problem has received much attention recently in the distributed computing literature. In this problem, knk\leq n agents placed initially arbitrarily on the nodes of an nn-node, mm-edg...
The Merit of Simple Policies: Buying Performance With Parallelism and System ArchitectureMert Yildiz, Alexey Rolich, Andrea Baiocchi2025-03-20下载While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads...
iDynamics: A Configurable Emulation Framework for Evaluating Microservice Scheduling Policies under Controllable Cloud-Edge DynamicsMing Chen, Muhammed Tawfiqul Islam, Maria Rodriguez Read, Rajkumar Buyya2025-03-20下载This paper presents iDynamics, a configurable emulation framework that exposes these dynamics as controllable experimental factors while running real microservice code on a Kubernetes-based cloud-edge...
On the Effectiveness of the 'Follow-the-Sun' Strategy in Mitigating the Carbon Footprint of AI in Cloud InstancesRoberto Vergallo, Luís Cruz, Alessio Errico, Luca Mainetti2025-03-20下载'Follow-the-Sun' (FtS) is a theoretical computational model aimed at minimizing the carbon footprint of computer workloads. It involves dynamically moving workloads to regions with cleaner energy sour...
SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative ModelsFahao Chen, Peng Li, Tom H. Luan, Zhou Su, Jing Deng2025-03-20下载Speculative decoding has been shown as an effective way to accelerate Large Language Model (LLM) inference by using a Small Speculative Model (SSM) to generate candidate tokens in a so-called speculat...
Prediction of Permissioned Blockchain Performance for Resource Scaling ConfigurationsSeungwoo Jung, Yeonho Yoo, Gyeongsik Yang, Chuck Yoo2025-03-20下载Blockchain is increasingly offered as blockchain-as-a-service (BaaS) by cloud service providers. However, configuring BaaS appropriately for optimal performance and reliability resorts to try-and-erro...
ATTENTION2D: Communication Efficient Distributed Self-Attention MechanismVenmugil Elango2025-03-20下载Transformer-based models have emerged as a leading architecture for natural language processing, natural language generation, and image generation tasks.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Ordered Topological Deep Learning: a Network Modeling Case StudyGuillermo Bernárdez, Miquel Ferriol-Galmés, Carlos Güemes-Palau, Mathilde Papillon, Pere Barlet-Ros, Albert Cabellos-Aparicio, Nina Miolane2025-03-20下载Computer networks are the foundation of modern digital infrastructure, facilitating global communication and data exchange. As demand for reliable high-bandwidth connectivity grows, advanced network m...
Comparative Analysis of Deep Learning Models for Real-World ISP Network Traffic ForecastingJosef Koumar, Timotej Smoleň, Kamil Jeřábek, Tomáš Čejka2025-03-20下载Accurate network traffic forecasting is essential for Internet Service Providers (ISP) to optimize resources, enhance user experience, and mitigate anomalies.
Distributed Split Computing Using Diffusive Metrics for UAV SwarmsTalip Tolga Sarı, Gökhan Seçinti, Angelo Trotta2025-03-20下载In large-scale UAV swarms, dynamically executing machine learning tasks can pose significant challenges due to network volatility and the heterogeneous resource constraints of each UAV.
PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video StreamingLiming Liu, Jiangkai Wu, Haoyang Wang, Peiheng Wang, Zongming Guo, Xinggong Zhang2025-03-20下载Traditional video compression algorithms exhibit significant quality degradation at extremely low bitrates. Promptus emerges as a new paradigm for video streaming, substantially cutting down the bandw...
Sustainable Open-Data Management for Field Research: A Cloud-Based Approach in the Underlandscape ProjectAugusto Ciuffoletti, Letizia Chiti2025-03-20下载Field-based research projects require a robust suite of ICT services to support data acquisition, documentation, storage, and dissemination. A key challenge lies in ensuring the sustainability of data...
Energy-Efficient Federated Learning and Migration in Digital Twin Edge NetworksYuzhi Zhou, Yaru Fu, Zheng Shi, Howard H. Yang, Kevin Hung, Yan Zhang2025-03-20下载The digital twin edge network (DITEN) is a significant paradigm in the sixth-generation wireless system (6G) that aims to organize well-developed infrastructures to meet the requirements of evolving a...
Enhancing Physical Layer Security in Cognitive Radio-Enabled NTNs with Beyond Diagonal RISWali Ullah Khan, Chandan Kumar Sheemar, Eva Lagunas, Symeon Chatzinotas2025-03-20下载Beyond diagonal reconfigurable intelligent surfaces (BD-RIS) have emerged as a transformative technology for enhancing wireless communication by intelligently manipulating the propagation environment.
Towards Agentic AI Networking in 6G: A Generative Foundation Model-as-Agent ApproachYong Xiao, Guangming Shi, Ping Zhang2025-03-20下载The promising potential of AI and network convergence in improving networking performance and enabling new service capabilities has recently attracted significant interest.

cs.PF - Performance

标题作者发布日期PDF摘要
Flowshop Machine Scheduling: Markov Modeling, Optimal Schedules and HeuristicsSamah A. M. Ghanem2025-03-20下载Flowshop machine scheduling has been of main interest in several applications where the timing of its processes plays a fundamental role in the utilization of system resources.
A Dataset of Performance Measurements and Alerts from Mozilla (Data Artifact)Mohamed Bilel Besbes, Diego Elias Costa, Suhaib Mujahid, Gregory Mierzwinski, Marco Castelluccio2025-03-20下载Performance regressions in software systems can lead to significant financial losses and degraded user satisfaction, making their early detection and mitigation critical.

基于 VitePress 构建