Skip to content

2025-10-09

cs.AR - Architecture

标题作者发布日期PDF摘要
Production-Grade Local LLM Inference on Apple Silicon: A Comparative Study of MLX, MLC-LLM, Ollama, llama.cpp, and PyTorch MPSVarun Rajesh, Om Jodhpurkar, Pooja Anbuselvan, Mantinder Singh, Ashok Jallepali, Shantanu Godbole, Pradeep Kumar Sharma, Hritvik Shrivastava2025-10-09下载We present a systematic, empirical evaluation of five local large language model (LLM) runtimes on Apple Silicon: MLX, MLC-LLM, llama.cpp, Ollama, and PyTorch MPS.
LOTION: Smoothing the Optimization Landscape for Quantized TrainingMujin Kwun, Depen Morwani, Chloe Huangyuan Su, Stephanie Gil, Nikhil Anand, Sham Kakade2025-10-09下载Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds wher...
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM InferenceHengrui Zhang, Pratyush Patel, August Ning, David Wentzlaff2025-10-09下载Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill...
Fletch: File-System Metadata Caching in Programmable SwitchesQingxiu Liu, Jiazhen Cai, Siyuan Sheng, Yuhui Chen, Lu Tang, Zhirong Shen, Patrick P. C. Lee2025-10-09下载Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories.
Efficient Deployment of CNN Models on Multiple In-Memory Computing UnitsEleni Bougioukou, Theodore Antonakopoulos2025-10-09下载In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computations.
A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based OperationsAnastasios Petropoulos, Theodore Antonakopoulos2025-10-09下载Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configu...
VeriPy -- A New Python-Based Approach for SDR Pipelined/Unrolled Hardware Accelerator GenerationYuqin Zhao, Linghui Ye, Haihang Xia, Luke Seed, Tiantai Deng2025-10-09下载Software-defined radio (SDR) plays an important role in the communication field by providing a flexible and customized communication system for different purposes according to the needs.
DL-PIM: Improving Data Locality in Processing-in-Memory SystemsParker Hao Tian, Zahra Yousefijamarani, Alaa Alameldeen2025-10-09下载PIM architectures aim to reduce data transfer costs between processors and memory by integrating processing units within memory layers. Prior PIM architectures have shown potential to improve energy e...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Comparative Performance Analysis of Modern NoSQL Data Technologies: Redis, Aerospike, and DragonflyDeep Bodra, Sushil Khairnar2025-10-09下载The rise of distributed applications and cloud computing has created a demand for scalable, high-performance key-value storage systems. This paper presents a performance evaluation of three prominent ...
Maple: A Multi-agent System for Portable Deep Learning across ClustersMolang Wu, Zhao Zhang2025-10-09下载Training deep learning (DL) models across Graphics Processing Unit (GPU) clusters is technically challenging. One aspect is that users have to compose command lines to adapt to the heterogeneous launc...
Reinforcement Learning-Driven Edge Management for Reliable Multi-view 3D ReconstructionMotahare Mounesan, Sourya Saha, Houchao Gan, Md. Nurul Absur, Saptarshi Debroy2025-10-09下载Real-time multi-view 3D reconstruction is a mission-critical application for key edge-native use cases, such as fire rescue, where timely and accurate 3D scene modeling enables situational awareness a...
Man-Made Heuristics Are Dead. Long Live Code Generators!Rohit Dwivedula, Divyanshu Saxena, Aditya Akella, Swarat Chaudhuri, Daehyeok Kim2025-10-09下载Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deploye...
Are Voters Willing to Collectively Secure Elections? Unraveling a Practical Blockchain Voting SystemZhuolun Li, Haluk Sonmezler, Faiza Shirazi, Febin Shaji, Tymoteusz Mroczkowski, Dexter Lardner, Matthew Alain Camus, Evangelos Pournaras2025-10-09下载Ensuring ballot secrecy is critical for fair and trustworthy electronic voting systems, yet achieving strong secrecy guarantees in decentralized, large-scale elections remains challenging.
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM InferenceHengrui Zhang, Pratyush Patel, August Ning, David Wentzlaff2025-10-09下载Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-bound prefill...
Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD SolverGregor Olenik, Marcel Koch, Hartwig Anzt2025-10-09下载Modern high-performance computing (HPC) increasingly relies on GPUs, but integrating GPU acceleration into complex scientific frameworks like OpenFOAM remains a challenge.
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning SystemsYuanjun Dai, Keqiang He, An Wang2025-10-09下载Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments.
Conceptual Design Report for FAIR ComputingJohan Messchendorp, Mohammad Al-Turany, Volker Friese, Thorsten Kollegger, Bastian Loeher, Jochen Markert, Andrew Mistry, Thomas Neff, Adrian Oeftiger, Michael Papenbrock, Stephane Pietri, Shahab Sanjari, Tobias Stockmanns2025-10-09下载This Conceptual Design Report (CDR) presents the plans of the computing infrastructure for research at FAIR, Darmstadt, Germany. It presents the computing requirements of the various research groups, ...
Energy-Efficient Maximal Independent Sets in Radio NetworksDominick Banasik, Varsha Dani, Fabien Dufoulon, Aayush Gupta, Thomas P. Hayes, Gopal Pandurangan2025-10-09下载The maximal independent set (MIS) is one of the most fundamental problems in distributed computing, and it has been studied intensively for over four decades.
pyGinkgo: A Sparse Linear Algebra Operator Framework for PythonKeshvi Tuteja, Gregor Olenik, Roman Mishchuk, Yu-Hsiang Tsai, Markus Götz, Achim Streit, Hartwig Anzt, Charlotte Debus2025-10-09下载Sparse linear algebra is a cornerstone of many scientific computing and machine learning applications. Python has become a popular choice for these applications due to its simplicity and ease of use.
Distributed Resource Selection for Self-Organising Cloud-Edge SystemsQuentin Renau, Amjad Ullah, Emma Hart2025-10-09下载This paper presents a distributed resource selection mechanism for diverse cloud-edge environments, enabling dynamic and context-aware allocation of resources to meet the demands of complex distribute...
Towards Energy-Efficient Serverless Computing with Hardware IsolationNatalie Carl, Tobias Pfandzelter, David Bermbach2025-10-09下载Serverless computing provides just-in-time infrastructure provisioning with rapid elasticity and a finely-grained pricing model. As full control of resource allocation is in the hands of the cloud pro...
A Multi-Simulation Bridge for IoT Digital TwinsMarco Picone, Samuele Burattini, Marco Melloni, Prasad Talasila, Davide Ziglioli, Matteo Martinelli, Nicola Bicocchi, Alessandro Ricci, Peter Gorm Larsen2025-10-09下载The increasing capabilities of Digital Twins (DTs) in the context of the Internet of Things (IoT) and Industrial IoT (IIoT) call for seamless integration with simulation platforms to support system de...
BlockSDN: Towards a High-Performance Blockchain via Software-Defined Cross Networking optimizationWenyang Jia, Jingjing Wang, Ziwei Yan, Xiangli Peng, Guohui Yuan2025-10-09下载The scalability of blockchain systems is constrained by inefficient P2P broadcasting, as most existing optimizations focus only on the logical layer without considering physical network conditions.
A Semantic Model for Audit of Cloud Engines based on ISO/IEC TR 3445:2022Morteza Sargolzaei Javan2025-10-09下载Cloud computing has become the foundation of modern digital infrastructure, yet the absence of a unified architectural and compliance framework impedes interoperability, auditability, and robust secur...
When Light Bends to the Collective Will: A Theory and Vision for Adaptive Photonic Scale-up DomainsVamsi Addanki2025-10-09下载As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, collective communication has emerged as a critical bottleneck in scale-up systems.
From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered PrefillGunjun Lee, Jiwon Kim, Jaiyoung Park, Younjoo Lee, Jung Ho Ahn2025-10-09下载Large Language Model (LLM) inference in production must meet stringent service-level objectives for both time-to-first-token (TTFT) and time-between-token (TBT) while maximizing throughput under fixed...
SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based ScreeningMurtaza Rangwala, Farag Azzedin, Richard O. Sinnott, Rajkumar Buyya2025-10-09下载Decentralized Federated Learning enables privacy-preserving collaborative training without centralized servers but remains vulnerable to Byzantine attacks.
Decentralised Blockchain Management Through Digital TwinsGeorgios Diamantopoulos, Nikos Tziritas, Rami Bahsoon, Georgios Theodoropoulos2025-10-09下载The necessity of blockchain systems to remain decentralised limits current solutions to blockchain governance and dynamic management, forcing a trade-off between control and decentralisation.
Adaptive Execution Scheduler for DataDios SmartDiffAryan Poduri2025-10-09下载We present an adaptive scheduler for a single differencing engine (SmartDiff) with two execution modes: (i) in-memory threads and (ii) Dask based parallelism.
FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated LearningYunbo Li, Jiaping Gui, Zhihang Deng, Fanchao Meng, Yue Wu2025-10-09下载Federated learning (FL) enables collaborative model training across multiple parties without sharing raw data, with semi-asynchronous FL (SAFL) emerging as a balanced approach between synchronous and ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network SlicesProggya Chakraborty, Aaquib Asrar, Jayasree Sengupta, Sipra Das Bit2025-10-09下载5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while ...
Robust Heuristic Algorithm Design with LLMsPantea Karimi, Dany Rouhana, Pooria Namyar, Siva Kesava Reddy Kakarla, Venkat Arun, Behnaz Arzani2025-10-09下载We posit that we can generate more robust and performant heuristics if we augment approaches using LLMs for heuristic design with tools that explain why heuristics underperform and suggestions about h...
Curated Wireless Datasets for Aerial Network ResearchAmir Hossein Fahim Raouf, Donggu Lee, Mushfiqur Rahman, Saad Masrur, Gautham Reddy, Cole Dickerson, Md Sharif Hossen, Sergio Vargas Villar, Anıl Gürses, Simran Singh, Sung Joon Maeng, Martins Ezuma, Christopher Roberts, Mohamed Rabeek Sarbudeen, Thomas J. Zajkowski, Magreth Mushi, Ozgur Ozdemir, Ram Asokan, Ismail Guvenc, Mihail L. Sichitiu, Rudra Dutta2025-10-09下载This Review consolidates publicly available aerial wireless measurement datasets collected using AERPAW. We organize signal-level, power-level, and KPI-level datasets under a unified taxonomy, harmoni...
Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inferenceYannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, Merim Dzaferagic, John D. Kelleher2025-10-09下载As AI becomes a native component of 6G network control, AI models must adapt to continuously changing conditions, including the introduction of new features and measurements driven by multi-vendor dep...
Serv-Drishti: An Interactive Serverless Function Request Simulation Engine and VisualiserSiddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya2025-10-09下载The rapid adoption of serverless computing necessitates a deeper understanding of its underlying operational mechanics, particularly concerning request routing, cold starts, function scaling, and reso...
BlockSDN: Towards a High-Performance Blockchain via Software-Defined Cross Networking optimizationWenyang Jia, Jingjing Wang, Ziwei Yan, Xiangli Peng, Guohui Yuan2025-10-09下载The scalability of blockchain systems is constrained by inefficient P2P broadcasting, as most existing optimizations focus only on the logical layer without considering physical network conditions.
URLLC for 6G Enabled Industry 5.0: A Taxonomy of Architectures, Cross Layer Techniques, and Time Critical ApplicationsAbdikarim Mohamed Ibrahim, Rosdiadee Nordin, Yahya S. M. Khamayseh, Angela Amphawan, Muhammed Basheer Jasser2025-10-09下载The evolution from Industry 4.0 to Industry 5.0 introduces stringent requirements for ultra reliable low latency communication (URLLC) to support human centric, intelligent, and resilient industrial s...
When Light Bends to the Collective Will: A Theory and Vision for Adaptive Photonic Scale-up DomainsVamsi Addanki2025-10-09下载As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, collective communication has emerged as a critical bottleneck in scale-up systems.
TDoA-Based Self-Supervised Channel Charting with NLoS MitigationMohsen Ahadi, Omid Esrafilian, Florian Kaltenberger, Adeel Malik2025-10-09下载Channel Charting (CC) has emerged as a promising framework for data-driven radio localization, yet existing approaches often struggle to scale globally and to handle the distortions introduced by non-...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Man-Made Heuristics Are Dead. Long Live Code Generators!Rohit Dwivedula, Divyanshu Saxena, Aditya Akella, Swarat Chaudhuri, Daehyeok Kim2025-10-09下载Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deploye...
Rethinking Provenance Completeness with a Learning-Based Linux SchedulerJinsong Mao, Benjamin E. Ujcich, Shiqing Ma2025-10-09下载Provenance plays a critical role in maintaining traceability of a system's actions for root cause analysis of security threats and impacts. Provenance collection is often incorporated into the referen...

cs.PF - Performance

标题作者发布日期PDF摘要
Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network SlicesProggya Chakraborty, Aaquib Asrar, Jayasree Sengupta, Sipra Das Bit2025-10-09下载5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while ...
pyGinkgo: A Sparse Linear Algebra Operator Framework for PythonKeshvi Tuteja, Gregor Olenik, Roman Mishchuk, Yu-Hsiang Tsai, Markus Götz, Achim Streit, Hartwig Anzt, Charlotte Debus2025-10-09下载Sparse linear algebra is a cornerstone of many scientific computing and machine learning applications. Python has become a popular choice for these applications due to its simplicity and ease of use.

基于 VitePress 构建