About

Who I am

My name is Siavash, also known as MohammadReza. I was born in Shiraz, a city rich in Persian culture and history, located in Iran.

I currently serve as a postdoctoral research associate at the University of Edinburgh. I obtained my PhD in Computer Science from the University of Edinburgh, under the guidance of Prof. Boris Grot. Prior to this, I achieved my M.Sc. in Software Engineering from the Department of Computer Science and Engineering (CSE) at Shiraz University, and my B.Sc. in Software Engineering from the same university.

What I do

My current research is centered around distributed systems, replication protocols, and performance predictability in datacenter networks. In the past, I contributed to compelling projects involving RISC-V on FPGA and systems in the context of high-performance computing. For more details, please refer to my brief CV here.

During my free time, I indulge in my hobbies, which include photography, playing chess, reading manga, and watching anime!

News

Apr 2025: I will give a talk on Reliable Replication Protocols on SmartNICs at Huawei Workshop. 🎤
Mar 2025: I will give a talk on Reliable Replication Protocols on SmartNICs at PaPoC'25. 🎤
Oct 2024: I will give a talk on Hardware-Accelerated Reliable Replication Protocols at Huawei Workshop. 🎤
Jul 2024: I will be serving on the Artifact Evaluation Committee for EuroSys’25. 📝
Feb 2024: Started working as a Postdoctoral Research Associate at the University of Edinburgh!
Jul 2023: I joined Huawei Technologies Research & Development as a CPU architect!
May 2023: I passed my PhD viva! New chapter begins! 🎓
May 2023: I will be serving as the Web Co-Chair in HPCA’24.
May 2023: I will be serving on the Artifact Evaluation Committee for HPCA’24. 📝
Mar 2023: Saba has been accepted to EuroSys’23.
Jan 2022: I will be serving as the Submission Co-Chair in MICRO’22.
Jan 2021: Hermes has been selected as an IEEE Micro Top Pick Honorable Mention!
Jul 2020: Bankrupt Covert Channel has been accepted to WOOT’20, co-located with USENIX Sec’20.
Feb 2020: Smart Priority Assignment in Datacenter Networks accepted at Yarch’20 workshop.
Jan 2020: RPerf has been accepted to ISPASS’20.
Nov 2019: Hermes has been accepted to ASPLOS’20.
Jul 2019: I attended the ACACES’19 summer school. What an enjoyable week that was!
Dec 2017: Our collaboration with Microsoft Research has begun.
Nov 2017: My PhD at the University of Edinburgh has begun!
Jul 2017: My MSc is finished and I graduated with Highest Distinction! 🎓
Apr 2017: We took 5th place in Soccer Simulation 2D at IranOpen 2017, Tehran.
Mar 2017: We took 1st place in Shiraz Startup Weekend (SWShiraz). Let the feast begin! ✨
Sep 2016: I’m back from my internship at EPFL. My Master thesis is joint with PARSA Lab.
Jul 2016: My internship at EPFL has begun. They are awesome! ✨
Jul 2016: We took 6th place in Soccer Simulation 2D at Robocup 2016, Leipzig.

Publications

Reliable Replication Protocols on SmartNICs

MR Siavash Katebzadeh , Antonios Katsarakis , and Boris Grot

abstract PDF

Today's datacenter applications rely on datastores that are required to provide high availability, consistency, and performance. To achieve high availability, these datastores replicate data across several nodes. Such replication is managed through a reliable protocol designed to keep the replicas consistent using a consistency model, even in the presence of faults. For several applications, strong consistency models are favored over weaker consistency models, as the former guarantee a more intuitive behavior for clients. Furthermore, to meet the demands of high online traffic, datastores must offer high throughput and low latency. However, delivering both strong consistency and high performance simultaneously can be challenging. Reliable replication protocols typically require multiple rounds of communication over the network stack, which introduces latency and increases the load on network resources. Moreover, these protocols consume considerable CPU resources, which impacts the overall performance of applications, especially in high-throughput environments. In this work, we aim to design a hardware-accelerated system for replication protocols to address these challenges. We approach offloading the replication protocol onto SmartNICs, which are specialized network interface cards that can be programmed to implement custom logic directly on the NIC. By doing so, we aim to enhance performance while preserving strong consistency, all while saving valuable CPU cycles that can be used for applications' logic.

Saba: Rethinking Datacenter Network Allocation from Application’s Perspective

MR Siavash Katebzadeh , Paolo Costa , and Boris Grot

In Eighteenth European Conference on Computer Systems (EuroSys) , 2023

abstract PDF

Today’s datacenter workloads increasingly comprise distributed data-intensive applications, including data analytics, graph processing, and machine-learning training. These applications are bandwidth-hungry and often congest the datacenter network, resulting in poor network performance, which hurts application completion time. Efforts made to address this problem generally aim to achieve max-min fairness at the flow or application level. We observe that splitting the bandwidth equally among workloads is sub-optimal for aggregate application-level performance because various workloads exhibit different sensitivity to network bandwidth: for some workloads, even a small reduction in the available bandwidth yields a significant increase in completion time; for others, the completion time is largely insensitive to the available bandwidth. Building on this insight, we propose Saba, an applicationaware bandwidth allocation framework that distributes network bandwidth based on application-level sensitivity. Saba combines ahead-of-time application profiling to determine bandwidth sensitivity with runtime bandwidth allocation using lightweight software support with no modifications to network hardware or protocols. Experiments with a 32server hardware testbed show that Saba improves average completion time by 1.88× (and by 1.27× in a simulated 1,944server cluster).

Hermes: A fast, fault-tolerant and linearizable replication protocol

Antonios Katsarakis , Vasilis Gavrielatos , MR Siavash Katebzadeh , and 4 more authors

In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , 2020

abstract PDF

Today’s datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes1, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6× lower than that of CRAQ and ZAB.

Evaluation of an InfiniBand Switch: Choose Latency or Bandwidth, but Not Both

MR Siavash Katebzadeh , Paolo Costa , and Boris Grot

In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , 2020

abstract PDF

Today’s cloud datacenters feature a large number of concurrently executing applications with diverse intradatacenter latency and bandwidth requirements. To remove the network as a potential performance bottleneck, datacenter operators have begun deploying high-end HPC-grade networks, such as InfiniBand (IB), which offer fully offloaded network stacks, remote direct memory access (RDMA) capability, and non-discarding links. While known to provide both low latency and high bandwidth for a single application, it is not clear how well such networks accommodate a mix of latencyand bandwidth-sensitive traffic that is likely in a real-world deployment. As a step toward answering this question, we develop a performance measurement tool for RDMA-based networks, RPerf, that is capable of precisely measuring the IB switch performance without hardware support. Using RPerf, we benchmark a rack-scale IB cluster in isolated and mixedtraffic scenarios. Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario. We evaluate several options to improve the latency-bandwidth trade-off and demonstrate that none are ideal.

Bankrupt covert channel: Turning network predictability into vulnerability

Dmitrii Ustiugov , Plamen Petrov , MR Siavash Katebzadeh , and 1 more author

In 14th USENIX Workshop on Offensive Technologies (WOOT) , 2020

abstract PDF

Recent years have seen a surge in the number of data leaks despite aggressive information-containment measures deployed by cloud providers. When attackers acquire sensitive data in a secure cloud environment, covert communication channels are a key tool to exfiltrate the data to the outside world. While the bulk of prior work focused on covert channels within a single CPU, they require the spy (transmitter) and the receiver to share the CPU, which might be difficult to achieve in a cloud environment with hundreds or thousands of machines. This work presents Bankrupt, a high-rate highly clandestine channel that enables covert communication between the spy and the receiver running on different nodes in an RDMA network. In Bankrupt, the spy communicates with the receiver by issuing RDMA network packets to a private memory region allocated to it on a different machine (an intermediary). The receiver similarly allocates a separate memory region on the same intermediary, also accessed via RDMA. By steering RDMA packets to a specific set of remote memory addresses, the spy causes deep queuing at one memory bank, which is the finest addressable internal unit of main memory. This exposes a timing channel that the receiver can listen on by issuing probe packets to addresses mapped to the same bank but in its own private memory region. Bankrupt channel delivers 74Kb/s throughput in CloudLab’s public cloud while remaining undetectable to the existing monitoring capabilities, such as CPU and NIC performance counters.

Projects

Saba

Joint project with Microsoft Research

A performance-centric bandwidth allocation scheme designed for datacenter networks and optimized for bandwidth-hungry frameworks (e.g., Apache Spark).