Siavash Katebzadeh

About


Who I am

My name is Siavash, also known as MohammadReza. I was born in Shiraz, a city rich in Persian culture and history, located in Iran.

What I am

I currently serve as a postdoctoral research associate at the University of Edinburgh. I obtained my PhD in Computer Science from the University of Edinburgh, under the guidance of Prof. Boris Grot. Prior to this, I achieved my M.Sc. in Software Engineering from the Department of Computer Science and Engineering (CSE) at Shiraz University, and my B.Sc. in Software Engineering from the same university.

What I do

My current research is centered around distributed systems, replication protocols, and performance predictability in datacenter networks. In the past, I contributed to compelling projects involving RISC-V on FPGA and systems in the context of high-performance computing. For more details, please refer to my brief CV here.

During my free time, I indulge in my hobbies, which include photography, playing chess, reading manga, and watching anime!



News

  • Apr 01, 2025: I will give a talk on Reliable Replication Protocols on SmartNICs at Huawei Workshop. 🎤
  • Mar 31, 2025: I will give a talk on Reliable Replication Protocols on SmartNICs at PaPoC'25. 🎤
  • Oct 21, 2024: I will give a talk on Hardware-Accelerated Reliable Replication Protocols at Huawei Workshop. 🎤
  • Jul 01, 2024: I will be serving on the Artifact Evaluation Committee for EuroSys’25. 📝
  • Feb 01, 2024: Started working as a Postdoctoral Research Associate at the University of Edinburgh!
  • Jul 17, 2023: I joined Huawei Technologies Research & Development as a CPU architect!
  • May 23, 2023: I passed my PhD viva! New chapter begins! 🎓
  • May 01, 2023: I will be serving as the Web Co-Chair in HPCA’24.
  • May 01, 2023: I will be serving on the Artifact Evaluation Committee for HPCA’24. 📝
  • Mar 01, 2023: Saba has been accepted to EuroSys’23.
  • Jan 01, 2022: I will be serving as the Submission Co-Chair in MICRO’22.
  • Jan 15, 2021: Hermes has been selected as an IEEE Micro Top Pick Honorable Mention!
  • Jul 15, 2020: Bankrupt Covert Channel has been accepted to WOOT’20, co-located with USENIX Sec’20.
  • Feb 29, 2020: Smart Priority Assignment in Datacenter Networks accepted at Yarch’20 workshop.
  • Jan 10, 2020: RPerf has been accepted to ISPASS’20.
  • Nov 11, 2019: Hermes has been accepted to ASPLOS’20.
  • Jul 10, 2019: I attended the ACACES’19 summer school. What an enjoyable week that was!
  • Dec 01, 2017: Our collaboration with Microsoft Research has begun.
  • Nov 05, 2017: My PhD at the University of Edinburgh has begun!
  • Jul 04, 2017: My MSc is finished and I graduated with Highest Distinction! 🎓
  • Apr 08, 2017: We took 5th place in Soccer Simulation 2D at IranOpen 2017, Tehran.
  • Mar 01, 2017: We took 1st place in Shiraz Startup Weekend (SWShiraz). Let the feast begin! ✨
  • Sep 30, 2016: I’m back from my internship at EPFL. My Master thesis is joint with PARSA Lab.
  • Jul 04, 2016: My internship at EPFL has begun. They are awesome! ✨
  • Jul 02, 2016: We took 6th place in Soccer Simulation 2D at Robocup 2016, Leipzig.

Publications


Reliable Replication Protocols on SmartNICs

MR Siavash Katebzadeh , Antonios Katsarakis , and Boris Grot

abstract PDF

Today's datacenter applications rely on datastores that are required to provide high availability, consistency, and performance. To achieve high availability, these datastores replicate data across several nodes. Such replication is managed through a reliable protocol designed to keep the replicas consistent using a consistency model, even in the presence of faults. For several applications, strong consistency models are favored over weaker consistency models, as the former guarantee a more intuitive behavior for clients. Furthermore, to meet the demands of high online traffic, datastores must offer high throughput and low latency. However, delivering both strong consistency and high performance simultaneously can be challenging. Reliable replication protocols typically require multiple rounds of communication over the network stack, which introduces latency and increases the load on network resources. Moreover, these protocols consume considerable CPU resources, which impacts the overall performance of applications, especially in high-throughput environments. In this work, we aim to design a hardware-accelerated system for replication protocols to address these challenges. We approach offloading the replication protocol onto SmartNICs, which are specialized network interface cards that can be programmed to implement custom logic directly on the NIC. By doing so, we aim to enhance performance while preserving strong consistency, all while saving valuable CPU cycles that can be used for applications' logic.

Saba: Rethinking Datacenter Network Allocation from Application’s Perspective

MR Siavash Katebzadeh , Paolo Costa , and Boris Grot

In Eighteenth European Conference on Computer Systems (EuroSys) , 2023

abstract PDF

Today’s datacenter workloads increasingly comprise distributed data-intensive applications, including data analytics, graph processing, and machine-learning training. These applications are bandwidth-hungry and often congest the datacenter network, resulting in poor network performance, which hurts application completion time. Efforts made to address this problem generally aim to achieve max-min fairness at the flow or application level. We observe that splitting the bandwidth equally among workloads is sub-optimal for aggregate application-level performance because various workloads exhibit different sensitivity to network bandwidth: for some workloads, even a small reduction in the available bandwidth yields a significant increase in completion time; for others, the completion time is largely insensitive to the available bandwidth. Building on this insight, we propose Saba, an applicationaware bandwidth allocation framework that distributes network bandwidth based on application-level sensitivity. Saba combines ahead-of-time application profiling to determine bandwidth sensitivity with runtime bandwidth allocation using lightweight software support with no modifications to network hardware or protocols. Experiments with a 32server hardware testbed show that Saba improves average completion time by 1.88× (and by 1.27× in a simulated 1,944server cluster).

Hermes: A fast, fault-tolerant and linearizable replication protocol

Antonios Katsarakis , Vasilis Gavrielatos , MR Siavash Katebzadeh , and 4 more authors

In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , 2020

abstract PDF

Today’s datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency. This work introduces Hermes1, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6× lower than that of CRAQ and ZAB.

Evaluation of an InfiniBand Switch: Choose Latency or Bandwidth, but Not Both

MR Siavash Katebzadeh , Paolo Costa , and Boris Grot

In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , 2020

abstract PDF

Today’s cloud datacenters feature a large number of concurrently executing applications with diverse intradatacenter latency and bandwidth requirements. To remove the network as a potential performance bottleneck, datacenter operators have begun deploying high-end HPC-grade networks, such as InfiniBand (IB), which offer fully offloaded network stacks, remote direct memory access (RDMA) capability, and non-discarding links. While known to provide both low latency and high bandwidth for a single application, it is not clear how well such networks accommodate a mix of latencyand bandwidth-sensitive traffic that is likely in a real-world deployment. As a step toward answering this question, we develop a performance measurement tool for RDMA-based networks, RPerf, that is capable of precisely measuring the IB switch performance without hardware support. Using RPerf, we benchmark a rack-scale IB cluster in isolated and mixedtraffic scenarios. Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario. We evaluate several options to improve the latency-bandwidth trade-off and demonstrate that none are ideal.

Bankrupt covert channel: Turning network predictability into vulnerability

Dmitrii Ustiugov , Plamen Petrov , MR Siavash Katebzadeh , and 1 more author

In 14th USENIX Workshop on Offensive Technologies (WOOT) , 2020

abstract PDF

Recent years have seen a surge in the number of data leaks despite aggressive information-containment measures deployed by cloud providers. When attackers acquire sensitive data in a secure cloud environment, covert communication channels are a key tool to exfiltrate the data to the outside world. While the bulk of prior work focused on covert channels within a single CPU, they require the spy (transmitter) and the receiver to share the CPU, which might be difficult to achieve in a cloud environment with hundreds or thousands of machines. This work presents Bankrupt, a high-rate highly clandestine channel that enables covert communication between the spy and the receiver running on different nodes in an RDMA network. In Bankrupt, the spy communicates with the receiver by issuing RDMA network packets to a private memory region allocated to it on a different machine (an intermediary). The receiver similarly allocates a separate memory region on the same intermediary, also accessed via RDMA. By steering RDMA packets to a specific set of remote memory addresses, the spy causes deep queuing at one memory bank, which is the finest addressable internal unit of main memory. This exposes a timing channel that the receiver can listen on by issuing probe packets to addresses mapped to the same bank but in its own private memory region. Bankrupt channel delivers 74Kb/s throughput in CloudLab’s public cloud while remaining undetectable to the existing monitoring capabilities, such as CPU and NIC performance counters.

Projects


Saba

Joint project with Microsoft Research

A performance-centric bandwidth allocation scheme designed for datacenter networks and optimized for bandwidth-hungry frameworks (e.g., Apache Spark).


RPerf

Joint project with Microsoft Research

An accurate performance measurement system designed for RDMA-based networks.

Docs Code


Hermes

A Fast, Fault-Tolerant and Linearizable Replication Protocol.

Docs Code


Deterministic QEMU

Joint project with PARSA Lab, EPFL

Detection and elimination of various sources of non-deterministic behavior of QEMU.

Code


BrainF Compiler

A modern compiler for BrainF Language, written in Rust. The compiler is designed to be one of the richest implementations of this language using various optimization techniques.

Docs Code


Bigkernel

Joint project with the University of Toronto

A high-performance CPU-GPU communication pipelining scheme for big data style applications at compile-time, developed using LLVM.


FTwitter

Joint project with DA Research Group at UNSW

A rich Application Programming Interface for Twitter data manipulation, both locally and online.

Experience


The University of Edinburgh

Since Dec 2023

Postdoctoral Research Associate

Project: Hardware-Accelerated Reliable Replication Protocols


Huawei Technologies Research & Development

Jul 2023 - Dec 2023

CPU Architect

Project: TLB optimizations on ARM cores.


The University of Edinburgh

2017-2022

Postgraduate Researcher

Project: Performance-Centric Bandwidth Allocation in Datacenter


Parallel Systems Architecture Lab @ EPFL

2016-2017

Intern

Projects: QFlex Timing, Deterministic QEMU


Software Systems Laboratory @ Shiraz University

2014-2015

Researcher

Project: Research on performance optimization on GPU and dealing with big data


CERT Center (Shiraz APA) @ Shiraz University

2012-2013

Intern

Project: Research on pattern matching algorithms for anti-malware software


Network Administration Team @ Shiraz University

2011-2012

Intern

Project: Implementation of a monitoring system on Windows and Linux stations of CSE department


The University of Edinburgh

2018-2021

Teaching Assistant

  • Operating Systems, Winter 2018, 2019
  • Introduction to Computer Systems, Fall 2018, 2019, 2020, 2021

Shiraz University

2011-2015

Teaching Assistant

  • Operating Systems, Spring 2013, 2014, Fall 2013, 2015
  • Principles Of Programming, Spring 2011, 2013, 2015, Fall 2012
  • Operating Systems Laboratory, Fall 2014
  • Database Laboratory, Spring 2015
  • Machine Language & Assembly Programming, Fall 2013
  • Computer Architecture, Spring 2014