Izzat's research interests are been centered around programming systems for emerging processor and memory technologies, which lies at the intersection of compilers, parallel programming, and computer architecture. His work is divided into the following main thrusts:

Thrust #1: Programming Systems for GPUs

The emergence of GPUs as general purpose accelerators has provided opportunities for massive performance gains and energy savings, but comes at the cost of increased programming difficulty. Izzat's work focuses on increasing GPU adoption by making programming GPUs for high performance easier and more accessible. His initial research on GPU programming systems focused on performance portability, taking two different approaches. The first approach is MxPA (CGO’15) which is a compiler that compiles from OpenCL to multi-core CPUs while achieving high performance by taking advantage of vectorization opportunities that OpenCL exposes as well as locality centric static scheduling of work-items, outperforming the Intel and AMD OpenCL stacks. The second approach is Tangram (MICRO’16a) which is a high-level programming system for targeting different processors and processor generations from the same source code. Tangram is equipped with a smart compiler that performs codelet selection, composition, and tuning to synthesize kernels for different devices from simple code fragments, performing comparably to state-of-the-art vendor libraries. More recently, his work has been focused on features that make GPUs become more autonomous within systems such as unified memory which enables GPUs to access system memory without relying on CPUs for copying it, and dynamic parallelism which enables GPUs to launch device kernels without relying on CPUs for control assistance. Chai (ISPASS’17) is a taxonomy and benchmark suite that demonstrates how accelerators such as GPUs and FPGAs communicate and collaborate with host CPUs via unified memory. On the other hand, KLAP (MICRO’16b) is a compiler and software runtime that employ techniques to reduce dynamic parallelism launch overhead and achieve better parallelism and resource utilization through more efficient thread block scheduling.

Thrust #2: Programming Systems for Resistive Memories

Emerging resistive memory devices such as memristors are a promising technology for building large byte-addressable non-volatile memories as well as building neuromorphic in-memory accelerators that use memristors for compute. Izzat's work aims at providing programming support for taking advantage of these technologies. Byte-addressable non-volatile memories based on emerging resistive memory technologies have enabled new programming paradigms centered around persistent objects. A key challenge is how to represent persistent objects in such a way that makes them usable across processes. SpaceJMP (ASPLOS’16) addresses this challenge via operating system and compiler support for using multiple virtual address spaces per process. SAVI Objects (OOPSLA’17) enable sharing of polymorphic persistent objects via compiler techniques that represent virtual function table pointers in portable ways such that they can be used across processes. Beyond acting as storage devices, memristors can also be used as analog compute devices that provide fast and efficient matrix-vector multiplications when assembled into crossbars. This technology can be used to build highly efficient neuromorphic accelerators that are well-suited for Deep Neural Networks. Based on this technology, Izzat is currently involved in a multi-institution project to design a programmable memristor-based accelerator and a complete software stack to support it.

Thrust #3: Application Engagement

Application engagement is critical for conducting programming systems research to ensure that these systems are not developed in vacuum, but rather with an awareness of the challenges that end users are facing. Izzat have been involved with numerous application acceleration efforts with a few notable ones listed here. He has participated in developing Tiger (BMC Bioinformatics’12), a framework for DNA sequence assembly that parallelizes an inherently sequential algorithm using an iterative approach. He has also participated in the development of a solver for Nonlinear Tomographic Image Reconstruction (IPDPS’18) that enables fast and accurate image reconstruction by capturing multiple scattering effects and using Multilevel Fast Multipole Methods. The latter work broke the record for the largest object reconstructed to date, both in terms of number of unknowns as well as computational resources used.