System-Level Model Parallelism 

We are developing graph partitioning based solutions to implement model parallelism to scale deep learning frameworks on multiple GPUs. Unlike prior work, we design system-level solutions that are agnostic about the DL models, which brings a lot of advantages.

Accelerated Machine Learning

Since machine learning algorithms consist of complex data structures processed in an iterative fashion, any performance optimizations play a crucial role. We develop performance optimizations and performance models for machine learning applications.