Architectures for High Performance, High Confidence, and Low Power

Continuing hardware enhancements governed by Moore’s law can provide faster and plentiful transistors. However, their power consumption and susceptibility to transient faults are raising serious concerns. Our research is looking to architect the hardware for the three-pronged goals of performance, power and reliability in a cost and complexity effective manner, by identifying what mechanisms to provide in the hardware and how to exploit these mechanisms in software.

Cloud Computing and Datacenters

The evolving complexity of systems software and hardware is making system management a serious concern in numerous enterprises. While the hardware and software procurement costs are themselves quite low, their Total Cost of Operation (TCO) due to the involvement of personnel, and issues such as energy consumption, delivery and cooling, is constituting a large part of the operating budget of data centers and supercomputing environments. Our research is developing self-* (self-tuning, self-healing, and self-managed) techniques for autonomic management of high-end servers and clusters towards the goal of making them easier to use and deploy, while optimizing their performance, power and reliability characteristics in software.

Storage Systems and Intelligent Memory Hierarchy

Denser and faster silicon integration has only exacerbated the problem of moving data to and from the storage subsystem. With applications getting larger and becoming more data-centric, the I/O subsystem has become a compelling target of optimization. Our research is examining techniques for bringing the right data to the right place at the right time towards reducing I/O latencies. In addition, storage virtualization is becoming central to hosting centers to isolate data centric services from each other. We are examining feedback control for QoS management at all levels of the storage hierarchy. Finally, with the I/O subsystem contributing to a significant portion of the overall power budget at data centers, we are looking to optimize the energy consumption of large disk arrays.

Scalable Cluster and Parallel Computing

Clusters built with off-the-shelf workstations/PCs and off-the-shelf high bandwidth networks are being used for hosting a diverse spectrum of demanding applications. Our research has been examining numerous architectural, systems software and application level issues in this environment. In contrast to many other projects addressing similar goals, our focus has been on examining issues that arise when there are several users/applications concurrently executing on the cluster. These issues (scalability, Quality-of-Service, interference between users, etc.) are extremely important for today’s multi-user time-shared server environments that clusters are deployed in. Further, we are also examining new and emerging applications for clusters whose requirements can be significantly different from those posed by traditional scientific applications.

Resource-constrained and Mobile Computing

With computing becoming ubiquitous, users are requiring computational capabilities from a wide spectrum of constrained devices ranging from PDAs and cell phones, to numerous embedded applications (automobiles, medical devices, home appliances, etc.). Conventional system solutions are often not adequate to handle these constrained devices which are limited by the computational power, memory and storage capacities, battery power, network disconnectivity, etc. We are investigating a gamut of computing issues – hardware and architectures, compilation techniques, OS and runtime support, communication mechanisms, wireless networking and application development – on these constrained devices, making applications seamlessly run across diverse platforms.