Ph.D. Department of Computer Science, University of Wisconsin, expected 2014.
M.S. Department of ComputerScience, University of Wisconsin, 2010.
B.S. Department of Computer Science, University of Wisconsin, 2006.
My Ph.D. research has focused on two broad areas: full-text search in relational databases and executing relational queries in a Hadoop environment.
Many database management systems, both commercial and open-source, have support for storing and indexing text documents for fast retrieval as part of SQL queries. A number of these systems implement this feature by utilizing a specialized, purpose-built full-text storage format and execution engine; however, work by Grossman et al. in the 1990s showed how a full-text query engine could be constructed by utilizing the database’s standard relational processing engine. In my work, I evaluate how the performance characteristics of both of these approaches have changed after nearly two decades’ worth of innovation in the relational query engine, using both single-node and parallel RDBMSs.
On the Hadoop side, there is a great deal of interest in connecting the relational world to the Hadoop distributed filesystem and MapReduce processing environment. Nearly all major commercial database vendors offer some level of Hadoop interoperability, and there’s growing interest insupporting this platform in the academic and open-source community through projects like Apache Hive, Apache Pig, and Hadoop DB. However, as Hadoop’s MapReduce interface is not a comfortable match for complex relational processing due to its relatively high latency and materialization of intermediate results, these efforts are hindered by the work required to fit relational queries into the MapReduce world.
My work attempts to address this issue by developing Knot, a relational processing engine built on top of YARN, the next-generation job scheduler in Hadoop 2.0. Knot is a dedicated relational engine that allows relational queries to be formulated in terms of operator trees, with a great deal of flexibility in scheduling and pipelining of operators. By building on top of YARN, Knot maintains good compatibility with Hadoop, but provides a much more natural interface to execute relational operations.
Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, Jeffrey F. Naughton: Toward Scalable Keyword Search over Relational Data. PVLDB 3(1): 140-149 (2010)
Akanksha Baid, Ian Rae, AnHai Doan, Jeffrey F. Naughton: Toward industrial-strength keyword search systems over relational data. ICDE 2010: 717-720
Jeffrey R. Ballard, Ian Rae, and Aditya Akella.2010. Extensible and scalable network monitoring using OpenSAFE. INM/WREN'10.