Educational Background
Ph.D., University of Michigan, 1976
M.S.E., University of Michigan, 1971
A.B., Colgate University, 1970
Employment
April 2008 to present, Technical Fellow
Microsoft Corporation, Madison, WI
April 2008 to present, John P. Morgridge Professor, Emeritus
Computer Sciences Department, University of Wisconsin-Madison
1999 to March 2008, John P. Morgridge Professor,
Computer Sciences Department, University of Wisconsin-Madison
1999-2004, Chair,
Computer Sciences Department, University of Wisconsin-Madison
1984-1999, Professor and Romnes Fellow,
Computer Sciences Department,University of Wisconsin-Madison
1982-1984, Associate Professor,
Computer Sciences Department,University of Wisconsin-Madison
1976-1982, Assistant Professor,
Computer Sciences Department,University of Wisconsin-Madison
Professional Societies & Honors
National Academy of Engineering, 1998-
Fellow, American Academy of Arts and Sciences, 2007-
ACM Fellow, 1995-
ACM Software Systems Award, 2009
IEEE Emanuel R. Piore Award, 2009
ACM SIGMOD Innovations Award, 1995
Talks
Polybase
One of the newest projects at GSL is Polybase which tightly integrates SQL Server PDW with Hadoop. This talk provides a good overview of the goals of the project. At this point there no decision has been made whether or not to commercialize the Polybase prototype.
PASS Keynotes
Since I joined Microsoft in 2008 I have been asked each year to give a keynote talk at PASS – the annual SQL Server users group meeting. The talks have covered the following topics
2011 – Big Data, What's the Big Deal, A talk on big data and Hadoop (file is 20MB)
2010 – SQL Query Optimization: Why is it So Hard to Get Right, Introduction to relational query optimization (file is 15MB)
2009 - From 1 to 1000 MIPS, A general talk on how technology trends have impacted the design of DB systems
2008 - Parallel Database Systems 101, An introduction to the key techniques used by parallel database systems
Feel free to use these PASS talks, in whole, or in parts. The talks are each about 75 minutes long. There is no copyright on the decks. All I ask is that if you “lift” slides for a talk that you acknowledge where you got them.
Publications
Below are a number of selected papers organized by research area and project. Additional papers can be found on the DBLP web site and the Wisconsin Database Group Web Site.
Hadoop and Big Data
A Comparison of Approaches to Large-Scale Data Analysis, (with Pavlo, A., Paulson, E., Rasin, A., Abadi, D., Madden, S., and M. Stonebraker), Proceedings of the 2009 SIGMOD Conference, Providence, R.I., May 2009.
MapReduce and Parallel DBMSs: Friends or Foes, (with Stonebraker, M., Abadi, D., Madden, S., Paulson, E., Pavlo, A., and A. Rasin). CACM, January 2010, Vol 53. No. 1.
Clustera: An Integrated Computation and Data Management System (with Robinson, E., Shankar, S., Paulson, E., Naughton, J., Krioukov, A. and J. Royalty), Proceedings of the 2008 VLDB Conference, Auckland, NZ, August 2008.
Parallel Database Systems
Over the 32 year period I was a professor at Wisconsin we implemented three parallel database systems: DIRECT (1977-1984), Gamma (1984-1992) and Paradise (1993-1997). I no longer have copies of those papers without links unfortunately.
The following paper presents a high-level overview of the mechanisms used by today's commercial parallel database products.
Parallel Database Systems: The Future of Database Processing or a Passing Fad? (with J. Gray), Communications of the ACM, June, 1992.
DIRECT
The DIRECT project ran from 1977 until 1984. It was one of the first operational parallel database systems. Several versions of the system were built starting with PDP 11/03s and ending with PDP 11/23 processors connected by a 1 megabtit token ring for passing messages and a shared-memory constructed using CCD chips.
DIRECT - A Multiprocessor Organization for Supporting Relational Data Base Management Systems, IEEE Transactions on Computers, Vol. C-28, No. 6, June 1979.
Query Execution in Direct, Proceedings of the 1979 SIGMOD Conference, Boston, MA, May 1979.
Implementation of the Database Machine DIRECT (with H. Boral, D. Friedland, N. Jarrell, and W. K. Wilkinson), IEEE Transactions on Software Engineering, Vo. SE-8, No. 6, November, 1982.
Gamma
The GAMMA project began in January 1984 and ran until late 1992 at which point the code was so broken from years of patching that we gave up. The first version of GAMMA became operational in fall of 1985 on a collection of 20 VAX 11/750s connected by a 100 mbit/second token ring constructed by Proteon for us. Later the system was ported to a 32 processor Intel iPSC-2 hyerpcube configured with one disk per processor.
GAMMA - A High Performance Dataflow Database Machine (with B. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna), Proceedings of the 1986 VLDB Conference, Japan, August 1986.
The GAMMA Database Machine Project (with S. Ghandeharizadeh, D. Schneider, H. Hsiao, A. Bricker, R. Rasmussen), IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March, 1990.
A Performance Analysis of the Gamma Database Machine (with S. Ghandeharizadeh and D. Schneider), Proceedings of the 1988 SIGMOD Conference, Chicago, Ill., June, 1988.
Multiprocessor Hash-Based Join Algorithms (with Bob Gerber), Proceedings of the 1985 VLDB Conference, Stockholm, Sweden, August, 1985.
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment (with D. Schneider), Proceedings of the 1989 SIGMOD Conference, Portland, Oregon, May 1989.
A Comparison of Non-Equijoin Algorithms (with J. Naughton, and D. Schneider), Proceedings of the 15th International VLDB Conference, Barcelona, Spain, August, 1991.
Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting (with J. Naughton and D. Schneider), Proceedings of the Parallel and Distributed Information Systems Conference, Miami Beach, Florida, December, 1991.
Practical Skew Handling in Parallel Joins (with J. Naughton, D. Schneider, and S. Seshadri), Proceedings of the 1992 Very Large Data Base Conference, Vancouver, CA, August 1992.
Nested Loops Revisited (with J. Naughton and J. Burger), Proceedings of the Second International Conference on Parallel and Distributed Information Systems, San Diego, CA, January, 1993.
Tradeoffs in Processing Multi-Way Join Queries via Hashing in Multiprocessor Database Machines (with D Schneider), Proceedings of the 1990 VLDB Conference, Brisbane, Australia, August, 1990.
Dynamic Memory Allocation for Multiple Query Workloads (with M. Mehta), Proceedings of the 1993 Very Large Data Base Conference, Dublin, Ireland, August 1993.
Managing Intra-Operator Parallelism in Parallel Database Systems (with M. Mehta), Proceedings of the 1995 VLDB Conference, Zurich, September 1995.
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines (with H. Hsiao), Proceedings of the 6th International Conference on Data Engineering, Los Angeles, CA, February 1990.
A Performance Study of Three High Availability Data Replication Strategies (with Hui-I Hsiao), Proceedings of the Parallel and Distributed Information Systems Conference, Miami Beach, Florida, December, 1991.
Paradise
Client-Server Paradise (with J. Patel, J. Luo, and J. Yu), Proceedings of the 1994 VLDB Conference, Chile, August 1994.
Building A Scalable GeoSpatial Database System: Technology, Implementation, and Evaluation (with J. Naughton, J. Patel, J. Yu, N. Kabra and a cast of dozens ), Proceedings of the 1997 SIGMOD Conference, Tucson, Arizona, May, 1997.
Query Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries in Tape-Resident Data Sets (with JieBing Yu), Proceedings of the 9th International Conference on Scientific and Statistical Database Management, Olympia, Washington, August 1997.
Processing Satellite Images on Tertiary Storage: A Study of the Impact of Tile Size on Performance
(with JieBing Yu), Proceedings of the 1996 NASA Conference on Mass Storage Systems, College Park, MD., Sept. 1996.
Partition Based Spatial Merge Join (with Jignesh Patel), Proceedings of the 1996 SIGMOD Conference, Montreal, CA, June, 1996.
Benchmarking
Benchmarking Database Systems - A Systematic Approach (with D. Bitton and C. Turbyfill), Proceedings of the 1983 Very Large Database Conference, October 1983. Here is a link to a tar file that contains the benchmark queries and generator
A Methodology for Database System Performance Evaluation (with H. Boral) Proceedings of the 1984 SIGMOD Conference, June, 1984.
The OO7 Benchmark (with M. Carey and J. Naughton), Proceedings of the 1993 SIGMOD Conference, Washington, D.C., May 1993.
The Bucky Object Relational Benchmark (with M. Carey, J. Naughton, M. Asgarian, J. Gehrke, D. Shah), Proceedings of the 1997 SIGMOD Conference, Tucson, Arizona, May, 1997.
Query Optimization
The EXODUS Optimizer Generator (with G. Graefe), Proceedings of the 1987 SIGMOD Conference, San Francisco, CA, May 1987.
Opt++ - an Object Oriented Approach to Query Optimization (with N. Kabra), VLDB Journal November 1997.
Efficient Mid-Query Re-Optimization of SubOptimal Query Execution Plans (with N. Kabra), Proceedings of the 1998 SIGMOD Conference, Seattle, WA, June, 1998.
Buffer Pool Aware Query Optimization (with R. Ramamurthy), Proceedings of the 2005 CIDR Conference, Asilomar, CA, January 2005.
Proactive Re-Optimization (with Babu, S. and P. Bizarro), Proceedings of the SIGMOD 2005 Conference, Baltimore, MD, June 2005.
Object-Oriented Database Systems
Of Objects and Databases: A Decade of Turmoil (with M. Carey), Invited Paper, Proceedings of the 1996 VLDB Conference, Bombay, India, August, 1996.
Shoring Up Persistent Applications (with M. Carey, J. Naughton, M. Solomon, ...) Proceedings of the 1994 SIGMOD Conference, Minneapolis, Minn, May 1994.
QuickStore: A High Performance Mapped Object Store (with S. White), Proceedings of the 1994 SIGMOD Conference, Minneapolis, Minn, May 1994. Also, VLDB Journal "Best of SIGMOD 1994 Issue, VLDB Journal, Vol 4, No. 4, October 1995.
Implementing Crash Recovery in QuickStore: A Performance Study (with S. White), Proceedings of the 1995 SIGMOD Conference, San Francisco, CA, May 1995.
A Performance Study of Alternative Object Faulting and Pointer Swizzling Strategies (with Seth White), Proceedings of the 1992 Very Large Data Base Conference, Vancouver, CA, August 1992.
A Study of Three Alternative Workstation-Server Architectures for Object Oriented Database Systems (with P. Futtersack, D. Maier, and F. Velez), Proceedings of the 1990 VLDB Conference, Brisbane, Australia, August, 1990
The Architecture of the EXODUS Extensible DBMS (with M. Carey, D. Frank, G. Graefe, J. E. Richardson, E. J. Shekita and M. Muralikrishna), Proceedings of the International Workshop on Object Oriented Database Systems, Asilomar, CA. September, 1986.
The EXODUS Extensible DBMS Project: An Overview (with M. Carey, Graefe, G., Haight, D., Richardson, J., Schuh, D., Shekita, E., and Vandenberg, S.), in Readings in Object-Orient Database Systems, S. Zdonik and D. Maier, eds., Morgan-Kaufman Publ. Co., 1989.
Object and File Management in the EXODUS Extensible Database System (with M. Carey, J. Richardson, and E. Shekita), Proceedings of the 1986 VLDB Conference, Japan, August 1986.
Storage Management for Objects in EXODUS (with Carey, M., Richardson, J., and Shekita, E.), in Object-Oriented Concepts, Applications, and Databases, W. Kim and F. Lochovsky, eds., Addison-Wesley Publishing Co., 1988.
Crash Recovery in Client-Server EXODUS (with M. Franklin, M. Zwilling. C. Tan, and M. Carey), Proceedings of the 1992 SIGMOD Conference, San Diego, CA, June 1992.