Important Dates

by April 15, 2012

Meeting May 8-9, 2012




Contact Info:

For questions please contact:

Chaitan Baru


arrow CLDS
arrow SDSC


NSF - National Science FoundationMellanox TechnologiesSe
agate Brocade


Participant Bios

Dave B Anderson
Dave Anderson is Director of Strategic Planning for Seagate and has over 30 years of experience in the computer field. He was involved in the architecture and planning of Fibre Channel since its first proposal as a disc interface. He was original author and editor of the Object-Based Storage Device proposal submitted to the SCSI standards committee. Mr. Anderson was one of the original nine elected members of the SNIA (Storage Networking Industry Association) Technical Council and on the original steering committee for the first File and Storage Technology (FAST) Conference. He has been awarded 6 patents related to disc storage. He was most recently instrumental in the integration of cryptographic security services in hard drives. He is a member of ACM and the IEEE Computer Society.

Magdalena Balazinska
University of Washington
Magdalena Balazinska is an Assistant Professor in the department of Computer Science and Engineering at the University of Washington. Magdalena's research interests are broadly in the fields of databases and distributed systems. Her current research focuses on big-data analytics, sensor and scientific data management, and cloud computing. Magdalena holds a PhD from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received an NSF CAREER Award (2009), a 10-year most influential paper award (2010), an HP Labs Research Innovation Award (2009-2011), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and several best-paper awards (2002, 2010, and 2011).

Chaitan Baru
UC San Diego
Chaitan Baru, is Director, Center for Large-scale Data Systems research (CLDS) and a Research Scientist at the San Diego Supercomputer Center, UC San Diego. His technical interests are in the areas of scientific data management, large-scale data systems, data integration, data analytics, and parallel database systems. He has been involved in cyberinfrastructure projects across a range of science disciplines, e.g. earth sciences, ecological sciences, hydrology, earthquake engineering, biomedical sciences, and others. He has been at SDSC for the past 15 years. Prior to that, he was one of the group leads for DB2 Parallel Edition at IBM, where he also participated in the development of the TPC-D benchmark. He led a team at IBM that produced the first audited TPC-D result, 100GB benchmark database on 100 nodes of an IBM SP-2.

Milind Bhandarkar
Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system, and has been contributing and working with Hadoop since version 0.1.0. He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at the Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo! and Linkedin. Currently, he is the Chief Architect at Greenplum Labs, a division of EMC.

Andrew Bond
Principal Software Engineer - Red Hat, Inc.
Andrew has been involved with benchmarking during his entire 23 year career. He has worked for NCR, Sequent Computer Systems, Compaq, HP, and now Red Hat in groups involved in hardware/software performance characterization and improvement. He has used a wide variety of benchmarks, published official benchmark results, and worked directly with industry consortia to develop industry standard benchmarks. He now represents Red Hat in both the Standard Performance Evaluation Corporation (SPEC) and the Transaction Processing Council (TPC) industry consortia.

Dhruba Borthakur
Dhruba Borthakur is a core contributor to the Hadoop Distributed File System. He currently works for Facebook as part of the big data engineering team that uses Hadoop and Hive technologies. Dhruba also actively contributes code to Apache HBase. Earlier, he has worked with various storage technologies at Yahoo Inc, Veritas Software and IBM Transarc Labs. He has a Masters Degree in CS from Univ of Wisconsin Madison.

Michael J. Carey
University of California, Irvine
Michael J. Carey is currently a Bren Professor of Information and Computer Sciences at UC Irvine, where his current research interests are centered around data-intensive computing and scalable data management. Prior to rejoining academia in 2008, Carey worked at BEA Systems as the chief architect and an engineering director for BEA's AquaLogic Data Services Platform team. He also spent a number of years as a Professor at the University of Wisconsin-Madison, at IBM Almaden as a database researcher/manager, and as a Fellow at e-commerce software startup Propel Software. Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD Edgar F. Codd Innovations Award.

Vinoth Chandar
Linkedin Corporation
Vinoth Chandar is a software engineer in the Distributed Data System group at LinkedIn, working on Voldemort. He has a masters degree in Computer Science from UTAustin. His interests are distributed systems and mobile computing

Yanpei Chen
UC Berkeley/Cloudera
Yanpei Chen is a Ph.D. student at the University of California, Berkeley. His research focuses on workload-driven design and evaluation of large scale Internet datacenter systems, and includes industrial collaborations with Cloudera, NetApp, and Facebook. He is a member of the Algorithms, Machines, and People's Laboratory, and he holds a National Science Foundation Graduate Research Fellowship. He will graduate in May 2012 and work at Cloudera thereafter.

Craig Cutforth
Craig Cutforth completed his undergraduate electrical engineering studies at Rose-Hulman. Following this, he went on to receive his M.S. and Ph.D. from the University of Colorado where his research was on adaptive control systems. His controls background prepared him well for starting in the servo controls group at Seagate Technology, a worldwide leader of disk drive systems. Craig worked in the servo group for ten years where his responsibilities included understanding how track to track seeks impacted the drive level performance and leading the servo design team for multiple products. He just started his new position as a system lead on the emerging markets team and hopes to extend his knowledge of the disk drive system to include a better understanding of big data.

Stephen Daniel
Technical Director - NetApp
Stephen Daniel has spent the past 11 years working as a senior technologist at NetApp. His current responsibilities include architecting solutions for Hadoop and NoSQL scale-out analytic databases. Mr. Daniel represents NetApp on the Storage Performance Council (SPC), and has been a representative to the Transaction Processing Performance Council (TPC) in the past.

Hulya Emir Farinas
Hulya Farinas has extensive experience in the application of algorithmic approaches to complex problems in multiple verticals. Before joining Greenplum, she has held positions at IBM and M-Factor where she helped her customers make optimal business decisions under uncertainty by marrying machine learning algorithms with optimization routines. She is currently a principal data scientist at Greenplum where she is the lead for health care vertical. She holds a Ph.D. in operations research from the University of Florida.

Andries Engelbrecht
Hewlett-Packard Alliance
Andries Engelbrecht is an Architect in the Hewlett-Packard Alliance, Performance and Solutions group, focused on developing solutions and appliances for Data Management. This includes solutions for Hadoop and Relational Databases for Analytical purposes. Andries has been involved in developing numerous HP solutions for Business Intelligence with various software partners and customers, and utilizing different workloads and benchmarks to characterize these solutions.

Dan Ferber
Dan Ferber Brief Background/Experience Dan holds a Masters in Computer Systems from the University of St. Thomas, and has worked in testing, support, software development, and business development for Cray, SGI, Sun, Oracle, and now Whamcloud, Inc. As a storage vendor-€neutral provider of Lustre and Lustre services, Dan and Whamcloud have an interest to see Lustre and storage system benchmarks of various vendors' scalable storage units that are used in conjunction with global parallel file systems. Dan's previous testing work along with his current work related to Whamcloud customers and markets, is what interests him in this area. Dan is based in Eagan, MN.

John R. Galloway, Jr.
For the past five years I was an architect in the performance group for Actian (previously Ingres) looking at all aspects of performance analysis for both Ingres classic (OLTP) and VectorWise (Bi) generally and for specific customer issues. For the last two years I was also the chair of the TPC-H technical subcommittee. Most of my 37 years in the industry has been spent building operating systems, file systems, database systems, and doing performance analysis at the system level, with a couple minor excursions into cell phone software and robotics.

Ahmad Ghazal
Teradata Corporation
Ahmad Ghazal is working for Teradata R&D for the past 15 years. For the past year, he is a database architect with the Teradata Aster group. Prior to that, Ahmad worked as an architect with the Teradata query optimizer group. Ahmad earned his Ph.D. in computer science from the University of Illinois at Chicago. He is an inventor or co-inventor of 13 issued US patents and published numerous papers in the database area.

Goetz Graefe
Hewlett-Packard Laboratories
Goetz Graefe is a HP Fellow researching database systems at Hewlett-Packard Laboratories, specifically robust query processing, query execution algorithms, automatic indexing, and transactional storage. Prior work includes academic research into database query optimization and product architecture for a major commercial database server. His best-known papers are surveys on query execution and on B-tree indexing.

Anarnath Gupta
Amarnath Gupta is a research scientist at the San Diego Supercomputer Center of UC San Diego. His Advanced Query Processing Lab specializes in semantic search engines, scientific data integration, ontology management, graph data management, and implementation techniques of the above in various architectural platforms including the cloud.

Ron Hawkins
Ron Hawkins is the director of industry relations at the University of California's San Diego Supercomputer Center, where he is responsible for developing industry partnerships and research collaborations in high performance computing. Mr. Hawkins is a technology industry veteran, having held VP-level management, engineering, and product development positions at companies such as SONY, SAIC, and Titan. Mr. Hawkins' technology background and interests include high performance and cloud computing, "big data," and data-intensive systems. Mr. Hawkins also volunteers as an entrepreneur-in-residence in the CONNECT Springboard entrepreneurship program, is a consultant or advisor to several early stage technology companies, and serves on the advisory board of the engineering school at the University of San Diego. He received the Master of Information Systems degree from Virginia Tech and the BSEE degree from the U.S. Naval Academy.

Jeff Jackson
HPC COE (Center of Expertise) - Geophysics Solution Lead - Shell
Responsible for setting standards and strategy for HPC and helping to aligning HPC activities across Shell (Infrastructure, Production and R&D Applications, Software Development). Focal point for Geophysics HPC and Infrastructure.

Hans-Arno Jacobsen
Hans-Arno Jacobsen is a professor of computer science and computer engineering. He directs and leads the research activities of the Middleware Systems Research Group ( His research aims to ease the development of scalable, reliable, and secure ultra-large-scale distributed applications. In pursuit of these objectives, he engages in basic research on event processing, publish/subscribe, service-orientation, aspect-orientation, and green middleware. In research and development engagements with various companies, he pursues projects on large-scale business process management, service delivery models, service and infrastructure
management, and e-energy. Arno has served as program committee member of various international conferences, including ICDCS, ICDE, Middleware, SIGMOD, OOPSLA and VLDB. He was the Program Chair of the 5th International Middleware Conference and the General Chair of the Inaugural International Conference on Distributed Event-Based Systems 2007. He is among the initiators of the DEBS ( conference series and the ( research portal. Further information is available from

Gabriele Jost
Advanced Micro Devices (AMD)
Gabriele Jost obtained her doctorate in Applied Mathematics from the University of Göttingen, Germany. Her background comprises a combination of industrial and academic experience in application parallelization and optimization, compiler technology and processor architectures. She has worked for various vendors (Suprenum GmbH, Thinking Machines Corporation, NEC, Sun Microsystems and Oracle) of high performance parallel computers in the areas of compiler technology, parallelization, optimization and benchmarking using customer codes and industry standard benchmarks such as SpecOMP. Her position as a Research Scientist at the NASA Ames Research Center in Moffett Field, California, USA, was focused on evaluating and enhancing tools for parallel program development and different parallel programming paradigms.
During her appointment as a Research Scientist at the Texas Advanced Computing Center of the University of Texas in Austin she worked on site at the Naval Postgraduate School in Monterey, CA, supporting TeraGrid usage for applications from the areas of Climate Modeling, Meteorology and Oceanography. The work involved handling large amounts of measured as well as simulated data. This made her appreciate the need for infrastructure and tools to handle large data sets and sparked her interest in Cloud Computing.
She is now working at AMD (Advanced Micro Devices) on systems performance optimization and performance metrics with a focus on Cloud Computing. She is also active on the SpecOMP development committee and has been active in the OpenMP Language committee with a focus on OpenMP accelerator support.

Mark Kelly
Convey Computer
Mr. Kelly has over 20 years of experience in developing and optimizing applications for High Performance Computing platforms and is currently a senior staff scientist at Convey Computer, focusing on acceleration of bioscience and big data applications. Prior to Convey, Mark was a master-level staff scientist with Hewlett-Packard where he led an experienced team of engineers and developers. He holds US patents in software design, is published in IEEE journals and a senior-level ACM award recipient.

Paul Kent
Paul Kent is Vice President of Big Data initiatives at SAS, and was previously VP Platform R&D at SAS and lead groups responsible for developing the SAS foundation and mid-tier technologies.
Paul joined SAS in 1984 and has contributed to the development of SAS software components including PROC SQL (principle author of query planner/optimizer), TCP/IP connectivity, the Output Delivery System (ODS) and more recently the Inside-Database and High-Performance initiatives implementing SAS mathematics inside MPP database machines.
Paul was educated at WITS in South Africa, graduating with a Bachelor of Commerce (with honors) followed by an almost complete MBA (interrupted to try an North American posting), and got his commercial introduction to using computers to make better business decisions in the gold division of Anglo American.

Onur Kocberber
Onur Kocberber is a third-year PhD candidate in the Parallel Systems Architecture Laboratory at EPFL, where he is advised by Prof. Babak Falsafi. He is the release co-manager of CloudSuite and a co-developer of the Flexus simulation framework. His research interests target utilizing dark silicon for emerging data-oriented services.

Aleksander Kolcz
Aleksander Kołcz is currently with Twitter, where he leads the User Modeling Team. His work is focused on applying Machine Learning and Data Mining techniques to modeling user interests and to preventing service abuse. He has 12 years of industrial R&D experience at Microsoft, AOL and Personalogy. He received his PhD in 1996 from the University of Manchester Institute of Science and Technology.

Dan Koren
Actian Coorporation
Actian Corporation is an industry leader enabling corporations to take immediate action on big data through its revolutionary Action AppsTM, Cloud Action PlatformTM and Vectorwise Analytical DatabaseTM. Dan Koren has served the computer system and database industries far longer than he would like to admit, making every product he touched run a lot faster. He is responsible for Actian’s Performance Engineering Program. Dan is a certified trouble maker and rabble rouser who can easily talk his way into any situation. He enjoys acting as a catalyst, devil’s advocate and enabler for new ideas. Even his opinions have opinions!

Anu Krishnan
HPC COE & Innovation Labs Manager - Shell
Responsible for Shell's HPC COE, Advanced Sensing and Innovation Labs.

Paul Krneta
Paul Krneta is the CTO of BMMsoft. He served as CTO for Sybase IQ where he designed IQ Multiplex (shared disk MPP DW), NonStopIQ and designed two world's largest DWs. As Technical Director for Database Technology at Digital (DEC) he optimized Oracle to be the first 64-bit database capable of VLM ("Very Large Memory") - the in-memory database.

Stefan Manegold
Stefan Manegold (PhD, University of Amsterdam, 2002) is head of the Database Architectures research group at CWI in Amsterdam, The Netherlands. Manegold's research work comprises database architectures, query processing algorithms and data management on modern hardware, as well as leveraging column-store database technology for efficient and scalable XML / XQuery processing, with a particular focus on optimization, performance, benchmarking and testing. Manegold co-authored more than 40 scientific publications, and recently received the VLDB 2009 10-year Best Paper Award together with his co-authors Peter Boncz and Martin Kersten. Stefan Manegold is a core member of the developers team of the open-source column-oriented database system MonetDB, co-founder of the DaMoN workshop series (co-located with SIGMOD since 2005), and co-chair of the Repeatability and Workability Evaluation for SIGMOD 2009 and 2010.

Serge Mankovski
Research Staff Member, CA Labs
Serge Mankovski is a Research Staff Member with CA Labs. He is based in Ontario, Canada. He coordinates CA Labs research activities with Canadian universities. Serge has over 20 year of industry experience in operating systems, expert systems, machine learning, reasoning, telecommunication software, enterprise job scheduling and event-based enterprise integration. Serge joined CA Technologies from Cybermation, where he worked as a Technical Architect and Technology Evangelist. Serge has been involved in industry sponsored research for over nine years. He co-authored 19 filed patent applications, and has 11 granted patents in United States , United Kingdom and Canada. Serge received a Master's degree in Automatic System Control from Kishinev Polytechnic Institute in Moldova.

Casey Miles
Casey Miles focuses on Ethernet solutions for High Performance Computing (HPC) and Big Data applications. He has been working at Brocade for the past 9 years and prior to that he was a network engineer in the United States Air Force. He has served as Brocade's liaison to the annual Super Computing shows for the past 4 years and a regular fixture of the SCiNet community. In the past year, he has worked on the Mira cluster at Argonne, the Mustang computer at Los Alamos National Labs, The Defense Supercomputer Resource Center at the US Air Force Research Labs, and most recently with the CSIRO HPC project in Perth, Australia. He is a UC Irvine MBA candidate for 2014 and lives with his wife and three small children in San Diego, CA

Raghunath Nambiar
Raghunath Nambiar is an architect at Cisco's Data Center Group responsible for performance engineering and solution strategies. His current focus areas include scale-out and big data solutions. He has 18 years of technical accomplishments with significant expertise in computer system architecture and performance engineering. He has served on several industry standard committees for performance evaluation and benchmarking. He is a member of the board of directors of the Transaction Processing Performance Council (TPC) and chair of its International Conference Series on Performance Evaluation and Benchmarking. He has published three books and over 30 papers. Raghu holds master's degrees from University of Massachusetts and Goa University, and completed advanced project management program from Stanford University

Owen O'Malley
Owen O'Malley is a software architect who has worked exclusively on Hadoop since the project's start. He was the first committer added to Hadoop and was the original chair of the Hadoop Project Management Committee. Owen was the technology lead for both MapReduce and the project to add security to Hadoop. In 2009, he optimized Hadoop to set the record for the Terasort, Gray Sort, and Minute Sort benchmarks. In July 2011, he helped co-found Hortonworks, which is accelerating development and adoption of Hadoop for the enterprise.

Ken Osterberg
Director, Enterprise PLM Portfolio Strategy - Seagate
Ken is a member of the Enterprise Product Line Management (PLM) team at Seagate and is responsible for the Enterprise roadmap strategy, product planning process, and long term portfolio. Ken has 25+ years of high-tech engineering, marketing, and strategy experience. He holds an MBA in Strategy and Marketing from the Carlson School of Management, University of Minnesota, as well as a MSEE and BSEE engineering degree from the University of Minnesota where his focus was VLSI Engineering.

Scott Pearson
Director, High Performance Environments - Brocade
Scott Pearson is Director of High Performance Environments for Brocade. Scott has been very involved in the Linux Open Source Community for many years. Before joining Brocade he served as Director Federal Sales at Linux Networks where he collaborated with Sandia National Laboratories to architect and deliver the first ever Infiniband Cluster along with numerous other Top500 HPC Systems. Prior to that Scott was an early employee with start-up VA Linux Systems directing the Datacenter Program. He was instrumental for the VA Linux record NASDAQ IPO in 1999. Scott holds a Bachelor of Science Degree from the University of Utah.

Beth Plale
Indiana University
Beth Plale is Director of the Data To Insight Center, Managing Director of Pervasive Technology Institute, and Professor of Computer Science in the School of Informatics and Computing at Indiana University. Professor Plale has broad research and governance interest in long term preservation and access to scientific data, and in enabling computational access to large-scale data for broader groups of researchers. Her specific research interests are in tools for metadata and provenance capture, data repositories, cyberinfrastructure for large-scale data analysis, and workflow systems. Plale has substantive experience in developing stable and useable data cyberinfrastructure.

Meikel Poess
Meikel Poess is a software developer in the Server Technology group at Oracle. His primary focus is on performance improvement and benchmarking of Oracle's RDBMS system for large decision support and business intelligence systems. He is also Oracle's representative to the Transaction Processing Performance Council (TPC) where he has held several positions, including chairman of the TPC-H, TPC-R and TPC-DS subcommittees. He architected TPC's latest decision support benchmark, TPC-DS. He is also on TPC's public relations subcommittee. In addition he serves on the steering committee of the SPEC Research Group, which is technical group within SPEC established to serve as a platform for collaborative research efforts in the areas of computer benchmarking, performance evaluation and experimental system analysis, fostering the interaction between industry and academia in the field.

Steven Puzio
Steven Puzio has been working in the high tech industry for over10 years. At Brocade, he is Manager of OEM Systems Engineering developing the IBM partnership through product qualification, technical training, and marketing development. He is also a technical lead for High Performance Computing network solutions for IBM. Prior to Brocade, Mr. Puzio worked at Force10 Networks where he was a High Performance Computing Technical Architect. He was responsible for architectural review and technical sales development for engineers and sales teams in IBM e1350/iDataPlex, System P/X, GPFS, Deep Computing and BlueGene Systems. Steven has also worked with network integrators in the New York City area. He is Brocade & Cisco Certified Network/Design Professional and has worked with various technologies from Brocade, Force10, Cisco, HP, 3Com and others to gain a strong understanding of solution integration. Mr. Puzio graduated from Stevens Institute of Technology with a Bachelor of Engineering in Computer Engineering. He currently resides in Raleigh, North Carolina.

Francois Raab
Francois Raab is President of InfoSizing. As the original author of TPC-C, Raab brings 25 years of experience in defining industry standard performance benchmarks and validating benchmark results. Prior to creating InfoSizing, he developed and taught classes on database internals and tuning.

Tilmann Rabl
MSRG - University of Toronto
Tilmann Rabl is a postdoctoral researcher at the Middleware System Research Group ( led by Prof. Dr. Hans-Arno Jacobsen at the University of Toronto. He finished his doctoral thesis on allocation and scaling in relational database systems in 2011 at the University of Passau. An significant part of his thesis was the development of the Parallel Data Generation Framework (PDGF). For this work he received a technical contribution award from the Transaction Performance Processing Council in 2011. His current work focuses on big data challenges in application performance management, in trac monitoring and in power monitoring. He is involved in several benchmarking development efforts in the area of big data, ETL and data warehouses.

Nicholas Schork
Scripps Research Institute
Nicholas J. Schork, Ph.D. is Professor, Department of Molecular and Experimental Medicine, The Scripps Research Institute and Director of Biostatistics and Bioinformatics at the Scripps Translational Science Institute. His research focuses on quantitative aspects of biomedical science with an emphasis on human genetics and genomics. He is currently involved in a number of very large-scale human genomics projects, including the NIA-sponsored Longevity Consortium and the Stand-Up-To-Cancer Dream Team to develop genomically-guided therapeutic strategies for treating melanoma.

Shashi Shekhar
U Minnesota
Shashi Shekhar is a McKnight Distinguished University Professor at the University of Minnesota. For contributions to spatial databases, spatial data mining, and geographic information systems(GIS), he received the IEEE-CS Technical Achievement Award and was elected Fellows of the IEEE and the AAAS. He co-authored a textbook on Spatial Databases, and co-edited an Encyclopedia of GIS. He is serving as a member of the Computing Community Consortium Council, a co-Editor-in-Chief of Geo-Informatica journal, and a program co-chair of G. I. Science Conference (2012). Earlier he served on National Academies' committees (Mapping Sciences, GEOINT Research Priorities, and GEOINT Workforce), and editorial board of IEEE Trans. on Knowledge and Data Eng. He also co-chaired Symposium on Spatial and Temporal Databases and ACM Conference on GIS.

Reza Taheri
I have been doing system performance analysis for 30 years, first at Bell Labs, then for 20 years at HP, and the last 5 at VMware. I have focused on tuning large systems, typically running databases and transaction processing workloads. I first published a TPC benchmark result in 1990, too many to count since. I chair the TPC-V virtualization benchmark development subcommittee.

Nicholas Wakou
Nicholas is a Systems Engineer/Senior Consultant with the Cloud Computing team of the Dell Next Generation Compute Solutions Group. He has been actively involved in developing and running TPC benchmarks since the year 2000. He is Dell’s primary representative at the Transaction Processing Performance Council (TPC) and SPECcloud development committee. He is a past Chair of the TPC Public Relations Committee and a past member of the TPC Technical Advisory Board (TAB).

Len Wyatt
Len Wyatt is a Principal Program Manager at Microsoft, working at the intersection of the Big Data ecosystem with the traditional Data Warehousing ecosystem. His background includes Data Warehousing, Business Intelligence, On-Line Analytical Processing, ETL and a dash of system performance. Len has a nasty habit of getting involved with unreasonable projects, such as creating the biggest OLAP cube ever, or setting the ETL World Record. Len is the instigator and chair of the TPC committee developing a Data Integration benchmark.

Jerry Zhao
Jerry Zhao is a Senior Staff Software Engineer at Google. He is the Tech Lead Manager of several projects on large-scale data processing and workflow management, including MapReduce. His team provides infrastructure support for applications ranging from interactive SQL queries on multi-TB data sets to multi-PB data crunching for web search.