PilotScope

PilotScope is a middleware to bridge the gaps of deploying AI4DB (Artificial Intelligence for Databases) algorithms into actual database systems. It aims at hindering the underlying details of different databases so that an AI4DB driver could steer any database in a unified manner. By applying PilotScope, we obtain the following benefits:

  • The DB users could experience any AI4DB algorithm as a plug-in unit on their databases with little cost. The cloud computing service providers could operate and maintain AI4DB algorithms on their database products as a service to users. (More Convenient for Usage! πŸ‘πŸ‘πŸ‘)

  • The ML researchers could easily benchmark and iterate their AI4DB algorithms in practical scenarios. (Much Faster to Iterate! ⬆️⬆️⬆️)

  • The ML and DB developers are liberated from learning the details in other side. They could play their own strengths to write the codes in their own sides. (More Freedom to Develop! πŸ„β€β™€οΈπŸ„β€β™€οΈπŸ„β€β™€οΈ)

  • All contributors could extend PilotScope to support more AI4DB algorithms, more databases and more functions. (We highly encourage this! 😊😊😊)

For more details, please kindly read this document and refer to our paper.


News

  • πŸŽ‰ [2023-12-15] Our paper on PilotScope has been accepted by VLDB 2024!


1. Overview

At first, we illustrate the general system architecture of PilotScope as follows:

PilotScope System Architecture

PilotScope provides a console to operate the whole system. It manages multiple AI4DB drivers and the DB interactor. The DB interactor contains an interface connecting AI4DB drivers with databases. It contains multiple APIs, namely injection interface to inject AI4DB algorithms to replace database tasks, data collection to collect training data and push/pull operators to allow AI4DB algorithms to exchange data with the databases. Each database steered by the AI4DB drivers is attached with its specific implementations of DB interactor.

For the detailed working pineline of PilotScope and in-depth description of each component in PilotScope, please refer to the section System Pipeline and section Core Component, respectively.

1.1. How to Apply PilotScope?

To database users (πŸ‘¨β€πŸ’Ό , πŸ‘©β€βš•οΈ or πŸ‘€), PilotScope is totally transparent. The database user can access PilotScope through its console, log into the native database, and then perform database operations using the same commands as usual. The user can start any AI4DB driver using a simple command. Then, PilotScope would automatically execute the AI4DB algorithm whenever necessary. For example, if the user starts the AI4DB driver of a learned query optimizer, the database would automatically apply this learned query optimizer to generate execution plan for each input SQL query.

The general workflow on applying an AI4DB driver in PilotScope is illustrated as follows:

PilotScope Workflow

1.1.1. Supported Tasks and Algorithms

Currently, PilotScope focuses on database tasks in the query optimization workflow shown as follows.

Query Optimizer Workflow

At the moment, PilotScope supports to apply 15 AI4DB algorithms on 2 well-known databases for 4 representative database tasks. We list the details as follows:

1.1.1.1. Knob Tuning

Supported Databases: PostgreSQL PostgreSQL and Spark Spark

  • Bayesian Optimization Method:

    • SMAC: Sequential Model-Based Optimization for General Algorithm Configuration

    • GP-BO: Bayesian Optimisation over Multiple Continuous and Categorical Inputs

  • RL-based Method:

    • DDPG: An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning

For more knob tuning methods, please refer to the LlamaTune benchmark and paper.

1.1.1.2. Index Recommendation

Supported Databases: : PostgreSQL PostgreSQL

  • Enlarging Empty Set:

    • AutoAdmin: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server (Applied in Microsoft SQL Server)

    • DB2Advisor: DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes (Applied in DB2)

    • DTA: Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server (Applied in Microsoft SQL Server)

    • Extend: Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies

  • Reducing Large Set:

    • Drop: Index Selection in Relational Databases

    • Relaxation: Automatic Physical Database Tuning: A Relaxation-based Approach

  • Linear Programming Method:

    • CoPhy: CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

For more index recommendation methods, please refer to the benchmark and paper.

1.1.1.3. Cardinality Estimation

Supported Databases: : PostgreSQL PostgreSQL and Spark Spark

  • Query-driven Method:

    • MSCN: Learned Cardinalities: Estimating Correlated Joins with Deep Learning

  • Data-driven Method:

    • NeuroCard: NeuroCard: One Cardinality Estimator for All Tables

    • DeepDB: DeepDB: Learn from Data, not from Queries!

For more cardinality estimation methods, please refer to the section Model Tuning in our benchmark and paper. There provides the paper and codes for a number of traditional, data-driven and query-driven methods.

1.1.1.4. End-to-end Learned Query Optimizer

Supported Databases: : PostgreSQL PostgreSQL and Spark Spark

  • Learning-to-rank based Method:

    • Lero: A Learning-to-rank Query Optimizer

We could also simply adapt the AI4DB driver for Lero to support some other end-to-end learned query optimizers, including:

  • Hint Set Tuning Method:

    • Bao: Bao: Making Learned Query Optimization Practical

    • HyperQO: Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

  • Plan Exploration Method:

    • Neo: Neo: A Learned Query Optimizer

    • Balsa: Balsa: Learning a Query Optimizer Without Expert Demonstrations

1.1.1.5. More AI4DB Methods …

For more AI4DB methods, please refer to the AI4DB paper reading list.

1.1.1.6. Usage Details

For the detailed steps to apply these AI4DB drivers in PilotScope, please refer to the chapter Example.

1.2. How to Develop in PilotScope?

PilotScope could support to deploy any AI4DB method on top of any native database.

To ML developers (πŸ‘©β€πŸ’» or πŸ§‘πŸ»β€πŸ’»), as well as researchers (πŸ‘¨β€πŸ”¬ or πŸ‘©πŸ»β€πŸ”¬), they could focus on writing the ML programs using AI-friendly language, e.g., Python. The ML programms could directly invoke the APIs in the interface of DB interactor to inject the AI4DB method or enforce actions into databases, and exchange data with the databases. The interface of DB interactor keeps the same for any database. In such way, the ML developers are liberated from the underlying details of the database codes. Meanwhile, the same AI4DB dirver could be applied to steer multiple different database systems.

To DB developers (πŸ‘¨πŸ»β€πŸ’» or πŸ₯·), they are responsible to implement all APIs in the interface of DB interactor in each specific database. They are familiar with the codebase of the database so they could find the best way to implement these interaction APIs as lightweight patches. Implementing these patches is much easier than integrating a complete ML program into databases. What they do is just like their ordinary work on developing new features on the native database.

The general working paradigm on developing an AI4DB driver in PilotScope is illustrated as follows:

PilotScope Workflow

1.2.1. Developing Details

For the details on the interaction and other APIs in PilotScope, please refer to the sections API (ML Side) and Database Side Develop.

1.3. What to Do in PilotScope?

PilotScope can be easily extended to support more AI4DB algorithms for more AI4DB tasks on more databases. According to its working paradigm, we warmly welcome the open-source community to contribute their efforts on PilotScope from these aspects:

  • Write more AI4DB algorithms into ML programs;

  • Propose more APIs in the interface of the DB interactor;

  • Implement the interaction APIs on more databases;

  • Develop more features to enrich the functionality in PilotScope.

We sincerely invite all contributors (πŸ‘¦, πŸ‘©, πŸ‘±πŸ»β€β™‚οΈ, πŸ‘©πŸ»β€πŸ¦³, πŸ§“, πŸ‘΅πŸ», πŸ‘©β€πŸ«, πŸ‘¨β€πŸ«, or πŸ‘₯) to build this open-source community around PilotScope together! πŸ’ͺ🏻 πŸ’ͺ🏻 πŸ’ͺ🏻

We could foster collaborations and accelerate the research iterations πŸ‘£ πŸ‘£ πŸ‘£ of AI4DB methods, and simultaneously make AI4DB more applicable and viable for production scenarios (🏭, πŸ–₯ orπŸ“±).

1.4. License

PilotScope is released under Apache License 2.0.

1.5. References

If you find PilotScope useful for your research, development and product, please kindly cite the following paper:

@article{zhu2023pilotscope,
	title={PilotScope: Steering Databases with Machine Learning Drivers},
	author={Rong Zhu and Lianggui Weng and Wenqing Wei and Di Wu and Jiazhen Peng and Yifan Wang and Bolin Ding and Defu Lian Bolong Zheng and Jingren Zhou},
	journal = {Proceedings of the VLDB Endowment},
	year={2024}}