6.3.1. pilotscope.DBInteractor.PilotDataInteractor
- class PilotDataInteractor(config: pilotscope.PilotConfig.PilotConfig, enable_simulate_index=False)[source]
Bases:
object
The core module for interacting with DBMS and handling push-and-pull operators
- __init__(config: pilotscope.PilotConfig.PilotConfig, enable_simulate_index=False) → None[source]
- Parameters
config – The configuration of PilotScope.
enable_simulate_index – A flag indicating whether to enable the simulated index. This flag is only valid for PostgreSQL.
- push_hint(key_2_value_for_hint: dict)[source]
Set the value of each hint (i.e., the run-time config) when execute SQL queries. The hints can be used to control the behavior of the database system in a session.
For PostgreSQL, you can find all valid hints in https://www.postgresql.org/docs/13/runtime-config.html.
For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration
Note that you need to distinguish which parameter is static (push_knob) and which is run-time (push_hint).
- Parameters
key_2_value_for_hint – A dictionary containing key-value pairs representing hint information, e.g. {“hint_key”: “hint_value”}.
- push_knob(key_2_value_for_knob: dict)[source]
Set the value of each knob parameter (i.e., the static config) of database. These parameters are valid after the database is restarted. Thus, pilotscope will restart the database if this function is called.
For PostgreSQL, you can find all valid knob in https://www.postgresql.org/docs/13/runtime-config.html. Pilotscope will modify the configuration file (i.e., postgresql.conf) of PostgreSQL to set the knob.
For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html
Note that you need to distinguish which parameter is static (push_knob) and which is run-time (push_hint).
- Parameters
key_2_value_for_knob – A dictionary containing key-value pairs for knob settings, e.g. {“knob_key”: “knob_value”}
- push_card(subquery_2_value: dict, enable_parameterized_subquery=False)[source]
Set the cardinality of each sub-plan query when execute a SQL query. The cardinality of each sub-plan query will set to the default estimated value if you don’t provide the value.
- Parameters
subquery_2_value – A dictionary containing subquery to card value mappings, e.g. {“subquery_1”: “card_1”, “subquery_2”: “card_2”}
enable_parameterized_subquery – A flag indicating whether to enable parameterized subquery.
- push_pg_hint_comment(pg_hint_comment: str)[source]
Set the comment of pg_hint_plan into input SQL query. The function is only valid for PostgreSQL. The pg_hint_plan is an extension for the PostgreSQL database that allows users to influence the query planner’s choice of execution plans. You can find more information in https://github.com/ossc-db/pg_hint_plan.
- Parameters
pg_hint_comment – a pg_hint_comment like
/*+ HashJoin(a b) SeqScan(a) */
, which indicate the join method of a and b is HashJoin and the scan method of a is SeqScan.
- push_index(indexes: List[pilotscope.Common.Index.Index], drop_other=True)[source]
Set the index information of each table when execute SQL queries. The index will be built before executing all SQL queries in a session of PilotDataInteractor.
- Parameters
indexes – A list of Index instances to be handled.
drop_other – A flag indicating whether to drop other indexes not in the indexes list.
- pull_subquery_card(enable_parameterized_subquery=False)[source]
Require PilotScope to collect all cardinality information of each sub-plan queries when execute a SQL query.
- Parameters
enable_parameterized_subquery – A flag indicating whether to enable parameterized subquery.
- pull_execution_time()[source]
Require PilotScope to collect execution time when execute a SQL query.
- pull_estimated_cost()[source]
Require PilotScope to collect estimated cost from database’s optimizer when execute a SQL query.
- pull_record()[source]
Require PilotScope to collect execution records when execute a SQL query. PilotScope will not to execute a complete process of recode retrieval, unless you require it to collect records.
- pull_buffercache()[source]
Retrieves the buffer cache, where each item include a table name and the numbers of its buffer.
- execute_batch(sqls, is_reset=True) → List[Optional[pilotscope.PilotTransData.PilotTransData]][source]
Execute all sqls sequentially in a session. All SQL queries is executed using the identical push/pull configuration.
- Parameters
sqls – a list of sqls to be executed
is_reset – If it is true, all push/pull will be removed after execution
- Returns
list of the results for each sql
- execute_parallel(sqls, parallel_num=10, is_reset=True)[source]
Execute all SQL statements in parallel within a session. All SQL queries is executed using the identical push/pull configuration.
- Parameters
sqls – a list of sqls to be executed
parallel_num – the number of threads, defaults to 10
is_reset – If it is true, all push/pull will be removed after execution
- Raises
RuntimeError – simulate index does not support execute_parallel
- Returns
list of the results for each sql
- execute(sql, is_reset=True) → Optional[pilotscope.PilotTransData.PilotTransData][source]
Execute this SQL and finish all registered push-and-pull operators before.
- Parameters
sql – a sql statement to be executed
is_reset – If it is true, all push/pull will be removed after execution
- Returns
If no exceptions, it returns a PilotTransData representing extended result; otherwise, it returns None.