6.3.1. pilotscope.DBInteractor.PilotDataInteractor

class PilotDataInteractor(config: pilotscope.PilotConfig.PilotConfig, enable_simulate_index=False)[source]

Bases: object

The core module for interacting with DBMS and handling push-and-pull operators

__init__(config: pilotscope.PilotConfig.PilotConfig, enable_simulate_index=False)None[source]
Parameters
  • config – The configuration of PilotScope.

  • enable_simulate_index – A flag indicating whether to enable the simulated index. This flag is only valid for PostgreSQL.

push_hint(key_2_value_for_hint: dict)[source]

Set the value of each hint (i.e., the run-time config) when execute SQL queries. The hints can be used to control the behavior of the database system in a session.

For PostgreSQL, you can find all valid hints in https://www.postgresql.org/docs/13/runtime-config.html.

For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration

Note that you need to distinguish which parameter is static (push_knob) and which is run-time (push_hint).

Parameters

key_2_value_for_hint – A dictionary containing key-value pairs representing hint information, e.g. {“hint_key”: “hint_value”}.

push_knob(key_2_value_for_knob: dict)[source]

Set the value of each knob parameter (i.e., the static config) of database. These parameters are valid after the database is restarted. Thus, pilotscope will restart the database if this function is called.

For PostgreSQL, you can find all valid knob in https://www.postgresql.org/docs/13/runtime-config.html. Pilotscope will modify the configuration file (i.e., postgresql.conf) of PostgreSQL to set the knob.

For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html

Note that you need to distinguish which parameter is static (push_knob) and which is run-time (push_hint).

Parameters

key_2_value_for_knob – A dictionary containing key-value pairs for knob settings, e.g. {“knob_key”: “knob_value”}

push_card(subquery_2_value: dict, enable_parameterized_subquery=False)[source]

Set the cardinality of each sub-plan query when execute a SQL query. The cardinality of each sub-plan query will set to the default estimated value if you don’t provide the value.

Parameters
  • subquery_2_value – A dictionary containing subquery to card value mappings, e.g. {“subquery_1”: “card_1”, “subquery_2”: “card_2”}

  • enable_parameterized_subquery – A flag indicating whether to enable parameterized subquery.

push_pg_hint_comment(pg_hint_comment: str)[source]

Set the comment of pg_hint_plan into input SQL query. The function is only valid for PostgreSQL. The pg_hint_plan is an extension for the PostgreSQL database that allows users to influence the query planner’s choice of execution plans. You can find more information in https://github.com/ossc-db/pg_hint_plan.

Parameters

pg_hint_comment – a pg_hint_comment like /*+ HashJoin(a b) SeqScan(a) */, which indicate the join method of a and b is HashJoin and the scan method of a is SeqScan.

push_index(indexes: List[pilotscope.Common.Index.Index], drop_other=True)[source]

Set the index information of each table when execute SQL queries. The index will be built before executing all SQL queries in a session of PilotDataInteractor.

Parameters
  • indexes – A list of Index instances to be handled.

  • drop_other – A flag indicating whether to drop other indexes not in the indexes list.

pull_subquery_card(enable_parameterized_subquery=False)[source]

Require PilotScope to collect all cardinality information of each sub-plan queries when execute a SQL query.

Parameters

enable_parameterized_subquery – A flag indicating whether to enable parameterized subquery.

pull_physical_plan()[source]

Require PilotScope to collect physical plan when execute a SQL query.

pull_execution_time()[source]

Require PilotScope to collect execution time when execute a SQL query.

pull_estimated_cost()[source]

Require PilotScope to collect estimated cost from database’s optimizer when execute a SQL query.

pull_record()[source]

Require PilotScope to collect execution records when execute a SQL query. PilotScope will not to execute a complete process of recode retrieval, unless you require it to collect records.

pull_buffercache()[source]

Retrieves the buffer cache, where each item include a table name and the numbers of its buffer.

execute_batch(sqls, is_reset=True)List[Optional[pilotscope.PilotTransData.PilotTransData]][source]

Execute all sqls sequentially in a session. All SQL queries is executed using the identical push/pull configuration.

Parameters
  • sqls – a list of sqls to be executed

  • is_reset – If it is true, all push/pull will be removed after execution

Returns

list of the results for each sql

execute_parallel(sqls, parallel_num=10, is_reset=True)[source]

Execute all SQL statements in parallel within a session. All SQL queries is executed using the identical push/pull configuration.

Parameters
  • sqls – a list of sqls to be executed

  • parallel_num – the number of threads, defaults to 10

  • is_reset – If it is true, all push/pull will be removed after execution

Raises

RuntimeError – simulate index does not support execute_parallel

Returns

list of the results for each sql

execute(sql, is_reset=True)Optional[pilotscope.PilotTransData.PilotTransData][source]

Execute this SQL and finish all registered push-and-pull operators before.

Parameters
  • sql – a sql statement to be executed

  • is_reset – If it is true, all push/pull will be removed after execution

Returns

If no exceptions, it returns a PilotTransData representing extended result; otherwise, it returns None.

reset()[source]

Resets the internal state of the object. This method clears all registered push and pull operators, and database connection.