6.10.3. pilotscope.DBController.SparkSQLController

class SparkSQLController(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]

Bases: pilotscope.DBController.BaseDBController.BaseDBController

__init__(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]
Parameters
  • config – The config of PilotScope including the config of database.

  • echo – if true, the more detailed information will be printed when executing the sql statement.

load_all_tables_from_datasource()[source]
persist_table(table_name)[source]
persist_tables()[source]
exist_table(table_name, where='session')bool[source]

If the table named table_name exist or not

Returns

the table named table_name exist, it returns True; otherwise, it returns False

load_table_if_exists_in_datasource(table_name)[source]
analyze_table_stats(table_name)[source]
analyze_all_table_stats()[source]
clear_all_tables()[source]
set_hint(key, value)[source]

Set the value of each hint (i.e., the run-time config) when execute SQL queries. The hints can be used to control the behavior of the database system in a session.

For PostgreSQL, you can find all valid hints in https://www.postgresql.org/docs/13/runtime-config.html.

For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration

Parameters
  • key – The key associated with the hint.

  • value – The value of the hint to be set.

create_table_if_absences(table_name, column_2_value, primary_key_column=None, enable_autoincrement_id_key=True)[source]

Create a table according to parameters if absences. This function will not insert any data into the table. The column names and types of the table will be inferred from column_2_value.

Parameters
  • table_name – the name of the table you want to create

  • column_2_value – a dict, whose keys are the names of columns and values. This data will be used to infer the column names and types of the table.

  • primary_key_column – A column name in column_2_value. The corresponding column will be set as primary key. Otherwise, there will be no primary key.

  • enable_autoincrement_id_key – If it is True, the primary_key_column will be autoincrement. It is only meaningful when primary_key_column is not None.

get_table_row_count(table_name)[source]

Get the row count of the table

Parameters

table_name – the name of the table

Returns

the row count

insert(table_name, column_2_value: dict, persist=True)[source]

Insert a new row into the table with each column’s value set as column_2_value.

Parameters
  • table_name – the name of the table

  • column_2_value – a dict where the keys are column names and the values are the values to be inserted

execute(sql, fetch=False, fetch_column_name=False)Union[pandas.core.frame.DataFrame, pyspark.sql.dataframe.DataFrame][source]

Execute a sql statement.

Parameters
  • sql – A SQL statement to be executed.

  • fetch – fetch result or not. If True, the function will return a list of tuple representing the result of the sql.

explain_physical_plan(sql, comment='')Dict[source]

Get a physical plan from database’s optimizer for a given SQL query.

Parameters

sql – The SQL query to be explained.

get_estimated_cost(sql, comment='')Tuple[int][source]

Get an estimated cost of a SQL query.

Parameters
  • sql – The SQL query for which to estimate the cost.

  • comment – An optional comment to include with the query plan. Useful for debugging.

Returns

The estimated total cost of executing the SQL query.

write_knob_to_file(key_2_value_knob)[source]

Write knobs to config file, you should restart database to make it work.

Parameters

key_2_value_knob – a dict with keys as the names of the knobs and values as the values to be set.

recover_config()[source]

Recover config file of database to the lasted saved config file by backup_config()

shutdown()[source]

shutdown the database

start()[source]

start the database

explain_execution_plan(sql, comment='')[source]

Get an execution plan from database’s optimizer for a given SQL query.

Parameters

sql – The SQL query to be explained.

status()[source]
get_buffercache()[source]
create_index(index)[source]

Create an index on columns index.columns of table index.table with name index.index_name.

Parameters

index – a Index object including the information of the index

drop_index(index)[source]

Drop an index by its index name.

Parameters

index – an index that will be dropped

drop_all_indexes()[source]

Drop all indexes across all tables in the database. This will not delete the system indexes and unique indexes.

get_all_indexes_byte()[source]

Get the size of all indexes across all tables in the database in bytes. This will include the system indexes and unique indexes.

Returns

the size of all indexes in bytes

get_table_indexes_byte(table_name)[source]

Get the size of all indexes on a table in bytes. This will include the system indexes and unique indexes.

Parameters

table_name – a table name that the indexes belong to

Returns

the size of all indexes on the table in bytes

get_index_byte(index: pilotscope.Common.Index.Index)[source]

Get the size of an index in bytes by its index name.

Parameters

index – the index to get size

Returns

the size of the index in bytes