6.10.3. pilotscope.DBController.SparkSQLController

class SparkSQLController(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]

Bases: pilotscope.DBController.BaseDBController.BaseDBController

__init__(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]

Parameters

config – The config of PilotScope including the config of database.
echo – if true, the more detailed information will be printed when executing the sql statement.

load_all_tables_from_datasource()[source]

persist_table(table_name)[source]

persist_tables()[source]

exist_table(table_name, where='session') → bool[source]

If the table named table_name exist or not

Returns: the table named table_name exist, it returns True; otherwise, it returns False

load_table_if_exists_in_datasource(table_name)[source]

analyze_table_stats(table_name)[source]

analyze_all_table_stats()[source]

clear_all_tables()[source]

set_hint(key, value)[source]

Set the value of each hint (i.e., the run-time config) when execute SQL queries. The hints can be used to control the behavior of the database system in a session.

For PostgreSQL, you can find all valid hints in https://www.postgresql.org/docs/13/runtime-config.html.

For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration

Parameters

key – The key associated with the hint.
value – The value of the hint to be set.

create_table_if_absences(table_name, column_2_value, primary_key_column=None, enable_autoincrement_id_key=True)[source]

Create a table according to parameters if absences. This function will not insert any data into the table. The column names and types of the table will be inferred from column_2_value.

Parameters

table_name – the name of the table you want to create
column_2_value – a dict, whose keys are the names of columns and values. This data will be used to infer the column names and types of the table.
primary_key_column – A column name in column_2_value. The corresponding column will be set as primary key. Otherwise, there will be no primary key.
enable_autoincrement_id_key – If it is True, the primary_key_column will be autoincrement. It is only meaningful when primary_key_column is not None.

get_table_row_count(table_name)[source]

Get the row count of the table

Parameters: table_name – the name of the table
Returns: the row count

insert(table_name, column_2_value: dict, persist=True)[source]

Insert a new row into the table with each column’s value set as column_2_value.

Parameters

table_name – the name of the table
column_2_value – a dict where the keys are column names and the values are the values to be inserted

execute(sql, fetch=False, fetch_column_name=False) → Union[pandas.core.frame.DataFrame, pyspark.sql.dataframe.DataFrame][source]

Execute a sql statement.

Parameters

sql – A SQL statement to be executed.
fetch – fetch result or not. If True, the function will return a list of tuple representing the result of the sql.

explain_physical_plan(sql, comment='') → Dict[source]

Get a physical plan from database’s optimizer for a given SQL query.

Parameters: sql – The SQL query to be explained.

get_estimated_cost(sql, comment='') → Tuple[int][source]

Get an estimated cost of a SQL query.

Parameters

sql – The SQL query for which to estimate the cost.
comment – An optional comment to include with the query plan. Useful for debugging.

Returns

The estimated total cost of executing the SQL query.

write_knob_to_file(key_2_value_knob)[source]

Write knobs to config file, you should restart database to make it work.

Parameters: key_2_value_knob – a dict with keys as the names of the knobs and values as the values to be set.

recover_config()[source]: Recover config file of database to the lasted saved config file by backup_config()

shutdown()[source]: shutdown the database

start()[source]: start the database

explain_execution_plan(sql, comment='')[source]

Get an execution plan from database’s optimizer for a given SQL query.

Parameters: sql – The SQL query to be explained.

status()[source]

get_buffercache()[source]

create_index(index)[source]

Create an index on columns index.columns of table index.table with name index.index_name.

Parameters: index – a Index object including the information of the index

drop_index(index)[source]

Drop an index by its index name.

Parameters: index – an index that will be dropped

drop_all_indexes()[source]: Drop all indexes across all tables in the database. This will not delete the system indexes and unique indexes.

get_all_indexes_byte()[source]

Get the size of all indexes across all tables in the database in bytes. This will include the system indexes and unique indexes.

Returns: the size of all indexes in bytes

get_table_indexes_byte(table_name)[source]

Get the size of all indexes on a table in bytes. This will include the system indexes and unique indexes.

Parameters: table_name – a table name that the indexes belong to
Returns: the size of all indexes on the table in bytes

get_index_byte(index: pilotscope.Common.Index.Index)[source]

Get the size of an index in bytes by its index name.

Parameters: index – the index to get size
Returns: the size of the index in bytes