6.10.3. pilotscope.DBController.SparkSQLController
- class SparkSQLController(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]
Bases:
pilotscope.DBController.BaseDBController.BaseDBController
- __init__(config: pilotscope.PilotConfig.SparkConfig, echo=False)[source]
- Parameters
config – The config of PilotScope including the config of database.
echo – if true, the more detailed information will be printed when executing the sql statement.
- exist_table(table_name, where='session') → bool[source]
If the table named table_name exist or not
- Returns
the table named table_name exist, it returns True; otherwise, it returns False
- set_hint(key, value)[source]
Set the value of each hint (i.e., the run-time config) when execute SQL queries. The hints can be used to control the behavior of the database system in a session.
For PostgreSQL, you can find all valid hints in https://www.postgresql.org/docs/13/runtime-config.html.
For Spark, you can find all valid hints (called conf in Spark) in https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration
- Parameters
key – The key associated with the hint.
value – The value of the hint to be set.
- create_table_if_absences(table_name, column_2_value, primary_key_column=None, enable_autoincrement_id_key=True)[source]
Create a table according to parameters if absences. This function will not insert any data into the table. The column names and types of the table will be inferred from column_2_value.
- Parameters
table_name – the name of the table you want to create
column_2_value – a dict, whose keys are the names of columns and values. This data will be used to infer the column names and types of the table.
primary_key_column – A column name in column_2_value. The corresponding column will be set as primary key. Otherwise, there will be no primary key.
enable_autoincrement_id_key – If it is True, the primary_key_column will be autoincrement. It is only meaningful when primary_key_column is not None.
- get_table_row_count(table_name)[source]
Get the row count of the table
- Parameters
table_name – the name of the table
- Returns
the row count
- insert(table_name, column_2_value: dict, persist=True)[source]
Insert a new row into the table with each column’s value set as column_2_value.
- Parameters
table_name – the name of the table
column_2_value – a dict where the keys are column names and the values are the values to be inserted
- execute(sql, fetch=False, fetch_column_name=False) → Union[pandas.core.frame.DataFrame, pyspark.sql.dataframe.DataFrame][source]
Execute a sql statement.
- Parameters
sql – A SQL statement to be executed.
fetch – fetch result or not. If True, the function will return a list of tuple representing the result of the sql.
- explain_physical_plan(sql, comment='') → Dict[source]
Get a physical plan from database’s optimizer for a given SQL query.
- Parameters
sql – The SQL query to be explained.
- get_estimated_cost(sql, comment='') → Tuple[int][source]
Get an estimated cost of a SQL query.
- Parameters
sql – The SQL query for which to estimate the cost.
comment – An optional comment to include with the query plan. Useful for debugging.
- Returns
The estimated total cost of executing the SQL query.
- write_knob_to_file(key_2_value_knob)[source]
Write knobs to config file, you should restart database to make it work.
- Parameters
key_2_value_knob – a dict with keys as the names of the knobs and values as the values to be set.
- recover_config()[source]
Recover config file of database to the lasted saved config file by backup_config()
- explain_execution_plan(sql, comment='')[source]
Get an execution plan from database’s optimizer for a given SQL query.
- Parameters
sql – The SQL query to be explained.
- create_index(index)[source]
Create an index on columns index.columns of table index.table with name index.index_name.
- Parameters
index – a Index object including the information of the index
- drop_index(index)[source]
Drop an index by its index name.
- Parameters
index – an index that will be dropped
- drop_all_indexes()[source]
Drop all indexes across all tables in the database. This will not delete the system indexes and unique indexes.
- get_all_indexes_byte()[source]
Get the size of all indexes across all tables in the database in bytes. This will include the system indexes and unique indexes.
- Returns
the size of all indexes in bytes