Optimize Hive queries for Tableau - Five Tips

1. Denormalize your data.

In Hive a good practice is to denormalize your data. With De-normalisation, the data is present in the same table so there is no need for any joins, hence the selects are very fast.

2. Execution Engine

Using your execution engine as tez or spark will optimize hive queries.

3. Usage of proper file format

Using proper file format based on your data will drastically optimize hive query performance. Generally ORC file format is best suitable for increasing your query performance. PARQUET format is another alternative. PARQUT format can be read by both Impala and Hive.

Optimize Hive

4. Optimize Hive Query by Vectorization

Vectorization improves operations like scans, aggregations, filters and joins, by performing them in batches of 1024 rows at once instead of single row each time. Operations are performed on the entire column vector, which improves the instruction pipelines and cache usage. This can optimize hive queries.

5. Cost based optimization

Optimize hive query’s logical and physical execution plan before submitting for final execution. These optimizations are not based on the cost of the query.

To collect statistics for the Hive query optimizer, set two parameters:

Optimize Hive

and then within Hive, enter the command:

Hive Optimization

There are a number of techniques available for improving the performance of Tableau visualizations and dashboards built from Hive. While Hadoop is a batch-oriented system, optimization hints listed above could reduce latency.

Visit our Tableau Dashboard Gallery.

Optimize Hive queries for Tableau – Five Tips

Get your FREE Tableau started

Drought Severeness in USA

Popular Now

Less is more. Here is guide to develop a dashboard that make sense.

Storytelling with bar charts? Check these beautiful Tableau bar charts

Tableau Sales Dashboard using Super Store Data

1. Denormalize your data.

2. Execution Engine

3. Usage of proper file format

4. Optimize Hive Query by Vectorization

5. Cost based optimization

PaintedData

Optimize Hive queries for Tableau – Five Tips

Get your FREE Tableau started

Drought Severeness in USA

Popular Now

1. Denormalize your data.

2. Execution Engine

3. Usage of proper file format

4. Optimize Hive Query by Vectorization

5. Cost based optimization

Related Posts