[Solved] Spark SQL build for hive?

EverSQL Database Performance Knowledge Base

Spark SQL build for hive?

I have downloaded spark release - 1.3.1 and package type is Pre-build for Hadoop 2.6 and later

now i want to run below scala code using spark shell so i followed this steps

1. bin/spark-shell

2. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

3. sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")

Now the problem is if i verity it on hue browser like

select * from src;

then i get

table not found exception

that means table not created how do i configure hive with spark shell to make this successful. i want to use SparkSQL also i need to read and write data from hive.

i randomly heard that we need to copy hive-site.xml file somewhere in spark directory

can someone please explain me with the steps - SparkSQL and Hive configuration

Thanks Tushar

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
The optimized query:
SELECT
        * 
    FROM
        src

Related Articles



* original question posted on StackOverflow here.