How do I set up a Spark SQL JDBC connection on Amazon EMR?

2 minute read
0

I want to set up a Spark SQL Java Database Connectivity (JDBC) connection on Amazon EMR.

Resolution

Note: The following steps use SQuirreL SQL Client for SQL client on a local machine.

To set up a Spark SQL JDBC connection on Amazon EMR, complete the following steps:

  1. Download and install SQuirreL SQL Client. For more information, see Download and installation on the SQuirreL SQL website.

  2. To connect to the Amazon EMR primary node, use SSH.

  3. To start Spark Thrift Server, run the following command on the primary node:

    sudo /usr/lib/spark/sbin/start-thriftserver.sh
  4. On the primary node, copy all .jar files from the /usr/lib/spark/jars directory to your local machine.

  5. Open SQuirreL SQL Client, and then create a new driver as follows:
    For Name, enter Spark JDBC Driver.
    For Example URL, enter jdbc:hive2://localhost:10001.

  6. On the JDBC Driver Class Path tab, choose Add.

  7. In the dialog box, navigate to the directory that you previously copied the .jar files to, and then select all the files.

  8. In the Class Name field, enter org.apache.hive.jdbc.HiveDriver, and then choose OK.

  9. To set up an SSH tunnel, use local port forwarding on your local machine:

    ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10001:localhost:10001 hadoop@example-primary-public-dns

    Note: Replace example-primary-public-dns with your primary public DNS name.

  10. To connect to the Spark Thrift Server, create a new alias in SQuirreL SQL Client as follows:
    For Name, enter Spark JDBC.
    For Driver, enter Spark JDBC Driver.
    For URL, enter jdbc:hive2://localhost:10001.
    For Username, enter hadoop.

Note: You can now run queries from SQuirreL SQL Client.

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago