Qubole ODBC from C#


Here is a detailed tutorial for accessing the Qubole ODBC driver from .NET. Because of the level of detail, you may want to scan the headers and skip to the section you want. I’m assuming you’re going to use Visual Studio to develop in some sort of Windows environment.

Setting up the Qubole ODBC Driver:

This is how to set up the Qubole ODBC Driver as of 6/18/2014. They may add functionality, so get more current documentation if what you see does not fit what you see when setting up.

1. Get the driver:

According to Qubole, the driver can be obtained by contacting Qubole Support. They will provide you with one or two *.msi files and possibly some *.tdc files. We’re going to ignore the .tdc files for the purpose of this demo. Pick the *.msi file that matches your processor architecture (I got files for x86 or x64).

2. Install the driver: This works like most install files.

  • Click the *.msi file that you chose above..
  • You’ll get the Welcome page. Click Next.
  • Then comes Terms of Use. Read (you do read these things, don’t you?) and click the Accept checkbox (assuming of course you accept the terms).
  • Next is the Destination Folder dialog. I chose the default.
  • Then there is the Account Token dialog. You get your account token by navigating to the User’s Tab of the Control Panel in the Qubole UI. The account token is right under the label “API Token for Account:” as shown on the bottom of this screenshot:

  • Then you can enter the cluster labels. Qubole recommends using a different cluster for Presto use, so there are dialogs for Hive and Presto. You get these from the labels. Mine were “default” and “Presto”.

  • On the next dialog, you click the Install button, and the ODBC installation is complete.

3. Check installation:

  • From your Windows Start Menu, search for “Data Sources (ODBC)”. Click this program.
  • Click the “System DSN” tab and you should find 2 new DSNs one for Qubole Hive and one for Qubole Presto. Pay attention to those names. We’ll use them below.

Creating test data:

I generally have a small Hive table to use for initial work on a project. It saves delay for example if I forget to use the Limit keyword in my query, or the Test or Constrained Run query modes.

In this case I grabbed 1000 lines from the default_qubole_airline_origin_destination table and put them in table ODBCTestData with the following Hive statements:

drop table if exists ODBCTestData;
create table ODBCTestData like default_qubole_airline_origin_destination;
insert overwrite table ODBCTestData select * from default_qubole_airline_origin_destination limit 1000;

Making a Hive query in Qubole and retrieving the results:

Given this is a programming blog, I’ll provide the code here. Aside from pasting this into a C# console application, the following things may be necessary for this to work:

  • Change the filename in line 30 to go into a file and folder of your choice.
  • If the name of the datasource in the ODBC Data Source Administrator is not the same as the DSN in line 27, correct that.
  • If you are using the 64-bit driver, change your Visual Studio project’s properties so that:
    • “Prefer 32-bit” is unchecked in the Build Tab.
    • “Enable the Visual Studio hosting process” is unchecked in the Debug Tab.

The last 2 items avoid an architecture mismatch between a 32-bit application and the 64-bit driver.

using System;
using System.Data.Odbc;
using System.IO;

namespace QuboleOdbcCsharp
{
	class Program
	{
		static void Main(string[] args)
		{
			ConnectToData();
		}

		public static void ConnectToData()
		{
			OdbcConnection odbcConnection = null;
			StreamWriter streamWriter = null;
			OdbcDataReader reader = null;

			try
			{
				//Define the connection
				odbcConnection = new System.Data.Odbc.OdbcConnection()
				{
					//Note: "Qubole Hive ODBC Driver DSN" below is the name of the 
					//DSN from ODBC Data Source Administrator application
					ConnectionString = "DSN=Qubole Hive ODBC Driver DSN;Initial Catalog=default"
				};
				//Set up an output file
				streamWriter = new StreamWriter(@"c:\junk\OdbcTest.txt");
				{
					//Open the connection
					odbcConnection.Open();

					//Issue a query against this connection returning an OdbcDataReader
					var command = odbcConnection.CreateCommand();
					command.CommandText = "select * from odbctestdata;";
					reader = command.ExecuteReader();

					//Loop through the data reader getting 4 columns from the table.
					while (reader.Read())
					{
						streamWriter.WriteLine("{0}\t{1}\t{2}\t{3}",
							reader.GetString(reader.GetOrdinal("origin")),
							reader.GetString(reader.GetOrdinal("dest")),
							reader.GetString(reader.GetOrdinal("tkcarrier")),
							reader.GetString(reader.GetOrdinal("passengers")));
					}
				}
			}
			catch (Exception ex)
			{
				// Should handle exceptions.  This is a demo.
			}
			finally
			{
				if (reader != null && !reader.IsClosed)
				{
					reader.Close();
				}
				if (streamWriter != null)
				{
					streamWriter.Close();
				}
				if (odbcConnection != null)
				{
					odbcConnection.Close();
				}
			}
		}
	}
}
Categories: BI and Big Data
Tags: Qubole;

SEARCH ARTICLES