HPL/SQL (formerly PL/HQL) is a language translation and execution layer developed by Dmitry Tolpeko. It was introduced into the Hive source code in June, 2015 (JIRA-11055) and included this February, 2016 in Hive 2.0. However, it doesn't need Hive to function.
Let me introduce PL/HQL, an open source tool that implements procedural SQL
can be used with any SQL-on-Hadoop solution.
Motivation:
- Writing the driver code using well-known procedural SQL (not bash)
that enables Hadoop to even more wider audience
- Allowing dynamic SQL, iterations, flow-of-control and SQL exception
handling
- Facilitating migration of RDBMS workload to Hadoop
Plans (besides extending syntax):
- Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code
- Allowing connections to multiple databases (i.e. lookup tables in
relational databases)
- On-the-fly SQL conversion (SELECT i.e.), compatibility layer
Current steps to install in a Hortonworks HDP 2.3.2 environment. Substitute for the version you are using.
Download and Install
tar xvf
hplsql-0.3.13.tar.gzz /usr/hdp/2.3.2.0-2950/
ln -s
/usr/hdp/2.3.2.0-2950/hplsql-0.3.13/ /usr/hdp/current/hplsqlConfigure HADOOP_CLASSPATH
Edit /usr/hdp/current/hplsql/hplsql
Replace /usr/lib/ with /usr/hdp/2.3.2.0-2950/
Add to Path (in this case globally)
echo
"PATH=${PATH}:/usr/hdp/current/hplsql" >
/etc/profile.d/hplsql-path.sh && chmod 755
/etc/profile.d/hplsql-path.sh
Configure plhql-site.xml
To configure Hive connection settings, and connectivity to other databases (mySQL, Teradata, IBM DB/2, Oracle, MSSQL)
Test from Command Line
hplsql
--version
Get the previous date:
START=$(hplsql -e 'CURRENT_DATE - 1')
Copy table to a file:
COPY (SELECT id, name FROM sales.users WHERE local_dt = CURRENT_DATE) TO /data/users.txt DELIMITER '\t';
Copy table from default connection (Hive) to Teradata connection
COPY sales.users TO sales.users2 AT tdconn;
Log to mySQL from Hive
MAP OBJECT log TO log.log_data AT mysqlconn; DECLARE cnt INT; SELECT count(*) INTO cnt FROM sales.users WHERE local_dt = CURRENT_DATE; INSERT INTO log (message) VALUES ('Number of users: ' || cnt);
Compare Hive table totals to mySQL
CMP SUM sales.users WHERE local_dt = CURRENT_DATE, users_daily AT mysqlconn;
Great addition to the Hive codebase.