Azure Data Lake Analytics (ADLA) provides a U-SQL language (think Pig + SQL + C# + more) based on Microsoft's internal language Scope. Scope is used for tools like Bing Search. It has the same concepts as Hadoop - schema on read, custom reducers, extractors/SerDes, etc. A component of ADLA is based on Microsoft internal job scheduler and compute engine, Cosmos. ADLA uses Apache YARN to schedule jobs and manage its in-memory components.
Azure Data Lake Store (ADLS) is a blob storage layer for ADLA, which behaves more like HDFS and uses WebHDFS / Apache Hadoop behind the scenes. ADLA includes the concepts of Tables, Views, Stored Procedures, Table-Valued Functions, Partitions, and stores these types of objects in its internal metastore catalog, similar to Hive.
Currently ADLS supports TSV/CSV format, with extensions for JSON and the ability to write custom extractors against pretty much any format that you could read with .NET or the .Net SDK for Hadoop.
A USQL Script looks something like this:
DECLARE EXTERNAL @inputfile string = "myinputdir/myinputfile"
@indataset = EXTRACT
col1 as string,
col2 as int?
FROM @inputfile
USING Extractors.Tsv(skipFirstNRows:1, silent:false);
@outdataset = SELECT
col1,
(col2.Length == 0)? 0 : col2 AS isblankcol
FROM @indataset;
OUTPUT @outdataset TO @outputlocation
USING Outputters.Tsv(outputHeader : true, quoting: false);
One problem I have with USQL is the name. Every search on Google comes back with "We searched for SQL. Did you mean USQL?"
USQL uses C# syntax and .Net data typing, and it includes code-behind and custom assemblies.
A USQL Script job can be submitted either locally for testing or to Azure Data Lake Analytics. It is a batch process and there is limited interactive functionality.
For those familiar with using hdfs / hadoop commands, there is Python shell development in progress against ADLS with some familiar commands.
cat chmod close du get help ls mv quit rmdir touch
chgrp chown df exists head info mkdir put rm tail
As with any Azure services, you can also use Azure Xpat Cli, Powershell & Web APIs.
No comments:
Post a Comment