Apache hive is a part of the Hadoop echo system that provides the functionality to create a data warehouse at the Hadoop distributed file system. let us Understand What is Hive and how we can work on windowing function in apache hive.
Firstly, the hive is not an OLTP tool rather it is used to process big data that are distributed at the Hadoop cluster apart from this the apache hive is OLAP (online analytical processing) tool of Hadoop. using apache hive you can create different types of the table like internal, external, partitioned, buckets or you can apply windowing function in Apache Hive to process data that is stored at apache hive warehouse.
A windowing function in apache hive is not a new concept as from the past decade we were using windowing functions features in various relational database management systems like MySQL and oracle apart from this, windowing function techniques are used to execute SQL queries at the top of the temporary inner table that is a window.
In this blog, we are going to cover all windowing functions in apache hive so let us start with what is windowing operations on tables.window is a temp table where you perform a couple of aggregate function like count on the window, sum on the window,avg on the window, etc.
Step-1: Create a table in the hive to better understand the functionalities of the windowing function.
Creating a table in the hive.
CREATE TABLE IF NOT EXISTS EMP
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE;
Loading Data into Table.
Step-2: Now load your data into table EMP
LOAD DATA LOCAL INPATH ‘/home/cloudera/Desktop/inputdata.csv’ OVERWRITE INTO TABLE EMP;
Step-3: Then verify your data using select * from EMP;
Applying Window Functions on data sets
Performing window operation on the partitions as in the below command we have only two partitions 1-male 2-female so this query will divide all data into 2 parts and will perform count, sum, min, max,avg operation on part1 and part2.
,count(*) over w as countsal
,sum(Salary) over w as sumsalary
,min(salary) over w as minsalary
,max(salary) over w as maxsalary
,avg(salary) over w as avgsalary
WINDOW w as (partition by gender);
Finding First and Last Value Over a Window:-
Now we are going to fetch the first and last value over a window.
select name,gender,first_value(salary) over(partition by gender) as first_high from emp;
First Value Result:-
select name,gender,last_value(salary) over(partition by gender) as last_high from emp;
Last Value Result:-
Finding Rank and dense Rank over a window.
Finding row number, rank and dense rank using window.
ROW_NUMBER() OVER W as row_number,
RANK() OVER W as rank,
DENSE_RANK() OVER W as dens_erank
window W as (ORDER BY salary ASC);