Group Function in SQL
The syntax for the GROUP BY clause is:
SELECT column1, column2, … column_n, aggregate_function (expression)
FROM tables
WHERE predicates
GROUP BY column1, column2, … column_n;aggregate_function can be a function such as SUM, COUNT, MIN, or MAX.
Here’s a list of the available group functions:
- avg(x) Averages all x column values returned by the select statement
- count(x) Counts the number of non-NULL values returned by the select statement for column x
- max(x) Determines the maximum value in column x for all rows returned by the select statement
- min(x) Determines the minimum value in column x for all rows returned by the select statement
- stddev(x) Calculates the standard deviation for all values in column x in all rows returned by the select statement
- sum(x) Calculates the sum of all values in column x in all rows returned by the select statement
- Variance(x) Calculates the variance for all values in column x in all rows returned by the select statement
Example using the SUM function
For example, you could also use the SUM function to return the name of the department and the total sales (in the associated department).
SELECT department, SUM(sales) as “Total sales”
FROM order_details
GROUP BY department;
Because you have listed one column in your SELECT statement that is not encapsulated in the SUM function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.
Example using the COUNT function
For example, you could use the COUNT function to return the name of the department and the number of employees (in the associated department) that make over $25,000 / year.
SELECT department, COUNT(*) as “Number of employees”
FROM employees
WHERE salary > 25000
GROUP BY department;
ROLLUP
This group by operation is used to produce subtotals at any level of aggregation needed. These subtotals then “roll up” into a grand total, according to items listed in the group by expression. The totaling is based on a one-dimensional data hierarchy of grouped information. For example, let’s say we wanted to get a payroll breakdown for our company by department and job position. The following code block would give us that information:
SQL> select deptno, job, sum(sal) as salary
2 from emp
3 group by rollup(deptno, job);
DEPTNO JOB SALARY
——— ——— ———
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 8750
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
20 10875
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
30 9400
29025
Notice that NULL values in the output of rollup operations typically mean that the row contains subtotal or grand total information. If you want, you can use the nvl( ) function to substitute a more meaningful value.
cube
cube This is an extension, similar to rollup. The difference is that cube allows you to take a specified set of grouping columns and create subtotals for all possible combinations of them. The cube operation calculates all levels of subtotals on horizontal lines across spreadsheets of output and creates cross-tab summaries on multiple vertical columns in those spreadsheets. The result is a summary that shows subtotals for every combination of columns or expressions in the group by clause, which is also known as n-dimensional cross-tabulation. In the following example, notice how cube not only gives us the payroll breakdown of our company by DEPTNO and JOB, but it also gives us the breakdown of payroll by JOB across all departments:
SQL> select deptno, job, sum(sal) as salary
2 from emp
3 group by cube(deptno, job);
DEPTNO JOB SALARY
——— ——— ———
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
10 8750
20 ANALYST 6000
20 CLERK 1900
20 MANAGER 2975
20 10875
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
30 9400
ANALYST 6000
CLERK 4150
MANAGER 8275
PRESIDENT 5000
SALESMAN 5600
29025
Excluding group Data with having
Once the data is grouped using the group by statement, it is sometimes useful to weed out unwanted data. For example, let’s say we want to list the average salary paid to employees in our company, broken down by department and job title. However, for this query, we only care about departments and job titles where the average salary is over $2000. In effect, we want to put a where clause on the group by clause to limit the results we see to departments and job titles where the average salary equals $2001 or higher. This effect can be achieved with the use of a special clause called the having clause, which is associated with group by statements. Take a look at an example of this clause:
SQL> select deptno, job, avg(sal)
2 from emp
3 group by deptno, job
4 having avg(sal) > 2000;
DEPTNO JOB AVG(SAL)
——— ——— ———
10 MANAGER 2450
10 PRESIDENT 5000
20 ANALYST 3000
20 MANAGER 2975
30 MANAGER 2850
Consider the output of this query for a moment. First, Oracle computes the average for every department and job title in the entire company. Then, the having clause eliminates departments and titles whose constituent employees’ average salary is $2000 or less. This selectivity cannot easily be accomplished with an ordinary where clause, because the where clause selects individual rows, whereas this example requires that groups of rows be selected. In this query, you successfully limit output on the group by rows by using the having clause.
Leave a Reply
Want to join the discussion?Feel free to contribute!