Core Java

Compute Percentiles in Java

When it concerns data analysis in Java, computing percentiles stands as a foundational task for grasping the statistical distribution and features of a numerical dataset. Let us understand how to compute percentiles in Java.

1. Understanding Percentiles

A percentile represents a value below which a given percentage of observations in a group of observations fall. For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the observations may be found. There are different methods to calculate percentiles, including the nearest rank method, linear interpolation method, and more complex methods like the Harrell-Davis quantile estimator.

Percentiles are widely used in various fields, including finance, healthcare, and education. They help in comparing data sets, identifying outliers, and making informed decisions based on data distribution. Understanding how to calculate percentiles is crucial in various data analysis tasks, including statistical analysis, data mining, and machine learning. Being able to handle both arrays and collections provides flexibility and convenience in working with different data structures.

2. Calculating Percentile

Calculating a percentile involves determining the value below which a certain percentage of data points in the collection fall. Below is an example of how to achieve this programmatically in Java.

package com.jcg.example;

import java.util.Arrays;
import java.util.Collection;
import java.util.ArrayList;

public class PercentileCalculator {

    // Method overload to handle arrays
    public static double calculatePercentile(double[] data, double percentile) {
        return calculatePercentile(Arrays.asList(data), percentile);
    }

    // Method to calculate percentile from a collection
    public static double calculatePercentile(Collection data, double percentile) {
        double[] dataArray = data.stream().mapToDouble(Double::doubleValue).toArray();
        Arrays.sort(dataArray);
        double n = (percentile / 100.0) * (dataArray.length - 1) + 1;
        if (n == 1) return dataArray[0];
        else if (n == dataArray.length) return dataArray[dataArray.length - 1];
        else {
            int k = (int) n;
            double d = n - k;
            return dataArray[k - 1] + d * (dataArray[k] - dataArray[k - 1]);
        }
    }

    public static void main(String[] args) {
        // Example with array
        double[] arrayData = {12, 15, 17, 20, 23, 25, 28, 30, 35, 40};
        double percentile = 75;
        double resultArray = calculatePercentile(arrayData, percentile);
        System.out.println("From Array: The " + percentile + "th percentile is: " + resultArray);
        
        // Example with collection
        Collection collectionData = new ArrayList();
        collectionData.add(12.0);
        collectionData.add(15.0);
        collectionData.add(17.0);
        collectionData.add(20.0);
        collectionData.add(23.0);
        collectionData.add(25.0);
        collectionData.add(28.0);
        collectionData.add(30.0);
        collectionData.add(35.0);
        collectionData.add(40.0);
        double resultCollection = calculatePercentile(collectionData, percentile);
        System.out.println("From Collection: The " + percentile + "th percentile is: " + resultCollection);
    }
}

2.1 Code Explanation

Below is the breakdown of the code:

  • Method Overloading: Two versions of the calculatePercentile method are defined to handle both arrays and collections.
  • calculatePercentile Method for Arrays: This method converts the array to a list and delegates the calculation to the other method.
  • calculatePercentile Method for Collections: This method calculates the percentile from a collection of numerical data.
  • Conversion to Array: The collection is converted to a primitive double array using Java Streams for sorting.
  • Sorting and Calculation: The array is sorted, and the percentile is calculated using the same logic as before.
  • Main Method: Entry point of the program where the execution starts. It demonstrates how to use the calculatePercentile method with both arrays and collections of sample data.
  • Example with Array: An array containing sample numerical data is defined, and the percentile is calculated.
  • Example with Collection: A collection containing sample numerical data is defined, and the percentile is calculated.
  • Output: The calculated percentiles are printed to the console for both the array and the collection.

2.2 Code Output

The output of the provided Java code will be:

From Array: The 75th percentile is: 30.5

From Collection: The 75th percentile is: 30.5

3. Conclusion

In conclusion, knowing the calculation of percentiles allows for a deeper understanding of how numerical data is spread and its key features. The knowledge empowers individuals to make informed decisions and tackle problems more effectively across various domains.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button