Percentile Plot : Exploring percentile distribution of numeric data
While histograms are great to show distribution of numeric data, they are not the best for comparison of different data groups. To make the distribution of data easier to compare across groups, Keshif features a unique percentile plot design, which is a variation of box-whisker plots.
Keshif's percentile plot reveals the median (50-percentile) and other percentile ranges in your dataset. Keshif creates a box for every percentile range, (10-20), (20-30), (30-40), ... (80-90). These boxes allow you to see the distribution of your data and its skew towards lower and upper values. Each percentile bin will include roughly the same number of records (one-tenth of the data).
- Percentile plots show distribution of data effectively and across multiple data groups.
- Percentile plots reveal the median and other percentile characteristics of data.
- Percentile plots can present information in a compact space,
- But, they cannot show frequency across binned ranges like histograms.
Let us consider the example below, and observe detailed sales price trends across different categories.
To show or hide percentile plot, click chart configuration icon, and then click the percentile chart option.
Highlighted and compared record groups get their own percentile row in the plot, allowing direct comparison across groups and the complete dataset.
You can easily view median (50-percentile) and the percentile ranges by mouse-over. You can even directly filter the numeric data by clicking on a percentile range.
Let's focus on a few specific observations:
The four rows in the percentile plot below the sales histogram are: 1) Total distribution, 2) Technology group (orange), 3) Furniture group (purple), and 4) Office supplies group (dark green)
Note that Office supplies are significantly cheaper than the furniture and technology sales, while office supplies and technology sales are similar in overall distribution. This information is harder to observe just using the histogram chart.
The median of all the sales price values is $55, that is, half of the products sold were cheaper than $55, and the other half was more expensive.
Similarly, we can observe 20% of the sales were less than $14, and 30% were less than $21. The number of sales in each percentile range is about 100 (9,994/10).
The percentile plot is not linked to aggregation metric or view settings. For example, if you compute the average quantity for orders in different sales price ranges, the distribution of the percentiles will continue to be based on the count and order of records.