Introduction

The summarize operator in APL enables you to perform data aggregation and create summary tables from large datasets. You can use it to group data by specified fields and apply aggregation functions such as count(), sum(), avg(), min(), max(), and many others. This is particularly useful when analyzing logs, tracing OpenTelemetry data, or reviewing security events. The summarize operator is helpful when you want to reduce the granularity of a dataset to extract insights or trends.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Usage

Syntax

| summarize [[Field1 =] AggregationFunction [, ...]] [by [Field2 =] GroupExpression [, ...]]

Parameters

  • Field1: A field name.
  • AggregationFunction: The aggregation function to apply. Examples include count(), sum(), avg(), min(), and max().
  • GroupExpression: A scalar expression that can reference the dataset.

Returns

The summarize operator returns a table where:

  • The input rows are arranged into groups having the same values of the by expressions.
  • The specified aggregation functions are computed over each group, producing a row for each group.
  • The result contains the by fields and also at least one field for each computed aggregate. Some aggregation functions return multiple fields.

Use case examples

In log analysis, you can use summarize to count the number of HTTP requests grouped by method, or to compute the average request duration.

Query

['sample-http-logs']
| summarize count() by method

Run in Playground

Output

methodcount_
GET1000
POST450

This query groups the HTTP requests by the method field and counts how many times each method is used.

Other examples

['sample-http-logs']
| summarize topk(content_type, 20)

Run in Playground

['github-push-event']
| summarize topk(repo, 20) by bin(_time, 24h)

Run in Playground

Returns a table that shows the heatmap in each interval [0, 30], [30, 20, 10], and so on. This example has a cell for HISTOGRAM(req_duration_ms).

['sample-http-logs']
| summarize histogram(req_duration_ms, 30)

Run in Playground

['github-push-event']
| where _time > ago(7d)
| where repo contains "axiom"
| summarize count(), numCommits=sum(size) by _time=bin(_time, 3h), repo
| take 100

Run in Playground

  • count: Use when you only need to count rows without grouping by specific fields.
  • extend: Use to add new calculated fields to a dataset.
  • project: Use to select specific fields or create new calculated fields, often in combination with summarize.