Aggregate - Maple Help

TimeSeriesAnalysis

 Aggregate
 aggregate time series data

 Calling Sequence Aggregate(timeseries, frequency, opts)

Parameters

 timeseries - frequency - name or string or positive integer indicating the new frequency of the data opts - (optional) equation(s) of the form combine = combine_value, date = date_value, or options accepted by the TimeSeries constructor

Description

 • The Aggregate command takes a TimeSeries data set, and transforms it into a data set with a different (typically lower) frequency. For example, it transforms weekly into monthly data.
 • The Aggregate command examines the first and last time point for which data exist in timeseries. It then divides the time in between up into periods corresponding to the frequency argument; for example, if frequency is monthly, it looks at the month including the first data point up to the month including the last data point. The existing data for each period are combined in a way determined by the combine option; by default, by just taking the last data value from that period. Finally, a new TimeSeries object is created that contains the combined data, and associated time points chosen according to the date option.
 • The frequency parameter determines the new frequency of the data.
 – Possible values are the names hourly, daily, weekly, monthly, quarterly, yearly, or annual; or equivalently, the same as strings, "hourly", etc. The values yearly and annual are the same.
 – Another option is to use a positive integer for this parameter. This represents a time period of that number of seconds. For example, using 3600 would have the same result as hourly.
 • The combine option can take a few keyword values, or a user-supplied procedure. The keywords are max, min, sum, mean, weightedmean, first, and last; or equivalently, the same strings, "max", etc. The name or string weightedmean can be indexed by the name first, last, or middle (but not strings - one cannot index strings by strings).
 – With combine = max or combine = min or combine = sum, the value for the new data set is the maximum or minimum or sum of the values for the given period, respectively.
 – With combine = mean or combine = weightedmean, the value for the new data set is the arithmetic mean of the values for the given period; in the first case, unweighted, and in the second case, weighted by the length of the interval. If weightedmean is indexed by first, then it is assumed that the dates for the time series are the first in their period, and the weight of a value is the length of the period between the current date and the next one. If weightedmean is indexed by last, then it is assumed that the dates for the time series are the last in their period, and the weight of a value is the length of the time period between the previous date and the current one. If weightedmean is indexed by middle, or not indexed at all, then it is assumed that the period boundaries are in the middle between the specified dates, and the weight of a value is half the length of the time period between the previous date and the next one.
 – With combine = first or combine = last (the default), the value for the new data set is the first or last of the values for the given period, respectively.
 – When using combine = p or combine = [p, nodates] or combine = [p, dates], where p is a custom procedure and the names nodates and dates are included literally, the procedure p is called for every period. When using combine = p or combine = [p, nodates], it is called as p(v), and when using combine = [p, dates], it is called as p(v, d); the values of the parameters are as follows:
 • v is Vector of the n data values for the given period; n is zero or more.
 • d is the Vector of n+2 dates corresponding to the data in v, and the last date before it and the first date after it, in order, expressed as the number of seconds elapsed since January 1st, 1970. For the purposes of this argument d, the sequence of dates in the time series is temporarily extended by an interval at the beginning equal to the first interval, and an interval at the end equal to the last interval.
 p should return the value to be used for the given period. This should be a value that can be stored in a Matrix with datatype float. If it throws an error for any value, $\mathrm{undefined}$ will be used (that is, the data will be considered missing).
 – For all of the combining methods, if there are no data points for any period, the result will be obtained from applying the given procedure to a Vector with zero entries. In particular, max will use $-\mathrm{\infty }$, min will use $\mathrm{\infty }$, sum will use $0$, and mean, first, and last will all use $\mathrm{undefined}$.
 • The date option (which can also be spelled dates) can have the values first, last, or default; or equivalently, the same strings, "first", etc. This option determines the time points with which the given periods are marked in the resulting TimeSeries.
 – With dates = first, the resulting data are considered to occur at the first second of the given period.
 – With dates = last, the resulting data are considered to occur at the last second of the given period.
 – The option dates = default (the default) works like dates = last, unless combine = first is given, in which case it works like dates = first.
 • You can specify extra options, such as period or headers, that are accepted by the TimeSeries constructor. These are passed on to that constructor.

Examples

 > $\mathrm{with}\left(\mathrm{TimeSeriesAnalysis}\right):$
 > $\mathrm{sales_numbers}≔⟨150,147,114,113,91,164,56,39,32,86,91,125,100,106,88,151,90,104,86,103,96,77,90,94,88,86,87,113,100,93,97,99,95,92,81,89,71,110,127,105⟩$
 > $\mathrm{sales}≔\mathrm{TimeSeries}\left(\mathrm{sales_numbers},\mathrm{startdate}="2010-01-01",\mathrm{frequency}="weekly",\mathrm{header}="Weekly Sales"\right)$
 ${\mathrm{sales}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Weekly Sales}}\\ {\mathrm{40 rows of data:}}\\ {\mathrm{2010-01-01 - 2010-10-01}}\end{array}\right]$ (1)
 > ${\mathrm{GetData}\left(\mathrm{sales}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}150.0\\ 147.0\\ 114.0\\ 113.0\\ 91.0\end{array}\right]$ (2)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{sales}\right)$

Let's compute approximate total sales per month. (Approximate, because weeks don't fully line up with months.)

 > $\mathrm{total_monthly_sales}≔\mathrm{Aggregate}\left(\mathrm{sales},\mathrm{monthly},\mathrm{combine}=\mathrm{sum},\mathrm{header}="Approx. Total Monthly Sales"\right)$
 ${\mathrm{total_monthly_sales}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Approx. Total Monthly Sales}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-01-31 - 2010-10-31}}\end{array}\right]$ (3)
 > ${\mathrm{GetData}\left(\mathrm{total_monthly_sales}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}615.0\\ 291.0\\ 402.0\\ 539.0\\ 362.0\end{array}\right]$ (4)
 > ${\mathrm{GetDates}\left(\mathrm{total_monthly_sales}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}{"2010-01-31"}\\ {"2010-02-28"}\\ {"2010-03-31"}\\ {"2010-04-30"}\\ {"2010-05-31"}\end{array}\right]$ (5)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{sales},\mathrm{total_monthly_sales}\right)$

If we use the dates = first option, the data for each month is the same, but it is marked as occurring on the first of the month, rather than the last.

 > $\mathrm{total_monthly_sales_2}≔\mathrm{Aggregate}\left(\mathrm{sales},\mathrm{monthly},\mathrm{combine}=\mathrm{sum},\mathrm{dates}=\mathrm{first},\mathrm{header}="Approx. Total Monthly Sales"\right)$
 ${\mathrm{total_monthly_sales_2}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Approx. Total Monthly Sales}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-01-01 - 2010-10-01}}\end{array}\right]$ (6)
 > ${\mathrm{GetData}\left(\mathrm{total_monthly_sales_2}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}615.0\\ 291.0\\ 402.0\\ 539.0\\ 362.0\end{array}\right]$ (7)
 > ${\mathrm{GetDates}\left(\mathrm{total_monthly_sales_2}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}{"2010-01-01"}\\ {"2010-02-01"}\\ {"2010-03-01"}\\ {"2010-04-01"}\\ {"2010-05-01"}\end{array}\right]$ (8)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{sales},\mathrm{total_monthly_sales},\mathrm{total_monthly_sales_2}\right)$

The weekly sales dates start on January 1st, 2010, which was a Friday. Some months have more Fridays than others, which will skew the total monthly sales. This effect is somewhat mitigated if we compute the average sales per week for each month, rather than the total sales.

 > $\mathrm{weekly_sales_avg_per_month}≔\mathrm{Aggregate}\left(\mathrm{sales},\mathrm{monthly},\mathrm{combine}=\mathrm{mean},\mathrm{header}="Approx. Weekly Sales, Avg. per Month"\right)$
 ${\mathrm{weekly_sales_avg_per_month}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Approx. Weekly Sales, Avg. per Month}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-01-31 - 2010-10-31}}\end{array}\right]$ (9)
 > ${\mathrm{GetData}\left(\mathrm{weekly_sales_avg_per_month}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}123.0\\ 72.75\\ 100.5\\ 107.8\\ 90.5\end{array}\right]$ (10)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{sales},\mathrm{weekly_sales_avg_per_month}\right)$

Alternatively, we can make a time series of the inter-week variability per month; in particular, the standard deviation of weekly sales between the weeks in one month. We can do this by using the command Statistics:-StandardDeviation, which accepts a Vector to return the standard deviation of the values in that Vector. This is exactly the format we need for a custom combine procedure.

 > $\mathrm{sales_variability}≔\mathrm{Aggregate}\left(\mathrm{sales},\mathrm{monthly},\mathrm{combine}=\mathrm{Statistics}:-\mathrm{StandardDeviation},\mathrm{header}="Sales Variability"\right)$
 ${\mathrm{sales_variability}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Sales Variability}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-01-31 - 2010-10-31}}\end{array}\right]$ (11)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{sales},\mathrm{weekly_sales_avg_per_month},\mathrm{sales_variability}\right)$

Now let's look at a time series of monthly data to begin with.

 > $\mathrm{results}≔⟨15,30,26,43,39,76,9,74,47,65,51,18,47,49,49,47,56,51,52,52,59,43,49,77,69,87,67,62,55,42⟩$
 > $\mathrm{ts}≔\mathrm{TimeSeries}\left(\mathrm{results},\mathrm{startdate}="2010-01-01",\mathrm{frequency}="monthly",\mathrm{header}="Results"\right)$
 ${\mathrm{ts}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Results}}\\ {\mathrm{30 rows of data:}}\\ {\mathrm{2010-01-01 - 2012-06-01}}\end{array}\right]$ (12)
 > ${\mathrm{GetData}\left(\mathrm{ts}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}15.0\\ 30.0\\ 26.0\\ 43.0\\ 39.0\end{array}\right]$ (13)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{ts}\right)$

If we want to aggregate this to quarterly data by averaging, it might make sense to use a weighted mean: as opposed to weeks, months are not equally long. The dates for the time series use the first day of each month, so we use weightedmean[first].

 > $\mathrm{umean_results}≔\mathrm{Aggregate}\left(\mathrm{ts},\mathrm{quarterly},\mathrm{combine}=\mathrm{mean},\mathrm{header}="Unweighted"\right)$
 ${\mathrm{umean_results}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Unweighted}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-03-31 - 2012-06-30}}\end{array}\right]$ (14)
 > $\mathrm{wmean_results}≔\mathrm{Aggregate}\left(\mathrm{ts},\mathrm{quarterly},\mathrm{combine}={\mathrm{weightedmean}}_{\mathrm{first}},\mathrm{header}="Weighted"\right)$
 ${\mathrm{wmean_results}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{Weighted}}\\ {\mathrm{10 rows of data:}}\\ {\mathrm{2010-03-31 - 2012-06-30}}\end{array}\right]$ (15)

The difference between the results is quite small, though. We can see a difference when looking at the actual data, but not in the plot.

 > ${\mathrm{GetData}\left(\mathrm{umean_results}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}23.666666666666668\\ 52.666666666666664\\ 43.333333333333336\\ 44.666666666666664\\ 48.333333333333336\end{array}\right]$ (16)
 > ${\mathrm{GetData}\left(\mathrm{wmean_results}\right)}_{\left(\right)..5}$
 $\left[\begin{array}{c}23.45437702640111\\ 52.51648351648352\\ 43.29347826086956\\ 44.600724309642374\\ 48.310792033348775\end{array}\right]$ (17)
 > $\mathrm{TimeSeriesPlot}\left(\mathrm{ts},\mathrm{umean_results},\mathrm{wmean_results}\right)$

Compatibility

 • The TimeSeriesAnalysis[Aggregate] command was introduced in Maple 2015.