Introduction to PHEindicatormethods

Georgina Anderson

2024-01-25

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  77 196
#> 2 Area2 2015   3 140
#> 3 Area3 2015  12 173
#> 4 Area4 2015   5 183
#> 5 Area1 2016  70 195
#> 6 Area2 2016  65 148
#> 7 Area3 2016  54 168
#> 8 Area4 2016  16 188

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop      value     lowercl    uppercl confidence
#> 1 Area1 2015  77 196 0.39285714 0.327173007 0.46266039        95%
#> 2 Area2 2015   3 140 0.02142857 0.007314003 0.06110479        95%
#> 3 Area3 2015  12 173 0.06936416 0.040121258 0.11731614        95%
#> 4 Area4 2015   5 183 0.02732240 0.011725747 0.06235556        95%
#> 5 Area1 2016  70 195 0.35897436 0.294968065 0.42842966        95%
#> 6 Area2 2016  65 148 0.43918919 0.361774515 0.51968079        95%
#> 7 Area3 2016  54 168 0.32142857 0.255479343 0.39536161        95%
#> 8 Area4 2016  16 188 0.08510638 0.053063732 0.13376480        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop      value     lowercl    uppercl confidence
#> 1 Area1 2015  77 196 0.39285714 0.292449398 0.50322029      99.8%
#> 2 Area2 2015   3 140 0.02142857 0.004313174 0.09966265      99.8%
#> 3 Area3 2015  12 173 0.06936416 0.029566868 0.15421632      99.8%
#> 4 Area4 2015   5 183 0.02732240 0.007549329 0.09398057      99.8%
#> 5 Area1 2016  70 195 0.35897436 0.261701252 0.46941522      99.8%
#> 6 Area2 2016  65 148 0.43918919 0.320634897 0.56511532      99.8%
#> 7 Area3 2016  54 168 0.32142857 0.222297357 0.43976878      99.8%
#> 8 Area4 2016  16 188 0.08510638 0.040616613 0.16970803      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop     value    lowercl   uppercl confidence  statistic
#> 1 Area1 2015  77 196 39.285714 32.7173007 46.266039        95% percentage
#> 2 Area2 2015   3 140  2.142857  0.7314003  6.110479        95% percentage
#> 3 Area3 2015  12 173  6.936416  4.0121258 11.731614        95% percentage
#> 4 Area4 2015   5 183  2.732240  1.1725747  6.235556        95% percentage
#> 5 Area1 2016  70 195 35.897436 29.4968065 42.842966        95% percentage
#> 6 Area2 2016  65 148 43.918919 36.1774515 51.968079        95% percentage
#> 7 Area3 2016  54 168 32.142857 25.5479343 39.536161        95% percentage
#> 8 Area4 2016  16 188  8.510638  5.3063732 13.376480        95% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl   uppercl confidence  statistic
#> 1 Area1 2015  77 196 39.285714 29.2449398 50.322029      99.8% percentage
#> 2 Area2 2015   3 140  2.142857  0.4313174  9.966265      99.8% percentage
#> 3 Area3 2015  12 173  6.936416  2.9566868 15.421632      99.8% percentage
#> 4 Area4 2015   5 183  2.732240  0.7549329  9.398057      99.8% percentage
#> 5 Area1 2016  70 195 35.897436 26.1701252 46.941522      99.8% percentage
#> 6 Area2 2016  65 148 43.918919 32.0634897 56.511532      99.8% percentage
#> 7 Area3 2016  54 168 32.142857 22.2297357 43.976878      99.8% percentage
#> 8 Area4 2016  16 188  8.510638  4.0616613 16.970803      99.8% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop     value    lowercl   uppercl
#> 1 Area1 2015  77 196 39.285714 29.2449398 50.322029
#> 2 Area2 2015   3 140  2.142857  0.4313174  9.966265
#> 3 Area3 2015  12 173  6.936416  2.9566868 15.421632
#> 4 Area4 2015   5 183  2.732240  0.7549329  9.398057
#> 5 Area1 2016  70 195 35.897436 26.1701252 46.941522
#> 6 Area2 2016  65 148 43.918919 32.0634897 56.511532
#> 7 Area3 2016  54 168 32.142857 22.2297357 43.976878
#> 8 Area4 2016  16 188  8.510638  4.0616613 16.970803

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  77 196 39285.714 31002.5473 49101.036        95% rate per 100000
#> 2 Area2 2015   3 140  2142.857   441.9087  6262.338        95% rate per 100000
#> 3 Area3 2015  12 173  6936.416  3580.0635 12117.259        95% rate per 100000
#> 4 Area4 2015   5 183  2732.240   887.1510  6376.138        95% rate per 100000
#> 5 Area1 2016  70 195 35897.436 27982.6159 45354.905        95% rate per 100000
#> 6 Area2 2016  65 148 43918.919 33894.0458 55979.148        95% rate per 100000
#> 7 Area3 2016  54 168 32142.857 24145.0777 41940.269        95% rate per 100000
#> 8 Area4 2016  16 188  8510.638  4861.4409 13821.510        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Exact
#> 3  Byars
#> 4  Exact
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl   uppercl confidence    statistic
#> 1 Area1 2015  77 196 39.285714 26.8789289 55.195935      99.8% rate per 100
#> 2 Area2 2015   3 140  2.142857  0.1360953  9.330172      99.8% rate per 100
#> 3 Area3 2015  12 173  6.936416  2.3123561 15.653734      99.8% rate per 100
#> 4 Area4 2015   5 183  2.732240  0.4040283  8.991664      99.8% rate per 100
#> 5 Area1 2016  70 195 35.897436 24.0728005 51.247396      99.8% rate per 100
#> 6 Area2 2016  65 148 43.918919 28.9731496 63.513710      99.8% rate per 100
#> 7 Area3 2016  54 168 32.142857 20.2857877 48.105200      99.8% rate per 100
#> 8 Area4 2016  16 188  8.510638  3.3866809 17.378534      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Exact
#> 3  Byars
#> 4  Exact
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl   uppercl
#> 1 Area1 2015  77 196 39.285714 26.8789289 55.195935
#> 2 Area2 2015   3 140  2.142857  0.1360953  9.330172
#> 3 Area3 2015  12 173  6.936416  2.3123561 15.653734
#> 4 Area4 2015   5 183  2.732240  0.4040283  8.991664
#> 5 Area1 2016  70 195 35.897436 24.0728005 51.247396
#> 6 Area2 2016  65 148 43.918919 28.9731496 63.513710
#> 7 Area3 2016  54 168 32.142857 20.2857877 48.105200
#> 8 Area4 2016  16 188  8.510638  3.3866809 17.378534

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015    97   692 0.140   0.116   0.168 95%        proportion of 1 Wilson
#> 2  2016   205   699 0.293   0.261   0.328 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015    97   692 14017.  11367.  17100. 95%        rate per 100000 Byars 
#> 2  2016   205   699 29328.  25450.  33629. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev value  lowercl  uppercl confidence statistic
#> 1       302           8 31.63068 37.75 11.30609 64.19391        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015        97           4  35.4  24.2  -156.     205. 99.8%      mean     
#> 2  2016       205           4  24.4  51.2   -73.5    176. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015        97           4  35.4  24.2  -156.     205.
#> 2  2016       205           4  24.4  51.2   -73.5    176.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  44 15092
#> 2 Area1 2006 Male       5  15 10418
#> 3 Area1 2006 Male      10 187 13220
#> 4 Area1 2006 Male      15  89 11665
#> 5 Area1 2006 Male      20  21 15458
#> 6 Area1 2006 Male      25 139 14885

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1614    278486  632.    600.    666. 95%       
#>  2 Area1  2006 Male          1637    263156  608.    577.    641. 95%       
#>  3 Area1  2007 Female        1584    287686  554.    525.    584. 95%       
#>  4 Area1  2007 Male          1975    286970  723.    688.    758. 95%       
#>  5 Area1  2008 Female        2166    271994  827.    790.    864. 95%       
#>  6 Area1  2008 Male          1690    297058  584.    554.    615. 95%       
#>  7 Area1  2009 Female        1674    299153  653.    620.    687. 95%       
#>  8 Area1  2009 Male          2049    279695  784.    748.    822. 95%       
#>  9 Area1  2010 Female        2141    267544  782.    746.    819. 95%       
#> 10 Area1  2010 Male          1435    280798  554.    524.    585. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1614    278486  632.    600.    666.
#>  2 Area1  2006 Male          1637    263156  608.    577.    641.
#>  3 Area1  2007 Female        1584    287686  554.    525.    584.
#>  4 Area1  2007 Male          1975    286970  723.    688.    758.
#>  5 Area1  2008 Female        2166    271994  827.    790.    864.
#>  6 Area1  2008 Male          1690    297058  584.    554.    615.
#>  7 Area1  2009 Female        1674    299153  653.    620.    687.
#>  8 Area1  2009 Male          2049    279695  784.    748.    822.
#>  9 Area1  2010 Female        2141    267544  782.    746.    819.
#> 10 Area1  2010 Male          1435    280798  554.    524.    585.
#> # ℹ 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1614    278486  632.    600.    666. 95%       
#>  2 Area1  2006 Male          1637    263156  608.    577.    641. 95%       
#>  3 Area1  2007 Female        1584    287686  554.    525.    584. 95%       
#>  4 Area1  2007 Male          1975    286970  723.    688.    758. 95%       
#>  5 Area1  2008 Female        2166    271994  827.    790.    864. 95%       
#>  6 Area1  2008 Male          1690    297058  584.    554.    615. 95%       
#>  7 Area1  2009 Female        1674    299153  653.    620.    687. 95%       
#>  8 Area1  2009 Male          2049    279695  784.    748.    822. 95%       
#>  9 Area1  2010 Female        2141    267544  782.    746.    819. 95%       
#> 10 Area1  2010 Male          1435    280798  554.    524.    585. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1614    278486  632.    600.    666. 95%       
#>  2 Area1  2006 Male          1637    263156  608.    577.    641. 95%       
#>  3 Area1  2007 Female        1584    287686  554.    525.    584. 95%       
#>  4 Area1  2007 Male          1975    286970  723.    688.    758. 95%       
#>  5 Area1  2008 Female        2166    271994  827.    790.    864. 95%       
#>  6 Area1  2008 Male          1690    297058  584.    554.    615. 95%       
#>  7 Area1  2009 Female        1674    299153  653.    620.    687. 95%       
#>  8 Area1  2009 Male          2049    279695  784.    748.    822. 95%       
#>  9 Area1  2010 Female        2141    267544  782.    746.    819. 95%       
#> 10 Area1  2010 Male          1435    280798  554.    524.    585. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1306    224154  618.    584.    654. 95%       
#>  2 Area1  2006 Male          1182    207056  585.    552.    620. 95%       
#>  3 Area1  2007 Female        1191    225394  556.    524.    588. 95%       
#>  4 Area1  2007 Male          1590    222293  748.    710.    786. 95%       
#>  5 Area1  2008 Female        1791    216099  848.    809.    889. 95%       
#>  6 Area1  2008 Male          1287    226874  577.    545.    611. 95%       
#>  7 Area1  2009 Female        1449    239270  672.    636.    709. 95%       
#>  8 Area1  2009 Male          1515    226984  757.    718.    797. 95%       
#>  9 Area1  2010 Female        1603    209767  790.    751.    830. 95%       
#> 10 Area1  2010 Male          1294    220315  590.    558.    624. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups:   area, year [20]
#>    area   year total_count total_pop value lowercl uppercl confidence statistic 
#>    <chr> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>      <chr>     
#>  1 Area1  2006        3251    541642  602.    581.    624. 95%        dsr per 1…
#>  2 Area1  2007        3559    574656  635.    612.    657. 95%        dsr per 1…
#>  3 Area1  2008        3856    569052  698.    674.    721. 95%        dsr per 1…
#>  4 Area1  2009        3723    578848  668.    646.    691. 95%        dsr per 1…
#>  5 Area1  2010        3576    548342  665.    642.    689. 95%        dsr per 1…
#>  6 Area2  2006        3359    580193  572.    552.    594. 95%        dsr per 1…
#>  7 Area2  2007        4308    552969  759.    735.    784. 95%        dsr per 1…
#>  8 Area2  2008        4134    556799  759.    734.    784. 95%        dsr per 1…
#>  9 Area2  2009        3310    545631  651.    628.    674. 95%        dsr per 1…
#> 10 Area2  2010        3935    562442  714.    690.    738. 95%        dsr per 1…
#> 11 Area3  2006        4071    519900  789.    763.    815. 95%        dsr per 1…
#> 12 Area3  2007        3673    555876  709.    685.    734. 95%        dsr per 1…
#> 13 Area3  2008        4390    580990  756.    732.    781. 95%        dsr per 1…
#> 14 Area3  2009        3604    554507  649.    627.    671. 95%        dsr per 1…
#> 15 Area3  2010        3357    546450  669.    646.    693. 95%        dsr per 1…
#> 16 Area4  2006        4103    607716  694.    672.    717. 95%        dsr per 1…
#> 17 Area4  2007        3673    602685  632.    611.    654. 95%        dsr per 1…
#> 18 Area4  2008        4003    564326  693.    671.    717. 95%        dsr per 1…
#> 19 Area4  2009        3056    552974  563.    541.    584. 95%        dsr per 1…
#> 20 Area4  2010        3525    578959  590.    570.    611. 95%        dsr per 1…
#> # ℹ 1 more variable: method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0  1008 130955
#> 2       5   703 125062
#> 3      10   758 115161
#> 4      15   848 113157
#> 5      20   656 119685
#> 6      25   844 105940

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     1614    1836. 0.879   0.837   0.923 95%       
#>  2 Area1  2006 Male       1637    1761. 0.929   0.885   0.976 95%       
#>  3 Area1  2007 Female     1584    1919. 0.825   0.785   0.867 95%       
#>  4 Area1  2007 Male       1975    1911. 1.03    0.989   1.08  95%       
#>  5 Area1  2008 Female     2166    1764. 1.23    1.18    1.28  95%       
#>  6 Area1  2008 Male       1690    1968. 0.859   0.818   0.901 95%       
#>  7 Area1  2009 Female     1674    1995. 0.839   0.800   0.880 95%       
#>  8 Area1  2009 Male       2049    1848. 1.11    1.06    1.16  95%       
#>  9 Area1  2010 Female     2141    1780. 1.20    1.15    1.25  95%       
#> 10 Area1  2010 Male       1435    1857. 0.773   0.733   0.814 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1614    1836. 0.879   0.837   0.923
#>  2 Area1  2006 Male       1637    1761. 0.929   0.885   0.976
#>  3 Area1  2007 Female     1584    1919. 0.825   0.785   0.867
#>  4 Area1  2007 Male       1975    1911. 1.03    0.989   1.08 
#>  5 Area1  2008 Female     2166    1764. 1.23    1.18    1.28 
#>  6 Area1  2008 Male       1690    1968. 0.859   0.818   0.901
#>  7 Area1  2009 Female     1674    1995. 0.839   0.800   0.880
#>  8 Area1  2009 Male       2049    1848. 1.11    1.06    1.16 
#>  9 Area1  2010 Female     2141    1780. 1.20    1.15    1.25 
#> 10 Area1  2010 Male       1435    1857. 0.773   0.733   0.814
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     1614    1836.     657.  578.    550.    607. 95%       
#>  2 Area1  2006 Male      1637    1761.     657.  611.    582.    641. 95%       
#>  3 Area1  2007 Fema…     1584    1919.     657.  542.    516.    570. 95%       
#>  4 Area1  2007 Male      1975    1911.     657.  679.    650.    710. 95%       
#>  5 Area1  2008 Fema…     2166    1764.     657.  807.    773.    842. 95%       
#>  6 Area1  2008 Male      1690    1968.     657.  564.    538.    592. 95%       
#>  7 Area1  2009 Fema…     1674    1995.     657.  552.    525.    579. 95%       
#>  8 Area1  2009 Male      2049    1848.     657.  729.    698.    761. 95%       
#>  9 Area1  2010 Fema…     2141    1780.     657.  791.    757.    825. 95%       
#> 10 Area1  2010 Male      1435    1857.     657.  508.    482.    535. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1614    1836.     657.  578.    550.    607.
#>  2 Area1  2006 Male       1637    1761.     657.  611.    582.    641.
#>  3 Area1  2007 Female     1584    1919.     657.  542.    516.    570.
#>  4 Area1  2007 Male       1975    1911.     657.  679.    650.    710.
#>  5 Area1  2008 Female     2166    1764.     657.  807.    773.    842.
#>  6 Area1  2008 Male       1690    1968.     657.  564.    538.    592.
#>  7 Area1  2009 Female     1674    1995.     657.  552.    525.    579.
#>  8 Area1  2009 Male       2049    1848.     657.  729.    698.    761.
#>  9 Area1  2010 Female     2141    1780.     657.  791.    757.    825.
#> 10 Area1  2010 Male       1435    1857.     657.  508.    482.    535.
#> # ℹ 30 more rows