Get Data Out technical documentation: Incidence

Background

The incidence counts and crude incidence rates are calculated for each Get Data Out grouping. Crude rates are helpful in determining the cancer burden and specific needs for services for a given population, compared with another population, regardless of size.

The NCRAS SOP - Crude incidence rates, age specific rates and ASRs was followed. This technical documentation should be read together with the SOP as it provides additional information about the snapshots that were used and the code that was run.

1. Populations of interest

The populations of interest were defined as in the documents below which are available on the Get Data Out website:

Data for 2013-2019 are included. The documents explain the partition of cases into groups. Incidence counts, crude rates and confidence intervals were produced for all these groups.

2. Counting the number of cancer cases

The number of cancer cases were taken from AV2019.at_tumour_england on CASREF01. The CAS SOP #1 - Counting Cancer Cases was followed, with the additional exclusion of testicular tumours in female patients and ovarian tumours in male patients for tumours whose ICD code starts with D. (The SOP handles C-coded tumours but does not yet perform any exclusion on D-coded tumours: sex-exclusions are performed on all tumours for our dataset).

The code used to extract the cohort (and organise by cancer type) are in the following documents:

3. Counting At Risk populations

The number of people in the At Risk population was taken from ONS2019.POPULATIONS_NORMALISED on CASREF01.The codes above also extract the populations.

4. Calculation the Crude Rates and Confidence intervals

The crude rates and their confidence intervals were calculated in RStudio via PHE rate from the PHEindicatormethodpackage. For numerators that were equal to or greater than 10 Byar’s method is applied. For small numerators, Byar’s method is less accurate and so an exact method based on the Poisson distribution is used. As discussed in the APHO Technical Briefing 3 - Commonly Used Public Health Statistics and their Confidence Intervals, the rates can be approximately described by a Poisson distribution, and confidence intervals around them produced according to this. For small observed counts a precise confidence interval can be calculated using Poisson functions, and for larger numbers Byar’s approximation can be used. This methodology agrees with the NCRAS SOP - Crude incidence rates, age specific rates and ASRs.

5. Qualtiy Assurance

The Quality Assurance (QA) was done internally by the Get Data Out team.

Any questions, please contact sally.vernon1@nhs.net, charlotte.eversfield@nhs.net, hannah.maconochie1@nhs.net