Get Data Out technical documentation: Survival

Background

Survival estimates are calculated for each Get Data Out (GDO) grouping. GDO survival is calculated following the UK and Ireland Association of Cancer Registries (UKIACR) Guidelines on Population Based Cancer Survival Analysis SOP and NDRS Cancer Survival Methodology. These are the same guidelines used to produce the National Statistics on cancer survival in England.

This technical documentation should be read together with the UKIACR SOP and Cancer Survival Methodology as they provide additional information on methodology and data quality checks. This document explains any differences in methodology between the GDO survival analysis and the National Statistics survival approach.

Population of interest

The population of interest was selected as the first tumour for patients diagnosed with a cancer listed in Table 1 between 2013 and 2019 in England. The patient information used for the survival analysis was extracted from AV2019.AV_TUMOUR_ENGLAND on CASREF01 (and from the corresponding end-of-year snapshot for skin tumours) in the Cancer Analysis System (CAS).

To ensure high quality data, inclusion and exclusion criteria set out in pages 14-15 of the Data collection and quality assurance of administrative data publication and on pages 11 to 12 of the Guidelines on Population Based Cancer Survival Analysis SOP were followed. In contrast to the National Statistics survival approach, individuals under the age of 15 with tumours are not excluded from GDO survival estimates.

Patients were excluded from survival analysis if they have had a previous primary cancer of the same type diagnosed before the period of interest. This is because if a patient has two or more cancers of the same type, it is not clear whether survival time from that type of cancer should be measured from the first or later diagnosis. What counts as a previous cancer of the same type differs by GDO site and some single GDO cancer sites were separated into multiple sites for survival analysis. These details can be found in Table 1.

Data were extracted for tumours diagnosed from 1971 to 2019 (1995 to 2019 for skin tumours) to determine the patients first primary cancer of a particular Get Data Out grouping. For the majority of Get Data Out site groups a single definition was used to define the grouping for all diagnosis years (1971 to 2019). For some site groups the classification system or codes used to define the Get Data Out group were not applicable for cancers diagnosed prior to 2013. For these groups, a pre-2013 site definition was created so that an accurate assessment could be made of previous tumours of the same type for diagnoses prior to 2013 and diagnoses from 2013 onwards. These are outlined in Table 1.

Notes:

Please review the GDO site specific grouping documents for more comprehensive explanations and definitions on groupings.

*Please review the Bladder grouping document for a more comprehensive definition of bladder 2013 onwards

**Sarcomas in site C48 are excluded, defined as ICD-O-2 code

***C48 restricted to sarcomas, defined as ICD-O-2 code

Cancer in adults is defined using the International Statistical Classification of Diseases 10th Revision (ICD-10) and by morphology and behaviour codes in the International Classification of Diseases for Oncology, Third Edition (ICD-O3). Get Data Out metrics include some cancers with ICD-10 codes that in ICD-10 are not classified as malignant.

Survival analysis methodology

GDO survival data present two measures of survival for each grouping: Kaplan-Meier estimates and Net survival estimates at 3, 6, 9, 12 (1 year), 24 (2 years), 36 (3 years), 48 (4 years), 60 (5 years), 72 (6 years) and 84 (7 years) months.

A brief explanation of the Kaplan-Meier and Net survival methods are explained below. A more extensive explanation can be found in the Guidelines on Population Based Cancer Survival Analysis SOP.

Survival estimates published in GDO use the ‘complete’ approach, where some patients may have been followed up for less than the full period. The complete approach is used because some groups contain adults and children (individuals under the age of 15). (For a discussion of other survival methodology see pg15 to 18 of NDRS Cancer Survival Methodology).

The methodology used in the National Statistics uses the ‘complete’ approach for adult survival estimates (individuals aged 15 to 99 years), whilst ‘cohort’ and ‘period’ approach are used for childhood survival estimates (individuals under age 15). (For a discussion of other survival methodology see pg15 to 18 of NDRS Cancer Survival Methodology). All survival estimates published in GDO use the ‘complete’ approach, where some patients may have been followed up for less than the full period. The complete approach is used for consistency because some GDO groups contain adults and children.

Estimates were calculated in Stata 17.

Kaplan-Meier survival

The Kaplan-Meier estimator is used to measure the proportion of patients living to a certain amount of time after diagnosis. It is a non-parametric method, which calculates the cumulative probability of “all cause” survival. This estimator accounts for the total amount of time for which patients are alive after diagnosis and also for those patients who are lost to follow-up. The time between diagnosis and last known vital status date is the available survival time.

The variance of the Kaplan-Meier estimator was estimated using Greenwood’s formula. A 95% confidence interval was obtained using a complementary log-log transformation to constrain the limits to be between 0% and 100%.

Estimates were obtained in Stata using the STS list function.

Net survival

Net survival estimates the survival of cancer patients compared with the background mortality that patients would have experienced if they had not been diagnosed with cancer. Net survival is a variant of relative survival that is preferred as a measure of cancer survival in adults because it is an unbiased estimator. The Pohar-Perme estimator is used to create a unbiased estimate of net survival that accounts for informative censoring bias.

Survival greater than 100% can occur if the survival experience in cancer patients is greater than the survival experience of the general population. For example, a high proportion of breast cancers are screen-detected and women who attend screening have on average better health status, therefore are less likely to die from non-cancer causes than the general population.

Estimates were calculated using the STNS function in Stata 17.

Lifetables

The background mortality for the general population is derived from population life tables produced and published by NCRAS. When using these life tables, the mortality of cancer patients is compared with that of individuals in the general population who belong to the same single year of age (0 to 99 years), sex, population weighted quintile of the index of multiple deprivation (IMD) and region. Age is capped at 99 in order to align with the ICSS ages. For children (individuals aged 0 to 14) survival is calculated by setting the rate of death in the general population lifetables to be 0, as the assumption is made that a death of a child within 10 years of a cancer diagnosis is almost always due to their cancer diagnosis.

Further information on the methodology used to create the lifetables can be found in the Lifetables Methodology document.

Caveats

Kaposi sarcoma estimates may be particularly affected by potential limitations in our assumptions about baseline populations. Net survival is calculated to give a measure of the probability of surviving cancer in the absence of other causes of death. To do this we compare the survival to the expected survival in the cohort if they did not have cancer. This is done by using life tables which look at the survival of the baseline population. However, Kaposi sarcoma is mostly seen in people with advanced HIV infections. This may act as a confounder, as people with Kaposi sarcoma are being compared to the population of England as the baseline population. Ideally, the baseline population would be made from a cohort that had the same % of people with HIV but without Kaposi sarcoma, as the % of people with Kaposi sarcoma who have HIV. This would give the best measure of the impact of the Kaposi sarcoma alone on survival. If the life expectancy of a population with HIV but no Kaposi sarcoma is lower than the life expectancy of the same population with no HIV, then the net survival as a measure of the probability of surviving cancer in the absence of other causes of death may be biased downwards. Similar problems with net survival may occur for other cancer sites if there is a strongly correlated comorbidity that also significantly reduces life expectancy. However, the Kaplan-Meier survival is a simpler measure of how many of the people diagnosed were still alive after a given time period (with no adjustment for the cause of death) and does not suffer from this problem.

For skin tumours there are a small number of patients for which the subsequent BCC or cSCC tumour is included in the analysis rather than the first BCC or cSCC tumour. This is due to a data issue affecting four CCGs between 1995 and 2007. The impact of this on the survival estimates is thought to be minimal and does not change any conclusions that may be drawn from the data.

Suppression rules

In the survival analyses, for groups where the estimates do not meet particular data quality criteria, the results are suppressed with a certain code relative to that criteria:

A minimum of ten patients should be alive at the beginning of the survival period being estimated (for example, first year of follow-up for a 1-year estimate; no cohorts failed this criterion. Cohorts failing this criterion are denoted “.e” in the results.
At least two deaths registered in the years before or after the duration(s) being estimated. Cohorts failing this criterion are denoted “.f” in the results.
The standard error of the survival estimates should be lower than 20%. Cohorts failing this criterion are denoted “.g” in the results.
The level of the survival estimates should not increase with duration; for example, the survival estimated at 6-months following diagnosis should be lower than the survival estimated at 3-months following diagnosis. Cohorts failing this criterion are denoted “.h” in the results.
For cohorts denoted “.a”, the data for a particular length of survival are not yet available.
Where a group size is very small, data are not available as a measure to protect patient confidentiality. This is indicated with “.k”.

For skin cancers:

Survival of skin groups that are subsequent are denoted “.c” in the results. Even if we had perfect data quality, it is not helpful to calculate ‘survival after first tumour’ for a group that is ‘subsequent tumours’.
Survival of skin groups that are top of the tree and a mix of groups (e.g. BCC and Melanoma) are denoted “.i” in the results. It is theoretically possible to calculate survival for this group, but the ‘previous tumour’ cannot yet be defined for these mixed groups in a meaningful way.

A full definition of these codes can be found in the GDO missing file.

Appendices

Appendix 1

This is the sarcoma lookup with all codes associated with sarcoma. For more information on Sarcoma please read the Sarcoma grouping document

Appendix 2

This is the skin tumour lookup with all codes associated with skin tumours. For more information on skin tumours please read the Skin grouping document

Any questions, please contact sally.vernon1@nhs.net, polly.jeffrey1@nhs.net, hannah.maconochie1@nhs.net, thomas.higgins@nhs.net