This page documents the known limitations with the data.

Current known data limitations

COVID-19 effects

  • The coronavirus (COVID-19) lock-downs during 2020 caused a noticeable change in the patterns of cancer diagnoses, with a significant reduction in the number and rates of diagnosis compared to 2019 (approximately an 11% fall in incidence over all cancers). Therefore, trend data should be interpreted with care over the COVID-19 period.

  • The registration of 2019 tumours were being completed during the COVID-19 pandemic. This led to reduced access to the usual data sources, and despite the registry’s best efforts, a noticeable decrease in data quality in some fields. This is most seen in an increase in ‘stage unknown’ tumours, and a corresponding decrease in other stage groups. This should be noted when undertaking time-series analysis on the data.

Skin tumours

  • Keratinocyte skin cancer is very common. Due to the burden of work processing these cases, they are generally not manually reviewed, but are run through an auto-processor. This creates a cancer registration for the first BCC and cSCC per patient. Subsequent tumours are imputed based on the presence of pathology, following the Z Venables 1stPPPA methodology. This is described in more detail in the full documentation, and has been validated as a good measure of skin incidence, but these cases do not have the full detail of complete cancer registrations. This methodology was initially developed in England. When comparing keratinocyte cancer rates between countries the English rates may appear worryingly higher. This is usually explained by other countries only counting the first tumour, and so is not increased risk of skin cancer in England, but an artifact of cancer registration and analysis.

  • The treatment statistics for keratinocyte cancers (BCC and cSCC), are currently experimental. Although NDRS now uses HES outpatient records as well as HES inpatient records to identify skin surgical treatments (an improvement on the methodology used prior to the 01/06/2023 release) some quality concerns remain around the surgery statistics, particularly for BCC tumours. Based on clinical treatment patterns, current statistics suggest undercounting is still present.

Missing denominator populations

  • The Lung cancer partition is the first GDO partition to divide groups by performance status. As we do not know the performance status of the general (non-cancer) population, we do not have population estimates for the performance status groups and hence cannot compute incidence rates for individual performance status groups in the data. The Population and incidence rate columns are therefore stubbed with “.p” and only the raw incidence is reported.

Improvements in data quality over time

  • The National Disease Registry has been one organisation since 2013, which is why Get Data Out statistics start in this year. However, the training and development of staff to work consistently as one organisation was an ongoing process after the move to one registry. Because of this in the early years of data rapid improvement in fields such as stage completeness and more specific coding may be observed.

Lung performance status data in early years

  • The performance status data in 2013 and 2014 has a relatively high proportion of unknowns (in the data released on 04/04/2024). We are working to improve the data completeness in these years by pulling in data from the national lung cancer audit and this will be included in future releases.

Not otherwise specified code

  • Improvements in coding can cause artifical changes in cancer statistics over time. For example, poorly coded data could make greater use of the ‘not otherwise specified’ (NOS) code. As data quality improves, the incidence of specific types of cancer may appear to increase, while the incidence of NOS cancers can decrease. When interpreting changes over time for very specifically coded cancers, reviewing the incidence of adjacent ‘NOS’ groups may be helpful.

Staging system changes over time

  • Get Data Out reports on many statistics by stage. Different cancers are staged in different systems, where the main staging systems used for each site are described in that sites grouping document. However, as understanding of cancers improve, the staging system that cancers are registered in continues to change and improve. In the GDO data, this is most often seen as a move from using TNM 7 to TNM 8, usually between 2017 and 2018 diagnoses. Please note that although groups may seem the same either side of this change, there may be subtle changes in their definitions between the staging systems. The main changes that may be seen are the size of a group may change, or the survival of patients in the group may change. Times of staging system changes can be identified from the table below.

  • The table below lists the percentage of tumours staged in each system for all staged tumours at each cancer site, diagnosed 2013-2020 by year.

Liver and biliary tract tumours

  • Every year there are around 10-30 liver neuroendocrine tumours that are classified into the ‘Other liver’ group. A sample of these liver neuroendocrine tumours was reviewed and most of them were secondary liver cancers, rather than primary, and so would usually be excluded from the cohort. These tumours are being reviewed and a QA process is being put in place to improve our data quality in the future. In the data released on 01/06/2023, the size of the ‘Other liver’ group is likely to be slightly inflated.

Haematological malignancies survival

  • In the survival data released on 04/04/2024, we do not provide survival estimates for Immunoglobulin deposition disease (IDD). This is because the ICD-10 code corresponding to IDD (which is defined by GDO in ICD-O-3) is E85, and we can currently only produce survival estimates for groups with an ICD-10 code prefixed by C or D. We are working to resolve this issue and plan to release survival estimates for this group in future releases.

Haematological malignancy transformations statistics

  • Only incidence statistics are produced for haematological malignancy transformations, and we do not publish on treatment, routes to diagnosis, or survival (they are stubbed with “.i”). In the case of survival, this is because the group is made up entirely of subsequent tumours, by definition. We do not yet have the appropriate definitions to define treatment or routes to diagnosis flags for transformation events.

Data issues in previous GDO releases, now corrected

Eye cancer surgery data

  • This problem was live between 01/06/2023 and 04/04/2024, but is now fixed. Resection procedure codes for eye cancer have not yet been defined and hence, all eye cancer surgery treatment combinations should be flagged with “.m” for “Data are not available as resection procedure codes have not been defined, i.e. we do not know what codes count as surgery for this group”. This flagging was however missed in the data released on 01/06/2023 resulting in all eye surgery combinations have a count of 0, which is incorrect. This issue was fixed in the data released on the website on 04/04/2024.

Bladder, Urethra, Renal Pelvis and Ureter data for 2013-2019

  • This problem was live between 01/06/2023 and 09/06/2023, but is now fixed. There were 96 rows of inaccurate data in the GDO_data_wide.csv file released on 01/06/2023. For all years, 2013-2019, some bladder groups had data published for treatment, survival, and routes to diagnosis, where they should have been given a “.j” flag for “Data are not available as the grouping is new and not all statistics have been calculated yet (statistics are calculated annually)”. These groups are the four stages (Stage localised, Stage locally advanced, Stage metastatic, and Stage unknown) of the group Renal Pelvis and Ureter > Malignant and in situ > Muscle-invasive, accounting for four rows of data per year. Also the same four stages of the Bladder > Malignant and in situ > Urothelial > Muscle-invasive group, accounting for another four rows of data per year. This issue was fixed in the data and re-released on the website on 09/06/2023.

2019 treatment data

  • This problem was live between 03/11/2022 and 01/06/2023, but is now fixed. Treatment data downloaded from the GDO website between 03/11/2022 and 01/06/2023 had minor inaccuracies, where the 2019 surgery rates were artificially reduced due to missing surgery data. This issue was resolved in the treatment data released on 01/06/2023. This effect was generally estimated as no more than a 2% reduction (although greater for non-melanoma skin cancers) which only affected 2019 cases. It is recommended that all users refresh their treatment data to the latest data available on the GDO website to avoid this bug.