This page documents the known limitations with the data.
Current known data limitations
COVID-19 effects
The coronavirus (COVID-19) lock-downs during 2020 caused a
noticeable change in the patterns of cancer diagnoses, with a
significant reduction in the number and rates of diagnosis compared to
2019 (approximately an 11% fall in incidence over all cancers).
Therefore, trend data should be interpreted with care over the COVID-19
period.
The registration of 2019 tumours were being completed during the
COVID-19 pandemic. This led to reduced access to the usual data sources,
and despite the registry’s best efforts, a noticeable decrease in data
quality in some fields. This is most seen in an increase in ‘stage
unknown’ tumours, and a corresponding decrease in other stage groups.
This should be noted when undertaking time-series analysis on the
data.
Skin tumours
Keratinocyte skin cancer is very common. Due to the burden of
work processing these cases, they are generally not manually reviewed,
but are run through an auto-processor. This creates a cancer
registration for the first BCC and cSCC per patient. Subsequent tumours
are imputed based on the presence of pathology, following the Z Venables
1stPPPA methodology. This is described in more detail in the full
documentation, and has been validated as a good measure of skin
incidence, but these cases do not have the full detail of complete
cancer registrations. This methodology was initially developed in
England. When comparing keratinocyte cancer rates between countries the
English rates may appear worryingly higher. This is usually explained by
other countries only counting the first tumour, and so is not increased
risk of skin cancer in England, but an artifact of cancer registration
and analysis.
The treatment statistics for keratinocyte cancers (BCC and cSCC),
are currently experimental. Although NDRS now uses HES outpatient
records as well as HES inpatient records to identify skin surgical
treatments (an improvement on the methodology used prior to the
01/06/2023 release) some quality concerns remain around the surgery
statistics, particularly for BCC tumours. Based on clinical treatment
patterns, current statistics suggest undercounting is still
present.
Missing denominator populations
- The Lung cancer partition is the first GDO partition to divide
groups by performance status. As we do not know the performance status
of the general (non-cancer) population, we do not have population
estimates for the performance status groups and hence cannot compute
incidence rates for individual performance status groups in the data.
The Population and incidence rate columns are therefore stubbed with
“.p” and only the raw incidence is reported.
Improvements in data quality over time
- The National Disease Registry has been one organisation since 2013,
which is why Get Data Out statistics start in this year. However, the
training and development of staff to work consistently as one
organisation was an ongoing process after the move to one registry.
Because of this in the early years of data rapid improvement in fields
such as stage completeness and more specific coding may be
observed.
Not otherwise specified code
- Improvements in coding can cause artifical changes in cancer
statistics over time. For example, poorly coded data could make greater
use of the ‘not otherwise specified’ (NOS) code. As data quality
improves, the incidence of specific types of cancer may appear to
increase, while the incidence of NOS cancers can decrease. When
interpreting changes over time for very specifically coded cancers,
reviewing the incidence of adjacent ‘NOS’ groups may be helpful.
Staging system changes over time
Get Data Out reports on many statistics by stage. Different
cancers are staged in different systems, where the main staging systems
used for each site are described in that sites grouping document.
However, as understanding of cancers improve, the staging system that
cancers are registered in continues to change and improve. In the GDO
data, this is most often seen as a move from using TNM 7 to TNM 8,
usually between 2017 and 2018 diagnoses. Please note that although
groups may seem the same either side of this change, there may be subtle
changes in their definitions between the staging systems. The main
changes that may be seen are the size of a group may change, or the
survival of patients in the group may change. Times of staging system
changes can be identified from the table below.
The table below lists the percentage of tumours staged in each
system for all staged tumours at each cancer site, diagnosed 2013-2020
by year.
Liver and biliary tract tumours
- Every year there are around 10-30 liver neuroendocrine tumours that
are classified into the ‘Other liver’ group. A sample of these liver
neuroendocrine tumours was reviewed and most of them were secondary
liver cancers, rather than primary, and so would usually be excluded
from the cohort. These tumours are being reviewed and a QA process is
being put in place to improve our data quality in the future. In the
data released on 01/06/2023, the size of the ‘Other liver’ group is
likely to be slightly inflated.
Haematological malignancies survival
- In the survival data released on 04/04/2024, we do not provide
survival estimates for Immunoglobulin deposition disease (IDD). This is
because the ICD-10 code corresponding to IDD (which is defined by GDO in
ICD-O-3) is E85, and we can currently only produce survival estimates
for groups with an ICD-10 code prefixed by C or D. We are working to
resolve this issue and plan to release survival estimates for this group
in future releases.
Data issues in previous GDO releases, now corrected
Eye cancer surgery data
- This problem was live between 01/06/2023 and 04/04/2024, but is now
fixed. Resection procedure codes for eye cancer have not yet been
defined and hence, all eye cancer surgery treatment combinations should
be flagged with “.m” for “Data are not available as resection procedure
codes have not been defined, i.e. we do not know what codes count as
surgery for this group”. This flagging was however missed in the data
released on 01/06/2023 resulting in all eye surgery combinations have a
count of 0, which is incorrect. This issue was fixed in the data
released on the website on 04/04/2024.
Bladder, Urethra, Renal Pelvis and Ureter data for 2013-2019
- This problem was live between 01/06/2023 and 09/06/2023, but is now
fixed. There were 96 rows of inaccurate data in the GDO_data_wide.csv
file released on 01/06/2023. For all years, 2013-2019, some bladder
groups had data published for treatment, survival, and routes to
diagnosis, where they should have been given a “.j” flag for “Data are
not available as the grouping is new and not all statistics have been
calculated yet (statistics are calculated annually)”. These groups are
the four stages (Stage localised, Stage locally advanced, Stage
metastatic, and Stage unknown) of the group Renal Pelvis and Ureter >
Malignant and in situ > Muscle-invasive, accounting for four rows of
data per year. Also the same four stages of the Bladder > Malignant
and in situ > Urothelial > Muscle-invasive group, accounting for
another four rows of data per year. This issue was fixed in the data and
re-released on the website on 09/06/2023.
2019 treatment data
- This problem was live between 03/11/2022 and 01/06/2023, but is now
fixed. Treatment data downloaded from the GDO website between 03/11/2022
and 01/06/2023 had minor inaccuracies, where the 2019 surgery rates were
artificially reduced due to missing surgery data. This issue was
resolved in the treatment data released on 01/06/2023. This effect was
generally estimated as no more than a 2% reduction (although greater for
non-melanoma skin cancers) which only affected 2019 cases. It is
recommended that all users refresh their treatment data to the latest
data available on the GDO website to avoid this bug.