The Get Data Out programme aims to publish statistics on small groups of patients with the same characteristics, for example tumour type, age, and gender. Groups of size approximately 100 allow for patient anonymity to be protected whilst allowing for statistically meaningful analysis to be carried out. In some cases where a set of characteristics is considered particularly clinically important, groups are smaller than 100, but are always large enough to protect patient anonymity.
The Get Data Out programme reduces the risk of associating data with any individual patient by making the unit of data a small group of patients, not individual patients. When each patient is in one and only one group we know that the risk of data release is minimised and the data passes the anonymisation standard. If larger groups were split by multiple factors, patients would appear in multiple groups. For example if Chromophobe RCC kidney tumours were split by age and also by gender, and there was a 30 year old female person, they would appear in two groups, both the group ‘under 60 year olds’ and ‘female’. Assigning patients to more than one group increases the amount of information we are publishing on them, and makes it harder to assess the risk in a data release. To minimise risk, we are prioritising putting all patients in one standard grouping. There may be opportunities to publish on overlapping partitions where patients are assigned to more than one group in the future. We have already experimented with this with the sarcoma partition, and initial findings are promising.
There are a number of reasons why we do not publish data for some groups and/or statistics. Where we are not publishing data, the numeric value is replaced with a letter. The meaning of each letter used can be found here and is listed below.
Due to the small size of some groups, it might risk patient confidentially to publish on that group every year. If the size of the group is large enough over the course of three years, the group is published on as a three-year statistic e.g. 2013-2015. This allows us to provide data on these groups without compromising patient confidentiality.
These statistics were chosen because they align nicely with the National Statistics that NDRS publish on. In the future, Get Data Out may publish on more statistical outputs, for example age-standardised rates. If there is one that you’re particularly keen to see, please get in touch here.
The initial goal of the Get Data Out programme, steered by Cancer52, was to publish detailed statistics on rare cancer sites for which there is often no data publicly available. The eventual aim is to publish data on all cancer sites, and the most recent additions are the “Liver and biliary tract”, “Haematological malignancies”, and “Haematological malignancy transformations” sites. If there is a cancer site that you’re particularly keen to see, please get in touch here.
Splitting a group by ethnicity is an option which is discussed in our working group meetings when we develop a new cancer site partition. However, given that there are often small numbers of patients in a group, usually the only publishable split would be into “White” and “Non-white”, to protect patient anonymity. At that level of granularity, it has not been chosen so far as the most clinically meaningful split for the outcome of that site or group.
In calculating our statistics, the ONS’ population tables are used to give a population denominator. This data has age bands of ‘0-4’, ‘5-9’, ‘10-14’ and ‘15-19’, which prevents Get Data Out from having an ‘Under 18’ split that would align with the patient being treated as a child. Some of the cancer groups that are split out by morphology e.g. Retinoblastoma are by their nature, cancers that tend to affect children and young people.
In the future, Get Data Out is considering developing a partition of all the cancers in children, teenagers and young adults that would then be split into Get Data Out cancer sites.