The Get Data Out Programme > Introduction
Understanding the ‘Get Data Out’ data
This page is a guide to help you understand more about the data from Get Data Out.
- How are patients grouped?
- What statistics do you release?
- How do I get more technical detail about this data?
- How should I use the data?
If you have any questions, please get in touch with us here. Please mention ‘Get Data Out’ in your email.
How are patients grouped?
The Get Data Out programme publishes data about small groups of patients. Patients with a particular type of tumour are divided into many small groups by certain characteristics, such as their year of diagnosis, more specific classification of their tumour, patient age or patient sex. Patients with different types of tumours are divided into groups differently because different tumours affect patients in different ways.
Each small group contains approximately 100 patients with the same characteristics. Statistics about each group are routinely calculated and released. Because patients have been grouped together in this way, the anonymisation standard has been ‘designed in’, and we are able to release this data safely without any risks to patient confidentiality. All data is anonymous.
You can imagine how we have divided patients into groups by thinking of a tree (or a sideways tree) as in the image shown. At the root of the tree are all the patients with a diagnosis of one type of tumour, and as you move along the branches these patients are divided into smaller groups by a certain characteristic. If a branch will go on to contain too few patients, then the branch cannot divide any further. Because the data is structured in a hierarchical way, we do not calculate statistics for combinations of the small groups. In this example, we do not have data on incidence for all endocrine tumours of the brain for a certain year, or the routes to diagnosis for all genders of patients. This is to protect patient confidentiality.
What statistics do you release?
We calculate four statistics to describe each group of patients within each tumour group.
Incidence. Statistics are provided on the number of new tumours diagnosed in each group and the incidence rate of cancer in this group with upper and lower confidence intervals.
Treatment. Statistics are provided on the number of tumours treated with surgery, chemotherapy, radiotherapy and all combinations of these treatments in each group, the % of tumours treated, and the upper and lower confidence intervals around the percentage.
Survival. Statistics are provided on the number of tumours included in the survival calculation and the net and crude survival rates in each group at 3, 6, 9, 12, 24, 36 and 48 months after diagnosis, with upper and lower confidence intervals.
Routes to Diagnosis. Statistics are provided on the number of tumours diagnosed by each 'route to diagnosis' and the % of tumours diagnosed by each route with the upper and lower confidence intervals. The eight standard diagnostic routes - two week wait; GP referral; screening; other outpatient; inpatient elective; emergency presentation; death certificate only and unknown - are provided, along with a 'not classified' group. Please visit: http://ncin.org.uk/publications/routes_to_diagnosis to find out more.
The data starts from diagnoses in 2013, because this was the first year the National Cancer Registration and Analysis Service (NCRAS) collected data in a consistent, high quality, and comparable way on a single system.
How do I get more technical detail about this data?
For information about how we calculate our statistics, as well as information about the metadata, descriptions of our units and a list of all our releases, please visit our technical page.
How should I use the data?
Download the data by following the links below. You can download the data separately for each tumour group, or you can download all the data we have at once.
|All data published so far|
|Brain, meningeal and other primary CNS tumours|
|Ovary, fallopian tube and primary peritoneal carcinomas|
|Testicular tumours including post-pubertal teratomas|
The data is signed off as non-disclosive and is released under an Open Government Licence. You are free to copy, publish, distribute and transmit the information, and to adapt it and include it in your own products. The attribution statement that must be included with any reuse of the data is:
Data for this [study/ project/ report/tool] is based on patient-level information collected by the NHS, as part of the care and support of cancer patients. The data is collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is part of Public Health England (PHE). The data is taken from the Get Data Out tables.