2. Data Collected
In addition to those fields listed in the back of the Silver Book some centres collect additional data and include this in their submitted files. This has been done both through adding fields in the Access database as well as data collected in Excel. While the individual units may feel that this data is useful to them, and may feel it is useful to us, we do not incorporate this data in to the merged file for analysis and would encourage units to ensure that data outside of the required fields is deleted prior to submission. The analysis we undertake is solely on the data items outlined at the back of the Silver Book (listed in the EDI file). In addition, some units assign comments to individual cells of the data collected in Excel. The contents of these comments are not acted upon and they do not migrate in to SPSS when the data is made ready for analysis. Again, we would discourage their use and request that they are removed prior to data submission.
3. Identifying Data
Data collected in the ACPGBI Access database is readily identifiable since each data item has a unique field name which cannot be changed. When data is collected using Excel or CSV file formats it is imperative that a header row be included in the data sheet to allow us to readily identify the data that column contains and ensure that it is merged correctly – Please would all units not using the ACPGBI Access database submit files as detailed in the EDI document. Some units have not provided header rows to their submitted data and this causes problems and delays. With the large amount of data that is numerically encoded it becomes impossible to identify which field the data belongs to and would lead to us having to disregard that data. It has been practice to contact these units and request that they send a new file with the column headings. This can sometimes be done quickly, but for others it becomes a more lengthy process. This leads to delays in preparing the data for analysis, which increases the pressure on those individuals performing the analysis and writing the report. In future, data submitted without column headings readily identifying the data being stored will only have those fields that can be confidently identified included in the analysis. This may mean that all of your data could be disregarded.
4. Formatting of Data
This applies solely to data collected in Excel or as CSV files. The use of the ACPGBI Access database ensures that all fields have the correct formatting applied to them. When data is being collected in Excel or as CSV files, some units are submitting data with all of the cells in the spreadsheet formatted as Text. This can be easily recognised by the appearance of a green triangle in the top left corner of each cell that contains numerical data. Clicking on the cell will result in an exclamation mark appearing in a box adjacent to the cell warning that the numerical data is formatted as text. Numerical data can be easily changed to the correct format by highlighting the column of data and then clicking on the exclamation mark. An option will be listed for ‘Convert to Number’. Clicking this option will ensure the data is converted to the correct format.
Unfortunately, Excel does not highlight an error like this with dates in cells formatted as text. The correct formatting of dates can be checked by the following steps:
If the dates remain unchanged then they will need to be correctly formatted. There is no quick way to do this. Every cell is the column will need to be manually updated so that Excel recognises this is date data. In order to do this, you will need to double click on the cell and then press Enter. The date will then automatically change to the dd/mm/yyyy format. This will need to be repeated for every cell that contains a date. Once all the cells have been updated, the date format can be changed to the ‘*14/03/2001’ format using the above steps.
When preparing the data for analysis we need to run calculations to determine the patients age at diagnosis, length of stay, time from first referral to when first seen, and for ensuring the patients are analysed in the correct data period. If dates are stored as Text format, then these calculations will fail. Further, when importing data in to SPSS errors may occur resulting in incorrectly formatted dates being lost. In the complete NBOCAP dataset there are 30 fields that contain date data. For the 05-06 data set there were just over 9,000 patients. If all the dates were incorrectly formatted, this would result in over 270,000 cells of data requiring manual correction. It is not feasible for us to undertake this task, or allowed under Data Protection Act laws.
In this and future years, the onus will be on the submitting units to ensure that the data they submit is in the correct format. If errors occur because incorrectly formatted data is submitted, we will make no attempt to rectify these and mark the items as missing. The report produced in the Silver Book is for the benefit of everyone, in particular, those units who submit data. The quality of the analysis we produce can only ever be as good as the quality of the data you submit, and it is therefore up to the individual units to ensure that their data submission is of highest possible quality. If you want further advice on preparing your data for submission, please do not hesitate to contact us, but do allow plenty of time before the submission deadline.