NACC Data Validation Pipeline
Submitting a form data batch CSV file (via ADRC portal or API) to center's ingest project will trigger the data validation pipeline in the NACC Data Platform.
For the centers who are using REDCap direct entry, the records that are marked as Ready for Data Platform Upload
will be transferred to the center's ingest project each night by NACC, and the same validation pipeline will be triggered.
The stages of the data validation pipeline are listed below. A submission can be rejected at any stage of the pipeline, whether the entire CSV file is rejected or only the erroneous visits are rejected depends on the stage and the type of the error.
CSV Screening:
Check whether the submitted CSV file is correctly named with a module suffix. Find the list of file name suffixes for currently supported modules here. Entire file will be rejected if it doesn't follow this naming convention.
Fix: Correct the filename and re-upload the entire file.
CSV Format Check:
Check whether the submitted CSV has correct headers and datatypes. The CSV should include only NACC accepted variables for the respective module, and match the NACC published Data Element Dictionary (DED) for the module. Entire file will be rejected if any of the following scenarios detected,
- CSV file is missing the header row
- There are extra fields in the CSV header that are not listed in the DED
- There are duplicate fields in the CSV header
- Input file has incorrect formatting and cannot be parsed
Fix: Correct the errors and re-upload the entire file with the same name.
Identifier Lookup:
For each row in the CSV file, look up the NACCID using the ADCID and PTID for that record. Record will not be processed further if there's no matching NACCID found in the system. Records that succeded NACCID lookup will proceed to the next step in the pipeline.
Fix: Correct and re-upload the erroneous rows as a new file.
Data Transformations and Pre-processing Checks:
For each row in the CSV file, apply any necessary data transformations required for the submitted module and run pre-processing checks. Record will not be processed further if any of the following scenarios occur,
- Failed the transformations
- Failed pre-processing checks
- Duplicate record already exisits for the visit (in this case, system will not report an error but will not proceed to the next step as there's no change in visit data)
If the transformations and pre-processing checks are successful, a JSON formatted file for the visit record is generated and uploaded to the ingest project as aquisition data.
Fix: Correct and re-upload the erroneous rows as a new file.
Data Quality Checks:
Run NACC data quality checks according to NACC published error checks. Any row that failed data quality checks will be rejected.
Fix: Correct and re-upload the erroneous rows as a new file.
Check the Error Report and Alert Verification sections for details on how to view the error report and approve alerts.