Skip to main content

Crosswalks SOP

Introduction

Ensuring variable consistency across multiple dataset versions is essential for maintaining data integrity during migrations. In response to this need, the NACC, along with the Electronic Data Capture Working Group, has crafted a series of UDS Variable Crosswalks. These comprehensive tools are designed to facilitate the transition and support the Alzheimer's Disease Research Centers and other entities that utilize the Uniform Dataset in seamlessly integrating their data across UDS3 and UDS4.

The Crosswalks entail two components to aid both human interpretation alongside machine processing. For human users, we provide detailed tables that encompass data element names, crosswalk change indicators, and essential variable components from the data dictionary for both UDS3 and UDS4. These tables also include a comprehensive tabulation of all identified mapping rules. For those working with automated processes in data management and analysis, we offer equivalent JSON outputs that can be directly employed to develop mappings from UDS3 to UDS4 using environments such as R, Python, and SQL. Together, these resources provide ADRC's and other users with the tools necessary to not only transition to UDS4 efficiently but also continue to collect UDS3 as dictated by study needs, ensuring seamless data harmonization across UDS versions.

Scope

The scope of the UDS Variable Crosswalks is strategically designed to assist ADRCs in harmonizing their data for internal and operational purposes. While these crosswalks can indeed be used by Centers to build datasets for analysis or sharing their collected data, they are not intended to replace the complete, multi-center collection of UDS datasets typically provided by NACC during query requests. Instead, their primary aim is to support operational activities such as consensus conferences and dataset merging. The utility of these crosswalks extends to various specific use cases: for instance, they enable the longitudinal unification of data for Center participants who have records in both UDS3 and UDS4 formats. Additionally, they facilitate the merging of visits for multiple participants for both cross-sectional visits and multiple visits over time. This targeted approach ensures that Centers can maintain data continuity and integrity while navigating between UDS3 and UDS4.

Content -- Human Readable Tables

The human-readable tables in the UDS Variable Crosswalk are designed to assist ADRCs in transitioning from UDS3 to UDS4, ensuring data compatibility and integrity. These tables consist of several key components:

  1. Data Element and Crosswalk Change Dictionaries: These are provided for both UDS3 and UDS4 versions. The format is standardized across versions to facilitate ease of use. Each table includes:

    • Data Element Names: Identifies the variable names in UDS3 and UDS4, highlighting the nearest equivalents variable according to informational content or noting when a variable is new or has no direct counterpart.

    • Crosswalk Change Indicators: These binary indicators detail the nature of the data element changes, such as whether variables are new or removed, and whether there is potential for mapping between UDS3 and UDS4. Additionally, they indicate the complexity of the mapping (e.g., partial, complex), changes in forms or positions, alterations in data type or conformity, and changes in response levels and labels for factors.

  2. Data Dictionary Components: Drawn from UDS3 data elements dictionaries and REDCap-derived data dictionaries for UDS4 to inform the various mapping rules. Key components include:

    • UDS form and question number specifics for each dataset version.

    • Data types categorized into character, numeric factor, and entry integer/numeric.

    • Conformity details specifying allowable values.

    • Detailed response levels and labels, following REDCap conventions.

  3. Individual Mapping Rules: Specifically tailored to facilitate forward mapping from UDS3 to UDS4, these rules encompass:

    • Designated UDS4 target elements and their nearest UDS3 counterparts.

    • Defined mapping types based on the crosswalk indicators.

    • Specific old (UDS3) and new (UDS4) values linked by the mapping rule.

    • Potential for reversible mapping to UDS3, with conditions for partial or complex reversibility.

    • Additional notes and discussion points to aid in understanding and applying the mappings.

Content -- JSON Files

The JSON files streamline the application of crosswalks by focusing on mapping rules, which are vital for programmatic use. This approach enhances flexibility for Centers, enabling efficient auditing, quality checks, and troubleshooting. The mapping rules are presented in a hierarchy based on the level of complexity, which informs their organization within the JSON structure. Here are some examples, categorized for clarity:

  1. Direct Variable Mapping Variables retain the same names across UDS3 and UDS4, with only minor changes in metadata such as question text or position, or slight alterations in the wording of response labels. These mappings are straightforward, fully reversible, and involve no data loss. Variables are listed according to their UDS3 identifiers but can be inverted if necessary.

  2. Conditional Consistency The core informational and clinical concepts between the mapped variables remains unchanged between UDS3 and UDS4, even if variable names change. However, there is a risk of data loss if responses are merged or new ones added, which may complicate reversibility. Conditional mappings are often required.

  3. Structured Transformations Data type alterations necessitate more deliberate actions to accurately map responses. This includes transitioning from free text to structured data or implementing calculations for proper alignment between UDS3 and UDS4. Informational loss is generally minimal between UDS3 and UDS4.

  4. High Complexity Mappings in this category are potentially viable but are best left to individual Centers to define based on specific use cases. Clinical interpretations may not be straightforward, and response mappings could be impractical, and conceptual misalignment may exist. Centers are advised to proceed with caution or create bespoke mapping rules informed by established structures. All such rules are denoted as being complex within the conformity entry.

This classification helps Centers prioritize their mapping efforts, beginning with the simplest cases and progressing towards those requiring the most scrutiny and customized handling.

Maximizing Crosswalk Efficacy

  1. Core Collaboration: Each ADRC's Cores are encouraged to collaboratively review the mapping rules using the Human Readable Tables to select the most pertinent mappings for their unique objectives. Direct Variable Mappings are universally applicable, while Conditional Consistency and Structured Transformations typically facilitate a smooth transition from UDS3 to UDS4 with careful application. High Complexity variables, however, demand a meticulous examination to ensure their relevance and accurate deployment.

  2. JSON Scripting by Data Cores: Data Core personnel should author scripts employing the JSON files to seamlessly transform UDS3 data using their data platform of choice. JSON files can easily be leveraged using common data languages including R, Python, and SQL. These scripts should adeptly handle:

    • The conversion of response levels and labels

    • The consolidation of responses, utilizing the consistent " | " separators for efficient regex operations

    • The accurate mapping of question text to corresponding metadata during harmonization processes

    • The intelligent aggregation of UDS3 and UDS4 variables to uphold data integrity.

  3. Post-transformation Merging: Post-mapping, data variables should be amalgamated judiciously; while the crosswalk mappings are tailored for UDS3 to UDS4 transition, reversibility remains a feature and is highlighted whenever possible.

  4. Data Retention Protocol: As a best practice, legacy data variables that have been transformed should be retained within datasets. This preservation strategy empowers downstream researchers to make informed decisions regarding variable applicability.

  5. Custom Applications: Custom applications developed by the ADRCs, for example those which aid in consensus conferences, can easily be adapted to choose specific variable mappings that align with their utilization frameworks.

  6. Adaptability and Customization: The provided JSON rule sets and corresponding scripts are designed for adaptability. Centers can customize their migration processes by omitting certain rules or by bypassing rule classes, such as the High Complexity rules, entirely to suit their distinct requirements.

By adhering to these guidelines, ADRCs can ensure a methodical and strategic approach to harmonization between UDS3 and UDS4 data, enhancing both the quality and utility of their UDS research data.

Direct Variable Mapping Example - SOURCENW

{
"Direct_Mappings": [
{
"UDS3_variable": "SOURCENW"
"UDS4_variable": "SOURCENW"
"crosswalk_remappings": [
{
"change_type": "Positional change",
"mappings": [
{
"UDS3_value": "5",
"UDS4_value": "24"
"reversible": "Yes"
}
]
},
{
"change_type": "Question label change",
"mappings": [
{
"UDS3_value": "ADC enrollment type:",
"UDS4_value": "ADRC enrollment type:"
"reversible": "Yes"
}
]
},
{
"change_type": "Change in response labels",
"mappings": [
{
"UDS3_value": "Primarily ADC-funded (Clinical Core, Satellite Core, or other ADC Core or project)",
"UDS4_value": "Participant is supported primarily by ADRC funding (Clinical Core, Satellite Core, or other ADC Core or project)"
"reversible": "Yes"
},
{
"UDS3_value": "Subject is supported primarily by a non-ADC study (e.g., R01, including non-ADC grants supporting the FTLD Module participation)",
"UDS4_value": "Participant is supported primarily by a non-ADRC study (e.g., R01, including non-ADRC grants supporting the FTLD Module participation)"
"reversible": "Yes"
}
]
}
],
}
]
}

Conditional Consistency Example – RESIDENC and HISPANIC-to-ETHISPANIC

{
"Conditional_Consistency": [
{
"UDS3_variable": "RESIDENC",
"UDS4_variable": "RESIDENC",
"crosswalk_mappings": [
{
"mapping_type": "Positional change",
"mappings": [
{
"UDS3_value": "17",
"UDS4_value": "13",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Question label change",
"mappings": [
{
"UDS3_value": "What is the subject's primary type of residence?",
"UDS4_value": "What is your primary type of residence?",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Conformity change",
"mappings": [
{
"UDS3_value": "1-4, 9",
"UDS4_value": "1-5, 9",
"reversible": "Partial",
"note": "Only response levels 1-4,9 mappable to UDS3"
}
]
}
]
},
{
"UDS3_variable": "HISPANIC",
"UDS4_variable": "ETHISPANIC",
"crosswalk_mappings": [
{
"mapping_type": "Positional change",
"mappings": [
{
"UDS3_value": "8",
"UDS4_value": "3d",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Question label change",
"mappings": [
{
"UDS3_value": "Does the subject report of being Hispanic/Latino ethnicity (i.e., having origins from a mainly Spanish-speaking, Latin American country), regardless of race?",
"UDS4_value": "Race--Hispanic or Latino",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Conformity change",
"mappings": [
{
"UDS3_value": "0-1, 9",
"UDS4_value": "1 or blank",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Change in response levels",
"mappings": [
{
"UDS3_value": "0 | 9",
"UDS4_value": "blank",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Change in response labels",
"mappings": [
{
"UDS3_value": "Yes",
"UDS4_value": "Hispanic or Latino",
"reversible": "Yes"
}
]
}
]
}
]
}

Structured Transformation Example – PDYR-to-PDAGELEARNED

{
"Structured_Transformation": [
{
"UDS3_variable": "PDYR",
"UDS4_variable": "PDAGE",
"crosswalk_mappings": [
{
"mapping_type": "Change in form",
"mappings": [
{
"UDS3_value": "D2",
"UDS4_value": "A5/D2",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Question label change",
"mappings": [
{
"UDS3_value": "Year of PD diagnosis",
"UDS4_value": "Age at estimated PD symptom onset",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Conformity change",
"mappings": [
{
"UDS3_value": "1900-current year, 9999",
"UDS4_value": "10-110, 999",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Calculated mapping",
"mappings": [
{
"UDS3_value": "PDYR",
"UDS4_value": "PDYR - BIRTHYR",
"reversible": "Yes",
"note": "PDAGE is calculated by subtracting BIRTHYR from PDYR"
}
]
}
]
}
]
}

High Complexity Example – REFLEARNED to LEARNED

{
"High_Complexity": [
{
"UDS3_variable": "REFLEARNED",
"UDS4_variable": "LEARNED",
"crosswalk_mappings": [
{
"mapping_type": "Positional change",
"mappings": [
{
"UDS3_value": "2b",
"UDS4_value": "26",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Question label change",
"mappings": [
{
"UDS3_value": "If the referral source was self-referral or non-professional contact, how did the referral source learn of the ADC?",
"UDS4_value": "If the referral source was a self-referral or a nonprofessional contact, how did the referral source learn of the ADRC?",
"reversible": "Yes"
}
]
},
{
"mapping_type": "Conformity change",
"mappings": [
{
"UDS3_value": "1-4, 8, 9",
"UDS4_value": "1-10, 88, 99",
"reversible": "Complex",
"note": "Mapping between REFLEARNED and LEARNED is complex; exercise caution as many mappings may be non-applicable or conceptually misaligned."
}
]
},
{
"mapping_type": "Change in response levels",
"mappings": [
{
"UDS3_value": "8",
"UDS4_value": "88",
"reversible": "Yes"
},
{
"UDS3_value": "9",
"UDS4_value": "99",
"reversible": "Yes"
}
]
}
]
}
]
}