Overview

The data quality module is where users configure the data quality checks they want to run on incoming data from their SurveyCTO form. These checks feed into a data quality monitoring dashboard with different metrics such as the percentage of data quality violations aggregated by enumerator and location.

To enable the data quality module, select “Data Quality Dashboard” under Feature Selection.

Prerequisites

This step has the following prerequisites:

  1. Configure the main form on SurveyStream:

    Complete the SurveyCTO Integration step on SurveyStream and load the main form definition since all data quality checks are linked to a main form.

  2. (Optional) Deploy required data quality forms on SurveyCTO:

    Before configuring your data quality checks on SurveyStream make sure your data quality forms are deployed on SurveyCTO. This step is required only if you want to include checks based on data quality forms.

    Please check out the data quality form requirements section below for things to keep in mind while coding these forms.

  3. Configure Survey Status for Targets:

    SurveyStream provides an option to filter data based on Survey status variable before applying checks. This can be used to run checks only on completed and partially completed submissions. Therefore, ensure that the Survey Status for Targets module is complete and includes all the survey status values on which you would like to run the checks.

  4. Decide which checks to run and associated inputs:

    It will help if you have discussed and decided which checks to run and the inputs needed for each check before starting the configuration.

Configuration

Key concepts

  Data quality forms

Survey teams often create separate SurveyCTO forms for data quality processes that are carried out by a monitor. SurveyStream can use these data quality forms to calculate metrics like mismatch, protocol violation and spotcheck scores. SurveyStream supports the following types of data quality forms:

  1. Spotcheck: Monitor accompanies a surveyor to ensure they are following protocols
  2. Backcheck: Monitor calls back or revisits a respondent in person to check on a few responses to questions
  3. Audio Audit: Monitor listens to recording of survey to ensure that surveyors didn’t make entry errors and followed protocols correctly

  Data quality checks

SurveyStream supports ten different types of data quality checks:

  1. Logic: Check that certain skip patterns and logical relationships among variables are followed.
  2. Constraint: Check that the variable values fall within provided minimum and maximum constraints. Soft and hard constraint checks are available for finer grained monitoring.
  3. Outlier: Check whether continuous variables contain outliers, where an outlier is defined to be a certain multiple of the Inter Quartile Range or Standard Deviation or as values beyond a given percentile.
  4. Missing: Check if certain variables have a high percentage of missing values.
  5. Don’t Know: Check if certain variables have a high percentage of don’t know values.
  6. Refusal: Check if certain variables have a high percentage of refusal values.
  7. Mismatch: Check that a variable value in the main form matches the value of the same variable recorded in a data quality form.
  8. Protocol Violation: Check if a protocol has been violated as per entries in a data quality form.
  9. Spotcheck Score: Average the spotcheck scores recorded in data quality forms.
  10. GPS: Verify if GPS location of the household is within the sampled grid boundary coordinates or the GPS location of the household is same as the GPS of the sampled household within a margin of error.

Process

Configuring data quality forms

Adding data quality forms is very similar to adding a main form:

1

Form details

The first step is to provide the form details which includes:

InputDescription
Main SCTO formForm ID for the main SurveyCTO form linked to the data quality form
DQ form typeType of data quality form - audio audit, spotcheck or backcheck
DQ form IDForm ID of the data quality SurveyCTO form. This must match the form ID on the SurveyCTO form definition.
DQ form nameForm name of the data quality SurveyCTO form
2

SurveyCTO questions

The second step is to map the variables in the SurveyCTO form for the following required metadata fields:

  1. Target ID - Unique identifier for the survey respondent
  2. Enumerator ID - Unique identifier for the enumerator
  3. DQ enumerator ID - Unique identifier for the monitor who is filling out the data quality form
  4. Location variables (dynamic) - Unique identifier for each of the location levels configured in the survey

You can add multiple data quality forms for each main form and also edit/delete them if required.

Configuring data quality checks

There are two primary task to complete for this step:

1

Global configuration

This step has the following inputs:

  1. Select survey status values:

    Checks run on the SurveyCTO submissions with survey status values selected in this step. The dropdown has the list of all possible survey status values configured in the Survey Status for Targets module. This option is generally used to run checks only on fully completed submissions.

  2. Group by module name:

    When this option is selected, all checks will have a ‘Module name’ input and the metrics on the data quality monitoring dashboard can be grouped by module.

2

Configure checks

Here, you can provide the inputs for each check type. The inputs vary based on type of check. Below is a short description of inputs per check type:

Walkthrough

[Add a configuration walkthrough video]

Adding/editing checks during the survey

You can add/edit checks during the survey following the same process as configuring checks for the first time. The changes will take roughly 30 minutes to 1 hour to reflect on the dashboard. Changes will apply on all submissions of the form which means all flags corresponding to a deleted check or inactive check will be removed and newly added checks will run on all submissions including submissions that came before the change.

Handling inactive checks

During the survey, SurveyStream refreshes the form definition from SurveyCTO every 30 minutes. If a variable is removed from the form definition, any check using that variable will automatically be marked as inactive and Survey Admins will receive a warning notification regarding this change. When inactive, the check is not run and the corresponding flags are removed. You can edit such inactive checks to replace the removed variables and then mark them as active again.

Additional notes

Data quality form requirements

  1. Ensure each data quality form has the following variables:

    1. Target ID - Unique identifier for the survey respondent
    2. Enumerator ID - Unique identifier for the enumerator
    3. DQ enumerator ID - Unique identifier for the monitor who is filling out the data quality form
    4. Location variables (dynamic) - Unique identifier for each of the geo levels configured in the survey
  2. For mismatch checks, ensure the variable name on the data quality form matches the variable name on the main form

  3. For protocol violation checks, ensure that 0 indicates a violation on all protocol related questions