Feature Selection
.
Survey status
variable before applying checks. This can be used to run checks only on completed and partially completed submissions. Therefore, ensure that the Survey Status for Targets module is complete and includes all the survey status values on which you would like to run the checks.
Survey teams often create separate SurveyCTO forms for data quality processes that are carried out by a monitor. SurveyStream can use these data quality forms to calculate metrics like mismatch, protocol violation and spotcheck scores. SurveyStream supports the following types of data quality forms:
- Spotcheck: Monitor accompanies a surveyor to ensure they are following protocols
- Backcheck: Monitor calls back or revisits a respondent in person to check on a few responses to questions
- Audio Audit: Monitor listens to recording of survey to ensure that surveyors didn’t make entry errors and followed protocols correctly
SurveyStream supports ten different types of data quality checks:
- Logic: Check that certain skip patterns and logical relationships among variables are followed.
- Constraint: Check that the variable values fall within provided minimum and maximum constraints. Soft and hard constraint checks are available for finer grained monitoring.
- Outlier: Check whether continuous variables contain outliers, where an outlier is defined to be a certain multiple of the Inter Quartile Range or Standard Deviation or as values beyond a given percentile.
- Missing: Check if certain variables have a high percentage of missing values.
- Don’t Know: Check if certain variables have a high percentage of don’t know values.
- Refusal: Check if certain variables have a high percentage of refusal values.
- Mismatch: Check that a variable value in the main form matches the value of the same variable recorded in a data quality form.
- Protocol Violation: Check if a protocol has been violated as per entries in a data quality form.
- Spotcheck Score: Average the spotcheck scores recorded in data quality forms.
- GPS: Verify if GPS location of the household is within the sampled grid boundary coordinates or the GPS location of the household is same as the GPS of the sampled household within a margin of error.
Form details
Input | Description |
---|---|
Main SCTO form | Form ID for the main SurveyCTO form linked to the data quality form |
DQ form type | Type of data quality form - audio audit, spotcheck or backcheck |
DQ form ID | Form ID of the data quality SurveyCTO form. This must match the form ID on the SurveyCTO form definition. |
DQ form name | Form name of the data quality SurveyCTO form |
SurveyCTO questions
Global configuration
Configure checks
Common
Input | Description |
---|---|
Select variable | The variable that will be flagged if the check is violated. |
Flag description | (Optional) A short description of the flag that can be added on the dashboard for more context. |
Filter group | (Optional) Conditions for filtering the data before applying the check. The filter groups are joined by an OR operator and conditions within a group are joined by an AND operator. |
Module Name | (Optional) This is enabled when Group by module name is selected under global configuration and the value entered here is used to group results in the dashboard. |
Logic
Input | Description |
---|---|
Other variables | (Optional) Additional variables needed for the logic check’s assert conditions. These variables are assigned aliases B , C and so on. (Main variable is given the alias A ) |
Assertions | Assert conditions like A == B where A and B are aliases for the selected variables. The list of allowed operators in a condition are: + , - , * , / , ** , > , >= , < , <= , == , != . Each assertion group is joined by an OR operator. Assertions within a group are joined by an AND operator. |
income > 0
and filters will be age > 30
and land == 1
.Constraint
Input | Description |
---|---|
Hard Min/ Max | Strict minimum/ maximum values allowed for a variable |
Soft Min/ Max | Preferred minimum/ maximum values for a variable |
Outlier
Input | Description |
---|---|
Measure | The metric to be used for outlier calculation: Inter Quartile Range, Standard Deviation or Percentile |
Multiplier / Value | The multiple of the interquartile range or standard deviation (like 1.96 times the standard deviation) or the percentile value (such as ± 5th percentile) that signifies an outlier. |
Missings, Don't Knows and Refusals
Input | Description |
---|---|
Value | The value/list of values which corresponds to missing/don’t know/refusal as per the form definition. |
Apply check on all variables in the form
and Apply check on select variables
. If Apply check on all variables in the form
is selected, SurveyStream checks the form definition to find all questions for which the value specified is allowed as per the choice list and runs the check on those variables.For missing value checks, if the value is one of: (empty)
, NULL
, NA
or NAN
, the check is run on all variables that are not mandatory (required !='yes'
).Mismatch
Input | Description |
---|---|
Data quality form | The data quality form containing the variable to check against |
Protocol Violation
Input | Description |
---|---|
Data quality form | The data quality form containing the protocol question |
Spotcheck score
Input | Description |
---|---|
Data quality form | The data quality form containing the spotcheck score question |
Score Name | (Optional) Scores from multiple questions can be combined and aggregated against this score name. If not provided, the question name is taken as the score name by default. |
GPS
Input | Description |
---|---|
Type | The type of check: Point to Shape or Point to Point . Point to Shape check verifies if GPS location of the household surveyed is within the expected grid cell or shape boundary. Point to Point check verifies if the household surveyed is the correct sampled household as per their GPS coordinates. |
Grid ID Variable | SurveyCTO question for the grid ID. This is a mandatory input for Point to Shape check type. |
Expected GPS Variable | SurveyCTO question for the expected GPS coordinates of the household surveyed based on a listing/sampling exercise. The GPS coordinates are expected to be in the format: “latitude longitude”. This is a mandatory input for Point to Point check type. |
Threshold distance (m) | The value of ‘X’ for checking whether GPS location of the surveyed household is within ‘X’ meters of a grid cell boundary or the sampled household’s GPS coordinates |
Point to Shape
checks, the team has to also share the shape files for the grids with SurveyStream team. The file names of these shape files must follow the format: <grid id>.gpkg
.Survey Admins
will receive a warning notification regarding this change. When inactive, the check is not run and the corresponding flags are removed. You can edit such inactive checks to replace the removed variables and then mark them as active again.