Quality Statement
Educational institution address is the physical location of the individual’s place of study. Distinguishing details can include the education institute name; campus and/or suburb; and city, town, or district. Educational institutions include early childhood education, primary school, secondary school, and tertiary education institutions. For individuals who study at home, educational institution address can be a home address.
Two new variables have been derived from the 2023 Census educational institution address data: Māori/English medium education indicator and Education institution type. They are only available in the Integrated Data Infrastructure and through customised data requests.
Moderate quality
Data quality processes section below has more detail on the rating.
Priority level 2
A priority level is assigned to all census concepts: priority 1, 2, or 3 (with 1 being highest and 3 being the lowest priority).
Educational institution address is a priority 2 concept. Priority 2 concepts cover key subject populations that are important for policy development, evaluation, or monitoring. These concepts are given second priority in terms of quality, time, and resources across all phases of a census.
The census priority level for educational institution address remains the same as 2018.
2023 Census: Final content report has more information on priority ratings for census concepts.
Census usually resident population count participating in study
Census usually resident population count participating in study includes individuals studying either part-time or full-time, at any educational institute, from early education (childcare) to tertiary education.
‘Subject population’ means the people, families, households, or dwellings that the variable applies to.
The educational institution address classification consists of a combination of geographic classifications that are ordinarily stored independently of each other. There is a hierarchical relationship between the New Zealand geographic classifications of meshblock, statistical area 2, statistical area 3, territorial authority and regional council. They are combined for the purpose of providing as much detail as possible for educational institution address. Ideally, educational institution address would be defined at the meshblock level. Where this is not possible, the next most detailed geography available is used.
Workplace or Educational Institution Address V2.0.0
Educational institution address is a flat classification. The standard codes are:
- Meshblock codes (7 digits)
- Statistical area 2 codes (6 digits) prefixed by ‘9’
- Statistical area 3 codes (5 digits) prefixed by ‘99’
- Territorial authority codes (3 digits) prefixed by ‘9999’
- Regional council codes (2 digits) prefixed by ‘99999’
- 8888888 Overseas
- 9999977 Response unidentifiable
- 9999988 Response outside scope
- 9999996 No fixed address
- 9999998 New Zealand not further defined
- 9999999 Not stated
Address information (institution name; campus and/or suburb; and city, town, or district) is used to place a person into the classification.
The 2023 classification for educational institution address is similar to 2018 in structure, however area codes have been updated to reflect the 2023 geographic pattern, including the removal of urban rural areas and the addition of statistical area 3 categories.
Geographic data and maps has more information on geographical boundaries.
Standards and classifications has information on what classifications are, how they are reviewed, where they are stored, and how to provide feedback on them.
Educational institution address data is collected from the individual form (question 20 paper form).
Individuals were only required to answer this question if they indicated in earlier questions that they were enrolled in study (part-time or full-time) at any level (early childhood to tertiary), and they did not study at home.
There were differences in the way a person could respond between the modes of collection (online and paper forms).
On the online form:
- routing made the question available to answer only if respondents were in the subject population
- respondents could choose whether to use an as-you-type list of educational institution names and addresses sourced from Ministry of Education data or to type a free-text response. The question was supported by an instruction that when a name was selected from the list, the rest of the address fields will automatically fill. It also noted that if the suburb/campus or city is not correct, to check that the correct name has been selected from the list, and that it was possible to manually change the fields.
On the paper form:
- anyone could see and respond to the question, even if they were not in the subject population or studied at home
- only free text-responses were possible.
Data from online forms may therefore be of higher overall quality than data from paper forms. However, processing checks and edits were in place to improve the quality of the paper form data.
Stats NZ Store House has samples for both the individual and dwelling paper forms.
Data-use outside Stats NZ:
- by transport planners to plan and manage transport, particularly in large cities where congestion is a problem
- to assess daytime population in specific areas for town planning, traffic and transport management, and civil defence purposes
- for monitoring investment in certain travel modes, such as investments to support walking and cycling
- to focus targeted initiatives aimed at encouraging more children and tertiary students to use public transport
- to provide information for programmes aimed at improving physical health via use of active transport
- in developing fare structures to promote public transport use amongst tertiary students.
Data-use by Stats NZ:
- in conjunction with main means of travel to education and usual residence address to measure traffic flows of the student population
- in conjunction with travel to work data to measure traffic flows more generally.
Alternative data sources were used for missing and residual census responses and responses that could not be classified or did not provide the type of information asked for. The table below shows the distribution of data sources for educational institution address data.
Data sources for educational institution address data, as a percentage of the census usually resident population count participating in study, 2023 Census | ||
---|---|---|
Source of educational institution address data | Percent | |
2023 Census response | 82.2 | |
Historical census | 0.0 | |
Admin data | 0.0 | |
Deterministic derivation | 17.8 | |
Deterministic derivation from census response | 8.2 | |
Deterministic derivation from admin data | 9.4 | |
Deterministic derivation from CANCEIS(1) | 0.1 | |
Statistical imputation | 0.0 | |
No information | 0.0 | |
Total | 100.0 | |
1. CANCEIS = imputation based on CANadian Census Edit and Imputation System. Note: Due to rounding, individual figures may not always sum to the stated total(s) or score contributions. |
While the data sources themselves have not changed since 2018, how they are reported has changed. In the 2023 Census, educational institution address data is primarily sourced from 2023 Census response.
Deterministic derivation is used in the following cases:
- Where main means of travel to education was imputed to ‘study at home’, usual residence address was used to derive educational institution address. Usual residence address data source determined the type of deterministic derivation being from either 2023 Census response (either from an individual form or from the household listing), admin data or from CANCEIS imputation.
- Where main means of travel to education was not ‘study at home’, but educational institution address is missing or cannot be coded, the educational institution address has been set to the territorial authority of the person’s usual residence address. Again, the usual residence address data source then determined the type of deterministic derivation.
In 2018 Census, these data sources were reported as census response, response from a partial form or admin data.
Editing, data sources, and imputation in the 2023 Census describes how data quality is improved by editing, and how missing and residual responses are filled with alternative data sources (admin data and historical census responses) or statistical imputation. The paper also describes the use of CANCEIS (the CANadian Census Editing and Imputation System), which is used to perform imputation.
Missing and residual responses represent data gaps where respondents either did not provide answers (missing responses) or provided answers that did not fit predefined categories (residual responses).
There are no residual values present in the educational institution address data. Responses that could not be classified or did not provide the type of information asked for were replaced by deterministic derivation at the territorial authority level. Where a territorial authority spans multiple regional council boundaries and doesn’t map directly to a single regional council, they are coded as response unidentifiable for the regional council classification.
Percentage of regional council ‘Response unidentifiable’ for the census usually resident population count participating in study:
- 2023: 0.9 percent
- 2018: 0.2 percent
Overall quality rating: Moderate
Data has been evaluated to assess whether it meets quality standards and is suitable for use.
Three quality metrics contributed to the overall quality rating:
- data sources and coverage
- consistency and coherence
- accuracy of response.
The lowest rated metric determines the overall quality rating.
Data quality assurance in the 2023 Census provides more information on the quality rating scale.
Data sources and coverage: Moderate quality
The quality of all the data sources that contribute to the output for the variable were assessed. To calculate the data sources and coverage quality score for a variable, each data source is rated and multiplied by the proportion it contributes to the total output.
The rating for a valid census response is defined as 1.00. Ratings for other sources are the best estimates available of their quality relative to a census response. Each source that contributes to the output for that variable is then multiplied by the proportion it contributes to the total output. The total score then determines the metric rating according to the following range:
- 0.98–1.00 = very high
- 0.95–<0.98 = high
- 0.90–<0.95 = moderate
- 0.75–<0.90 = poor
- <0.75 = very poor.
While the majority of the data has come from census responses, the only alternative data source is deterministic derivation, which has a lower rating, resulting in a score of 0.92 and a metric rating of moderate.
Data sources and coverage rating calculation for educational institution address data, census usually resident population count participating in study, 2023 Census | |||
---|---|---|---|
Source of educational institution address data | Rating | Percent | Score contribution |
2023 Census response | 1.00 | 82.25 | 0.82 |
Deterministic derivation from census response | 0.56 | 8.17 | 0.05 |
Deterministic derivation from admin data | 0.54 | 9.44 | 0.05 |
Deterministic derivation from CANCEIS(1) | 0.50 | 0.14 | <0.01 |
No information | 0.00 | 0.00 | 0.00 |
Total | 100.00 | 0.92 | |
1. CANCEIS = imputation based on CANadian Census Edit and Imputation System. Note: Due to rounding, individual figures may not always sum to stated total(s) or score contributions. |
The introduction of deterministic derivation as a data source provides greater transparency of how the data is sourced and better reflects the accuracy of the data, as reflected in the ratings.
Consistency and coherence: Moderate quality
Educational institution address data is mostly consistent with expectations across consistency checks. There is an overall difference in the data compared with expectations and benchmarks, which can be explained through a combination of real-world change, incorporation of other sources of data, or a change in how the variable has been collected.
This variation is due to substantial changes in the coding methodology, especially at lower geography levels. These changes include:
- coding logic improvements, alongside improvements to the processing of the data, specificity of coding, and the proportion of responses that have been coded, that has led to higher quality data
- coding all remaining residual values to the territorial authority of the individual's usual residence address has had a notable impact on the data. In 2018 non-response was coded to the regional council of the usual residence address.
These changes appear to have improved the quality of the data overall with more census responses coded, and a much greater specificity of coding (for example, more coded to the meshblock level). However, such substantial change from 2018 makes the data somewhat inconsistent with time series, particularly for alternatively sourced values.
Accuracy of responses: Moderate quality
Educational institution address data has various data quality issues involving several categories or aspects of the data, or entire level of a hierarchical classification. The data quality issues could include problems with the classification or coding of data, such as vague responses resulting in coding issues, or responses that cannot be coded to a specific (non-residual) category, thereby reducing the amount of useful, meaningful data available for analysis.
The data quality of educational institution address data has improved through coding to lower levels of geography, higher quality scanning of paper forms, and more scanning repair, but there are still unavoidable quality issues due to respondents misunderstanding or misinterpreting the question and providing vague answers.
Users should note the following improvements since 2018:
- scanning repair improvements
- expanded use of manual intervention
- the pre-emptive inclusion of place look-ups to code responses that are not addresses
- extensive revision of the coding rules both prior to the data being received and in response to identified improvements after analysing the data.
It is recommended that the educational institution address data can be used in a comparable manner to the 2018 Census.
When using this data, users should be aware that:
- the handling of non-response and the way data sources are captured has changed substantially since the 2018 Census
- people that study at home should be excluded from the data if the data is being used to capture travel patterns.
- due to small counts for some categories analysis should be performed with care, especially at smaller geographies.
Furthermore, there are a lot of complexities for users to be aware of at different geographic levels:
- At lower geographic levels, there will be variability in the percentage of missing data for a given area. This means some small geographic areas will have poorer quality data than the overall quality rating and some patterns or trends may not always fully reflect real-world change.
- In 2018, the address associated with some educational institutions was incorrect and these addresses have been corrected in coding for the 2023 Census. This may cause shifts at lower geographic levels when comparing counts between census years.
To assess how this concept aligns with the variables from the previous census, use the links below:
- Educational institution address – 2018 Information by variable
- Educational institution address was not published in 2013.
Contact our Information centre for further information about using this concept.