Study participation (information about this variable and its quality)


Participation in education or training measures those attending, studying, or enrolled in tertiary institutions, school, early childhood education, or any other place of education or training. It is grouped into full-time study (20 hours or more a week), part-time study (less than 20 hours a week), and those not studying. In 2013 the subject population for this variable was the census usually resident population aged 15 years and over. In 2018 the subject population is the usually resident population.



Variable Details

Other Variable Information

The study participation variable has changed from high quality to moderate quality.

The consistency and coherence quality rating for study participation has been changed from high to moderate quality. This has resulted in an overall quality rating decrease from high to moderate quality for the study participation variable. The Data quality processes section (consistency and coherence subsection) has more information.

Priority level

Priority level 2

We assign a priority level to all census variables: Priority 1, 2, or 3 (with 1 being highest and 3 being the lowest priority).

Study participation is a priority 2 variable. Priority 2 variables cover key subject populations that are important for policy development, evaluation, or monitoring. These variables are given second priority in terms of quality, time, and resources across all phases of a census.

The Census priority level for study participation remains the same as 2013.

Quality Management Strategy and the Information by variable for study participation (2013) have more information on the priority rating.

Overall quality rating for 2018 Census


Data quality processes section below has more detail on the rating for this variable.

The External Data Quality Panel has provided an independent assessment of the quality of this variable and has rated it moderate/poor quality. 2018 Census External Data Quality Panel: Assessment of Variables has more information.

Subject population

Census usually resident population

‘Subject population’ means the people, families, households, or dwellings to whom the variable applies.

Study participation is also output for the census usually resident population aged 15 and over.

In the 2018 Census, the subject population has changed to include the child usually resident population aged 0 to 14 years. In the 2013 Census, the subject population for study participation was the usually resident population aged 15 and over. The subject population was changed so that the study participation question could serve as a filter for the travel to education question.

How this data is classified

Census Study Classification 2V3.0.0

Study participation has a flat classification with the following categories:

01 Full-time study

02 Part-time study

04 Not studying

99 Not elsewhere included

In 2013, a category 3 ‘full-time and part-time study’ was included where respondents could tick both 1 ‘full-time study’ and 2 ‘part-time study’. This category was not used for the 2018 census. In 2018, responses where categories 1 and 2 were ticked (possible on paper forms only) were coded to ‘full-time study’ as they fit the criteria of more than 20 hours a week of study.

The ‘not elsewhere included’ category contains residual categories such as ‘response unidentifiable’ and ‘not stated’. This is a change from 2013 and 2006 where there was no ‘not elsewhere included’ category in the classification. In 2013 and 2006, respondents who ticked more than one study participation option were coded to ‘full-time and part-time study’. This category is not used in the 2018 census.

The Standards and Classifications page provides background information on classifications and standards.

Question format

Study participation is collected on the individual form (question 18 on the paper form).

Stats NZ Store House has samples for both the individual and dwelling paper forms.

There were no differences between the wording or question format in the online and paper versions of this question.

There was a difference between the modes of collection (paper and online form).

On the online individual form:

  • the online form would not allow inconsistent responses to be marked (this is when the response of 'no – neither of these' is selected along with a response indicating that the respondent was either studying full-time or studying part-time). If the 'neither' box was marked, any other responses to study participation disappeared.
  • built-in routing functionality on the online form directed individuals in the subject population to the appropriate questions.

On the paper individual form:

  • inconsistent multiple responses to this question were possible when forms were completed on paper. If both full-time and part-time study were marked, these were coded to full-time study. If ‘no – neither of these’ and either full-time or part-time study was marked, these responses were coded to ‘response unidentifiable’.

Data from the online forms may therefore be of higher overall quality than data from paper forms.

How this data is used

Outside Stats NZ

  • Used by central government agencies to monitor change for those participating in study, and for policies targeting at-risk groups.
  • Cross-tabulated with a wide variety of other census variables, such as income, age, and ethnic group in order to understand the demographic and social and economic drivers of those who are studying compared to those who aren’t.
  • Used by local government for delivering useful and targeted programmes.
  • Informs analysis of work and labour force status and rates of young people who are not in employment, education or training (NEET).

Within Stats NZ

  • Population statistics use this information for student population estimates as the migration behaviour of students is highly distinctive.
  • Analysis of the relationship between studying and the labour market.
  • Used to derive NEET rates.

2018 data sources

We used alternative data sources for missing census responses and responses that could not be classified or did not provide the type of information asked for. Where possible, we used responses from the 2013 Census, administrative data from the Integrated Data Infrastructure (IDI), or imputation.

The table below shows the breakdown of the various data sources used for this variable.

2018 Study participation – census usually resident population
Source Percent
Response from 2018 Census 83.0 percent
2013 Census data 0.0 percent
Administrative data 9.5 percent
Statistical imputation 7.4 percent
No information 0.0 percent
Total 100 percent
Due to rounding, individual figures may not always sum to the stated total(s)  

The ‘no information’ percentage is where we were not able to source study participation data for a person in the subject population.

Administrative data sources

Data from the following administrative source was used:

  • information on Courses, Enrolments, TEC IT learners, Targeted Training - Ministry of Education.

Addition of administrative records to the New Zealand 2018 Census Dataset: An overview of statistical methods has more information on the timeliness of administrative data.

Please note that when examining study participation data for specific population groups within the subject population, the percentage that is from 2013 Census data, administrative data, and statistical imputation may differ from that for the overall subject population.

Missing and residual responses

‘No information’ in the data sources table is the percentage of the subject population coded to ‘not stated’. In previous censuses, non-response was the percentage of the subject population coded to ‘not stated.’

In 2018, the percentage of ‘not stated’ is zero due to the use of the additional data sources described above.

Percentage of ‘not stated’ for the census usually resident population (2018), and census usually resident population aged 15 years and over (2013, 2006):

  • 2018: 0.0 percent
  • 2013: 10.4 percent
  • 2006: 10.1 percent.

There was no ‘not elsewhere included’ category for this variable in 2013 and 2006. In 2018, admin data or imputation were used to replace all residual responses resulting in a ‘not elsewhere included’ percentage of zero.

2013 Census data user guide provides more information about non-response in the 2013 Census.

Data quality processes

Overall quality rating: Moderate quality

Data was evaluated to assess whether it meets quality standards and is suitable for use.

Three quality metrics contributed to the overall quality rating:

  • data sources and coverage
  • consistency and coherence
  • data quality.

The lowest rated metric determines the overall quality rating.

Data quality assurance for 2018 Census provides more information on the quality rating scale.

Data sources and coverage: High quality

We have assessed the quality of all the data sources that contribute to the output for the variable. To calculate a data sources and coverage quality score for a variable, each data source is rated and multiplied by the proportion it contributes to the total output.

The rating for a valid census response is defined as 1.00. Ratings for other sources are the best estimates available of their quality relative to a census response. Each source that contributes to the output for that variable is then multiplied by the proportion it contributes to the total output. The total score then determines the metric rating according to the following range:

  • 98–100 = very high
  • 95–<98 = high
  • 90–<95 = moderate
  • 75–<90 = poor
  • <75 = very poor.

The admin data was mostly comparable to 2018 Census responses while data sourced through statistical imputation was moderately comparable to census forms. The proportions of data from received forms along with the proportions from alternative sources contributed to the score of 0.96, determining the high quality rating.

Note: The administrative data used to complete the dataset held study participation records as at 31 December 2017. This means that the data used is older than if it was from census day (6 March 2018). Also, it does not count as many enrolments as it would if the data was extracted during term time.

Quality rating calculation table for the sources of study participation data –
2018 census usually resident population
Source Rating Percent of total Score contribution
2018 Census form 1.00 83.04 0.83
Admin data 0.89 9.55 0.08
Donor’s 2018 Census form 0.60 6.92 0.04
Donor’s response sourced from admin data 0.53 0.49 0.00
No Information 0.00 0.00 0.00
Total 100.00 0.96
Due to rounding, individual figures may not always sum to the stated total(s) or score contributions.      

Data sources, editing, and imputation in the 2018 Census has more information on the Canadian census edit and imputation system (CANCEIS) that was used to derive donor responses.

Consistency and coherence: Moderate quality

Study participation data is mostly consistent with expectations across consistency checks. There is an overall difference in the data compared with expectations and benchmarks that can be explained through a combination of real-world change, incorporation of other sources of data, or a change in how the variable has been collected.

Quality issues to note:

  • the subject population has changed since 2013 from the usually resident population aged 15 years and over to the usually resident population (which includes those aged under 15 years). Study participation data for under 15-year-olds shows an undercount when compared to Ministry of Education data, especially for early childhood (0-4 years). The large difference in counts for the 0-4 age group is likely due to respondents considering early childhood education as childcare rather than education. However, consistency checks of study participation for the usually resident population aged 15 years and over show consistent trends with the 2006 and 2013 Censuses, and therefore this part of the subject population is of high quality.
  • there was an increase in respondents not studying compared to 2013. This is in part due to a reduction of the high volume of ‘not stated’ responses seen in previous censuses (around 10 percent in 2006 and 2013) to zero in 2018 through use of admin data and imputation for study participation.

Data quality: High quality

The data quality checks for study participation included edits for consistency within the dataset and cross-tabulations to the regional council level of geography.

Study participation has only minor data quality issues. The quality of coding and responses within classification categories is high. Any impact of other data sources used is minor. Any issues with the variable appear in a low number of cases (typically in the low hundreds).

Overall, the quality of data improved with online collection due to the fewer multiple responses and scanning errors and lower non-response in online forms than for paper forms.

Recommendations for use and further information

We recommend that the use of this data can be similar to its use in 2013, with some caution.

  • The subject population has changed from the census usually resident population aged 15 years and over to the census usually resident population.
  • Study participation data for under 15-year-olds should be used with caution because there is an undercount compared with admin data, especially for early childhood (1 to 4 years). This may be due to parents considering early childhood education as childcare rather than education.
  • Census does not distinguish between formal and informal study. An individual may consider themselves as ‘studying’ without a formal enrolment. For example, adults may consider themselves as studying when attending an informal adult community education class.
  • Census study participation respondents aged 18 and over may include those undergoing on-the job industry training as well as those enrolled with industry training organisations.
  • Census respondents enrolled in an educational institution but who were not actively studying at the time of the census may have indicated that they were not participating in study at the time of the survey, while they would be included in admin data.

Comparisons with other data sources

Although surveys and sources other than the census collect study participation data, data users are advised to familiarise themselves with the strengths and limitations of the sources before use.

Key considerations when comparing study participation information from the 2018 Census with other sources include:

  • census is a key source of information on study participation for small areas and small populations. Many other sources do not provide detail at this level.
  • census aims to be a national count of all individuals in a population while other sources such as the Household Economic Survey (HES) and Household Labour Force Survey (HLFS) measuring this variable are only based upon a sample of the population.

Contact our Information Centre for further information about using this variable.

Revision Information

Currently viewing revision 14 by on 6/04/2020 3:11:24 a.m.

Revision 14 *
15/04/2020 12:26:28 a.m.
Revision 13
11/03/2020 11:39:05 p.m.
Revision 11
19/02/2020 2:59:11 a.m.
Revision 10
27/11/2019 8:24:13 p.m.
Revision 8
19/11/2019 2:19:53 a.m.
Revision 6
3/10/2019 2:16:37 a.m.
Revision 4
22/09/2019 9:53:26 p.m.

Show / Hide more...


DDI Agency
DDI Version


DDI 3 Download

Select the languages to display