Quality Statement

Label
Languages spoken - 2023 Census: Information by concept en-NZ
Definition

Languages spoken provides information on which languages, and how many, a person can speak or use. This includes New Zealand Sign Language.
This concept also includes number of languages spoken and official language indicator.

Number of languages spoken

The number of languages an individual speaks.

Official language indicator

The combinations of official languages and English spoken for individuals.

en-NZ
Overall quality rating

High quality

Data quality processes section below has more detail on the rating.

en-NZ
Priority level

Priority level 2
A priority level is assigned to all census concepts: priority 1, 2, or 3 (with 1 being highest and 3 being the lowest priority).
Languages spoken is a priority 2 concept. Priority 2 concepts cover key subject populations that are important for policy development, evaluation, or monitoring. These concepts are given second priority in terms of quality, time, and resources across all phases of a census.
The census priority level for languages spoken has increased from priority level 3 in 2018 to acknowledge key statutory duties, use of concept in policy and monitoring, and the importance of te reo Māori data.
The 2023 Census: Final content report has more information on priority ratings for census concepts.

en-NZ
Subject population

Census usually resident population count
‘Subject population’ means the people, families, households, or dwellings that the variable applies to.

en-NZ
How this data is classified

Languages spoken
Languages spoken is classified into the following categories:

Language - Standard Classification 2 V3.0.0 – Level 1 of 4

Code Category Code Category
0001 Germanic 0014 Austroasiatic
0002 Romance 0015 Tai-Kadai
0003 Greek 0016 Central-Eastern Malayo-Polynesian
0004 Balto-Slavic 0017 Western Malayo-Polynesian
0005 Albanian 0018 Afro-Asiatic
0006 Armenian 0019 Niger-Congo (Congo-Kordafanian)
0007 Indo-Aryan 0020 Pidgins and Creoles
0008 Celtic 0021 Language Isolates
0009 Iranian 0022 Miscellaneous Language Groupings
0010 Turko-Altaic 0023 Artificial Languages
0011 Uralic 0024 Sign Language
0012 Dravidian 0066 None (eg, too young to talk)
0013 Sino-Tibeto-Burman 9999 Not elsewhere included

Languages spoken uses a 4-level hierarchical classification with level 1 categories presented in the table above. The level 1 residual category “Not elsewhere included” contains the residual categories ‘Don’t know’, ‘Refused to answer’, ‘Response unidentifiable’, ‘Response outside scope’ and ‘Not stated’. Follow the link to examine the classification and find more detail.

The classification for ‘Languages spoken’ has changed since 2018 Census. These changes are:

  • Moriori added
  • Miaow-Yao changed to Miao-Yao.

Respondents could provide multiple answers to the languages spoken question, up to a maximum of six responses. As languages spoken is a multiple-response variable, the total number of responses in a table is greater than the total number of people stated.

Official language indicator
Official language indicator uses a flat 1-level hierarchical classification as presented in the table below. Census official language indicator classification group combinations of ‘Māori’, ‘English’, ‘NZ Sign Language’, and ‘Other’. Follow the link to examine the classification and find more detail.

Census official language indicator recode V1.0.0 – Level 1 of 1

Code Category Code Category
00 No language 31 Māori, English and NZ Sign Language (not Other)
11 Māori only 32 Māori, English, and Other (not NZ Sign Language)
12 English only 34 English, NZ Sign Language, and Other (not Māori)
13 NZ Sign Language only 41 Māori, English, NZ Sign Language, and Other
21 Māori and English only (not NZ Sign Language) 51 Other languages only (neither English, Māori, nor NZ Sign Language)
24 English and NZ Sign Language only (not Māori) 52 Other combination of Māori, English, NZ Sign Language, and Other
25 English and Other only (not Māori or NZ Sign Language) 99 Not elsewhere included

Number of languages spoken
Number of languages spoken uses a 2-level hierarchical classification with level 1 categories presented in the table below. Follow the link to examine the classification and find more detail.

Census number of languages spoken V2.0.0 – Level 1 of 2

Code Category
00 None
01 One language
02 Two languages
06 Six languages
99 Not elsewhere included

The 2023 Census classification for number of languages spoken is consistent with that used in 2018 Census.

The level 1 residual category “Not elsewhere included” contains the residual categories ‘Response unidentifiable’, ‘Response outside scope’ and ‘Not stated’.

Data collected on languages spoken is classified through the languages spoken standard classification, as well as a number of additional classifications.

Languages spoken

Number of languages spoken

Note that the ‘Census number of languages spoken – 2 or more recode V1.0.0’ classification counts people who reported speaking either one language or two or more languages (which may or may not include English or Māori).

Standards and classifications has more information on what classifications are, how they are reviewed, where they are stored, and how to provide feedback on them.

en-NZ
Question format

Languages spoken is collected on the individual form (paper form question 15).

There were the following changes to the question format:

  • the reminder of the possibility to select multiple responses was rephrased and moved in front of the question
  • ‘New Zealand Sign Language’ was moved up from fourth response option to third because of its status as an official language of New Zealand.

There were differences in the way a person could respond between the modes of collection (online and paper forms).

On the online form:

  • as-you-type functionality helped respondents provide valid languages in the classification.

On the paper form:

  • responses outside the valid range were possible. Alternative data sources were used to replace these responses.

Data from the online forms may therefore be of higher overall quality than data from paper forms. However, processing checks and edits were in place to improve the quality of the paper forms. 

Stats NZ Store House has samples for both the individual and dwelling paper forms.

en-NZ
Examples of how this data is used

Data-use outside Stats NZ:

  • to formulate, target, and monitor policies and programmes to revitalise the Māori language as an official language of New Zealand, such as the Maihi Karauna (the Crown’s Strategy for Māori Language Revitalisation 2019–2023)
  • to provide information on use of New Zealand Sign Language, to help maintain and promote its use
  • to assess the need to provide multi-lingual pamphlets and other translation services in areas such as education, health, and welfare
  • to evaluate and monitor existing language education programmes and services
  • to provide information for television and radio programmes and services
  • to understand the diversity and diversification of the New Zealand population over time, as well as language maintenance, retention, and distribution.
en-NZ
Data sources

Alternative data sources were used for missing and residual census responses and responses that could not be classified or did not provide the type of information asked for. The table below shows the distribution of data sources for languages spoken data.

Data sources for languages spoken data, as a percentage of census usually resident population count, 2023 Census
Source of languages spoken data Percent
2023 Census response 85.3
Historical census 8.8
 2018 Census 6.3
 2013 Census 2.5
Admin data 0.0
Deterministic derivation 0.4
Statistical imputation 5.4
 Probabilistic imputation 2.5
 CANCEIS(1) donor’s response sourced from 2023 Census form 2.5
 CANCEIS donor’s response sourced from 2018 Census 0.2
 CANCEIS donor’s response sourced from 2013 Census 0.1
 CANCEIS donor’s response sourced from probabilistic imputation 0.1
 CANCEIS donor’s response sourced from deterministic derivation <0.1
No information 0.0
Total 100.0
1. CANCEIS = imputation based on CANadian Census Edit and Imputation System.
Note: Due to rounding, individual figures may not always sum to the stated total(s) or score contributions.

Where appropriate, responses were used from the 2018 and 2013 Censuses to replace missing or residual responses.

If it was not possible to obtain languages spoken data from historical census data, probabilistic imputation was used. This is where the languages spoken by the person closest in age in the same household was used to fill in missing information on languages spoken for the record.

Statistical imputation was used for remaining records with missing or residual responses.

Deterministic derivation was used for records age zero that have responses other than ‘None (eg, too young to talk)’. This is where a consistency edit was applied to change the languages spoken in these records to ‘None (eg, too young to talk)’.

Editing, data sources, and imputation in the 2023 Census describes how data quality is improved by editing, and how missing and residual responses are filled with alternative data sources (admin data and historical census responses) or statistical imputation. The paper also describes the use of CANCEIS (the CANadian Census Editing and Imputation System), which is used to perform imputation.

en-NZ
Missing and residual responses

Missing and residual responses represent data gaps where respondents either did not provide answers (missing responses) or provided answers that were not valid (residual responses).

Alternative data sources have been used to fill all missing and residual responses in the 2023 and 2018 Censuses.

Percentage of ‘Not stated’ for the census usually resident population count:

  • 2023: 0.0 percent
  • 2018: 0.0 percent
  • 2013: 6.3 percent

For output purposes, these residual category responses are grouped with ‘Not stated’ and are classified as ‘Not elsewhere included’.

Percentage of ‘Not elsewhere included’ for the census usually resident population count:

  • 2023: 0.0 percent
  • 2018: 0.0 percent
  • 2013: 6.5 percent.
en-NZ
Data quality processes

Overall quality rating: High
Data has been evaluated to assess whether it meets quality standards and is suitable for use.

Three quality metrics contributed to the overall quality rating:

  • data sources and coverage
  • consistency and coherence
  • accuracy of responses.

The lowest-rated metric determines the overall quality rating.

Data quality assurance in the 2023 Census provides more information on the quality rating scale.

Data sources and coverage: Very high quality

The quality of all the data sources that contribute to the output for the variable have been assessed. To calculate the data sources and coverage quality score for a variable, each data source was rated and multiplied by the proportion it contributes to the total output.

The rating for a valid census response is defined as 1.00. Ratings for other sources are the best estimates available of their quality relative to a census response. Each source that contributes to the output for that variable is then multiplied by the proportion it contributes to the total output. The total score then determines the metric rating according to the following ranges:

  • 0.98 –0.100 = very high
  • 0.95–<0.98 = high
  • 0.90–<0.95 = moderate
  • 0.75–<0.90 = poor
  • <0.75 = very poor.

The high proportion of data received from the 2023 Census forms, alongside the high quality of alternative data sources, resulted in a score of 0.98, leading to a quality rating of very high.

Data sources and coverage rating calculation for languages spoken data, census usually resident population count, 2023 Census
Source of languages spoken data Rating Percent Score contribution
2023 Census response 1.00 85.34 0.85
2018 Census 0.92 6.28 0.06
2013 Census 0.89 2.52 0.02
Deterministic derivation 1.00 0.45 <0.01
Probabilistic imputation 0.80 2.50 0.02
CANCEIS(1) nearest neighbour imputation 0.80 2.92 0.02
No information 0.00 0.00 0.00
Total 100.00 0.98
1. CANCEIS = imputation based on CANadian Census Edit and Imputation System.
Note: Due to rounding, individual figures may not always sum to stated total(s) or score contributions.

Consistency and coherence: High quality

Languages spoken data is consistent with expectations across nearly all consistency checks, with some minor variation from expectations or benchmarks, which makes sense due to real-world change, incorporation of other sources of data, or a change in how the variable has been collected.

The Central-Eastern Malayo-Polynesian language group (which includes te reo Māori) has a relatively high proportion of data sourced from historical census and statistical imputation. This, coupled with the ability for type and number of languages spoken to change over time, contributed to the minor variation in data from expectations.

Accuracy of responses: Very high quality

Languages spoken data has no data quality issues that have an observable effect on the data. The quality of coding is very high. Any issues with the variable appear in a very low number of cases (typically less than a hundred).

Improvement in scanning repair for paper forms reduced the number of responses needing to be sourced from alternative sources. The consistency edit applied to inconsistent and contradictory responses further improved data quality.

en-NZ
Recommendations for use and further information

Languages spoken data can be used in a comparable manner to the 2018 and 2013 Censuses.

The language group with the highest imputation rate (CANCEIS and probabilistic) is the Central-Eastern Malayo-Polynesian language group. This category includes te reo Māori and most of the languages spoken in the Pacific. It is recommended users be aware of the proportion of alternatively sourced data when analysing languages spoken data at low levels of geography or lower levels of the classification.

Comparisons to other data sources

Although surveys and sources other than the census collect language data, data users are advised to familiarise themselves with the strengths and limitations of the sources before use.

Key considerations when comparing languages spoken information from the 2023 Census with other sources include the following:

  • Census is a key source of information on languages spoken for small areas and small populations. Many other sources do not provide detail at this level.
  • Census aims to be a national count of all individuals in a population while other sources, such as Te Kupenga, measuring this variable are only based upon a sample of the population.
en-NZ
Information by variables from previous censuses

To assess how this concept aligns with the variables from the previous census, use the links below:

Contact our Information centre for further information about using this concept.

en-NZ

Information

History

View Full History
Revision Date Responsibility Rationale
66 26/09/2024 11:33:55 AM
64 26/09/2024 10:00:57 AM