Variable Description
Language spoken provides information on which languages, and how many, a person can speak or use.
This includes New Zealand Sign Language.
en-NZPriority level
Priority level 3
We assign a priority level to all census variables: Priority 1, 2, or 3 (with 1 being highest and 3 being the lowest priority).
Languages spoken is a priority 3 variable. Priority 3 variables do not fit in directly with the main purpose of a census but are still important to certain groups. These variables are given third priority in in terms of quality, time, and resources across all phases of a census.
The census priority level for languages spoken remains the same as 2013.
Quality Management Strategy and the Information by variable for languages spoken (2013) have more information on the priority rating.
Overall quality rating for 2018 Census
High quality
Data quality processes section below has more detail on the rating for this variable.
The External Data Quality Panel has provided an independent assessment of the quality of this variable and has rated it as very high to poor quality, depending on the language. 2018 Census External Data Quality Panel: Assessment of Variables and Final report of the 2018 Census External Data Quality Panel have more information.
Subject population
Census usually resident population
‘Subject population’ means the people, families, households, or dwellings to whom the variable applies.
How this data is classified
Language - Standard Classification 2V2.0.0
Languages spoken is a hierarchical classification with four levels. Level 1 has 26 categories. Level 2 contains 30 categories. Level 3 contains 49 categories. Level 4 contains 196 categories. The categories in the first level are:
0001 Germanic
0002 Romance
0003 Greek
0004 Balto-Slavic
0005 Albanian
0006 Armenian
0007 Indo-Aryan
0008 Celtic
0009 Iranian
0010 Turko-Altaic
0011 Uralic
0012 Dravidian
0013 Sino-Tibeto-Burman
0014 Austroasiatic
0015 Tai-Kadai
0016 Central-Eastern Malayo-Polynesian
0017 Western Malayo-Polynesian
0018 Afro-Asiatic
0019 Niger-Congo (Congo-Kordafanian)
0020 Pidgins and Creoles
0021 Language Isolates
0022 Miscellaneous Language Groupings
0023 Artificial Languages
0024 Sign Language
0066 None (eg too young to talk)
9999 Not elsewhere included
‘Not elsewhere included’ contains the residual categories such as ‘response unidentifiable’, ‘response outside scope’, and ‘not stated’.
Specific languages such as Italian, Japanese, English, and Te Reo Maori are at level 3 of the classification. Up to 6 languages can be selected across all levels in a valid response.
The classifications for the variables derived from languages spoken are:
Census official language indicator 2V2.0.0
000 No language
011 Māori only
012 English only
013 NZ Sign Language only
021 Māori and English only (not NZ Sign Language)
022 Māori and NZ Sign Language only (not English)
023 Māori and other only (not English or NZ Sign Language)
024 English and NZ Sign Language only (not Māori)
025 English and other only (not Māori or NZ Sign Language)
026 NZ Sign Language and other only (not English or Māori)
031 Māori, English and NZ Sign Language (not other)
032 Māori, English and other (not NZ Sign Language)
033 Māori, NZ Sign Language and other (not English)
034 English, NZ Sign Language and other (not Māori)
041 Māori, English, NZ Sign Language and other
051 Other languages only (neither Māori, English nor NZ Sign Language)
099 Not elsewhere included
‘Not elsewhere included’ contains the residual categories of ‘response unidentifiable, ‘response outside scope’, and ‘languages not stated’.
Census number of languages spokenV2.0.0
00 None
01 One language
02 Two languages
03 Three languages
04 Four languages
05 Five languages
06 Six languages
99 Not elsewhere included
‘Not elsewhere included’ contains the residual categories of ‘response unidentifiable, ‘response outside scope’, and ‘languages not stated’.
The classification of languages spoken in the 2018 Census is consistent with that of both the 2006 Census and the 2013 Census.
The Standards and Classifications page provides background information on classifications and standards.
Question format
Languages spoken is collected on the individual form (question 15 on the paper form).
Stats NZ Store House has samples for both the individual and dwelling paper forms.
There were no differences between the wording or question format in the online and paper versions of this question. However, there were differences in the way a person could respond.
On the online individual form:
- as-you-type functionality helped respondents provide valid languages in the classification.
On the paper individual forms:
- responses outside the valid range were possible. Alternative data sources were used to replace these responses.
How this data is used
Outside Stats NZ
- To formulate, target and monitor policies and programmes to revitalise the Māori language as an official language of New Zealand.
- As an indicator of iwi vitality and cultural resources.
- To assess the need to provide multi-lingual pamphlets and other translation services in a variety of areas such as education, health and welfare.
- To evaluate and monitor existing language education programmes and services.
- To provide information for television and radio programmes and services.
- To understand the diversity and diversification of the New Zealand population over time, as well as language maintenance, retention and distribution.
2018 data sources
We used alternative data sources for missing census responses and responses that could not be classified or did not provide the type of information asked for. Where possible, we used responses from the 2013 Census, administrative data from the Integrated Data Infrastructure (IDI), or imputation.
The table below shows the breakdown of the various data sources used for this variable.
2018 languages spoken – census usually resident population | |
---|---|
Source | Percent |
Response from 2018 Census | 83.8 percent |
2013 Census data | 8.2 percent |
Administrative data | 0.0 percent |
Statistical imputation | 8.0 percent |
No information | 0.0 percent |
Total | 100 percent |
Due to rounding, individual figures may not always sum to the stated total(s) |
The ‘no information’ percentage is where we were not able to source languages spoken data for a person in the subject population.
Please note that when examining languages spoken data for specific population groups within the subject population, the percentage that is from 2013 Census data and statistical imputation may differ from that for the overall subject population.
Missing and residual responses
‘No information’ in the data sources table above is the percentage of the subject population coded to ‘not stated’. In previous censuses, non-response was the percentage of the subject population coded to ‘not stated.’
In 2018, the percentage of ‘not stated’ is zero due to the use of the additional data sources described above.
Percentage of ‘not stated’ for the census usually resident population:
- 2018: 0.0 percent
- 2013: 6.3 percent
- 2006: 4.9 percent.
In 2018, there were no residual responses remaining in the data due to the use of 2013 Census data and imputation to replace them. In output for previous censuses, responses that could not be classified or did not provide the type of information asked for were grouped with ‘not stated’ and classified as ‘not elsewhere included’.
Percentage of ‘not elsewhere included’ for the census usually resident population:
- 2018: 0.0 percent
- 2013: 6.5 percent
- 2006: 5.1 percent.
2013 Census data user guide provides more information about non-response in the 2013 Census.
Data quality processes
Overall quality rating: High quality
Data was evaluated to assess whether it meets quality standards and is suitable for use.
Three quality metrics contributed to the overall quality rating:
- data sources and coverage
- consistency and coherence
- data quality.
The lowest rated metric determines the overall quality rating.
Data quality assurance for 2018 Census provides more information on the quality rating scale.
Data sources and coverage: High quality
We have assessed the quality of all the data sources that contribute to the output for the variable. To calculate a data sources and coverage quality score for a variable, each data source is rated and multiplied by the proportion it contributes to the total output.
The rating for a valid census response is defined as 1.00. Ratings for other sources are the best estimates available of their quality relative to a census response. Each source that contributes to the output for that variable is then multiplied by the proportion it contributes to the total output. The total score then determines the metric rating according to the following range:
- 98–100 = very high
- 95–<98 = high
- 90–<95 = moderate
- 75–<90 = poor
- <75 = very poor.
2013 Census data was highly comparable to 2018 Census responses while data from within household imputation was mostly comparable to census forms. Data imputed from donor responses was moderately comparable to census forms. The proportions of data from each source along with the ‘no information’ rate of zero percent contributed to the score of 0.96, determining the high quality rating.
Quality rating calculation table for the sources of languages spoken – 2018 census usually resident population | |||
---|---|---|---|
Source | Rating | Percent of total | Score contribution |
2018 Census form | 1.00 | 83.83 | 0.84 |
2013 Census | 0.93 | 8.22 | 0.08 |
Imputation | |||
Within household donor | 0.70 | 1.61 | 0.01 |
Donor’s 2018 Census form | 0.60 | 5.55 | 0.03 |
Donor’s response sourced from 2013 Census | 0.56 | 0.61 | 0.00 |
Donor’s response sourced from within household | 0.42 | 0.18 | 0.00 |
No Information | 0.00 | 0.00 | 0.00 |
Total | 100.00 | 0.96 | |
Due to rounding, individual figures may not always sum to the stated total(s) or score contributions. |
Data sources, editing, and imputation in the 2018 Census has more information on the Canadian census edit and imputation system (CANCEIS) that was used to derive donor responses.
Consistency and coherence: High quality
At level 1 of the classification, this data is highly comparable with the 2013 and 2006 Census data.
Languages spoken data is consistent with expectations across nearly all consistency checks, with some minor variation from expectations or benchmarks that makes sense due to real-world change, incorporation of other sources of data, or a change in how the variable has been collected.
Level 3 of the classification was checked at a national level. There were some inconsistencies with time series at this level. At level 3 of the classification, as-you-type functionality enabled respondents to provide more detailed responses (for example by stating they spoke Fiji Hindi, rather than Hindi). This has resulted in some deviations from time series.
Data quality: High quality
The data quality checks for languages spoken included edits for consistency within the dataset and cross-tabulations to the national and regional council level of geography.
Languages spoken data has only minor data quality issues. The quality of coding and responses within classification categories is high. Any impact of other data sources used is minor. Any issues with the variable appear in a low number of cases (typically in the low hundreds).
Recommendations for use and further information
We recommend that the use of this data can be similar to its use in 2013.
When using this data you should be aware that:
- the language classification with the highest imputation rate (CANCEIS and probabilistic) is the Central Eastern Malayo Polynesian language group. This category includes Te Reo Māori and most of the languages spoken in the Pacific.
- data has been assessed to be consistent at level 3 of the classification at the national level. Some variation is possible at geographies below this level.
Comparisons with other data sources
Although surveys and sources other than the census collect language data, data users are advised to familiarise themselves with the strengths and limitations of the sources before use.
Key considerations when comparing languages spoken information from the 2018 Census with other sources include:
- census is a key source of information on languages spoken for small areas and small populations. Many other sources do not provide detail at this level.
- census aims to be a national count of all individuals in a population while other sources, such as Te Kupenga, measuring this variable are only based upon a sample of the population.
Contact our Information Centre for further information about using this variable.