Series
Household income and housing cost statistics – HES (Income)
en-NZHES, HES (Income)
en-NZThe HES collects information on household income, savings, and expenditure, as well as demographic information on individuals and households. It has three components: HES income, HES expenditure, and HES household net worth. The survey runs every year, from 1 July to 30 June of the following year.
HES income is the main vehicle for collecting household information and runs every year; it gathers household income, housing costs, and material wellbeing data – this is the ‘core HES’.
HES expenditure includes additional components – an expenditure diary and an expanded household expenditure questionnaire. It runs every three years.
HES household net worth includes additional questions on household assets and liabilities. It also runs every three years.
For information on Questionnaires, please see HES 2022-2023
en-NZThe primary objective of the HES is to facilitate the analysis and monitoring of the social and economic welfare of New Zealanders. The main users are government and others involved in the development, implementation, and evaluation of social and economic policies. They use economic standard of living data to:
understand the distribution of economic resources among private households in New Zealand
identify households most at risk of experiencing economic hardship and poverty
understand the effects of taxation and income support systems on the wellbeing of people and households.
The social and economic welfare of New Zealanders is measured through income, expenditure, housing costs, net worth, and non-monetary measures. Specifically, the HES provides annual information on:
the distribution of household gross and disposable annual income
the economic resources available to households after accounting for housing costs
the distribution of non-monetary material wellbeing and hardship
child poverty.
Every three years, HES also provides information on:
average weekly household expenditure
itemised household expenditure (which supports weighting for the consumers price index (CPI)])
the distribution and composition of net worth (assets minus liabilities) in New Zealand.
The core HES is designed to provide robust estimates of household income and housing costs at the national level and for 12 regional council areas, whereas the expenditure and net worth components do not provide information by regional level.
It also allows for analysis of income and material wellbeing by ethnicity, particularly for Māori and Pacific communities; by disability status; and by gender. It also allows analysis for potentially disadvantaged groups such as pensioners, one-parent families, and people who are unemployed.
HES is subject to revision on a regular basis due to changes to income-related information requirements, or when supplementary questions are added to or deleted from HES.
This results in possible changes to the definitions of items of information collected in the survey; changes to item codes and coding instructions; the addition or deletion of items from the survey; changes to descriptions of items in the survey, as well as changes to specific questions.
The survey began in July 1973 and operated on a July to June year until 30 June 1975. It then changed to an April to March survey year for the year ended March 1976, and ran annually on a March year until 1998. It then became a three-yearly survey, and moved back to a July–June year.
The survey was renamed the Household Economic Survey during 1993/94, from the Household Expenditure and Income Survey (HEIS) between 1983/84 and 1992/93, and the Household Survey before that.
In the three survey years ending March 1995, March 1996, and March 1997, we asked a series of questions on health status and the use of various health-related services in a health supplement. We asked them of everyone in the survey then introduced the questions into a revised HES as part of a plan to develop a household statistics collection strategy.
One component of this strategy was to develop the capacity to include supplements in HES. For 2000/01, we included an internet questionnaire as a supplement. It asked people to record any purchases they had made over the internet in the past year.
2000/01 was the first year of a new three-yearly cycle. We also introduced integrated weighting that year. This new method was successfully adopted and applied to back years.
Between the 2003/04 and 2006/07 periods, HES underwent significant redevelopment, with major changes to the collection methodology and classifications used.
Until the 2006/07 survey, the core HES was interviewer-administered using paper questionnaires. HES now uses computer-assisted interviewing (CAI) to collect and store the data, with interviewers using laptop computers to administer most of the survey.
We developed a new expenditure classification to meet the need for a common household consumption classification – to better align with the consumers price index, national accounts, and international standards. Consequently, there is a break in the expenditure time series, and 2006/07 expenditure data is not directly comparable with previous years. The income time series is relatively unaffected.
We included the Economic Living Standard Index (ELSI) (short-form version) questionnaire in HES for the first time in the 2006/07 survey.
For HES 2018/19, we used income data from the Integrated Data Infrastructure (IDI) to replace some sources of income for eligible individuals as well as increased the sample size to 28,500 dwellings for HES income to provide a more accurate picture of child poverty in New Zealand.
For HES 2019/20 we added a self-complete component to the questionnaire wherein we asked questions relating to personal demographics. We also asked questions relating to gender, sexual orientation and sex at birth for the first time in this module. Questions on disability were also included in this module. Also, respondents were not asked to provide amounts for income variables (wages and salaries; benefits; and other payments received from the New Zealand Government) since we are able to obtain these from admin data sources.
Data collection for the 2019/20, 2020/21 and 2021/22 surveys was significantly impacted due to COVID-19 alert level restrictions, lockdowns, and other disruptions, meaning that Stats NZ was unable to conduct face-to-face interviewing for portions of the collection period for each year. The achieved sample sizes for these years were 16,151, 16,196, 8,900 respectively compared to the targeted 20,000 households. For the 2022/2023 HES, the target achieved sample size was reduced from 20,000 down to 15,000 households. The final achieved sample in the 2022/2023 HES included 14,100 households, slightly less than the target of 15,000. Users should be aware that the reduction in sample size means that the statistics are subject to higher sampling error. Caution is advised when interpreting statistics for subpopulations, where the sampling error and risk of bias is higher.
For HES 2021/22, child poverty statistics have not been published by regional council level.
- Annual
The Treasury and Ministry of Social Development
Studies
Coverage
Data Collection
In all cases, contact with the selected households is made through personal visits by interviewers. We included the option of phone interviewing to maximise responses, particularly in case of disruptions like COVID-19 or extreme weather events. The number of eligible households on the panel list is the target number of respondents for the selected area. Thus, the aim of the data collection operation is to obtain completed documents from as many eligible addresses as the survey’s financial and time constraints allow. On average, this is four eligible responding households per panel.
When we cannot contact a household on the first visit, the interviewer makes at least two further visits at different times of the day to establish contact with the household. If, after the fifth visit, we still can’t contact the household, the household is a non-respondent. If an address contains more than one household, the interviewer randomly selects and surveys one household. Of the selected households, 5,500 are asked to keep an expenditure diary for the following week. See household expenditure for more information on this.
en-NZMethodology
Target population and survey scope
The survey scope defines the target population for the survey. This then defines the benchmarks that are used (the size of the target population is equal to the total of the benchmarks).
The target population is all usually resident individuals of private dwellings in urban and rural areas in North Island, South Island, and Waiheke Island of New Zealand.
The following people are out of scope:
overseas visitors who have been or expect to be in New Zealand for less than 12 months
residents of non-private dwellings such as hotels, motels, hostels, and boarding houses
long-term (more than six weeks) residents of institutions such as hospitals and/or prisons
persons in homes for the aged (including rest homes) where there are communal cooking facilities
members of permanent armed forces who live in non-private dwellings, such as barracks
members of New Zealand armed forces serving overseas
non-New Zealand diplomats and their families
New Zealand usual residents of offshore islands (except for Waiheke Island).
Survey population
The survey population is the target population with some exclusions due to practical survey difficulties. The following are excluded:
- New Zealand usual residents temporarily overseas who do not return to New Zealand within the survey period
- New Zealand usual residents temporarily staying elsewhere in New Zealand who do not return to the selected household within the survey period
- people residing at a wharf or landing place (for example, people on ships).
Children over the age of 15 who are away at boarding school who would be surveyed if they lived at home are included as part of their parents’/caregivers’ household.
Sample design information
The HES uses a stratified, multi-stage, cluster design. Primary sampling units (PSUs) are selected from the household survey frame, then dwellings within PSUs, then eligible persons within selected dwellings.
We use selected PSUs for several years, selecting different dwellings each year. Using the same set of PSUs provides some stability in the sample characteristics between years as well as efficient use of the surveyed PSUs. In the case of the HES, a selected PSU can supply three years of survey sample. The set of PSUs selected for the survey expansion in 2018 were used from 2018/19 to 2020/21.
A new sample of PSUs was selected for the 2021/22 HES, in conjunction with the Living in Aotearoa (LIA) survey, using the sample design of the HES introduced in 2018/19. The information on the frame used in the design was updated using 2018 Census data.
The first stage involved selecting a sub-sample of 2,500 PSUs from the 4,368 selected for the LIA survey from the household survey frame. A new sample of PSUs was again selected for the 2022/23 HES from the LIA sample, which included some overlap with the PSUs selected for 2021/22.
PSUs are stratified according to census information for each PSU on the household frame. Stratification primarily helps to ensure that different subgroups are represented in the sample (for example, regional council areas, NZ deprivation index areas), as well as to manage the sampling rate in more costly strata to reduce total survey cost, and to reduce the sampling variance of the estimates.
The sample was stratified by:
- region – 12 geographical strata based on regional council areas (West Coast, Tasman, Nelson, and Marlborough were collapsed into one region, as were Gisborne and Hawke’s Bay, to create larger regions)
- urban/rural – we used an urban/rural sampling ratio of 1.4:1 to control collection costs for Stats NZ, as rural PSUs are more expensive than urban PSUs to survey
- NZDep2018 Index – the inclusion of NZDep2018 in the stratification ensures a good spread of areas by socio-economic status
- estimated child poverty indicators from census data (at PSU level) – this ensures PSUs with high numbers of lower socio-economic children are represented
- household income – defined as total gross income from the 2018 Census.
Next, we select dwellings within selected PSUs. On average 11.4 dwellings are selected within each PSU, as we aim to achieve around eight households per PSU in our final sample. To do this, dwellings within the PSU are allocated systematically to one of six groups (called panels); each annual survey uses two of these panels – one for the main survey and one for increasing the sample size of Māori (see Oversampling Māori).
Finally, we interview all individuals aged 15 years and over within each selected dwelling. Selections are distributed across the 12-month survey period so that survey results are representative of income patterns across the year.
Oversampling Māori
We oversample Māori to increase the likelihood of achieving a higher number of Māori in our sample than we would by chance. We use Māori descent information from the 2018 Census to identify if a dwelling has a Māori household member. It enables us to identify dwellings likely to contain at least one person identifying as Māori and then sample them at a higher rate. From 2018/2019 to 2020/2021, the electoral roll had been used for this purpose.
Within each PSU, dwellings with at least one Māori household member have a higher chance of being selected into the sample. The resulting differential selection probabilities are adjusted for during weighting. Analysis comparing the use of the electoral roll and census data showed that either method would pick up a similar number of households.
Sample size
A sample size of 20,000 responding households is required to meet the accuracy objectives for the survey described above. With 2,500 PSUs, and just over 11.4 households selected per PSU, we get a total selected sample of approximately 28,600 households. We assume that at least 70 percent of households will provide a full response, leaving a final (achieved) sample of at least 20,000 households (The target sample sizes for the expenditure and net worth components, which are subsamples of the core HES, are 5,500 and 8,500 households, respectively).
With ongoing collection difficulties – including tight labour conditions, heightened awareness of illness, decreasing trust and willingness to provide information, and increasingly mobile individuals and households – Stats NZ reduced the target achieved sample size from 20,000 to 15,000 households. This was to ensure that we could focus the collection workforce on collecting smaller but representative sample for the 2022/2023 HES. The subsample for HES expenditure was not reduced.
This was implemented by removing half of the PSUs allocated to each month from January to July, with the remaining PSUs in the sample retaining the principles and assumptions of the initial sample design. Unbiased collection of the 15,000 households would be expected to achieve the same population and concept representation as the full 20,000 households, but with higher variance and sample error. The total selected sample included approximately 21,100 households.
Reliability of survey estimates
Two types of error are possible in estimates based on a sample survey – sampling error and non-sampling error.
Sampling error is a measure of the variability that occurs by chance because a sample, rather than an entire population, is surveyed. We can calculate the level of uncertainty around a survey estimate by exploring how that estimate would change if we were to draw many survey samples for the same period instead of just one. This allows us to define a range around the estimate (known as a confidence interval) and to state how likely it is that the real value that the survey is trying to measure lies within that range. Confidence intervals are typically set up so that we can be 95 percent sure that the true value lies within the range – in which case this range is referred to as a 95 percent confidence interval.
Confidence intervals are used as a guide to the size of the sample error. A wider confidence interval indicates a greater uncertainty around the estimate. Generally, a smaller sample size will lead to estimates that have a wider confidence interval than estimates from larger sample sizes. This is because a smaller sample is less likely than a larger sample to reflect the characteristics of the total population and therefore there will be more uncertainty around the estimate derived from the sample. The 95 percent confidence interval is used in HES reporting and is calculated as the estimate plus or minus the sample error.
We calculate sample errors using the jackknife method, which is based on the variation between estimates of different sub-samples taken from the whole sample.
Sample errors can be used to identify changes in the data that are due to real-world effects and are unlikely to have occurred by chance due to a particular sample being chosen.
If an observed annual change is larger than the associated sample error on the change, this change is unlikely to be the result of chance and is therefore considered ‘statistically significant’.
With an achieved sample size of 20,000, it is expected that:
- sample errors (95 percent confidence intervals) for the annual change in rates for the nine child poverty measures will be 1.5 percentage points or less
- sample errors (95 percent confidence intervals) for the annual change in rates for Māori children will be 3.9 percentage points or less.
Disrupted data collection resulted in sample errors on some estimates that were higher than designed for. Sample errors are published alongside each of the child poverty measures.
For HES 2022/23, for all children, the sample errors ranged from 1.4 to 3.0 across the nine poverty measures. For Māori children, they ranged from 3.0 to 4.8.
Non-sample error can occur in any survey, whether the estimates are derived from a sample or a census. Sources of non-sample error include non-response, errors in respondents’ reporting or interviewers’ recording of answers, and errors in data processing.
Every effort is made to minimise non-sample error by careful design and testing, training of data collection specialists and editing and quality control procedures during data processing. Any remaining error is very difficult to identify and quantify.
Non-response can affect the reliability of results and introduce bias if the people who do not respond systematically differ in some important characteristic from those who do respond. For example, if the response rate is low among people with low income, not only can we be less confident in the income estimates for this group, but national estimates will also be biased towards higher incomes.
We employ additional effort in the field to achieve as high a response rate as possible from low socio-economic groups and from different regions.
Our weighting methodology is also designed to mitigate the impact of lower response rates from certain subgroups of the population (that is, by adjusting the weights upwards). However, some bias will remain if the missing respondents have substantially different income to those who do respond.
Since the 2019/20 HES, the collection difficulties meant that response rates were lower than previous years and varied across the country. We conducted extensive investigation into potential bias arising from non-response, given the variable response rates, and concluded that the national child poverty measures were reliable, but there was a non-trivial risk of bias for statistics at the regional level.
Imputation
We use imputation to replace missing data for households with partially completed surveys (item non-response), as well as for non-responding individuals (unit non-response) residing in otherwise fully responding households. Households are defined as fully responding when all questions about household characteristics, such as household membership and housing costs, have been answered.
The demographic information collected allows us to impute the record of the non-responding household member by linking to the IDI or via imputation software (the imputation method is described below). Rather than discarding incomplete records, these methods allow us to make the best use of the data collected in HES.
We use imputation software when IDI information or supporting responses (in the case of sex at birth and gender) is unavailable, for the following variables:
Income
employment earnings and government transfers where a respondent has not been linked to the IDI
self-employment income where respondent is known to have such but has not provided a value
investment income where respondent is known to have such but has not provided a value.
Housing costs
- local and regional authority property rates for primary property.
Person demographics
age
gender
sex at birth
ethnicity
disability status for people over the age of 2 years.
We use the Canadian Census Editing and Imputation System (CANCEIS) software developed by Statistics Canada to perform deterministic and nearest neighbour donor imputation for the variables above. The nearest neighbour donor imputation method (NIM) identifies a donor (respondent) ‘nearest’ to the recipient (the non-respondent).
Weighting
Weighting is used to estimate the population from the sample. A weight is attached to each unit in the sample that indicates the number of households and people it represents in the final population estimate. Weighting ensures that estimates reflect the sample design, adjust for non-response, and align with current population estimates. For HES, deriving weights is a multi-stage process.
First, we calculate a household’s initial weight. This depends on the sample design and equals the inverse of the household’s selection probability (which itself depends on the selection probability of the PSU from the household survey frame, as well as the selection probability of the dwelling within the PSU).
For 2022/2023, the selection probability for the dwelling was calculated as the inverse of the number of panels in the PSU (for example, 1/6). This method is used in other Stats NZ surveys and aligns with common practice.
Second, we adjust the initial weights to account for unit non-response. Non-responding households are given weights of zero, while the initial weights of responding households are scaled by a rate-up factor based on the inverse of the weighted response rate of households. This is done in weighting classes formed by cross-classifying variables that are correlated with likelihood to respond. The weighting classes used for the 2022/2023 HES were: region, NZDep2018, ethnic densities, urban/rural, and interview quarter. This step creates adjusted response weights from the initial weights. Given the reduced achieved sample size, we completed extra investigation into the performance of the weighting to ensure we did not have unusual response factors.
Finally, we calibrate the adjusted response weights so that estimates reflect expected population totals or benchmarks. Calibration adjusts for under-coverage of the target population. We use a form of calibration called integrated weighting to ensure that all individuals in the same household are given the same weight and that household statistics derived from person-level data match the same statistics calculated directly from household-level data.
For 2022/2023 we used benchmarks based on the estimated resident population (ERP) and administrative data (admin data) on income and benefit receipt available in the Integrated Data Infrastructure (IDI).
The ERP for a particular year uses census information adjusted for births, deaths, and net migration since the most recent census (the base year). The 2018 Census has been used as the base year for the ERP, from 2018/2019 through to 2022/2023, though the 2018/2019 HES was revised to use 2018 Census information, as it initially used the 2013 Census as the base year. A similar revision will likely be required for the 2023/2024 HES once population information from the 2023 Census becomes available.
We calibrated the HES income distribution of adults to the income distribution of adults available in the IDI, and then calibrated these (adjusted) weights to the other benchmarks (that is, ERP benchmarks and the number of people in the IDI who received any government benefit). The benchmark variables/categories used in the calibration process are listed below.
Benchmarks based on the ERP are:
children – three 5-year age groups: 0‒4, 5‒9, 10‒14 years
sex by age groups – males and females by 14 age groups: 15‒17, 18‒19, 20‒24, 25‒29, 30‒34, 35‒39, 40‒44, 45‒49, 50‒54, 55‒59, 60‒64, 65‒69, 70‒74, 75+ years
region – 12 regions: Northland, Auckland, Waikato, Bay of Plenty, Gisborne-Hawke’s Bay, Taranaki, Manawatū-Whanganui, Wellington, West Coast-Tasman-Nelson-Marlborough, Canterbury, Otago, and Southland
Māori adults by age – two age groups for Māori: 15‒29, 30+ years
households by region and household type – 12 regions for households with two adults or without two adults, separately.
Benchmarks from admin data are:
people who received any government benefit, excluding New Zealand Superannuation and Veterans’ pension
the income distribution of adults – income deciles using total income (sum of income from all regular income sources) at the individual level.
Population rebase
Weights are revised following each census, based on the latest population counts (called a population rebase). For the current HES, we used the weights based on the 2018 Census population.
The last major rebase was for Census 2013 and was implement in the 2014/15 HES (Income) year. The revised data applied to the income, housing costs and material well-being data from 2006/07 to 2013/14, and to the expenditure data for 2006/07, 2009/10, and 2012/13.
For Census 2018, we only rebased the 2018/19 year in the 2019/20 year.
When 2023 Census information becomes available, we will undertake another rebase which will likely be applied to years back to 2018/19.
Data collection methodology
HES (Income) has four survey components:
- a household questionnaire
- a housing expenditure questionnaire
- an income questionnaire for each household member aged 15+. Demographic information for each member aged 15+ is collected within the income questionnaire and can be self-completed by the respondent. Demographic information for children (aged under 15) is collected from the designated parent (designated in the household questionnaire) in their income questionnaire.
- a material well-being questionnaire for one member per household who is aged 18+ (chosen randomly).
Data is collected by Stats NZ team of data collection specialists who visit selected households and complete face to face interviews with each eligible household member. Information is collected using:
- A household level computer assisted interview questionnaire which collected information on household characteristics, including housing costs
- An individual computer assisted interview questionnaire which collected information on income, employment and other personal characteristics from each usual resident aged 15 years and over
- An individual computer assisted interview questionnaire that collected material wellbeing information from one randomly selected usual resident aged 18 years or over.
- A section of computer assisted self-complete questions used to collect personal demographics.
Since the HES 2021/22, some surveys were conducted by phone.
Data processing methodology
Assignment of cases is centralised in Salesforce, a system that allows for real-time observation of response rates. The team of data collection specialists use BLAISE to conduct household surveys. BLAISE is a computer assisted interviewing (CAI) software that guides the interviewer through the correct sequence of questions. They are displayed one at a time and have automatic routing built in to make sure that respondents are only asked questions that are relevant to them.
Once submitted, the data is stored and a response is logged in our centralised tracking software, Salesforce. This allows for a real-time overview of progress in reaching the target response rates for demographic groups.
The data is then fed through various editing stages, before being loaded into the processing database, EPIC.
Admin data is used to replace survey data
Despite our best efforts to obtain accurate information about respondents’ income, relying on survey responses for data of this nature inevitably introduces uncertainty. For example, respondents may not remember or may fail to disclose all sources of income over the past year to the interviewer. They may provide only ‘rough estimates’, describe income after tax, or forget changes to their regular income over the year. In some cases, family members may not know the income of other family members. Benefit income is often understated – as people can forget benefit income when it was received only for small periods throughout the year.
We combine survey data on income with admin data from the Integrated Data Infrastructure (IDI), a large research database managed by Stats NZ holding microdata about people and households. It contains full tax information related to individuals, including data provided by employers for each employee (the employee monthly schedule), self-employment income, and some investment income. We also use data provided by the Ministry of Social Development (MSD) about benefits paid, including Working for Families (WFF) tax credits, and accommodation supplement.
Since HES 2018/19 we have used admin data to provide annual salary and wage and government transfer income. In HES 2018/19, we asked respondents their income but used the admin data in the published statistics. Starting with HES 2019/20, respondents have not been asked to provide their income amounts for these income variables.
Some income sources are not currently available in the IDI, including investment income, some sources of irregular income, and non-taxable income. We collect these income variables directly from respondents.
Salary and wage income is provided on an individual’s pay day and is updated in the IDI on a quarterly basis. However, other income (for example, from self-employment) relies on individuals providing their tax returns, which may be delayed before being included in the IDI. Due to this timeliness issue, we use the self-employment income provided to us by the respondent.
Information on income received from the WFF scheme is available from IRD and from MSD and this data is used for relevant households. However, for some households this income is received annually and there can be delays in this information being incorporated in the IDI due to delays in filing tax returns. For this reason, annual income from WFF is estimated for some families and is revised in the following year when more information is available
Consistency with other periods
Although we adjust survey results for various demographic variables (age, sex, and region), there can be variability in survey estimates from one survey collection period to the next. This variability is because a different group of households is selected for each survey.
Suppressed estimates
For confidentiality purposes, we suppress data in the released tables if a cell is based on fewer than five people or households. Any suppressed cells are identified in the tables with an ‘S’.
For information on Methodology, please see Child poverty statistics: Year ended June 2023 – technical appendix
en-NZ