Household Economic Survey data collection
In all cases, contact with the selected households is made through personal visits by interviewers. The number of eligible households on the panel list is the target number of respondents for the selected area. Thus, the aim of the data collection operation is to obtain completed documents from as many eligible addresses as the survey’s financial and time constraints allow. On average, this is four eligible responding households per panel.
When we cannot contact a household on the first visit, the interviewer makes at least two further visits at different times of the day to establish contact with the household. If, after the third visit, we still can’t contact the household, the household is a non-respondent. If an address contains more than one household, the interviewer randomly selects and surveys one household. Each household is interviewed and asked to keep an expenditure diary for the following two weeks.
##Target population and survey population
The survey scope defines the target population and who is of interest for the survey. This defines the benchmarks that are used (the size of the target population is equal to the total of the benchmarks).
The target population is all usually resident individuals living in private dwellings in urban and rural areas in the North Island, South Island and Waiheke Island of New Zealand. The following people are out of scope:
Overseas visitors who have been or expect to be, in New Zealand for less than 12 months
Residents of non-private dwellings such as hotels, motels, hostels, and boarding houses
Long-term residents of institutions such as hospitals and/or prisons. Persons in homes for the aged (including rest homes) where there are communal cooking facilities (where long-term is defined as more than 6 weeks)
Members of permanent armed forces who live in non-private dwellings e.g. barracks
Members of overseas New Zealand armed forces
Non-NZ diplomats and their families
New Zealand usual residents of offshore islands with the exception of Waiheke Island.
The survey population identifies any exclusions from the target population due to practical survey difficulties. The survey population is the target population excluding:
New Zealand usual residents temporarily overseas who do not return within the survey period (defined as 1 month)
New Zealand usual residents temporarily staying elsewhere in NZ who don’t return within the survey period (see below)
People residing at a wharf or landing place (i.e. people in ships, boats etc).
Children at boarding schools are also not surveyed, but housing costs on behalf of those children are included in the record-keeping of the parent or guardian. The survey population is therefore marginally different from the target population.
HES (Income) has four survey components:
- a household questionnaire
- an housing expenditure questionnaire
- an income questionnaire for each household member aged 15+
- a material well-being questionnaire for one member per household who is aged 18+ (chosen randomly).
Data collection methodology
Data is collected by Stats NZ team of survey interviewers who visit selected households and complete face to face interviews (where possible) with each eligible household member. Information is collected using:
*A household level computer assisted interview questionnaire which collected information on household characteristics, including housing costs
*An individual computer assisted interview questionnaire which collected information on income, employment and other personal characteristics from each usual resident aged 15 years and over
*An individual computer assisted interview questionnaire that collected material wellbeing information from one randomly selected usual resident aged 18 years or over.
*A section of computer assisted self-complete questions used to collect personal demographics.
The 2019/20 Household Economic Survey flowcharts can be found under Questionnaire & Forms section.
Sample design information
The HES uses a stratified, multistage, cluster design. The first stage involves the selection of a sample of Primary Sampling Units (PSUs) from the Household Survey Frame (HSF). We select a first-stage sample of 2,500 PSUs.
At the first stage of sample selection, PSU’s are stratified according to Census information for each PSU on the household frame. Stratification is done to:
Ensure representation of different subgroups (e.g. regional councils) in the sample.
Control the sampling rate in more expensive strata that would result in reducing total survey cost.
Reduce the sampling variance of the estimates.
The HES 19/20 sample design uses the following stratification:
Regional stratification 12 geographical strata based on regional council areas. West Coast, Tasman, Nelson, and Marlborough regions, and Gisborne and Hawke’s Bay regions were merged to create larger regions.
Urban/rural stratification We used an urban/rural sampling ratio of 1.35:1 to control collection costs for Stats NZ as rural PSUs are more expensive than urban PSUs to survey.
Stratification according to NZDep13 Index which is an index of socioeconomic deprivation. It provides a deprivation score for each meshblock (smallest geographical area) in New Zealand (University of Otago, 2014). The inclusion of NZDep13 ensures a good spread of areas by socio-economic conditions
Stratification according to child poverty indicators at PSU level. This ensures we have a good representation of PSUs with high numbers of lower socio-economic children
Stratification according to household income (defined as total gross income from the 2013 Census)
The second stage of the sample selection consists of selecting dwellings within the selected PSUs. We select an achieved cluster size of eight households per PSU. This means an initial sample of 11.4 households, on average, will be selected in each PSU. After selecting sample dwellings, all individuals aged 15 years and over are interviewed within each selected dwelling. Selections are distributed across the twelve-month survey period so that survey results are representative of income patterns across the year.
###Oversampling Māori To ensure we achieved a better representation of Māori in our sample than we would achieve by chance, we implemented an oversample of Māori using the Māori descent information on the electoral roll. This information is available for the whole population regardless of which roll they chose to be on for voting purposes. This information enables us to identify dwellings that are likely to contain at least one person who identified as Māori and then sample them at a higher rate.
Within each PSU, dwellings with at least one Māori household member have a higher chance of being selected into the sample. Once selected, we interview the household in the normal way regardless of whether they are of Māori descent or not.
In the end, we added 4,387 households in the form of a booster panel. All of which had at least one person of potential Māori descent as per the election roll. At the analysis stage, our weighting methodology adjusts for these different selection probabilities as well as adjusting for non-response.
##Sample size A sample is selected of 2,500 primary sampling units (PSUs), which is equal to around 28,500 households, to yield a sample of 20,000 responding households (an achieved sample rate of 70 percent).
Reliability of survey estimates
Two types of error are possible in estimates based on a sample survey – sampling error and non-sampling error.
Sampling error is a measure of the variability that occurs by chance because a sample rather than an entire population is surveyed. We can calculate the level of uncertainty around a survey estimate by exploring how that estimate would change if we were to draw many survey samples for the same time period instead of just one. This allows us to define a range around the estimate (known as a “confidence interval”) and to state how likely it is that the real value that the survey is trying to measure lies within that range. Confidence intervals are typically set up so that we can be 95% sure that the true value lies within the range – in which case this range is referred to as a “95% confidence interval”.
Confidence intervals are used as a guide to the size of sampling error. A wider confidence interval indicates a greater uncertainty around the estimate. Generally, a smaller sample size will lead to estimates that have a wider confidence interval than estimates from larger sample sizes. This is because a smaller sample is less likely than a larger sample to reflect the characteristics of the total population and therefore there will be more uncertainty around the estimate derived from the sample. The 95% confidence interval (CI) is used in HES reporting and is calculated as the estimate plus or minus the sampling error.
Statistical significance - some changes in estimates from one year to the next will be the result of different samples being chosen, whilst other changes will reflect real world changes in income across the population. Sample errors can be used to identify changes in the data that are statistically significant; that is, they are unlikely to have occurred by chance due to a particular sample being chosen. If an observed change from one year to the next is larger than the associated sample error, then this change is unlikely to be the result of chance and is therefore statistically significant.
We calculate sampling errors using the jackknife method which is based on the variation between estimates of different subsamples taken from the whole sample.
With the achieved sample size of 20,000, it is expected that:
*Sample errors (95% confidence intervals) for the annual change in rates for the 9 child poverty measures are to be 1.5 pp or less.
*Sample errors (95% confidence intervals) for the annual change in rates for Māori children are to be 3.9 pp or less.
Note that the achieved sample size in HES 2020 was less than the 20,000 achieved households as the survey collection ended in March 2020 due to the COVID-19 lockdown. Sample errors are therefore higher than for HES 2019.
Non-sampling error can occur in any survey, whether the estimates are derived from a sample or a census. Sources of non-sampling error include non-response, errors in reporting by respondents or recording of answers by interviewers, and errors in coding and processing the data.
Non-sampling errors are difficult to quantify. Non-response can affect the reliability of results and can introduce a bias if the people who don’t respond differ from those who do on some important characteristic. For example, if people with low-income tend not to respond to surveys as much as others then the survey may not represent this end of the distribution well.
Prior to the new HES being developed in 18/19 there is some evidence to suggest that the previous much smaller HES may have had some non-response bias particularly in years where expenditure was also collected and response rates were consequently a little lower, for example the 15/16 HES. The changes made to the design and weighting methodology in 2019 are expected to reduce the likelihood of non-response bias and as a consequence we expect low-income and material hardship rates could be slightly higher than they otherwise would have been.
Every effort is made to reduce non-sample error to a minimum by careful design and testing, training of interviewers and editing and quality control procedures during data processing. We have also employed additional effort in the field to focus on ensuring we have achieved as high a response rate from low socio-economic groups as possible.
The use of admin data from tax records will reduce the likelihood of respondent error in recalling wage and salary and benefit amounts affecting the results.
Our weighting methodology described below under Population weighting adjustments will also reduce any remaining non-response bias as much as possible.
Attempts are made to ensure that, as much as possible, interviews are completed by the selected respondent, to ensure collection of accurate information. However, there are some circumstances where all members of a household cannot be reached to complete an interview. In some circumstances, information collected from another person in the household on behalf of the respondent is allowed. This is known as a proxy response.
In HES a proxy may provide information in ‘family type’ households where:
*the whole household is informed about the survey. All agree to participate, but are not able to be present when the questionnaires are administered
*children are away at boarding school
*people don't work and have no source of income
*people are elderly, sick, or mentally incapacitated.
In all proxy interviews, the interviewer must be convinced the proxy is totally familiar with the other respondent’s information.
##Non-response and Imputation
Despite the best efforts of our survey interviewers there will always be a proportion of households that cannot be contacted or do not respond to the survey. There are two types of non-response: total or unit non-response (when no information is collected on a sampled unit or where the amount or quality of the information supplied is insufficient to be a response) and partial or item non-response (when the absence of information is limited only to some variables). Ignoring non-response has the potential to introduce bias in survey results if respondents and non-respondents are different according to key characteristics of interest. For example, if people on low income are more likely to be unable to be contacted for our survey or refuse to participate then the resulting estimates may under-represent the number of low-income people in the population if left untreated.
In general, the effects of unit non-response bias in our surveys are treated using a form of weight adjustment to increase the survey weight of respondents.
We imputed the following variables in HES 2019/20:
HES is a sample survey that uses statistical weights to calculate income, housing costs and material well-being estimates for the total New Zealand population. We revise the weights following each census, based on the latest population counts (called a populationrebase). For the current HES, we used the weights based on the 2013 Census population.
We applied the last rebase to HES in the 2014/15 HES (Income) year. The revised data applied to the income, housing costs and material well-being data from 2006/07 to 2013/14, and to the expenditure data for 2006/07, 2009/10, and 2012/13.
See Household Economic Survey population rebase: year ended June 2007–15 for more information about the revisions.
##Population weighting adjustments
Weighting is used to estimate the population from the sample. Each unit in the sample is given a weight that indicates the number of households and people it represents in the final population estimate. Weighting ensures that estimates reflect the sample design, adjust for non-response, and align with the current population estimates.
For HES, deriving the weight is a multi-stage process.
The first stage of weighting involves calculating a household’s initial weight. The initial weight depends on the sample design and equals the inverse of the selection probability.
The second stage involves adjusting the initial weights to account for unit non-response. The initial weight of a non-responding household is reduced to zero, while initial weights of responding households are scaled up by a rate-up factor based on the inverse of the weighted response rate of households. This is done in weighting classes formed by cross-classifying variables that are correlated with likelihood to respond, (e.g. region, NZDEP2013, ethnic densities, urban/rural, and interview quarter).
The final stage in the weighting process is calibration. Calibration is used to adjust the adjusted response weights to expected population totals or benchmarks. Calibration adjusts for under-coverage of the target population. A form of calibration, integrated weighting, is used to ensure that all individuals in the same household are given the same weight and that household statistics derived from person-level data match the same statistic calculated directly from household-level data.
The benchmark variables used for the HES income were obtained from two sources: benchmarks based on the estimated resident population (ERP), and benchmarks from admin data available in the IDI. From HES1920 onwards we are using Census 2018 base ERP estimates, adjusted by births, deaths and net migration. HES18/19 has been revised from using the Census 2013 base ERP to Census 2018 base ERP estimates. See Rebase for more information.
The benchmark variables/categories used in the calibration process are listed below:
Benchmarks based on the Estimated Residential Population (ERP) are:
*Children benchmarks (3 age groups: 0-4, 5-9, 10-14)
*Age by sex benchmarks (2 sex groups: Male, Female by 14 age groups: 15-17, 18-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75+)
*Regional benchmarks. There are 12 regions (Northland, Auckland, Waikato, Bay of plenty, Gisborne and Hawke’s Bay, Taranaki, Manawatu-Wanganui, Wellington, West coast- Tasman- Nelson – Marlborough, Canterbury, Otago, Southland)
*Number of Māori adults by age group (2 Māori age groups: 15-29, 30+)
*Number of households by 12 HLFS regions by 2 household types (2-adult, non 2-adult households).
Benchmarks from the admin data:
*Number of people who received any Government benefit excluding NZ Superannuation and Veterans Pension in the 18/19 year.
*Calibrate to the income distribution of adults. Income deciles using total income (sum of income from all regular income sources) at the individual level in the IDI.
The calibration process was carried out using two steps: In the first step, the HES income distribution of adults was calibrated to the income distribution of adults obtained from the IDI. In the second step, the adjusted weights in the first step are calibrated to the other benchmarks (i.e., ERP benchmarks and number of people who receive any benefit from the IDI).
The benchmark variables used for the HES income were obtained from two sources: benchmarks based on the estimated resident population (ERP), and benchmarks from admin data available in the IDI..
The person benchmarks used for HES are: regional population estimates; children sub-population estimates by three age groups; adult sub-population estimates by sex and 13 age groups (including 75 years and over); and adult Māori sub-population estimates by two age groups (including 30 years and over).
The household benchmarks are two categories of household composition (two-adult households and non-two-adult households), and these categories split further by regions.
##Consistency with other periods
Although we adjust survey results for various demographic variables (age, sex, and region), there can be variability in survey estimates from one survey collection period to the next. This variability is because a different group of households is selected for each survey.
##Using material well-being data
The material well-being questionnaire asks about ownership of particular items, or doing certain activities, and the extent that people economise. We also ask respondents how they rate their life satisfaction and whether income meets everyday needs.
From the material well-being questionnaire we publish selected results for satisfaction levels, and for adequacy of income to meet everyday needs. Stats NZ does not produce an index measurement of material well-being from this data. Other agencies can use such index data in conjunction with other measures (eg income, expenditure on housing costs, or household demographics), to give an indication of the material standard of living of New Zealanders.
For confidentiality purposes, we suppress data in the released tables if a cell is based on fewer than five people or households. Data is no longer suppressed if a relative sample error is 51 percent or higher (21 percent for cross-tabulated data).
See the Reliability of survey estimates section above.