Cover Page

Contents

Cover

Half Title page

Title page

Copyright page

Preface

Chapter 1: Register Surveys — An Introduction

1.1 The purpose of the book

1.2 The need for a new theory and new methods

1.3 Four ways of using administrative registers

1.4 Preconditions for register-based statistics

1.5 Basic concepts and terms

1.6 Comparing sample surveys and register surveys

1.7 Conclusions

Chapter 2: The Nature of Administrative Data

2.1 Different kinds of administrative data

2.2 How are data recorded?

2.3 Administrative and statistical information systems

2.4 Measurement errors in statistical and administrative data

2.5 Why use administrative data for statistics?

2.6 Comparing sample survey and administrative data

2.7 Conclusions

Chapter 3: Protection of Privacy and Confidentiality

3.1 Internal security

3.2 Disclosure risks – tables

3.3 Disclosure risks – microdata

3.4 Conclusions

Chapter 4: The Register System

4.1 A register model based on object types and relations

4.2 Organising the work with the system

4.3 The populations in the system

4.4 The variables in the system

4.5 Using the system for micro integration

4.6 Three kinds of registers with different roles

4.7 Register systems and register surveys within enterprises

4.8 Conclusions

Chapter 5: The Base Registers in the System

5.1 Characteristics of a base register

5.2 Requirements for base registers

5.3 The Population Register

5.4 The Business Register

5.5 The Real Estate Register

5.6 The Activity Register

5.7 Everyone should support the base registers

5.8 Conclusions

Chapter 6: How to Create a Register — Matching and Combining Sources

6.1 Preconditions in different countries

6.2 Matching methods and problems

6.3 Matching sources with different object types

6.4 Conclusions

Chapter 7: How to Create a Register — The Population

7.1 How should register surveys be structured?

7.2 Register survey design

7.3 Defining a register’s object set

7.4 Defining the statistical units

7.5 Creating longitudinal registers – the population

7.6 Conclusions

Chapter 8: How to Create a Register — The Variables

8.1 The variables in the register

8.2 Forming derived variables using models

8.3 Activity data

8.4 Creating longitudinal registers – the variables

8.5 Conclusions

Chapter 9: How to Create a Register — Editing

9.1 Editing register data

9.2 Case studies – editing register data

9.3 Editing, quality assurance and survey design1

9.4 Conclusions

Chapter 10: Metadata

10.1 Primary registers – the need for metadata

10.2 Changes over time – the need for metadata

10.3 Integrated registers – the need for metadata

10.4 Classification and definitions database

10.5 The need for metadata for registers

10.6 Conclusions

Chapter 11: Estimation Methods — Introduction

11.1 Estimation in sample surveys and register surveys

11.2 Estimation methods for register surveys that use weights

11.3 Calibration of weights in register surveys

11.4 Using weights for estimation

11.5 Conclusions

Chapter 12: Estimation Methods — Missing Values

12.1 Make no adjustments, publish ‘value unknown’

12.2 Adjustment for missing values using weights

12.3 Adjustment for missing values by imputation

12.4 Missing values in a system of registers

12.5 Conclusions

Chapter 13: Estimation Methods — Coverage Problems

13.1 Reducing overcoverage and undercoverage

13.2 Estimation methods to correct for overcoverage

13.3 Undercoverage in the administrative system

13.4 Conclusions

Chapter 14: Estimation Methods — Multi-valued Variables

14.1 Multi-valued variables

14.2 Estimation methods

14.3 Application of the method

14.4 Linking of time series using combination objects

14.5 Conclusions

Chapter 15: Theory and Quality of Register-based Statistics

15.1 Is there a theory for register surveys?

15.2 Measuring quality – why and how?

15.3 Analysing administrative sources – input data quality

15.4 Output data quality

15.5 The integration process – integration errors

15.6 Random variation in register data

15.7 The register system and data warehousing

15.8 Conclusions

Chapter 16: Conclusions

References

Index

Register-based Statistics

WILEY SERIES IN SURVEY METHODOLOGY

Established in Part by WALTER A. SHEWHART AND SAMUEL S. WILKS

Editors: Mick P. Couper, Graham Kalton, Lars Lyberg, J. N. K. Rao, Norbert Schwarz, Christopher Skinner

A complete list of the titles in this series appears at the end of this volume.

Title Page

Preface

From the preface to the first edition

Register surveys are becoming increasingly common within a growing number of national statistical offices. However, they are also common within enterprises and other organisations, where data from the organisation’s own administrative systems are used to produce statistics on, for example, production, sales and wages.

Although register-based statistics are the most common form of statistics, no well-established theory in the field has existed up to now. There have been no well-known terms or principles, which have made the development of both register-based statistics and register-statistical methodology all the more difficult. As a consequence of this, ad hoc methods have been used instead of methods based on a generally accepted theory.

Many countries are investigating the possibilities to use an increasing amount of administrative data for statistical purposes. It is necessary to reduce response burden and costs; increasing nonresponse in censuses and sample surveys also makes this new strategy necessary. A new approach is necessary and register surveys require that suitable statistical methods be developed.

We have studied the requirements for register-based statistics through analysis of Statistics Sweden’s system of statistical registers. Since 1994, we have devoted an increasing part of our work, at the Department of Research and Development at Statistics Sweden, to the study of register surveys. We have also worked together with a number of manufacturing enterprises and analysed their administrative data for the purposes of management. These experiences are also used in this book.

The first version of this book was published in 2004 in Swedish. It has been used in a number of study groups within Statistics Sweden. Around 50 people at Statistics Sweden have read and commented on different parts of the first Swedish version of this book. In addition, several individuals were interviewed to provide material for different examples and methodological sections.

The study groups based on the Swedish book gave us a very good overview of methodological problems regarding the register-based statistics produced by Statistics Sweden and helped us in our work with the first edition of the English version that was published in 2007.

Our work on the second edition

We have used the first edition in a number of courses given in Europe and Latin America. The first edition was translated into Spanish by INEGI, the national statistical office in Mexico. It was very important for us to have the opportunity to discuss register-based statistics with colleagues from Latin America and learn about their quite different preconditions regarding administrative data and statistics production. Our experiences from these courses and discussions have been incorporated in the new edition.

Since 2010 we have worked together with Professor Thomas Laitila at Örebro University. He has inspired us to think about the entire production system at a national statistical office. In the first edition we mainly discussed the register system, but in the second edition we also discuss the production system as a whole. Together with Thomas Laitila, we have worked with a research project regarding the quality of administrative data for economic statistics. The main results of this project are used in the new edition.

Our supporters and sources of inspiration

Our work with register-based statistics at Statistics Sweden was supported by Jan Carling, Director General 1993–1999, and Svante Öberg, Director General 1999–2005. Their active support was necessary for the success of our work.

Our courses in Latin America have been sponsored by the Inter-American Development Bank (IDB) and the United Nations Population Fund (UNFPA). The Spanish translation of the first edition was sponsored by the IDB. Finally, the research project on the quality of administrative data for economic statistics was a part of the BLUE-ETS project financed by the European Commission. Thanks to these sponsors, we have acquired experiences that have been very important for our work on the second edition.

Professor Carl-Erik Särndal has been a very important discussion partner during our work on the book. We have discussed important and difficult issues with him from the beginning of our work with the Swedish version to when we completed the second English edition. His broad experience from statistical offices in different countries and his background as a specialist in sample surveys have been enormously useful.

It is our hope that Register-based Statistics – Statistical Methods for Administrative Data and its proposals will stimulate the discussion of register statistics and give support to those who work with administrative data at national statistical offices.

Örebro, Sweden

Anders Wallgren
Britt Wallgren
ba.statistik@telia.com

CHAPTER 1

Register Surveys – An Introduction

Three types of statistics based on microdata are published by national statistical offices – statistics based on sample surveys, statistics based on censuses and statistics based on administrative registers. This book deals with the third type, statistics based on administrative registers, where instead of collecting data through sample surveys and censuses, administrative registers from different sources are adapted and processed to make the data suitable for statistical purposes. This kind of survey is called a register survey.

We introduce a number of concepts and principles that are used when discussing register surveys. These concepts and principles form the basis for a theory of this type of survey. We primarily discuss register surveys at national statistical offices. There is growing interest in this area; many countries increasingly use administrative data for statistical purposes, and there is a growing demand for a theory of register surveys.

1.1 The purpose of the book

Our main purpose is to describe and explain the methods that should be used for register surveys. Conducting a register survey means that a new statistical register is created with existing sources. The statistical register is then used to produce estimates required for the survey. What methods should be used in creating such a statistical register? One or more administrative registers are used when a new statistical register is created and the statistical register can differ from the administrative sources in many ways.

A system of statistical registers consists of a number of registers that can be linked to each other. In the Nordic countries, the national statistical offices have developed systems of registers that are used in the production of statistics. When new statistical registers are created, this register system becomes an important source that can be used together with different administrative sources. Another purpose of the book is to explain how such register systems should be designed and used in the production of statistics.

When a national statistical office starts using more and more administrative sources, the statistical production system of that office will gradually change. From a system based on enumerators or interviewers, address lists or maps, the system will become increasingly register-based. Sample surveys will be based on the Population Register or the Business Register instead of address lists or maps – variables in sample surveys can come from administrative registers as well as from telephone interviews or questionnaires. In addition to the change in methods used for sample surveys, new kinds of register-based statistics can also be produced. A third purpose of the book is to explain how administrative registers can be used to change the statistical production system of a national statistical office to improve cost efficiency and statistical quality.

Preconditions in different countries

The Nordic countries started to use administrative registers during the 1960s when paper-based administrative registers were transformed into computer-based flat files. The preconditions for using administrative registers for statistical purposes were good. This explains why the Nordic statistical offices now have access to large amounts of administrative data,1 and that the quality of these data is high in comparison with most other countries. Consequently, it has been possible to create statistical register systems that have made statistics production efficient and even to conduct completely register-based population and housing censuses. Identifying variables as identity numbers for persons and enterprises have high quality and deterministic matching is therefore easy.

The preconditions for using administrative data in many countries are today not as good, and changing the production system into a register-based system will take many years. During that period, administrative systems will gradually be improved, so many other countries will be able to use administrative data efficiently in the future. Therefore, a clear understanding of the Nordic experiences from the beginning will facilitate development in new register countries.

However, we also discuss problems that arise in statistical offices in countries without the same preconditions. In North America, there is another tradition of working with administrative data. When identifying variables are of lower quality and coverage of administrative systems is poorer, methods have been developed for linking records and estimating population size that are important to use under these circumstances.

Our aim is to present statistical methods and principles of general interest, and we rely mostly on experiences and case studies from Statistics Sweden to illustrate these general methodological issues. As a complement to this aim, we also present some cases from new register countries that have recently started to develop register-based statistics.

We started writing books on register-based statistics during the 1990s, and during these years we have had access to registers and colleagues at Statistics Sweden. This access to a fully register-based production system has been vital for analysing and discussing register-based statistics.

Case studies are essential – in a book on register-based statistics we cannot present ideas with formulas as in books on sampling theory. We use case studies based on real data and charts with small miniature registers to illustrate register-statistical methods and quality issues.

1.2 The need for a new theory and new methods

Sample surveys are based on methods that have been derived from an established theory – sampling theory. This theory has been developed within the academic world and statistical offices, and consists of terms and principles that are generally well known. Scientific literature and journals develop and spread the methodologies for sampling and estimation. Because the terms and principles are well known, people working with sample surveys can easily communicate and exchange their experiences.

Censuses with their own data collection are based on a long tradition of population censuses and the collection of data from local authorities, schools and enterprises. Measurement errors, design of questionnaires and nonresponse are methodological issues that also apply to sample surveys. Censuses and sample surveys are closely related in terms of methodology – censuses are often considered as special cases where the sample is the entire population.

Although register-based statistics are a common form of statistics used for official statistics and business reports, no well-established theory in the field exists. There are no recognised terms or principles, which makes the development of register-based statistics and register-statistical methodology all the more difficult. As a consequence, ad hoc methods are used instead of methods based on a generally accepted theory.

One important reason for this shortfall is that the subject field of register surveys is not included in academic statistics. Statistical theory within statistical science is understood as consisting of probability theory and statistical inference. Sampling theory is included within this theoretical school of thought, but register surveys based on total enumeration are not.

Unfortunately, statistical science has so far not included any theory on statistical systems. Statistical offices, larger enterprises and organisations do not often carry out separate surveys. It is more common that statistical information systems are built, which constantly generate new data. A statistical theory is necessary to describe the general principles and to develop the conceptual apparatus for such statistical information systems. Register surveys should be included in this theory. We formulate four basic principles for using administrative registers (Chart 1.1).

Chart 1.1 Four principles for using administrative registers for statistics

We use these principles in the book and gradually introduce the register-statistical terms that are needed for the discussions.

Chart 1.2 illustrates the present situation. Estimates from four different surveys are compared, and these comparisons show clearly that the systems approach often is missing in the work with statistical surveys. People are fully occupied with their own surveys and different surveys are also published at different points in time. As a rule most estimates are unique for one survey, but in Chart 1.2 we have found one identical variable and created the table with corresponding estimates from each survey. If we look at one survey at a time, we do not see any errors except for the sample survey in (4) where we have margins for the sampling error. But when we look at the four surveys together, we understand that there must be more serious errors in these surveys. We thus need a theory for systems of surveys and new methods for quality assessment. We return to this example in later chapters.

Chart 1.2 Employees by economic activity, November 2004, thousands

Why are there such large differences between the surveys? The estimates for mining, quarrying and manufacturing can be 636 or 717 thousands – the inconsistencies are more serious than the sampling error. The methodological work should consist of three steps: compare surveys and find errors and inconsistencies; find out why we have these inconsistencies; and finally, reduce the errors and inconsistencies.

Chart 1.2 also illustrates that we only have one established way of giving a numerical description of the quality of published estimates – margins for the sampling error. There is no commonly used way of describing the quality of register-based statistics. However, the non-sampling errors of sample surveys are as a rule not described in the same clear manner as the sampling errors; here we also lack methods for giving a numerical description of the quality of published estimates.

In 1995, Statistics Denmark published Statistics on Persons in Denmark – A Register-based Statistical System. The Danish book presents a systematic review of register-statistical work and describes how to design a well-prepared register system. The book was the first attempt to create a theory for register-based statistics and to describe the methods that are used. We build on and add to that work in this book.

1.3 Four ways of using administrative registers

When a statistical office plans to use administrative registers for statistical purposes, the office faces a survey design issue. How should the new sources be used? How should the existing surveys be modified or reduced? To answer these questions the administrative sources should be analysed by experienced subject-matter specialists and methodologists with a good overview of the production system.

An administrative register or source can be used in four different ways:

1. Completely alone.
If the source has good coverage and the variables in the source are of good quality, then the source can be used alone for producing statistics. Trade statistics based on only administrative registers with monthly data from Customs are an example of a source that many countries use alone for statistics production.
2. Alone, but combined with a base register.
The Population Register and the Business Register are two important base registers that are used for all surveys regarding persons or enterprises in the Nordic countries. Base registers are discussed in Chapter 5. If an administrative register or source is combined with a base register, the quality can be improved and controlled. It will then be possible to produce consistent register-based statistics. The base register contains important classification variables that can be used together with the administrative source. The Annual Pay Register in Section 1.5.4 is an example of using a source in this way.
3. In combination with a base register and other administrative registers.
In many cases an administrative register does not have sufficient coverage and the variable content is too limited. Then it is not advisable to use the source alone for statistics production. But if many sources are combined, it may often be possible to use the combined data set to produce register-based statistics. We mention two examples of this kind.
Example: In the Swedish Income and Taxation Register of persons, about 30 different sources are used regarding different kinds of income. If all these different kinds of income are combined, it is possible to create disposable income of good quality for all persons.
Example: A business register at a national statistical office is based on administrative sources. With five sources we created a Business Register for Sweden containing all enterprises active during a specific year. Each source consists of the legal units in one taxation system. In the table below, undercoverage and overcoverage of the sources are compared with our final Business Register. The administrative object sets in each source are adequate for each of the five taxation systems. Taken alone, each source is of low statistical quality; however, if all sources are combined, the coverage is good.
Over- and undercoverage in five administrative sources, per cent of all legal units

4. To improve other surveys, i.e. to improve the production system.
Example: There was no information on economic activity for some small enterprises in the Business Register. In the yearly income tax returns from small enterprises, there is text information from the enterprise that describes economic activity. This text was automatically coded into economic activity. In this way the yearly income tax returns were used to improve the Business Register.

In the Nordic countries, most register surveys use a base register as in 2 and 3 above. New register countries that have not yet developed good base registers will start with register surveys of the simple kind as in 1 above. When base registers have been developed, it will be possible to create register surveys according to 2 and 3.

1.4 Preconditions for register-based statistics

Preconditions differ between countries for sample surveys, censuses and register surveys; hence, the preconditions for statistical methods are different. The choice between cluster sampling and one-stage sampling depends on whether you have a Population Register or if you must use address lists. Regression estimation and calibration are methods that depend on the number and quality of available register variables. This means that an increased use of administrative registers will change the preconditions for all kinds of surveys.

For register surveys, the differences between countries are even more significant. Legislation on national registration and the taxation of persons and enterprises determine the character of the administrative systems that are used in each country. The legislation regarding statistical production and protection of statistical data also differs, and as a consequence certain methodological issues are important in some countries but not in others. The two main preconditions for using administrative registers for statistical purposes are stated in Chart 1.3.

Chart 1.3 Two preconditions for using administrative registers for statistics

1.4.1 Reliable administrative systems

Reliable administrative systems will generate data of good administrative quality. Good administrative quality is a necessary but not sufficient condition for good statistical quality. The systems for tax administration and welfare programmes will gradually develop and change, and these changes will determine what administrative data can be used for statistical purposes in the future. It is therefore important that national statistical offices maintain close and long-term relations with administrative authorities and politicians.

The long-term strategy requires high-level contacts to promote strategic changes that will improve statistics production. The statistical office must explain to the administrative authorities how their data are used for statistical purposes. The statistical office also needs detailed information on how the administrative systems are organised and what changes are planned. Close and long-term contacts at all levels are required for these purposes.

What aspects of national administrative systems are important for statistical offices? We note two such aspects here, coverage and identity codes.

Coverage – the systems should cover all

The Nordic systems for child benefits are good examples. All children in defined age groups are entitled to a sum of money. All parents want the entitlement – but to receive the money, the parents must be registered as parents to the child in question and national identity numbers are required for the parents and child. This system covers all children and all parents. As the information in the system’s registers is maintained and updated, all persons in the country will gradually be covered and the register will contain administrative, but also statistically important, links between all parents and children.

It is important for good coverage that the administrative systems cover both urban and rural populations, rich and poor citizens, and small and big enterprises. The ideal is that there is no selectivity. If suitable methods are not developed, selectivity will result in biased statistical estimates. For instance, in the Nordic countries all seriously ill persons will see a doctor, and all doctors know that cancer patients should be reported to the National Cancer Register. In this way we can be almost absolutely sure that all patients with a cancer diagnosis are in the Cancer Register. If rural or poor persons are underrepresented, estimated cancer incidence and mortality figures would be of low quality.

Unified systems of identity codes

Identities are important in administrative systems. Legally important relations between persons, such as husband and wife, or parents and children, are registered with the identities of the persons in question. In many registers the legally important relations between owners and different kinds of property are recorded with both the identities of owners and identity of property. For taxpayers, it is important that the tax paid is recorded together with the identity of the taxpayer. It is therefore in the interest of each taxpayer to use a correct identity in each transaction. The legal importance of identities explains why identity data as a rule are of high quality in many administrative sources.

The best way to handle identities in administrative systems is to use national identity numbers. Persons, enterprises and property should be given unique identity numbers that are used in all administrative systems in the country, and the same number should follow each person, enterprise or property over its lifetime.

Not only will administration become efficient; the statistical production system will become efficient when administrative data are used for statistical purposes, as it will be possible to link records and create important statistical comparisons. With unique national identity numbers, record linkage will be easy and the risk of false matches and false non-matches will be low. The statistical possibilities that national identity numbers create will be explained in the following chapters.

It is advantageous if the identity numbers have no relation to any attributes of the objects that are to be identified. For example, identity numbers for persons should not depend on name, sex, or address of the persons, because such attributes can change over time. Throughout the book we will use the abbreviation PIN for national identity numbers for persons and BIN for national identity numbers for legal units representing enterprises.

1.4.2 Legal base and public approval

There are preconditions concerning legal base and public approval that make possible the efficient use of administrative registers for statistics. These preconditions are discussed in UN/ECE (2007) and we build on that discussion here.

Legislation determines what data are generated

The national administrative systems for taxation and welfare are based on legislation that determines the kind of administrative data that are generated within these systems. If, for example, citizens pay income tax to municipalities, then the authorities must know where each citizen lives. The municipal taxation and welfare systems are the legal base for the Nordic administrative population registers. They are used not only for taxation and municipal welfare, but also for elections where the population register defines where each voter votes. For statistical purposes, this creates very good links between persons and geography that facilitate regional statistics. The administrative registers are updated every day, which makes possible timely monthly demographic statistics.

Legislation to improve the national statistical system

Politicians want to reduce the response burden of persons and enterprises as well as the direct costs for the production of community statistics.

– Legislation should provide the national statistical offices access to administrative microdata including identities, and the right to use the data for official statistics and research.
– Legislation should provide statistical offices the authority to match data from different sources and use data that were not originally generated for statistical purposes.
– Legislation could also instruct statistical offices to first use data from administrative registers and to conduct sample surveys or censuses only if available administrate data are insufficient.
– Some laws have the sole purpose of making register-based housing and population censuses possible. For example, the Nordic parliaments have decided that all employers must provide information on where all employees work – the local unit address for all. This information is given with income statements with data on employer identity, local unit identity, employee identity and wages and preliminary tax paid. These income statements play an important role in the Nordic statistical systems, as we obtain important links between three different object types. The parliaments have also decided that all persons should be registered at the dwelling where they live. It will then be possible to create statistics for households defined by the common dwelling in the register-based census.

Legislation on data protection

According to the second precondition in Chart 1.3, a national statistical office should have access to administrative registers kept by public authorities. This right should be supported by law and the protection of privacy must also be protected by law. Legislation that gives a statistical office access to administrative data is discussed above, and the protection of privacy and integrity are discussed below.

The principle of one-way traffic is important for data protection. Microdata can go from administrative authorities to the statistical office but never in the reverse direction.

The legislation on data protection should rest on a reasonable balance between protection of integrity on the one hand and increased costs and difficulties for statistics production on the other. An important task for top management at a national statistical office is to explain the consequences generated by proposed legislation to lawyers and politicians.

Public approval

The cooperation between register authorities and national statistical offices should be open and transparent. The fact that administrative data are used for statistical purposes should not be kept quiet; instead, the benefits and the efforts to protect integrity should be explained in open discussion and public debate.

It is important to explain that individual records regarding persons are anonymous in statistics production, in contrast to how administrative authorities handle the same data.

If the national statistical office has a good reputation as trustworthy, it will be easier to gain access to administrative data for statistics production. However, one mistake in the protection of integrity can immediately destroy this reputation.

Persons and enterprises do not want to be required to report to both an administrative authority and the national statistical office. Not having to do so will make public opinion more favourable to the use of administrative data for statistical purposes. It will become more difficult to motivate the double provision of data – why respond to a questionnaire on the enterprise’s turnover when you also submit a value-added tax return to the Tax Agency which includes the same information?

Evidence that double provision of data to Statistics Sweden and to another authority is regarded as unreasonable can be seen in this newspaper clipping:

Translated from a newspaper article:

Refuse to send statistics to Statistics Sweden!

Mr R from the B-farm thinks that the authorities should be able to find the information from their own registers. Mr R refuses to send in statistics to Statistics Sweden. Because he already sends in information every other week to the Swedish Board of Agriculture, he thinks that the authorities should cooperate with each other instead. …

1.5 Basic concepts and terms

Two principles form the basis of this book – the survey approach to administrative data and the systems approach. The survey approach means that we discuss estimates, estimators and quality as in a book on sample surveys. The systems approach builds on the register system concept that is introduced in Chapter 4 and is used throughout the book. We also discuss the production system at a national statistical office and the role of administrative registers in the design and development of that system.

We discuss three concepts in this section: what is a statistical survey, what is a register and what is a register survey? We also give examples of register surveys that illustrate some important principles discussed in later chapters: The Income and Taxation Register is a survey of persons and households and the Quarterly and Annual Pay Registers are business surveys.

1.5.1 What is a statistical survey?

This term is a central term used by statisticians at all national statistical offices. For many statisticians, however, the term is synonymous with sample survey. This will cause confusion when we discuss statistics based on administrative registers.

To avoid this confusion, we follow the distinction between different kinds of surveys that Statistics Canada (2009) use in their Quality Guidelines. The guidelines are written with censuses and sample surveys as the main focus. In this book, we focus on register surveys (3 below), but also discuss and compare other survey methodologies.

Statistics Canada, Quality Guidelines:

The term survey is used generically to cover any activity that collects or acquires statistical data. Included are:

1. a census, which attempts to collect data from all members of a population;
2. a sample survey, in which data are collected from a (usually random) sample of population members;
3. collection of data from administrative records, in which data are derived from records originally kept for non-statistical purposes;
4. a derived statistical activity, in which data are estimated, modelled, or otherwise derived from existing statistical data sources.

Estimates of, for example, number of employees by industry (as in Chart 1.2) can be based on a census, on a sample survey, or on a register survey. We can choose between these three different survey methodologies to estimate the same parameters. This is the reason why we have chosen to use the survey approach to administrative data – register surveys are only a new alternative to the two other well-established survey methods.

The forth survey method above is the method that is used for the National Accounts. The National Accounts survey is based on a model-based compilation of macrodata (or estimates) from a system of economic surveys. Chart 1.4 compares the four kinds of surveys.

Chart 1.4 The four different survey methodologies

Sample surveys are based on a mathematical theory – probability and inference theory. Censuses and sample surveys are based on a non-mathematical survey methodology based on behavioural science – psychology and cognition are important aspects that are used to discuss errors that arise during the collection of statistical data through interviews and questionnaires.

Register surveys require a non-mathematical theory based on a systems approach. Macrodata surveys should also be based on a theory of systems of surveys. We discuss these issues later in this book when we introduce the concept of survey system design.

1.5.2 What is a register?

An administrative register is maintained to store records on all objects to be administered, and the administrative process requires that all objects can be identified. The following definition is valid for administrative and statistical registers:

A register aims to be a complete list of the objects in a specific group of objects or population. However, data on some objects can be missing due to quality deficiencies.

Data on an object’s identity should be available so that the register can be updated and expanded with new variable values for each object.

Complete listing and
known identities are thus the characteristics of a register.

Catalogue, directory, list, register, registry are different terms for the same concept. We will only use the term register.

The following are examples of registers:

– Civic, civil or national registration of the population in a country results in registers of citizens, births and deaths.
– Income self-assessments from persons give registers of all taxpayers for a given year.
– In Sweden, enterprises with a turnover of SEK 40 million or more should report monthly. This gives monthly registers of all enterprises that have reported. For smaller enterprises, we obtain quarterly or yearly registers.
– All export and import transactions are registered by Customs. Monthly registers are created with all transactions for a specific month.
– A census file with data from a housing and population census is a register if there are identities of the persons in the file.

The identities used in register processing can either be identity numbers that are unique within a national administrative system or an identity number in a subsystem with keys to the identities in other systems. It is also possible to use identities defined by, for instance name, address, date of birth and place of birth.

These identities will be used in deterministic matching of the objects in different registers, where the aim is to find identical or related objects in two registers. In deterministic matching, two records are linked if the identifiers agree exactly. This is the most efficient method when the identifying variables are of good quality.

Because person PIN3 is not in the population register and person PIN8 is not in the administrative income register, the combined register after deterministic matching will have two records with missing values due to this non-match.

Many administrative registers consist only of persons or enterprises of a defined category. Only persons with income are in the administrative income register in the example in Chart 1.5. When such registers are combined with the population register, the non-match will generate missing values. Zero income must be imputed for persons not in the administrative income register, such as person PIN8. Person PIN3 is not in the population register and if that person is not found in any other register the non-match will result in missing values (*) for sex and age.

Chart 1.5 Deterministic matching with Personal Identity Numbers, PIN

1.5.3 What is a register survey?

The original data are generated in public administrative systems. Definitions of object sets, objects and variables are adapted to administrative purposes. Every authority carries out controls, corrections and other processing suited to their administrative aims.

When an authority delivers data to a national statistical office, further selections and processing may be carried out to meet the needs of the statistical office. The authorities also have metadata as definitions, administrative rules and quality aspects, based on the administrative authority’s experiences and investigations. This information is important for those receiving the data at the statistical office.

It is generally not a good idea to produce statistics directly from the received administrative registers because these are not adapted to statistical requirements. The object sets, object definitions and variables need to be edited, and as a rule it will be necessary to carry out some processing so that the register fulfils the statistical requirements for population, objects and variables. The register-statistical processing, which aims to transform one or several administrative registers into one statistical register, should be based on generally accepted statistical methods.

Chart 1.6a shows three important components of this work. We have found that people have a tendency to use administrative concepts as they are, and in some cases this can be acceptable – but in other cases it can be unacceptable. The three issues of how to define population, units and variables of a statistical register are important for the quality of the statistics to be produced with the newly created statistical register.

Chart 1.6a From administrative registers to statistical registers

A statistical population or administrative object set consists of N objects or units or elements. Of these three synonyms, we will as a rule use the term object for the units in an administrative object set and the term statistical unit for the units in a statistical population. The register-statistical processing is described in Chart 1.6b.

Chart 1.6b From administrative registers to statistical registers

1.5.4 The Income and Taxation Register

The Income and Taxation Register (I&T) is an important part of Statistics Sweden’s register system. It is used to describe income distribution and for regional income statistics, and it is the basis for longitudinal income registers used by university researchers.

This register utilises many administrative sources, and many administrative variables are used to create important statistical variables. Besides these administrative sources, it is necessary to use the register system at Statistics Sweden: the Population Register is used to define the population of the Income and Taxation Register, and important classification variables are imported from other registers in the system to the Income and Taxation Register.

1. Data generation at the National Tax Agency
The annual income self-assessment is based on tax returns from income earners and the taxation decisions of the local tax authority. Both the income earner and the tax authority use statements of earnings for salary, sickness benefits and interest payments that are the responsibility of employers, social insurance office and finance companies. The National Tax Agency ultimately compiles this information. Tax returns, statements of earnings and taxation decisions can be changed and supplemented. Data for one person can thus be very complex.
2. Microdata deliveries to the Income and Taxation Register
The Swedish National Tax Agency annually creates databases that contain information on Sweden’s population. The data files for one year – containing around 9 million records, each with around 300 variables – are delivered to the Income and Taxation Register at Statistics Sweden.
3. Metadata to the Income and Taxation Register
Record descriptions with names and definitions of variables accompany the deliveries from the National Tax Agency. Tax return forms, statement of earnings forms, taxation decisions and tax return instructions are also necessary for the correct interpretation of the data.
4. Editing of data
The I&T Register receives data from many different suppliers outside and inside Statistics Sweden. External data are edited. Data from other Statistics Sweden registers have already been edited. Contacts with suppliers are important to obtain knowledge of changes in the administrative system, which in turn is important to ensure the quality of the register statistics – administrative changes should not be interpreted as actual income changes.
5. Matching and selections
There is a large number of registers that should be processed to create the different sub-registers that are included in the Income and Taxation Register. Records from different sources are matched using Personal Identification Numbers (PIN), and aggregation is carried out at the same time, i.e. all the statements of earnings data for a specific person are aggregated so that the person’s income from work can be put together. One type of processing is to select persons aged 16 and older who were also parts of the population on 31 December.
6. Derived objects are created
More information on certain relations helps to form household units. Between adults, the relations married or cohabiting adults with children in common
7. Derived variables are created
A large number of derived income variables are formed. For instance, the wage or salary amounts are aggregated from the different earnings data to become an individual’s income from work. Every person’s total income from work and capital plus transfer payments minus tax becomes the person’s disposable income. For households, variables such as household type, number of consumption units and disposable income are formed.