Introduction

The insee package gathers tools to easily download data and metadata from insee BDM database.

It uses SDMX queries under the hood. Have a look at the detailed SDMX webservice page.

The first version of the package was published on CRAN 2020-07-29.

Proxy issues

Requirement for INSEE employees

In order for someone working behind a proxy server to be able to use insee, it is necessary to modify system variables as follow.

Sys.setenv(http_proxy = "my_proxy_server")
Sys.setenv(https_proxy = "my_proxy_server")

Installation & Loading

You can easily install insee with the following code :

# Get the development version from GitHub (required for the tutorial)
# install.packages("devtools")
devtools::install_github("hadrilec/insee")

# CRAN version
# install.packages("insee")

# library Loading

library(insee)
# tutorial's examples use tidyverse's packages
library(tidyverse)

Functionalities

This section will give you an overview of what you can do with insee.

Series have two identifiers the SDMX identifier and the so called idbank. Both can be used to download data.

Datasets List

INSEE BDM database offers more than 200 Datasets. The get_dataset_list() function returns the datasets catalogue :

insee_dataset = get_dataset_list() 
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%
id Name.en Name.fr url n_series
BALANCE-PAIEMENTS Balance of payments Balance des paiements https://www.insee.fr/fr/statistiques/series/103212755 197
CHOMAGE-TRIM-NATIONAL Unemployment, unemployment rate and halo by sex and age (ILO) Chômage, taux de chômage par sexe et âge (sens BIT) https://www.insee.fr/fr/statistiques/series/103167923 166
CLIMAT-AFFAIRES Business climate composite indicators Indicateurs synthétiques du climat des affaires https://www.insee.fr/fr/statistiques/series/103047029 3
CNA-2010-CONSO-MEN Households’ consumption - Results by product, function and durability Consommation des ménages - Résultats par produit, fonction et durabilité https://www.insee.fr/fr/statistiques/series/102331845 2247
CNA-2010-CONSO-SI Final consumption expenditure by institutional sectors - Results by transaction and product Dépenses de consommation finale par secteur institutionnel - Résultats par opération et produit https://www.insee.fr/fr/statistiques/series/102809534 1391
CNA-2010-CPEB Production and operating accounts by branch Comptes de production et d’exploitation par branche https://www.insee.fr/fr/statistiques/series/102852781 2739
CNA-2010-CSI Institutional sectors accounts - Annual results Comptes des secteurs institutionnels - Résultats annuels https://www.insee.fr/fr/statistiques/series/102321506 1173
CNA-2010-DEP-APU General government expenditure - Result by function and operation Dépenses des administrations publiques - Résultats par fonction et opération https://www.insee.fr/fr/statistiques/series/102334196 4400
CNA-2010-DETTE-APU Maastricht general government debt and public deficit Dette et déficit des administrations publiques au sens de Maastricht https://www.insee.fr/fr/statistiques/series/103212800 72
CNA-2010-EMPLOI Domestic employment, hours worked and hourly productivity Emploi intérieur, durée effective travaillée et productivité horaire https://www.insee.fr/fr/statistiques/series/102800357 702

Series Keys List

INSEE BDM database currently offers more than 140 000 series. The get_idbank_list function returns the series catalogue.

idbank_list = get_idbank_list()
nomflow idbank cleFlow
BALANCE-PAIEMENTS 001694056 M.BALANCE_DES_PAIEMENTS.CREDITS.009.VALEUR_ABSOLUE.FE.EUROS.BRUT.SO
CHOMAGE-TRIM-NATIONAL 001688370 T.CTCHC.VALEUR_ABSOLUE.FM.1.00-24.INDIVIDUS.CVS
CLIMAT-AFFAIRES 001565530 M.CLIMAT_AFFAIRES.ALL_SECT.INDICE.FM.SO.BRUT
CNA-2010-CONSO-MEN 001691912 A.CNA_CONSO_MENAGES_FONCTION.PCH.P4.FON01.VALEUR_ABSOLUE..FR-D976.EUR2010.BRUT.2010
CNA-2010-CONSO-SI 001703713 A.CNA_CONSO_SI.S13.PCH.P3.A10-JZ.VALEUR_ABSOLUE.FE.EUR2010.BRUT
CNA-2010-CPEB 001710115 A.CNA_CPEB.A10-RU.IPCH.P1.INDICE.FE.SO.BRUT
CNA-2010-CSI 001719388 A.CNA_COMPTES_SI_EA.S0.EA.D7S.SO.FR-D976.EUROS_COURANTS.BRUT
CNA-2010-DEP-APU 001730327 A.CNA_DEP_APU.S1313.D1.VALEUR_ABSOLUE.FONTOTAL.FR-D976.EUROS_COURANTS.BRUT.2010
CNA-2010-DETTE-APU 001710846 A.CNA_FINANCES_DETTE.S13.VAL.VALEUR_ABSOLUE.FE.EUROS_COURANTS.BRUT
CNA-2010-EMPLOI 001693569 A.CNA_EMPLOI_INTERIEUR.A10-RU.S10.VALEUR_ABSOLUE.E10.FE.NOMBRE_ACTIFS_OCCUPES_PP.BRUT

Find a series key

The best way to download data is to find the right series key (idbank), but how ? Indeed, in some cases it is not easy to understand what are the differences among series, especially for non-French speakers. To make the search easier, the insee package provides the function add_insee_title to get titles from idbanks, either in English or in French. It is not advised to use the function on the whole idbank dataset, as each SDMX query has 400-idbank limit. Then, add_insee_title function splits the list into several lists of 400 idbanks each. Thus, the user should filter the idbank dataset before using the function to avoid as much as possible this bottleneck as the following example shows. After the data retrieval, it is really nice to use the split_title function on the dataframe to get more readable titles easy to use in plots.

idbank_list = get_idbank_list()

idbank_list_selected =
  idbank_list %>%
  filter(nomflow == "IPI-2015") %>% #industrial production index dataset
  filter(dim1 == "M") %>% #monthly
  filter(dim5 == "INDICE") %>% #index
  filter(dim8 == "CVS-CJO") %>% #Working day and seasonally adjusted SA-WDA
  #automotive industry and overall industrial production
  filter(str_detect(dim4,"^29$|A10-BE")) %>% 
  add_insee_title()

idbank_list_selected

Download data

Download using a list of idbanks

The get_insee_idbank function should handle up to 1200 idbanks. It is then advised to narrow down the idbanks list used as argument of the function. Otherwise, put the limit argument to FALSE to ignore the function’s idbank limit.

library(tidyverse)
library(insee)

# the user can make a manual list of idbanks to get the data 
# example 1

data = get_insee_idbank("001558315", "010540726")

# using a list of idbanks extracted from the insee idbank dataset
# example 2 : household's confidence survey

idbank_dataset = get_idbank_list()

df_idbank = idbank_dataset %>%
  filter(nomflow == "ENQ-CONJ-MENAGES") %>%  #monthly households' confidence survey
  mutate(title = get_insee_title(idbank)) %>%
  filter(dim7 == "CVS") #seasonally adjusted

list_idbank = df_idbank %>% pull(idbank)

data = get_insee_idbank(list_idbank) %>% split_title()

# example 3 : get more than 1200 idbanks

idbank_dataset = get_idbank_list()

df_idbank = 
  idbank_dataset %>%
  slice(1:1201)

list_idbank = df_idbank %>% pull(idbank)

data = get_insee_idbank(list_idbank, firstNObservations = 1, limit = FALSE)

Download using a dataset name

For some datasets as IPC-2015 (inflation), the filter is necessary.

insee_dataset = get_dataset_list() 

# example 1 : full dataset
data = get_insee_dataset("CLIMAT-AFFAIRES")

# example 2 : filtered dataset 
# the user can filter the data
data = get_insee_dataset("IPC-2015", filter = "M+A.........CVS.", startPeriod = "2015-03")

# in the filter, the + is used to select several values in one dimension, like an "and" statement
# the void means "all" values available

# example 3 : only one series
# by filtering with the full SDMX series key, the user will get only one series
data = 
  get_insee_dataset("CNA-2014-CPEB",
                    filter = "A.CNA_CPEB.A38-CB.VAL.D39.VALEUR_ABSOLUE.FE.EUROS_COURANTS.BRUT",
                    lastNObservations = 10)

Support

Feel free to contact me with any question about this package using this e-mail address.