class: center, middle, inverse, title-slide .title[ # Procesando datos con el paquete tidyverse ] .subtitle[ ## R + Ciencias Sociales ] .author[ ### ] --- <style type="text/css"> .remark-slide-content { font-size: 25px; padding: 1em 1em 1em 1em; } </style> --- # Configuraciones iniciales para hoy -- - Tener tu propio proyecto de clase armado `File -> New Project` -- - Crea tu Rmd de trabajo `File -> New File -> Rmarkdown` -- - Carga las siguientes librerías ```r library(tidyverse) library(eph) library(questionr) install.packages("paquete") # nombre del paquete o librería que necesitamos instalar ``` -- - Carga la base de datos eph `b_eph_ind <- get_microdata(year = 2019, period = 3, type = "individual")` --- class: inverse, middle, center # ¿Arrancamos? 🚀 *** --- class: inverse, middle, center # ¿Qué es [Tidyverse](https://www.tidyverse.org/)? *** --- # Tidyverse .pull-left[ #### `Tidyverse` es una colección de paquetes de R, pensados para denominada "ciencia de datos". #### Comparten la misma filosofía de uso, por lo que trabajan en armonía entre unos y otros. ] .pull-right[ <img src="../img/tidyverse.png" width="781" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # ¿Por qué tidyverse? <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # __¿Por qué tidyverse?__ - ### Orientado a ser leído y escrito por y para seres humanos -- - ### Funciones no pensadas para una tarea específica sino para un proceso de trabajo <img src="../img/circuito del dato.png" width="50%" style="display: block; margin: auto;" /> -- - ### Su comunidad, basada en los principios del código abierto y trabajo colaborativo --- # __Instalación y uso__ * Sólo una vez (por computadora): ```r install.packages("tidyverse") ``` -- * En cada inicio de sesión de R o Rstudio: ```r library(tidyverse) ``` -- _No es necesario esto:_ ```r install.packages("dplyr") install.packages("tidyr") install.packages("ggplot2") ``` --- # Hoja de ruta ### Presentación de los paquetes `dplyr` y `tidyr` .pull-left[ ## ✔️ dplyr ☑️️ `select()` ☑️️ `filter()` ☑️️ `mutate()` ☑️️ `rename()` ☑️️ `arragne()` ☑️️ `summarise()` ☑️️ `group_by()` ] .pull-right[ ## ✔️ tidyr ☑️ `pivot_longer()` ☑️ `pivot_wider()` <br> ## ✔️ magrittr ☑️ `%>%` (_el pipe_) ] *** ```r library(eph) b_eph_ind <- get_microdata(year = 2019, period = 3, type = "individual") ``` --- class: middle, center, inverse EL PIPE <img src="../img/pipe.png" alt="Upside-down sloths are so cute", width = "7%"> *** _<p style="color:grey;" align:"center">Una forma de escribir</p>_ --- # EL PIPE <br><br> .pull-left[ ```r base_de_datos `%>%` funcion1 `%>%` funcion2 `%>%` funcion3 ``` ] .pull-right[ <img src="img/pipe_paso_a_paso.gif"> ] --- # magrittr - una forma de escribir <br><br> ### **Caso:** Deseo obtener la distribución relativa de casos por sexo: #### Funciones: `table()` - `prop.table()` - `round()` --- # EL PIPE .pull-left[ ### **Sin EL PIPE:** ```r # Paso2(Paso1(base_de_datos$variable)) prop.table(table(b_eph_ind$CH04)) ``` ``` 1 2 0.4818711 0.5181289 ``` ] -- .pull-right[ ### **Con EL PIPE** ```r b_eph_ind$CH04 %>% # base_de_datos$variable table() %>% # Paso 1 prop.table() # Paso 2 ``` ``` . 1 2 0.4818711 0.5181289 ``` ] --- class: middle, center, inverse <img src="../img/logo dplyr.png" width="30%" style="display: block; margin: auto;" /> --- # dplyr ## Funciones del paquete dplyr: <br> | __Función__ | __Acción__ | | :--- | ---: | | `select()` | *selecciona o descarta variables*| | `filter()` | *selecciona filas*| | `mutate()` | *crea / edita variables*| | `rename()` | *renombra variables*| | `group_by()` | *segmenta en funcion de una variable*| | `summarize()` | *genera una tabla de resúmen*| --- class: inverse, middle, center # __select()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Elije o descarta columnas de una base de datos</p>_ --- # select() ### La función tiene el siguiente esquema: ```r base_de_datos %>% * select(id, nombre) ``` <img src="../img/select_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # **Caso** ### - **Indicador 1:** *Principales tasas del mercado de trabajo para el aglomerado de CABA y Partidos del GBA* ### - **Indicador 2:** *Indicador 1 según el __sexo__ y __edad__ de las personas.* -- Según el [**Diseño de registro**](https://www.indec.gob.ar/ftp/cuadros/menusuperior/eph/EPH_registro_3t19.pdf), las variables de trabajo son: - **Aglomerado de residencia** = `AGLOMERADO` - **Condición de actividad** = `ESTADO` - **Sexo** = `CH04` - **Edad** = `CH06` - **Factor de ponderación** = `PONDERA` --- # **Caso** ### Librerías de trabajo e importación de la base: ```r library(tidyverse) library(eph) b_eph_ind <- read.table("entradas/usu_individual_T32019.txt", header = TRUE, sep = ";") ``` --- # select() - nombre de las variables ### selecciono las columnas que deseo de la base de datos: ```r b_eph_ind_seleccion <- `b_eph_ind` %>% `select`(ESTADO, CH04, CH06, PONDERA) ``` -- ### Chequeo la operación: ```r colnames(b_eph_ind_seleccion) ``` ``` [1] "ESTADO" "CH04" "CH06" "PONDERA" ``` --- # select() - por posición de la columna ```r b_eph_ind_seleccion <- b_eph_ind %>% select(`10, 12, 14, 28`) ``` -- ### chequeo seleccion: ```r colnames(b_eph_ind_seleccion) ``` ``` [1] "PONDERA" "CH04" "CH06" "ESTADO" ``` --- count: false # Otra forma de selecionar .panel1-select_1-auto[ ```r *b_eph_ind ``` ] .panel2-select_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_1-auto[ ```r b_eph_ind %>% * select(12:16) ``` ] .panel2-select_1-auto[ ``` # A tibble: 57,229 × 5 CH04 CH05 CH06 CH07 CH08 <int> <fct> <int> <int> <int> 1 1 12/04/1963 56 2 1 2 2 24/09/1972 46 2 1 3 2 14/09/1998 20 1 1 4 1 11/04/2007 12 5 1 5 2 03/03/1981 38 2 4 6 2 17/12/2011 7 5 4 7 1 10/12/2013 5 5 4 8 1 27/02/2016 3 5 4 9 1 15/07/1965 54 3 4 10 1 19/08/2000 19 5 4 # ℹ 57,219 more rows ``` ] <style> .panel1-select_1-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_1-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="https://media.tenor.com/images/4474c747b4bba7b72172078cbf2e797b/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_2-auto[ ```r *b_eph_ind ``` ] .panel2-select_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_2-auto[ ```r b_eph_ind %>% * select(CH03:CH10) ``` ] .panel2-select_2-auto[ ``` # A tibble: 57,229 × 8 CH03 CH04 CH05 CH06 CH07 CH08 CH09 CH10 <int> <int> <fct> <int> <int> <int> <int> <int> 1 1 1 12/04/1963 56 2 1 1 2 2 2 2 24/09/1972 46 2 1 1 2 3 3 2 14/09/1998 20 1 1 1 2 4 3 1 11/04/2007 12 5 1 1 1 5 2 2 03/03/1981 38 2 4 1 2 6 3 2 17/12/2011 7 5 4 1 1 7 3 1 10/12/2013 5 5 4 2 1 8 3 1 27/02/2016 3 5 4 2 1 9 1 1 15/07/1965 54 3 4 1 2 10 3 1 19/08/2000 19 5 4 1 1 # ℹ 57,219 more rows ``` ] <style> .panel1-select_2-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_2-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_3-auto[ ```r *b_eph_ind ``` ] .panel2-select_3-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_3-auto[ ```r b_eph_ind %>% * select(starts_with("CH")) ``` ] .panel2-select_3-auto[ ``` # A tibble: 57,229 × 16 CH03 CH04 CH05 CH06 CH07 CH08 CH09 CH10 CH11 CH12 CH13 CH14 CH15 <int> <int> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <chr> <int> 1 1 1 12/0… 56 2 1 1 2 0 4 1 <NA> 1 2 2 2 24/0… 46 2 1 1 2 0 4 2 3 1 3 3 2 14/0… 20 1 1 1 2 0 7 2 1 1 4 3 1 11/0… 12 5 1 1 1 2 4 2 0 1 5 2 2 03/0… 38 2 4 1 2 0 4 2 2 4 6 3 2 17/1… 7 5 4 1 1 1 2 2 1 1 7 3 1 10/1… 5 5 4 2 1 1 1 2 4 1 8 3 1 27/0… 3 5 4 2 1 1 1 2 0 1 9 1 1 15/0… 54 3 4 1 2 0 2 1 <NA> 3 10 3 1 19/0… 19 5 4 1 1 1 4 2 5 1 # ℹ 57,219 more rows # ℹ 3 more variables: CH15_COD <int>, CH16 <int>, CH16_COD <int> ``` ] <style> .panel1-select_3-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_3-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center ## Una más! --- count: false # Otra forma de selecionar .panel1-select_4-auto[ ```r *b_eph_ind ``` ] .panel2-select_4-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_4-auto[ ```r b_eph_ind %>% * select(ends_with("_COD")) ``` ] .panel2-select_4-auto[ ``` # A tibble: 57,229 × 6 CH15_COD CH16_COD PP04B_COD PP04D_COD PP11B_COD PP11D_COD <int> <int> <chr> <chr> <chr> <chr> 1 NA NA 8401 34323 <NA> <NA> 2 NA NA 9700 55314 <NA> <NA> 3 NA NA 1009 20333 <NA> <NA> 4 NA NA <NA> <NA> <NA> <NA> 5 202 NA 4803 30113 <NA> <NA> 6 NA NA <NA> <NA> <NA> <NA> 7 NA NA <NA> <NA> <NA> <NA> 8 NA NA <NA> <NA> <NA> <NA> 9 22 NA 1009 30113 <NA> <NA> 10 NA NA 1009 30314 <NA> <NA> # ℹ 57,219 more rows ``` ] <style> .panel1-select_4-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_4-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="https://media.tenor.com/images/31210518c407ef4392726bd7ab3a1625/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_5-auto[ ```r *b_eph_ind ``` ] .panel2-select_5-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_5-auto[ ```r b_eph_ind %>% * select(contains("03")) ``` ] .panel2-select_5-auto[ ``` # A tibble: 57,229 × 7 CH03 PP03C PP03D PP03G PP03H PP03I PP03J <int> <int> <int> <int> <int> <int> <int> 1 1 0 0 2 0 2 2 2 2 2 2 2 0 2 1 3 3 1 0 1 1 1 1 4 3 NA NA NA NA NA NA 5 2 1 0 2 0 2 2 6 3 NA NA NA NA NA NA 7 3 NA NA NA NA NA NA 8 3 NA NA NA NA NA NA 9 1 1 0 2 0 2 2 10 3 1 0 2 0 2 2 # ℹ 57,219 more rows ``` ] <style> .panel1-select_5-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_5-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_5-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="https://media.tenor.com/images/b8718c934090ad1a36acd7ef9d0b846c/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # _PRÁCTICA GRUPAL_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- class: inverse, middle ## Práctica Grupal 1) Crear un objeto en donde importamos la base de datos de la EPH (recordar tener en cuenta la extensión del archivo) 2) Crear otro objeto en donde selecciono 3 columnas de interés según sus nombres 3) Crear otro objeto en donde selecciono 3 columnas de interés según su posición 4) Escribir el siguiente código en el esquema "paso a paso (con pipes)" ```r base_ejercicio <- select(b_eph_ind, ESTADO, CH04, CAT_OCUP) ``` --- class: inverse, middle, center # filter() *** _<p style="color:grey;" align:"center">Define los casos (filas) en base a una condición</p>_ --- # filter() ### La función tiene el siguiente esquema: ```r base_de_datos %>% filter(condicion) ``` <img src="../img/filter_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # filter() - ### Por ejemplo: ```r base %>% `filter(Edad > 65)` ``` <img src="../img/filter_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # filter() ### Para resolver el **indicador** planteado, vamos a delimitar el universo a las **personas de 14 o más años** --- count: false # filter() .panel1-filter-auto[ ```r *b_eph_ind ``` ] .panel2-filter-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # filter() .panel1-filter-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # ℹ 57,219 more rows ``` ] --- count: false # filter() .panel1-filter-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(CH06 >= 14) ``` ] .panel2-filter-auto[ ``` # A tibble: 45,344 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 2 38 1 584 5 2 1 54 1 584 6 2 1 19 1 584 7 2 2 44 1 815 8 2 1 16 3 815 9 2 1 31 1 815 10 2 1 58 1 563 # ℹ 45,334 more rows ``` ] <style> .panel1-filter-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # filter() #### Operadores lógicos para filtrar: <br> .pull-left[ |Condición |Acción | | :--- | :--- | | | | | `==` | *igual* | | `%in%` | *incluye* | | `!=` | *distinto* | | `>` | *mayor que* | | `<` | *menor que* | | `>=` | *mayor o igual que*| | `<=` | *menor o igual que*| ] .pull-right[ | Operador | Descripción | | :--- | :--- | | | | | `&` | *y* - Cuando se cumplen ambas condiciones | | | | *o* - Cuando se cumple una u otra condición | ] --- # filter() ### **Caso:** Necesito delimitar el universo a la población que reside en la _Ciudad Autónoma de buenos Aires_ __o__ en los _Partidos del Buenos aires_. -- - Chequeo categorías de la variable: ```r unique(b_eph_ind$AGLOMERADO) ``` ``` [1] 2 3 4 5 6 7 9 10 12 13 14 15 17 18 19 20 22 23 25 26 27 29 30 31 32 [26] 33 34 36 38 91 93 ``` -- - Reviso en el diseño de registro los códigos correspondientes. --- count: false #filter .panel1-filter_1-auto[ ```r *b_eph_ind ``` ] .panel2-filter_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false #filter .panel1-filter_1-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter_1-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # ℹ 57,219 more rows ``` ] --- count: false #filter .panel1-filter_1-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(AGLOMERADO == 32 | AGLOMERADO == 33) ``` ] .panel2-filter_1-auto[ ``` # A tibble: 10,097 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 32 2 49 1 1031 2 32 1 9 4 1031 3 32 2 81 3 1031 4 32 1 72 1 1234 5 32 2 73 1 1234 6 32 1 28 3 1234 7 32 2 69 3 640 8 32 2 87 3 1923 9 32 1 40 1 2424 10 32 2 41 1 2424 # ℹ 10,087 more rows ``` ] <style> .panel1-filter_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #filter .panel1-filter_2-auto[ ```r *b_eph_ind ``` ] .panel2-filter_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false #filter .panel1-filter_2-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter_2-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # ℹ 57,219 more rows ``` ] --- count: false #filter .panel1-filter_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(AGLOMERADO %in% c(32,33)) ``` ] .panel2-filter_2-auto[ ``` # A tibble: 10,097 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 32 2 49 1 1031 2 32 1 9 4 1031 3 32 2 81 3 1031 4 32 1 72 1 1234 5 32 2 73 1 1234 6 32 1 28 3 1234 7 32 2 69 3 640 8 32 2 87 3 1923 9 32 1 40 1 2424 10 32 2 41 1 2424 # ℹ 10,087 more rows ``` ] <style> .panel1-filter_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _PRÁCTICA GRUPAL_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- class: inverse, middle # Práctica Grupal - A partir de la base de la EPH, crear un objeto nuevo que **contenga** las variables __AGLOMERADO__ y __CH06__ y **filtar** por aquella población que tenga _18 o más años de edad_ y que resida en los aglomerados de _Neuquén_ o _Río Negro_ - Chequear que las operaciones hayan sido un éxito (_pista: funciones como **unique()**, **table()** o **colnames()** pueden ser de ayuda)_ --- class: inverse, middle, center # _mutate()_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Creoa / edita variables (columnas)</p>_ --- # mutate() - ### En R base: ```r base_de_dato$var_nueva <- base_de_datos$var_1 + base_de_datos$var_2 ``` <br> - ### En `tidyverse`: ```r base_de_datos %>% mutate(var_nueva = var_1 + var_2) ``` --- # mutate() <br><br> ### **Indicador:** Sumatoria de ingresos por la ocupación principal y secundaria(s) <br><br> --- count: false # mutate() .panel1-mutate_1-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # mutate() .panel1-mutate_1-auto[ ```r b_eph_ind %>% * select(P21, TOT_P12) ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 2 P21 TOT_P12 <int> <int> 1 28000 700 2 9500 3600 3 -9 0 4 0 0 5 -9 0 6 0 0 7 0 0 8 0 0 9 -9 0 10 0 0 # ℹ 57,219 more rows ``` ] --- count: false # mutate() .panel1-mutate_1-auto[ ```r b_eph_ind %>% select(P21, TOT_P12) %>% * mutate(ingreso_ocup_tot = P21 + TOT_P12) ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 3 P21 TOT_P12 ingreso_ocup_tot <int> <int> <int> 1 28000 700 28700 2 9500 3600 13100 3 -9 0 -9 4 0 0 0 5 -9 0 -9 6 0 0 0 7 0 0 0 8 0 0 0 9 -9 0 -9 10 0 0 0 # ℹ 57,219 more rows ``` ] <style> .panel1-mutate_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # mutate() - case_when() ### Función complementaria: `case_when()`, mayormente utilizada para recodificación de variables <img src="../img/mutate_case.png" width="100%" style="display: block; margin: auto;" /> --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r b_eph_ind %>% * select(CH04, CH06) ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 2 CH04 CH06 <int> <int> 1 1 56 2 2 46 3 2 20 4 1 12 5 2 38 6 2 7 7 1 5 8 1 3 9 1 54 10 1 19 # ℹ 57,219 more rows ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r b_eph_ind %>% select(CH04, CH06) %>% * mutate(sexo = case_when(CH04 == 1 ~ "Varón", * CH04 == 2 ~ "Mujer")) ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 3 CH04 CH06 sexo <int> <int> <chr> 1 1 56 Varón 2 2 46 Mujer 3 2 20 Mujer 4 1 12 Varón 5 2 38 Mujer 6 2 7 Mujer 7 1 5 Varón 8 1 3 Varón 9 1 54 Varón 10 1 19 Varón # ℹ 57,219 more rows ``` ] <style> .panel1-mutate_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r b_eph_ind %>% * select(CH06) ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 1 CH06 <int> 1 56 2 46 3 20 4 12 5 38 6 7 7 5 8 3 9 54 10 19 # ℹ 57,219 more rows ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r b_eph_ind %>% select(CH06) %>% * mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", * CH06 %in% c(19:29) ~ "19 a 29", * CH06 %in% c(30:39) ~ "30 a 39", * CH06 %in% c(40:49) ~ "40 a 49", * CH06 %in% c(50:59) ~ "50 a 59", * CH06 >= 60 ~ "60 o más")) ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 2 CH06 edad_rango <int> <chr> 1 56 50 a 59 2 46 40 a 49 3 20 19 a 29 4 12 0 a 18 5 38 30 a 39 6 7 0 a 18 7 5 0 a 18 8 3 0 a 18 9 54 50 a 59 10 19 19 a 29 # ℹ 57,219 more rows ``` ] <style> .panel1-mutate_3-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_3-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _PRÁCTICA GRUPAL_ *** --- class: inverse # Práctica Grupal 1) Crear una variable nueva con las etiquetas correspondientes a los valores de **CAT_OCUP**: ```r 1 --> Patrón 2 --> Cuenta propia 3 --> Obrero o empleado 4 --> Trabajador familiar sin remuneración 9 --> Ns./Nr. ``` 1) Recodificar la variable de ingresos P21 en 5 rangos. --- class: inverse, middle, center # _summarise()_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Resume la información en una nueva tabla</p>_ --- # summarise() <br><br> <br><br> #### **Caso:** - **Indicador1:** Quiero conocer cuántas personas ocupadas hay - **Indicador2:** Quiero conocer el ingreso medio de la ocupación principal --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r *b_eph_ind ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r b_eph_ind %>% * select(ESTADO, P21, PONDERA) ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 57,229 × 3 ESTADO P21 PONDERA <int> <int> <int> 1 1 28000 547 2 1 9500 547 3 1 -9 547 4 3 0 547 5 1 -9 584 6 4 0 584 7 4 0 584 8 4 0 584 9 1 -9 584 10 1 0 584 # ℹ 57,219 more rows ``` ] --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r b_eph_ind %>% select(ESTADO, P21, PONDERA) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = questionr::wtd.mean(x = P21, * weights = PONDERA)) ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 1 × 5 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ <int> <int> <int> <int> 1 27989128 11933503 -9 540000 # ℹ 1 more variable: ingr_oc_princ_media <dbl> ``` ] <style> .panel1-summarise_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-summarise_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-summarise_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r *library(questionr) ``` ] .panel2-summarise_2-auto[ ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) *b_eph_ind ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) b_eph_ind %>% * select(ESTADO, P21, PONDERA) ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 57,229 × 3 ESTADO P21 PONDERA <int> <int> <int> 1 1 28000 547 2 1 9500 547 3 1 -9 547 4 3 0 547 5 1 -9 584 6 4 0 584 7 4 0 584 8 4 0 584 9 1 -9 584 10 1 0 584 # ℹ 57,219 more rows ``` ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) b_eph_ind %>% select(ESTADO, P21, PONDERA) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 1 × 5 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ <int> <int> <int> <int> 1 27989128 11933503 -9 540000 # ℹ 1 more variable: ingr_oc_princ_media <dbl> ``` ] <style> .panel1-summarise_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-summarise_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-summarise_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _group_by()_ *** _<p style="color:grey;" align:"center">Aplica una operación sobre la población de forma segmentada</p>_ --- # group_by() <br><br> <br><br> ```r base_de_datos %>% group_by(variable_de_corte) #<< ``` --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r *library(questionr) ``` ] .panel2-group_by_1-auto[ ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) *b_eph_ind ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) b_eph_ind %>% * group_by(CH04) ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 57,229 × 177 # Groups: CH04 [2] CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) b_eph_ind %>% group_by(CH04) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 2 × 6 CH04 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ <int> <int> <int> <int> <int> 1 1 13528065 6793308 -9 540000 2 2 14461063 5140195 -9 300000 # ℹ 1 more variable: ingr_oc_princ_media <dbl> ``` ] <style> .panel1-group_by_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-group_by_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-group_by_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Paso a Paso <img src="https://media.tenor.com/images/6c8cf7404cd3fdc8f518221899116825/tenor.gif" width="60%" style="display: block; margin: auto;" /> --- # **Caso** ### - **Indicador 1:** *Principales tasas del mercado de trabajo para el aglomerado de CABA y Partidos del GBA* ### - **Indicador 2:** *Indicador 1 según el __sexo__ y __edad__ de las personas.* -- Según el [**Diseño de registro**](https://www.indec.gob.ar/ftp/cuadros/menusuperior/eph/EPH_registro_3t19.pdf), las variables de trabajo son: - **Aglomerado de residencia** = `AGLOMERADO` - **Condición de actividad** = `ESTADO` - **Sexo** = `CH04` - **Edad** = `CH06` - **Factor de ponderación** = `PONDERA` --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r *b_eph_ind ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 6 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA <int> <int> <int> <int> <int> <int> 1 2 1 56 1 28000 547 2 2 2 46 1 9500 547 3 2 2 20 1 -9 547 4 2 1 12 3 0 547 5 2 2 38 1 -9 584 6 2 2 7 4 0 584 7 2 1 5 4 0 584 8 2 1 3 4 0 584 9 2 1 54 1 -9 584 10 2 1 19 1 0 584 # ℹ 57,219 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% * mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", * CH06 %in% c(19:29) ~ "19 a 29", * CH06 %in% c(30:39) ~ "30 a 39", * CH06 %in% c(40:49) ~ "40 a 49", * CH06 %in% c(50:59) ~ "50 a 59", * CH06 >= 60 ~ "60 o más"), * sexo = case_when(CH04 == 1 ~ "Varón", * CH04 == 2 ~ "Mujer")) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 8 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 2 1 56 1 28000 547 50 a 59 Varón 2 2 2 46 1 9500 547 40 a 49 Mujer 3 2 2 20 1 -9 547 19 a 29 Mujer 4 2 1 12 3 0 547 0 a 18 Varón 5 2 2 38 1 -9 584 30 a 39 Mujer 6 2 2 7 4 0 584 0 a 18 Mujer 7 2 1 5 4 0 584 0 a 18 Varón 8 2 1 3 4 0 584 0 a 18 Varón 9 2 1 54 1 -9 584 50 a 59 Varón 10 2 1 19 1 0 584 19 a 29 Varón # ℹ 57,219 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% * filter(AGLOMERADO %in% c(32, 33)) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 10,097 × 8 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 32 2 49 1 30000 1031 40 a 49 Mujer 2 32 1 9 4 0 1031 0 a 18 Varón 3 32 2 81 3 0 1031 60 o más Mujer 4 32 1 72 1 0 1234 60 o más Varón 5 32 2 73 1 20000 1234 60 o más Mujer 6 32 1 28 3 0 1234 19 a 29 Varón 7 32 2 69 3 0 640 60 o más Mujer 8 32 2 87 3 0 1923 60 o más Mujer 9 32 1 40 1 -9 2424 40 a 49 Varón 10 32 2 41 1 -9 2424 40 a 49 Mujer # ℹ 10,087 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% filter(AGLOMERADO %in% c(32, 33)) %>% * group_by(sexo, edad_rango) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 10,097 × 8 # Groups: sexo, edad_rango [14] AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 32 2 49 1 30000 1031 40 a 49 Mujer 2 32 1 9 4 0 1031 0 a 18 Varón 3 32 2 81 3 0 1031 60 o más Mujer 4 32 1 72 1 0 1234 60 o más Varón 5 32 2 73 1 20000 1234 60 o más Mujer 6 32 1 28 3 0 1234 19 a 29 Varón 7 32 2 69 3 0 640 60 o más Mujer 8 32 2 87 3 0 1923 60 o más Mujer 9 32 1 40 1 -9 2424 40 a 49 Varón 10 32 2 41 1 -9 2424 40 a 49 Mujer # ℹ 10,087 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% filter(AGLOMERADO %in% c(32, 33)) %>% group_by(sexo, edad_rango) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 14 × 7 # Groups: sexo [2] sexo edad_rango cant_pob_tot cant_ocupados min_ingr_oc_princ <chr> <chr> <int> <int> <int> 1 Mujer 0 a 18 1946718 17926 -9 2 Mujer 19 a 29 1192959 517320 -9 3 Mujer 30 a 39 1039620 637976 -9 4 Mujer 40 a 49 1076082 766799 -9 5 Mujer 50 a 59 817229 511513 -9 6 Mujer 60 o más 1692597 320630 -9 7 Mujer <NA> 67672 0 0 8 Varón 0 a 18 2113559 47708 -9 9 Varón 19 a 29 1252010 808136 -9 10 Varón 30 a 39 975293 858522 -9 11 Varón 40 a 49 1017797 895313 -9 12 Varón 50 a 59 772758 671746 -9 13 Varón 60 o más 1229724 491218 -9 14 Varón <NA> 88090 0 0 # ℹ 2 more variables: max_ingr_oc_princ <int>, ingr_oc_princ_media <dbl> ``` ] <style> .panel1-group_by_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-group_by_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-group_by_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: middle, center, inverse <img src="../img/logo tidyr.png" width="30%" style="display: block; margin: auto;" /> --- # Funciones del paquete tidyr: <br><br> <br><br> | __Función__ | __Acción__ | | :--- | ---: | | `pivot_longer()` | *Transforma en filas varias columnas*| | `pivot_wider()` | *transforma en columnas varias filas*| --- # estructura de datos <br> .pull-left[ <img src="../img/dato_ancho.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="../img/dato_largo.png" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # _pivot_longer()_ *** _<p style="color:grey;" align:"center">Reestructura la base, apilando varias columnas en una. De ancho a largo</p>_ --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r *b_eph_ind ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% * group_by(CH04) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 57,229 × 177 # Groups: CH04 [2] CODUSU ANO4 TRIMESTRE NRO_HOGAR COMPONENTE H15 REGION MAS_500 AGLOMERADO <fct> <int> <int> <int> <int> <int> <int> <fct> <int> 1 TQRMNOQ… 2019 3 1 1 1 43 S 2 2 TQRMNOQ… 2019 3 1 2 1 43 S 2 3 TQRMNOQ… 2019 3 1 3 1 43 S 2 4 TQRMNOQ… 2019 3 1 4 1 43 S 2 5 TQRMNOQ… 2019 3 1 2 1 43 S 2 6 TQRMNOQ… 2019 3 1 3 0 43 S 2 7 TQRMNOQ… 2019 3 1 4 0 43 S 2 8 TQRMNOQ… 2019 3 1 5 0 43 S 2 9 TQRMNOS… 2019 3 1 1 1 43 S 2 10 TQRMNOS… 2019 3 1 2 1 43 S 2 # ℹ 57,219 more rows # ℹ 168 more variables: PONDERA <int>, CH03 <int>, CH04 <int>, CH05 <fct>, # CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, CH11 <int>, # CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, CH16 <int>, # CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, … ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 2 × 6 CH04 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ <int> <int> <int> <int> <int> 1 1 13528065 6793308 -9 540000 2 2 14461063 5140195 -9 300000 # ℹ 1 more variable: ingr_oc_princ_media <dbl> ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% summarise(cant_pob_tot = sum(PONDERA), cant_ocupados = sum(PONDERA[ESTADO == 1]), min_ingr_oc_princ = min(P21), max_ingr_oc_princ = max(P21), ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr weights = PONDERA)) %>% * select(CH04, cant_ocupados, ingr_oc_princ_media) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 2 × 3 CH04 cant_ocupados ingr_oc_princ_media <int> <int> <dbl> 1 1 6793308 10805. 2 2 5140195 5896. ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% summarise(cant_pob_tot = sum(PONDERA), cant_ocupados = sum(PONDERA[ESTADO == 1]), min_ingr_oc_princ = min(P21), max_ingr_oc_princ = max(P21), ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr weights = PONDERA)) %>% select(CH04, cant_ocupados, ingr_oc_princ_media) %>% * pivot_longer(cols = c(cant_ocupados, ingr_oc_princ_media), #<< * names_to = "variable", * values_to = "valor") ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 4 × 3 CH04 variable valor <int> <chr> <dbl> 1 1 cant_ocupados 6793308 2 1 ingr_oc_princ_media 10805. 3 2 cant_ocupados 5140195 4 2 ingr_oc_princ_media 5896. ``` ] <style> .panel1-pivot_longer_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-pivot_longer_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-pivot_longer_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _pivot_wider()_ *** _<p style="color:grey;" align:"center">Reestructura la base, encolumnando varias filas de una variable. De largo a ancho</p>_ --- count: false # _pivot_wider()_ .panel1-pivot_wider_1-auto[ ```r *base_largo ``` ] .panel2-pivot_wider_1-auto[ ``` # A tibble: 4 × 3 CH04 variable valor <int> <chr> <dbl> 1 1 cant_ocupados 6793308 2 1 ingr_oc_princ_media 10805. 3 2 cant_ocupados 5140195 4 2 ingr_oc_princ_media 5896. ``` ] --- count: false # _pivot_wider()_ .panel1-pivot_wider_1-auto[ ```r base_largo %>% * pivot_wider(names_from = "variable", #<< * values_from = "valor") ``` ] .panel2-pivot_wider_1-auto[ ``` # A tibble: 2 × 3 CH04 cant_ocupados ingr_oc_princ_media <int> <dbl> <dbl> 1 1 6793308 10805. 2 2 5140195 5896. ``` ] <style> .panel1-pivot_wider_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-pivot_wider_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-pivot_wider_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style>