dbConnect() vs tbl() for reading data from AWS Redshift

  • 29 April 2022
  • 2 replies
  • 47 views

Userlevel 3
Badge +2

Seems like to read data from Redshift I have (at least) 2 options:

 

DBI::dbConnect(connection, ‘table-name’)

vs

dplyr::tbl(connection, ‘table-name’)

 

The latter doesn’t read the whole table into memory and so you can perform all sorts of dplyr goodness on it before using collect() to read it into memory.

Why would you ever use dbConnect() in that case?


2 replies

Userlevel 4
Badge +2

You’ve opened my eyes to

dplyr::tbl

Being able to read and manipulate before committing to memory is extremely useful for monster data frames!

Userlevel 6
Badge +2

I use dbConnect with dbplyr which is a great way of not reading everything into memory too! So you can set it up like this:

library(dplyr); library(dbplyr); library(DBI);


conn ← dbConnect(odbc::odbc(), ‘database-name’)

table ← tbl(conn, in_schema(‘schema’, ‘table’))

And then the table isn’t a table with all the data in it, it just points to the table in redshift. And if you manipulate it and want to then pull it into memory you can use the collect() function.

Reply