Solved

Reading xlsx files in python

  • 13 April 2022
  • 1 reply
  • 36 views

Userlevel 3
Badge +1
  • Active Contributor
  • 5 replies

I might be missing something obvious here but I can't find any info -- can anyone help me to read a .xlsx file from S3 to python? I know how to do it for csv files, but not xlsx.

icon

Best answer by TomGrundy 13 April 2022, 13:48

View original

1 reply

Userlevel 1

Yes I can help!

The easiest way is to read the file into a io.BytesIO buffer and then pass to pandas ie

buff = io.BytesIO(source.read())
df = pd.read_excel(buff)

apparently you can also put the S3 path into pandas directly.  If you’re using a source then you can grab the bucket and key and then turn that into an S3 path.

FYI: Pandas might moan and ask you to install a package to read the excel file

Reply