Ping a Specific Port

Question

questionto42standswithUkraine

Asked: 2022-08-25 11:07:07 +0800 CST2022-08-25 11:07:07 +0800 CST 2022-08-25 11:07:07 +0800 CST

Why does writing to a workbook of a few MB with Python's openpyxl module eat Gigabytes of RAM?

772

I found an old code on a legacy server that writes from a dataset of roughly 20 columns and >60K rows item by item to an Excel file using openpyxl. It does this by filling each cell on its own (the slow way, but not that slow either since the whole thing is done after a few minutes, and with formatting of the cells):

cell = ws.cell('%s%s' % (col_idx, i + 3))
cell.value = unicode(value).encode("utf-8")
get_style(cell, "content", column_colors[col_id])

At roughly 59K rows, it crashes, the console prints:

Killed

The logs show:

Received SIGTERM, shutting down.

The SIGTERM hints at too little memory so that the server kills the task "from outside". It is a virtual machine. Checking the memory with the command free shows that all of the free 3.5 GB get used by writing that small dataset (20 MB on the server) to Excel.

I checked it on the run and found: 500 MB just to configure the file, 700 MB every 10K rows. Writing the dataset leads to:

60 x 700 + 500 = 4700 MB free RAM needed, and the VM has only 3.5 GB. Which should be enough anyway for such a small file in the output. The output in the end is roughly a 20 MB workbook. The output Excel version should be 2007 or 2010, since it is running on Python 2.73 and in outdated legacy code.

Why does writing to a 20 MB workbook with Python's openpyxl module eat Gigabytes of RAM?

0 Answers

Voted

Robert · Answer 1 · 2022-08-26T08:57:55+08:00

Best Answer

Robert

2022-08-26T08:57:55+08:002022-08-26T08:57:55+08:00

That openpyxl requires a lot of RAM for reading/writing Excel files is known:

According to the openpyxl documentation about performance the general RAM usage can be calculated this way.

Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file.

In your case the value seems to be even higher may be your usage is different or the values form the documentation are outdated.

But the documentation also contains hints how to optimize, thus reduce the RAM usage:

Install the Python package lxml - if that package is installed it will be used. It is especially recommended when writing "large" Excel files.
Use Write-Only mode

1

Why does writing to a workbook of a few MB with Python's openpyxl module eat Gigabytes of RAM?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?