Category Archives: Machine Learning

Memory increasing issue when convert Pillow Image (8.3.1) either to Numpy array or OpenCV Image

Background introduction

Recently our project encounter a memory increase issue after upgrade from Pillow 8.2.0 to Pillow 8.3.0.

Memory consumption after upgrading the Pillow (8.3.1):

issue

The memory consumption before upgrading the Pillow (8.2.0):

normal

It seems the library bring lots of memory increments. BTW. The issue can not be produce before we build the project.

Solution

After I spent a few days to find the root cause and the solution. The issue code is located to a pillow image conversion

At the beginning, I suspect this would be a regression in Pillow, so I search their github issues

https://github.com/python-pillow/Pillow/issues/2019

https://github.com/python-pillow/Pillow/issues/5180

Both of these two threads indicate the issues are fixed. But I still tried to pull the fix code into our project. Without luck.

To keep the long story short, I grab two heap memory dump and compare with objects which increased most. This great help me reduce the search scope.

stack-trace

The relative issue code:

  1. np.array(pillow_image, [dtype=...])
  2. np.asarray(pillow_image)

 

All the above conversion will lead to the numpy array keep exist till a certain point trigger python release the accumulated objects.

Then I check the release note of the Pillow, there is only one place directly relative to numpy array. First I try to withdraw the changes directly in 8.3.1, no help. Second try to directly down grade to 8.3.0. Still not works.

Now, what? Checked Pillow API, Numpy API, OpenCV API again and also check their source. A idea pop up and write a workaround code

  1. numpy_array = np.frombuffer(pillow_image.tobytes(), dtype=np.uint8)

To convert the image array to OpenCV

  1. # 3 or 4 base on your image mode is RGB or RGBA
  2. numpy_array.shape = (*reversed(image.size), 3)
  3.  
  4. # If want to convert the image to BGR then call the OpenCV method
  5. # COLOR_RGB2BGR will required the image mode is 3
  6. numpy_array= cv2.cvtColor(numpy_array, cv2.COLOR_RGB2BGR)

It works! Now the memory consumption return to normal level.

fixed

There are several places directly use the image conversion, we should consider unify the similar logic into a utility class and add a wrapper for the third party library the isolation the influences.

Summary

Third party library upgrade is a risk to a product. We have to upgrade the library due to a security reason in this time. The Pillow changes are checked too, have found several suspicious code will cause the issue. Need time to test, maybe update the result later.