opened 02:40AM - 17 Mar 20 UTC
closed 02:51AM - 24 Dec 21 UTC
stat:awaiting response
comp:data
type:performance
TF 2.7
<em>Please make sure that this is a bug. As per our
[GitHub Policy](https://git…hub.com/tensorflow/tensorflow/blob/master/ISSUES.md),
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template</em>
**System information**
- Have I written custom code (as opposed to using a stock
example script provided in TensorFlow): No
- OS Platform and Distribution (e.g.,
Linux Ubuntu 16.04): Ubuntu 18.04.3 LTS
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if
the issue happens on mobile device:
- TensorFlow installed from (source or
binary): binary
- TensorFlow version (use command below): v2.1.0-rc2-17-ge5bf8de 2.1.0
- Python version: Python 3.6.6
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from
source):
- CUDA/cuDNN version:
CUDA Version: 10.1
cudnn-10.1
- GPU model and memory:
TITAN RTX
24190MiB
You can collect some of this information using our environment capture
[script](https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh)
You can also obtain the TensorFlow version with: 1. TF 1.0: `python -c "import
tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"` 2. TF 2.0: `python -c
"import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"`
**Describe the current behavior**
`tf.data.Dataset.from_generator` leaks memory after each call even if followed by `gc.collect()`.
**Describe the expected behavior**
Memory should be released when no reference exists for the dataset.
**Standalone code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
```
import gc
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
import tracemalloc
import linecache
def display_top(snapshot, key_type='lineno', limit=3):
snapshot = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, "<unknown>"),
))
top_stats = snapshot.statistics(key_type)
print("Top %s lines" % limit)
for index, stat in enumerate(top_stats[:limit], 1):
frame = stat.traceback[0]
# replace "/path/to/module/file.py" with "module/file.py"
filename = os.sep.join(frame.filename.split(os.sep)[-2:])
print("#%s: %s:%s: %.1f KiB"
% (index, filename, frame.lineno, stat.size / 1024))
line = linecache.getline(frame.filename, frame.lineno).strip()
if line:
print(' %s' % line)
other = top_stats[limit:]
if other:
size = sum(stat.size for stat in other)
print("%s other: %.1f KiB" % (len(other), size / 1024))
total = sum(stat.size for stat in top_stats)
print("Total allocated size: %.1f KiB" % (total / 1024))
def generator():
yield tf.zeros(2, 3)
tracemalloc.start()
for i in range(1000):
dataset = tf.data.Dataset.from_generator(generator, output_types=tf.int32, output_shapes=[None])
del dataset
gc.collect()
snapshot = tracemalloc.take_snapshot()
display_top(snapshot)
```
**Other info / logs** Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
```
Top 3 lines
#1: python3.6/_weakrefset.py:84: 159.5 KiB
self.data.add(ref(item, self._remove))
#2: python3.6/_weakrefset.py:37: 38.2 KiB
self.data = set()
#3: python3.6/_weakrefset.py:48: 32.4 KiB
self._iterating = set()
461 other: 306.4 KiB
Total allocated size: 536.4 KiB
Top 3 lines
#1: python3.6/_weakrefset.py:84: 159.5 KiB
self.data.add(ref(item, self._remove))
#2: python3.6/_weakrefset.py:37: 38.2 KiB
self.data = set()
#3: python3.6/_weakrefset.py:48: 32.4 KiB
self._iterating = set()
516 other: 343.1 KiB
Total allocated size: 573.1 KiB
...
Top 3 lines
#1: python3.6/weakref.py:335: 257.8 KiB
self = ref.__new__(type, ob, callback)
#2: debug/tf_dataset_memory_leak.py:45: 189.7 KiB
dataset = tf.data.Dataset.from_generator(generator, output_types=tf.int32, output_shapes=[None])
#3: ops/script_ops.py:257: 174.7 KiB
return "pyfunc_%d" % uid
519 other: 2423.3 KiB
Total allocated size: 3045.5 KiB
```
It leaks 3MB in 1000 calls. In [some real projects](https://github.com/hankcs/HanLP/issues/1437), it can leak as much as 5GB and keeps increasing.