I'm trying to run Gemma-2b on my computer with 16GB of RAM, but it gives me an error below:
2024-03-19 16:23:38.722556: I external/local_tsl/tsl/framework/bfc_allocator.cc:1112] Sum Total of in-use chunks: 4.78GiB2024-03-19 16:23:38.722568: I external/local_tsl/tsl/framework/bfc_allocator.cc:1114] Total bytes in pool: 8589934592 memory_limit_: 16352886784 available bytes: 7762952192 curr_region_allocation_bytes_: 85899345922024-03-19 16:23:38.722589: I external/local_tsl/tsl/framework/bfc_allocator.cc:1119] Stats: Limit: 16352886784InUse: 5129633792MaxInUse: 6157238272NumAllocs: 509MaxAllocSize: 1073741824Reserved: 0PeakReserved: 0LargestFreeBlock: 02024-03-19 16:23:38.722623: W external/local_tsl/tsl/framework/bfc_allocator.cc:499] ************************_______________________________________*************************************...tensorflow.python.framework.errors_impl.ResourceExhaustedError: Out of memory while trying to allocate 11685516680 bytes. [Op:__inference_generate_step_12720]
Also, I have around 160GB of swap.
$ swaponNAME TYPE SIZE USED PRIO/dev/nvme0n1p7 partition 60G 0B -2/media/.../ram file 100G 768K 0$ sysctl vm.swappiness vm.swappiness = 100
I have more than enough swap partition to cover the needed RAM, and I'm also running it on the CPU and not the GPU. The problem is, when I run this code on my Mac with 16GB of RAM, it uses the swap and runs smoothly. I am wondering why Ubuntu cannot manage memory as efficiently as Mac. What can I do to make Ubuntu use the swap partition to run the code? I even ran Gemma-7b with 50GB of swap memory usage on Mac like a piece of cake.
Here is the codedownload the model and run the model offline to see the result
import tensorflow as tftf.keras.backend.set_floatx('float16')from keras_nlp.src.models import GemmaCausalLMfrom keras_nlp.src.backend import kerasdef read_config(path): import json with open(path) as config_file: return json.load(config_file)if __name__ == '__main__': gemma_path = '/opt/gemma_2b' config_path = f'{gemma_path}/config.json' tokenizer_path = f'{gemma_path}/tokenizer.json' config = read_config(config_path) tokenizer_config = read_config(tokenizer_path) cls = keras.saving.get_registered_object(config["registered_name"]) backbone = keras.saving.deserialize_keras_object(config) backbone.load_weights(f'{gemma_path}/model.weights.h5') tokenizer = keras.saving.deserialize_keras_object(tokenizer_config) tokenizer.load_assets(f'{gemma_path}/assets/tokenizer') preprocessor = GemmaCausalLM.preprocessor_cls(tokenizer=tokenizer) gemma_lm = GemmaCausalLM(backbone=backbone, preprocessor=preprocessor) response = gemma_lm.generate(["Keras is a"], max_length=10)
Thanks.