After having no issues with ArangoDB3 for a couple of years, suddenly, I am encountering an AQL IO Error of the form
[HTTP 500][ERR 1305] AQL: IO error: While open a file for random read: /ssd1/arangodb3/engine-rocksdb/22850496.sst: No file descriptors available (while finalizing)
This is while performing an insert of the form
insert { id: "foo", junk: [ 1, 2, 3 ] } in bar
This occurred after running a lengthy operation populating a new database.
Looking at syslog
, I see the following (timestamps, etc., elided for readability):
ERROR [fae2c] {rocksdb} RocksDB encountered a background error during a compaction operation: IO error: While open a file for random read: /ssd1/arangodb3/engine-rocksdb/22850496.sst: No file descriptors available; The database will be put in read-only mode, and subsequent write errors are likely. It is advised to shut down this instance, resolve the error offline and then restart it.ERROR [be9ea] {rocksdb} rocksdb: [db/db_impl/db_impl_compaction_flush.cc:2922] Waiting after background compaction error: IO error: While open a file for random read: /ssd1/arangodb3/engine-rocksdb/22850496.sst: No file descriptors available, Accumulated background error counts: 1WARNING [afa17] {engines} could not sync metadata for collection 'OpenAlex_20240502/works'WARNING [a3d0c] {engines} background settings sync failed: IO error: While open a file for random read: /ssd1/arangodb3/engine-rocksdb/22850496.sst: No file descriptors availableWARNING [afa17] {engines} could not sync metadata for collection 'OpenAlex_20240502/publishers'
The first message above seems indicative of something but I'm not sure what.
The file in question, /ssd1/arangodb3/engine-rocksdb/22850496.sst
, does not exist, which would the an obvious source of the problem but I'm not sure how to cure it.
Restarting both Arango DB and the system does not clear the problem.
There is more than enough space on the filesystem
/dev/nvme0n1p1 7.3T 4.6T 2.8T 63% /ssd1
so that's not an issue.
arangodb --version
reports
Arango DB Version 0.18.2, build 3518b68, Go go1.21.5
arangosh --version
reports
3.11.8architecture: 64bitarm: falseasan: falseassertions: falseavx: trueavx2: falseboost-version: 1.78.0build-date: 2024-02-22 14:43:37build-repository: refs/tags/v3.11.8 eb715d099fbcompiler: gcc [11.2.1 20220219]coverage: falsecplusplus: 202002curl-version: nonedebug: falseendianness: littlefailure-tests: falsefd-client-event-handler: pollfd-setsize: 1024full-version-string: ArangoDB 3.11.8 [linux] 64bit, using jemalloc, build refs/tags/v3.11.8 eb715d099fb, VPack 0.2.1, RocksDB 7.2.0, ICU 64.2, V8 7.9.317, OpenSSL 3.0.13 30 Jan 2024icu-version: 64.2ipo: trueiresearch-version: 1.3.0.0jemalloc: truelibunwind: truelicense: communitymaintainer-mode: falsememory-profiler: truendebug: trueopenssl-version-compile-time: OpenSSL 3.0.13 30 Jan 2024openssl-version-run-time: OpenSSL 3.0.13 30 Jan 2024optimization-flags: -mfxsr -mmmx -msse -msse2 -mcx16 -msahf -mpopcnt -msse3 -msse4.1 -msse4.2 -mssse3 -mpclmul -mavx -mxsavepic: 2pie: 2platform: linuxreactor-type: epollreplication2-enabled: falserocksdb-version: 7.2.0server-version: 3.11.8sizeof int: 4sizeof long: 8sizeof void*: 8sse42: truetsan: falseunaligned-access: truev8-version: 7.9.317vpack-version: 0.2.1zlib-version: 1.2.13
I'm running Ubuntu 23.10
```Linux servername 6.5.0-28-generic #29-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 28 23:46:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux``
I have tried reinstalling Arango DB to no avai. I have restarted the application from a checkpoint and it now immediately and consistently fails.Even a simple insert as above throws the same error.
The application, written in python, is multithreaded, using the multiprocessing
modules and there are 64 threads/processes all performing uploads.
I have the identical code running on another system and it happily runs to completion, so I'm puzzled as to what might be going sideways here.