Java – load data larger than memory size in H2O
I am trying to load data larger than H2O memory size
The H2O blog mentioned: the description of bigger data and GC: when the Java heap is too full, we exchange the user mode to the disk, that is, the amount of data you use is greater than the physical DRAM We won't die in the GC death spiral, but we'll slow down outside the core We'll be as fast as the disk allows I personally tested loading 12gb data sets into 2GB (32-bit) JVM; It takes about 5 minutes to load data and another 5 minutes to run logistic regression
This is connected to H2O 3.6 R code of 0.8:
h2o.init(max_mem_size = '60m') # alloting 60mb for h2o,R is running on 8GB RAM machine
to
java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01,mixed mode) .Successfully connected to http://127.0.0.1:54321/ R is connected to the H2O cluster: H2O cluster uptime: 2 seconds 561 milliseconds H2O cluster version: 3.6.0.8 H2O cluster name: H2O_started_from_R_RILITS-HWLTP_tkn816 H2O cluster total nodes: 1 H2O cluster total memory: 0.06 GB H2O cluster total cores: 4 H2O cluster allowed cores: 2 H2O cluster healthy: TRUE Note: As started,H2O is limited to the CRAN default of 2 cpus. Shut down and restart H2O as shown below to use all your cpus. > h2o.shutdown() > h2o.init(nthreads = -1) IP Address: 127.0.0.1 Port : 54321 Session ID: _sid_b2e0af0f0c62cd64a8fcdee65b244d75 Key Count : 3
I tried to load 169 MB of CSV into H2O
dat.hex <- h2o.importFile('dat.csv')
Which is wrong,
Error in .h2o.__checkConnectionHealth() : H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/ Failed to connect to 127.0.0.1 port 54321: Connection refused
This indicates an out of memory error
Solution
By default, swap to disk is disabled by default because of poor performance The leading edge (not the latest stable version) has a flag to enable it: "– clean" (for "memory cleaner")
Steep cliff