Haskell – memory usage of deferred data types
I wrote a program to analyze and operate on the data in the file My first implementation uses data Bytestring to read the contents of the file Then use data Vector. Un@R_96_2419 @Ed converts this content to a sample vector I then perform processing and Analysis on this (Unboxed) sample value vector
Like an experiment, I wonder what happens if I use Haskell's laziness I decided to use data ByteString. Lazy, not data Bytestring and data Vector instead of data Vector. Un@R_96_2419 @Ed to perform this simple test I hope to see some improvement in memory usage Even if my program eventually needs to know the value of each sample, I still want memory usage to increase gradually When I described my program, the results surprised me
My original version was completed in about 20ms, and its memory usage is as follows:
Using data Vector and data Bytestring gives the following results:
I suspect this is related to my understanding of @ R_ 96_ 2419@ed and Un@R_96_2419 @Ed type misunderstanding, so I try to put data ByteString. Lazy ` data Vector. Un@R_96_2419 @Used with 'ed' This is the result:
Who can explain the results I got?
Edit I'm using hget to read from the file, which gives me a data ByteString. Lazy. I convert this bytestring to data of floats through the following function Vector:
toVector :: ByteString -> Vector Float toVector bs = U.generate (BS.length bs `div` 3) $\i -> myToFloat [BS.index bs (3*i),BS.index bs (3*i+1),BS.index bs (3*i+2)] where myToFloat :: [Word8] -> Float myToFloat words = ...
Floating point numbers are represented by 3 bytes
The rest of the processing mainly includes applying higher-order functions (such as filters, mappings, etc.) to the data
Edit2 my parser contains a function that reads all data from the file and returns this data in the sample vector (using the previous tovector function) I wrote two versions of this program, one is data Bytestring, the other is data ByteString. Lazy. I use these two versions to perform a simple test:
main = do [file] <- getArgs samples <- getSamplesFromFile file let slice = V.slice 0 100000 samples let filtered = V.filter (>0) slice print filtered
The strict version gives me the following memory usage:
Solution
So far, our data are not enough to reproduce the problem Here, I run four versions of http://sprunge.us/PeIJ Change strict to lazy and boxed to unboxed I'm compiling with GHC - O2 - rtsopts - Prof. the only thing worth noting is data Every real (pointer) element in a vector or stream in the vector version points to a beautiful boxed Haskell floating point number, which takes up a heap space Everything is basically the same except data Vector program, as expected, these carefully packed floating tops have a lot of blue
Edit if I only use GHC - Prof - rtsopts, that's what I get