Haskell – memory usage of deferred data types

I wrote a program to analyze and operate on the data in the file My first implementation uses data Bytestring to read the contents of the file Then use data Vector. Un@R_96_2419 @Ed converts this content to a sample vector I then perform processing and Analysis on this (Unboxed) sample value vector

Like an experiment, I wonder what happens if I use Haskell's laziness I decided to use data ByteString. Lazy, not data Bytestring and data Vector instead of data Vector. Un@R_96_2419 @Ed to perform this simple test I hope to see some improvement in memory usage Even if my program eventually needs to know the value of each sample, I still want memory usage to increase gradually When I described my program, the results surprised me

My original version was completed in about 20ms, and its memory usage is as follows:

Using data Vector and data Bytestring gives the following results:

I suspect this is related to my understanding of @ R_ 96_ 2419@ed and Un@R_96_2419 @Ed type misunderstanding, so I try to put data ByteString. Lazy ` data Vector. Un@R_96_2419 @Used with 'ed' This is the result:

Who can explain the results I got?

Edit I'm using hget to read from the file, which gives me a data ByteString. Lazy. I convert this bytestring to data of floats through the following function Vector:

toVector :: ByteString -> Vector Float
toVector bs =  U.generate (BS.length bs `div` 3) $\i ->
     myToFloat [BS.index bs (3*i),BS.index bs (3*i+1),BS.index bs (3*i+2)]
  where
    myToFloat :: [Word8] -> Float
    myToFloat words = ...

Floating point numbers are represented by 3 bytes

The rest of the processing mainly includes applying higher-order functions (such as filters, mappings, etc.) to the data

Edit2 my parser contains a function that reads all data from the file and returns this data in the sample vector (using the previous tovector function) I wrote two versions of this program, one is data Bytestring, the other is data ByteString. Lazy. I use these two versions to perform a simple test:

main = do
  [file] <- getArgs
  samples <- getSamplesFromFile file
  let slice = V.slice 0 100000 samples
  let filtered = V.filter (>0) slice
  print filtered

The strict version gives me the following memory usage:

Solution

So far, our data are not enough to reproduce the problem Here, I run four versions of http://sprunge.us/PeIJ Change strict to lazy and boxed to unboxed I'm compiling with GHC - O2 - rtsopts - Prof. the only thing worth noting is data Every real (pointer) element in a vector or stream in the vector version points to a beautiful boxed Haskell floating point number, which takes up a heap space Everything is basically the same except data Vector program, as expected, these carefully packed floating tops have a lot of blue

Edit if I only use GHC - Prof - rtsopts, that's what I get

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>