Read stream from S3 using clojure / Java
I have a large file on S3. I hope to decode and parse it when downloading I happen to use clojure Amazon library, but any library can
I can easily get a stream:
(def stream (-> (get-object "some-s3-bucket" "some-object-key") :input-stream)) ; returns: #<S3ObjectInputStream com.amazonaws.services.s3.model.S3ObjectInputStream
But how to read streams? Can I read one line at a time (the extracted content is JSON line)?
(if there is any ambiguity in my question, I only care about the reading of the stream, not any part of gzip decoding)
Solution
Because s3objectinputstream just extends Java io. InputStream, you can:
>Use clojure's reader function to get BufferedReader. > Read data from the reader in any way allowed by clojure
>Get the delayed row sequence from BufferedReader using line SEQ If this makes sense for your JSON It may not. > Use an inert JSON parser, such as clj lazy JSON This particular parser can even handle raw streams, so you can safely skip step (1)