First acquaintance with stream in Java 8
Lambda expression is the basis of stream. Beginners are recommended to learn lambda expression first, http://www.jb51.net/article/121129.htm
1. Get to know stream
Let's start with a general outline:
There are so many things. Stream is a very practical function added in java8. At first, it was thought to be a stream in IO (it doesn't matter at all). Let's take a look at a small example and feel it:
List the names of students whose scores exceed 85 in the class, and output the user names in descending order. Before java8, we need three steps:
1) Create a new list < student > NEWLIST, traverse the stulist in the for loop, and load the students whose scores exceed 85 into the new set
2) Sort the new collection NEWLIST
3) Traverse and print NEWLIST
These three steps only need two statements in Java 8. If you need to print and do not need to save the new production list, you actually only need one. Is it very convenient.
2. Characteristics of stream
We first list the following three characteristics of stream, which we will describe in detail later
1. Stream does not store data
2. Stream does not change the source data
3. Delayed execution characteristics of stream
Generally, we create a stream based on an array or collection. The stream does not specifically store data, and the operation on the stream will not affect the array and collection that create it. The aggregation, consumption or collection of the stream can only be performed once, and an error will be reported if the operation is performed again, as shown in the following code:
The program reports an error after completing a printing job normally.
The operation of stream is delayed. In the example of listing the names of students with more than 85 points in the class, the filter, sorted and map methods have not been executed before the collection method is executed. The previous conversion operation will be triggered only when the collection method is executed
See the following code:
We abstract the logic in the filter into a method and add printing logic to the method. If the stream conversion operation is delayed, split will print first, otherwise it will print later, and the code operation result is
It can be seen that the operation of stream is delayed.
TIP:
When we operate a stream, we will not modify the underlying set of the stream (even if the set is thread safe). If we want to modify the original set, we cannot define the output of the stream operation.
Due to the delayed execution characteristics of stream, it is allowed to modify the data source before the aggregation operation is executed.
The final printed result is 8
The following code is wrong
Result null pointer exception
3. Create a stream
1) Create from array
2) Create a stream from a collection
3) Create an empty stream
5) Create regular infinite flow
4. Operation on stream
1) Most commonly used
Map: conversion flow, which converts one type of flow into another
Filter: filter the flow and filter the elements in the flow
Flapmap: disassemble the flow, and disassemble each element in the flow into a flow
Sorted: sort by convection
2) Extract flow and composite flow
3) Aggregation operation
4) Optional type
Generally, aggregation operation will return an optional type, which represents a safe specified result type. The so-called security refers to avoiding null pointer exceptions caused by directly calling the null value of the return type, and calling optional Ifpresent() can judge whether the return value is empty, or directly call ifpresent (consumer consumer) to consume when the result part is empty; Call optional Get() gets the return value. The usual usage is as follows:
With optional, you can specify a return value when there is no value, such as
Creation of optional
Optional Empty() creates an empty option, using option Of() creates an optional for the specified value. Similarly, you can call the map method of the optional object for optional conversion, and call the flatmap method for optional iteration
5) Collect results
6) Grouping and slicing
The meaning of grouping and slicing is to display the result set of collection in the form of bit map < key, Val >. The general usage is as follows:
5. Original type flow
In the case of a large amount of data, it is inefficient to package the basic data type (int, double...) into the corresponding object stream. Therefore, we can also directly initialize the data into the original type stream. The operation on the original type stream is similar to the object stream. We only need to remember two points
1. Initialization of original type stream
2. Conversion between original type stream and stream object
6. Parallel flow
The flow executed in normal sequence can be changed into parallel flow. You only need to call the parallel () method of sequential flow, such as stream iterate(1,x -> x + 1). limit(10). parallel()。
1) Execution order of parallel streams
We call peek method to see the execution sequence of parallel stream and serial stream. Peek method, as its name implies, is to peep into the data in the stream. Peek method is declared as stream < T > Peek (consumer action); The data in the stream can be observed by adding the printing program, as shown in the following code:
The serial stream print results are as follows:
The print results of parallel stream are as follows:
You may not be able to understand it. We use the following figure to explain it
We will stream filter(x -> x > 5). filter(x -> x < 8). The process of foreach (system. Out:: println) is imagined as the pipeline in the figure above. The peek we add to the pipeline is equivalent to a valve, through which we can view the flow data,
1) When we use sequential flow, the data passes through the pipeline in the order of the source data. When one data is filtered by the filter or output through the whole pipeline, the second data will begin to repeat this process
2) When we use parallel flow, the system starts seven threads (four cores and eight threads in my computer) to execute processing tasks in addition to the main thread. Therefore, the execution is disordered, but the data processed in the same thread is carried out in order.
2) The influence of sorted () and distinct () on parallel flow
Sorted () and distinct () are element related methods, which are related to the overall data. Map, filter and other methods are not related to the passed elements. It is not necessary to know which elements are in the stream. Will parallel execution conflict with sorted?
Conclusion: 1 Parallel flow and sorting do not conflict, 2 Whether a flow is ordered may improve execution efficiency for some APIs and reduce execution efficiency for others
3. If you want the output results to be ordered, you need to use foreach ordered for parallel flows (foreach has higher output efficiency)
We did the following experiments:
As you can see, for serial streams distinct(). The sorted () method has no impact on the running time, but it will greatly increase the running time for serial streams. Therefore, parallel streams are not recommended for operations related to global data such as sorted and distinct ().
7.stream vs spark rdd
At first, an intuitive feeling of seeing stream is that it is like spark, really like spark
The above code is extracted from Spark's official website and uses Scala language, a most basic word count code. Here we briefly introduce spark, which is the most popular memory based big data processing framework, A core concept in spark is RDD (elastic distributed data set), which abstracts data distributed on different processors into RDD. RDD supports two types of operations: 1) transformation and 2) action. The transformation operator of RDD will not be executed immediately and will be triggered only after the action operator is used.
summary
The above is the stream related knowledge in java8 introduced by Xiaobian. I hope it will help you. If you have any questions, please leave me a message, and Xiaobian will reply to you in time!