Summary of easyexcel, an excel tool produced by Ali
premise
The author has been working as a small data and No. 0 data collection tool for some time. The service object is the boss of operation and business. Generally, the exported data is required to be Excel files. Considering that the machine resources of the start-up team are very limited, the author selects the excel tool easyexcel produced by Ali. Here is a brief share of the experience of using easyexcel. From its dependency tree, easyexcel encapsulates Apache poi. The author chose easyexcel from the beginning of contacting Excel to avoid the memory leakage caused by the widely spread Apache poi.
Introducing easyexcel dependency
Maven introduced into easyexcel is as follows:
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>easyexcel</artifactId>
<version>${easyexcel.version}</version>
</dependency>
The latest version of the current (2020-09-08) is 2.2.6.
API introduction
Excel files are mainly processed around read and write operations, and easyexcel API is also designed around these two aspects. First look at the API related to read operations:
// 新建一个ExcelReaderBuilder实例
ExcelReaderBuilder readerBuilder = EasyExcel.read();
// 读取的文件对象,可以是File、路径(字符串)或者InputStream实例
readerBuilder.file("");
// 文件的密码
readerBuilder.password("");
// 指定sheet,可以是数字序号sheetNo或者字符串sheetName,若不指定则会读取所有的sheet
readerBuilder.sheet("");
// 是否自动关闭输入流
readerBuilder.autoCloseStream(true);
// Excel文件格式,包括ExcelTypeEnum.XLSX和ExcelTypeEnum.XLS
readerBuilder.excelType(ExcelTypeEnum.XLSX);
// 指定文件的标题行,可以是Class对象(结合@ExcelProperty注解使用),或者List<List<String>>实例
readerBuilder.head(Collections.singletonList(Collections.singletonList("head")));
// 注册读取事件的监听器,默认的数据类型为Map<Integer,String>,第一列的元素的下标从0开始
readerBuilder.registerReadListener(new AnalysisEventListener() {
@Override
public void invokeHeadMap(Map headMap,AnalysisContext context) {
// 这里会回调标题行,文件内容的首行会认为是标题行
}
@Override
public void invoke(Object o,AnalysisContext analysisContext) {
// 这里会回调每行的数据
}
@Override
public void doAfterAllAnalysed(AnalysisContext analysisContext) {
}
});
// 构建读取器
ExcelReader excelReader = readerBuilder.build();
// 读取数据
excelReader.readAll();
excelReader.finish();
It can be seen that the read operation mainly uses the design of builder mode and event listening (or can be understood as observer mode). Generally, the above code can be simplified as follows:
Map<Integer,String> head = new HashMap<>();
List<Map<Integer,String>> data = new LinkedList<>();
EasyExcel.read("文件的绝对路径").sheet()
.registerReadListener(new AnalysisEventListener<Map<Integer,String>>() {
@Override
public void invokeHeadMap(Map<Integer,String> headMap,AnalysisContext context) {
head.putAll(headMap);
}
@Override
public void invoke(Map<Integer,String> row,AnalysisContext analysisContext) {
data.add(row);
}
@Override
public void doAfterAllAnalysed(AnalysisContext analysisContext) {
// 这里可以打印日志告知所有行读取完毕
}
}).doRead();
If you need to read the data and convert it to the corresponding object list, you need to specify the class of the title line, which is used in combination with the annotation @ excelproperty:
文件内容:
|订单编号|手机号|
|ORDER_ID_1|112222|
|ORDER_ID_2|334455|
@Data
private static class OrderDTO {
@ExcelProperty(value = "订单编号")
private String orderId;
@ExcelProperty(value = "手机号")
private String phone;
}
Map<Integer,String> head = new HashMap<>();
List<OrderDTO> data = new LinkedList<>();
EasyExcel.read("文件的绝对路径").head(OrderDTO.class).sheet()
.registerReadListener(new AnalysisEventListener<OrderDTO>() {
@Override
public void invokeHeadMap(Map<Integer,AnalysisContext context) {
head.putAll(headMap);
}
@Override
public void invoke(OrderDTO row,AnalysisContext analysisContext) {
data.add(row);
}
@Override
public void doAfterAllAnalysed(AnalysisContext analysisContext) {
// 这里可以打印日志告知所有行读取完毕
}
}).doRead();
If the amount of data is huge, it is recommended to use map < integer, string > type to read and operate data objects. Otherwise, a large number of reflection operations will greatly increase the time-consuming of reading data. In extreme cases, for example, when there are many attributes, the time-consuming of reflection operation may be longer than that of reading and traversal.
Next, look at the API of write operation:
// 新建一个ExcelWriterBuilder实例
ExcelWriterBuilder writerBuilder = EasyExcel.write();
// 输出的文件对象,可以是File、路径(字符串)或者OutputStream实例
writerBuilder.file("");
// 指定sheet,可以是数字序号sheetNo或者字符串sheetName,可以不设置,由下面提到的WriteSheet覆盖
writerBuilder.sheet("");
// 文件的密码
writerBuilder.password("");
// Excel文件格式,包括ExcelTypeEnum.XLSX和ExcelTypeEnum.XLS
writerBuilder.excelType(ExcelTypeEnum.XLSX);
// 是否自动关闭输出流
writerBuilder.autoCloseStream(true);
// 指定文件的标题行,可以是Class对象(结合@ExcelProperty注解使用),或者List<List<String>>实例
writerBuilder.head(Collections.singletonList(Collections.singletonList("head")));
// 构建ExcelWriter实例
ExcelWriter excelWriter = writerBuilder.build();
List<List<String>> data = new ArrayList<>();
// 构建输出的sheet
WriteSheet writeSheet = new WriteSheet();
writeSheet.setSheetName("target");
excelWriter.write(data,writeSheet);
// 这一步一定要调用,否则输出的文件有可能不完整
excelWriter.finish();
There are many methods in excelwriterbuilder, such as style, line processor and converter settings, which I think are not commonly used. There is no example here. The style of content is usually processed again after outputting the file, which will be easier to operate. Write operations can generally be simplified as follows:
List<List<String>> head = new ArrayList<>();
List<List<String>> data = new LinkedList<>();
EasyExcel.write("输出文件绝对路径")
.head(head)
.excelType(ExcelTypeEnum.XLSX)
.sheet("target")
.doWrite(data);
Practical skills
Here is a brief introduction to the practical skills used in production.
Multithreaded read
Using easyexcel multithreaded reading is recommended under limited preconditions:
A simple example is as follows:
@Slf4j
public class EasyExcelConcurrentRead {
static final int N_cpu = Runtime.getRuntime().availableProcessors();
public static void main(String[] args) throws Exception {
// 假设I盘的temp目录下有一堆同格式的Excel文件
String dir = "I:\\temp";
List<Map<Integer,String>> mergeResult = Lists.newLinkedList();
ThreadPoolExecutor executor = new ThreadPoolExecutor(N_cpu,N_cpu * 2,TimeUnit.SECONDS,new LinkedBlockingQueue<>(),new ThreadFactory() {
private final AtomicInteger counter = new AtomicInteger();
@Override
public Thread newThread(@NotNull Runnable r) {
Thread thread = new Thread(r);
thread.setDaemon(true);
thread.setName("ExcelReadWorker-" + counter.getAndIncrement());
return thread;
}
});
Path dirPath = Paths.get(dir);
if (Files.isDirectory(dirPath)) {
List<Future<List<Map<Integer,String>>>> futures = Files.list(dirPath)
.map(path -> path.toAbsolutePath().toString())
.filter(absolutePath -> absolutePath.endsWith(".xls") || absolutePath.endsWith(".xlsx"))
.map(absolutePath -> executor.submit(new ReadTask(absolutePath)))
.collect(Collectors.toList());
for (Future<List<Map<Integer,String>>> future : futures) {
mergeResult.addAll(future.get());
}
}
log.info("读取[{}]目录下的文件成功,一共加载:{}行数据",dir,mergeResult.size());
// 其他业务逻辑.....
}
@requiredArgsConstructor
private static class ReadTask implements Callable<List<Map<Integer,String>>> {
private final String location;
@Override
public List<Map<Integer,String>> call() throws Exception {
List<Map<Integer,String>> data = Lists.newLinkedList();
EasyExcel.read(location).sheet()
.registerReadListener(new AnalysisEventListener<Map<Integer,String>>() {
@Override
public void invoke(Map<Integer,AnalysisContext analysisContext) {
data.add(row);
}
@Override
public void doAfterAllAnalysed(AnalysisContext analysisContext) {
log.info("读取路径[{}]文件成功,一共[{}]行",location,data.size());
}
}).doRead();
return data;
}
}
}
Here, ThreadPoolExecutor #submit() is used to submit concurrent read tasks, and then future #get() is used to wait for all tasks to complete before merging the final read results.
Multi sheet writing
Writing multiple sheets is actually using the same excelwriter instance to write to multiple writesheet instances. The title line of each sheet can be overwritten by the configuration properties in the writesheet instance. The code is as follows:
public class EasyExcelMultiSheetWrite {
public static void main(String[] args) throws Exception {
ExcelWriterBuilder writerBuilder = EasyExcel.write();
writerBuilder.excelType(ExcelTypeEnum.XLSX);
writerBuilder.autoCloseStream(true);
writerBuilder.file("I:\\temp\\temp.xlsx");
ExcelWriter excelWriter = writerBuilder.build();
WriteSheet firstSheet = new WriteSheet();
firstSheet.setSheetName("first");
firstSheet.setHead(Collections.singletonList(Collections.singletonList("第一个Sheet的Head")));
// 写入第一个命名为first的Sheet
excelWriter.write(Collections.singletonList(Collections.singletonList("第一个Sheet的数据")),firstSheet);
WriteSheet secondSheet = new WriteSheet();
secondSheet.setSheetName("second");
secondSheet.setHead(Collections.singletonList(Collections.singletonList("第二个Sheet的Head")));
// 写入第二个命名为second的Sheet
excelWriter.write(Collections.singletonList(Collections.singletonList("第二个Sheet的数据")),secondSheet);
excelWriter.finish();
}
}
The effects are as follows:
@H_ 301_ 157@
Paging query and batch write
In some scenarios with a large amount of data, paging query and batch write can be considered. In fact, paging query original data - > data aggregation or conversion - > write target data - > query on the next page. In fact, when the amount of data is small, one-time full query and full write are only a special case of paging query and batch write. Therefore, query, conversion and write operations can be abstracted into a reusable template method:
int batchSize = 定义每篇查询的条数;
OutputStream outputStream = 定义写到何处;
ExcelWriter writer = new ExcelWriterBuilder()
.autoCloseStream(true)
.file(outputStream)
.excelType(ExcelTypeEnum.XLSX)
.head(ExcelModel.class);
for (;;){
List<OriginModel> list = originModelRepository.分页查询();
if (list.isEmpty()){
writer.finish();
break;
}else {
list 转换-> List<ExcelModel> excelModelList;
writer.write(excelModelList);
}
}
Please refer to a non titled party production application article "excel export optimization of million level data" written earlier by the author, which is applicable to the scenario of large amount of data export. The code is as follows:
@H_ 301_ 157@
The operation of uploading Excel files is similar to that of ordinary files, and then use the excel reader of easyexcel to read the abstract InputStream instance of the file part in the request object multiparthttpservletrequest:
@PostMapping(path = "/upload")
public ResponseEntity<?> upload(MultipartHttpServletRequest request) throws Exception {
Map<String,multipartfile> fileMap = request.getFileMap();
for (Map.Entry<String,multipartfile> part : fileMap.entrySet()) {
InputStream inputStream = part.getValue().getInputStream();
Map<Integer,String> head = new HashMap<>();
List<Map<Integer,String>> data = new LinkedList<>();
EasyExcel.read(inputStream).sheet()
.registerReadListener(new AnalysisEventListener<Map<Integer,String>>() {
@Override
public void invokeHeadMap(Map<Integer,AnalysisContext context) {
head.putAll(headMap);
}
@Override
public void invoke(Map<Integer,AnalysisContext analysisContext) {
data.add(row);
}
@Override
public void doAfterAllAnalysed(AnalysisContext analysisContext) {
log.info("读取文件[{}]成功,一共:{}行......",part.getKey(),data.size());
}
}).doRead();
// 其他业务逻辑
}
return ResponseEntity.ok("success");
}
The postman request is as follows:
@H_ 301_ 157@
Exporting Excel files using easyexcel is also relatively simple. You only need to attach the OutputStream object carried in the response object httpservletresponse to the excelwriter instance of easyexcel:
@GetMapping(path = "/download")
public void download(HttpServletResponse response) throws Exception {
// 这里文件名如果涉及中文一定要使用URL编码,否则会乱码
String fileName = URLEncoder.encode("文件名.xlsx",StandardCharsets.UTF_8.toString());
// 封装标题行
List<List<String>> head = new ArrayList<>();
// 封装数据
List<List<String>> data = new LinkedList<>();
response.setContentType("application/force-download");
response.setHeader("Content-Disposition","attachment;filename=" + fileName);
EasyExcel.write(response.getOutputStream())
.head(head)
.autoCloseStream(true)
.excelType(ExcelTypeEnum.XLSX)
.sheet("Sheet名字")
.doWrite(data);
}
Here we need to pay attention to:
Summary
Easyexcel's API design is simple and easy to use. It can be used to quickly develop scenes with Excel data import or export. It is one of the favorite tools of data extraction tools.
(c-3-d, e-a-20200909)