Java: the optimization process of exporting large quantities of Excel data

background

At present, the team is working on a user data Kanban (hereinafter referred to as Kanban), which basically covers all user behavior data and generates analysis reports. User behavior is composed of multiple data sources (catering, daily life, recharge consumption, transportation, logistics, transportation, medical care, housing property, sports health...),

Combination, sorting and statistics based on a large amount of data. According to the latest statistical report, nearly 100W + behavior data is generated every day, so the data base is very large.

This data center connects with many business teams. These teams filter some dimensions according to their own needs, and then directly download data (Excel) documents from our center for analysis. Therefore, it is very common for the next hundreds of thousands or millions of rows of data.

Problems and Solutions

Problems encountered

At present, the main problem is that with the gradual improvement of behavior ability, more and more user data is precipitated, and the business volume is also expanding.

Business teams sometimes download excessive data for analysis, so the data download ability on the platform is particularly important. Our problem is that the download efficiency is too slow. It takes more than 5 minutes to download 10W data, which is obviously a problem.

Solution steps

The code was left over by the previous team. The original functions were not open to use and there was no amount of data, so no problems were found. The following is the original export module. The original program is as follows. I have made a basic restore.

Now how to ensure the efficient export of data is our most important goal, which is also the most concern of the business team.

Look at the code marked in red. The way to generate excel is to render each cell in Excel and fill in the data line by line. The efficiency is too slow. According to the log analysis, it is found that the basic time is spent on data generation excel. Basic data of about 1W is generated every time

It takes 1 minute. It turns out that in other businesses, it is only exported and used as simple data. For example, hundreds of data will come out soon, but when a large amount of data is exported, the performance problem will appear immediately.

After discussion and reference, the team found that there are many powerful excel processing components in the industry. We give priority to Ali's easy Excel for a try.

Add easyexcel to POM as follows:

Code: dto content (Chinese is the configured header):

Steps for generating file stream (the code is very clear):

Generate and upload excel file stream completely:

After using Alibaba easyexcel component, it takes only 8 seconds for 10W + data to flow from generating excel file to uploading, which was about 8 minutes. The following is the log record of each step time point, which can be seen from the time consumption:

Sorting tool class

Tools and instructions

Referring to the tool classes sorted on the Internet, some classes and methods were OK in the previous version and marked as outdated in the new version.

reference material

Example document: https://www.yuque.com/easyexcel/doc/easyexcel Easyexcel GitHub address: https://github.com/alibaba/easyexcel

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>