Mongodb mass data crud optimization
1. Batch saving optimization
Avoid one query by one, use bulkwrite, based on replaceonemodel, and enable upsert:
public void batchSave(List<?> spoTriples,KgInstance kgInstance) {
MongoConverter converter = mongoTemplate.getConverter();
List<ReplaceOneModel<Document>> bulkOperationList = spoTriples.stream()
.map(thing -> {
org.bson.Document dbDoc = new org.bson.Document();
converter.write(thing,dbDoc);
ReplaceOneModel<org.bson.Document> replaceOneModel = new ReplaceOneModel(
Filters.eq(UNDERscore_ID,dbDoc.get(UNDERscore_ID)),dbDoc,new UpdateOptions().upsert(true));
return replaceOneModel;
})
.collect(Collectors.toList());
mongoTemplate.getCollection(getCollection(kgInstance)).bulkWrite(bulkOperationList);
}
2. Paging optimization
Fields that are often used for queries need to be indexed.
For queries that contain multiple keys, you can create a matching index.
2.1 avoid unnecessary count
When querying, the index is not slow, but if the paging page is returned , You need to query the totalcount. When the single table data is too large, the count will be more time-consuming, but do you really need an accurate number?
When searching for keywords in Google, Baidu and other search engines, we will only give you a limited number of results. Therefore, we do not need to give accurate numbers and set a threshold, such as 10000. When we find that the total amount is greater than 10000, we will return 10000 and the front-end display is greater than 10000.
The principle is also very identified. We skip max_ PAGE_ Count to see if there is any data. If so, it means that the total amount is greater than max_ PAGE_ Count, return Max_ PAGE_ Count is enough. Otherwise, calculate the real count.
int MAX_PAGE_COUNT = 10000;
/**
* 当总数大于阈值时,不再计算总数
*
* @param mongoTemplate
* @param query
* @param collectionName
* @return
*/
private long count(MongoTemplate mongoTemplate,Query query,String collectionName) {
query = query.with(PageRequest.of(MAX_PAGE_COUNT,1));
if (mongoTemplate.find(query,Thing.class,collectionName).size() > 0) {
return MAX_PAGE_COUNT;
}
return mongoTemplate.count(query,collectionName);
}
Front end display:
2.2 avoid too many skip
However, paging needs to skip some data first. This process takes time. You can avoid skipping through a small skill.
For example, when the list is displayed, it is sorted in reverse order according to the last modification time. 100 items are displayed on each page, and now page 100 is displayed. According to normal practice, 99 * 100 pieces of data need to be skipped, which is very expensive. From another perspective, because the data is orderly, the last modification time of the data on page 100 is less than the minimum modification time on page 99. When you add this condition to the query, you can directly take the first 100 items that meet the conditions.
3. Total quantity export optimization
3.1 remove unnecessary fields
When querying, specify really useful fields, which can effectively reduce the amount of data transmission and speed up the query efficiency. For example:
Query query = new Query();
query.fields().include("_id").include("name").include("hot").include("alias");
3.2 avoid using findall or paging query and use stream instead
There are two misunderstandings in full volume export. One is direct findall. When the amount of data is too large, it is easy to lead to the server out of mermory. Even without oom, it will cause a great load on the server and affect brother services. In addition, findall loads data into memory at one time, and the whole speed will be relatively slow. You need to wait for all data to enter memory before starting processing.
Another misunderstanding is that paging queries are processed in turn. Paging query can effectively reduce the burden on the server, which is a feasible method. However, as mentioned above, when paging to the back, you need to skip the previous data, which is useless. A slightly better approach is to convert skip into condtion as mentioned earlier. This method is efficient, but not recommended. There is code redundancy.
Page<Thing> dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg),page);
Map<String,Individual> thingId2Resource = new ConcurrentHashMap<>();
appendThingsToModel(model,concept2OntClass,hot,alias,dataList,thingId2Resource);
while (dataList.hasNext()) {
page = PageRequest.of(page.getPageNumber() + 1,page.getPageSize());
dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg),page);
appendThingsToModel(model,thingId2Resource);
}
It is more recommended to use the steam method of mongotemplate, return the closeableiterator iterator, read one data and process one data to achieve efficient processing:
@Override
public <T> CloseableIterator<T> stream(final Query query,final Class<T> entityType,final String collectionName) {
return doStream(query,entityType,collectionName,entityType);
}
Code can be simpler and more efficient by using the method instead:
CloseableIterator<Thing> dataList = kgDataStoreService.getSimpleInfoIterator(kg);
// 实体导入
// Page<Thing> dataList = entityDao.findAllByPage(kgDataStoreService.getKgCollectionByKgInstance(kg),thingId2Resource);
To be continued...