10 common errors when Java developers write SQL statements

The coordination between Java developers' object-oriented programming thinking and command-line programming thinking depends on their ability levels as follows:

Skills (anyone can write command-line code) dogma (some people use the "pattern pattern" method, that is, patterns are everywhere and identified by names) emotional state (in the early days, real object-oriented code will be more difficult to understand than imperative code.)

However, when Java developers write SQL statements, everything becomes different. SQL is an illustrative language, which has nothing to do with object-oriented thought and imperative thought. In SQL language, query is very easy to express. But it is not so easy to write in the best or most correct way. Developers not only need to rethink their own programming mode, but also need to think deeply from the perspective of set theory.

The following is when Java developers make JDBC or jooq write SQL statements,

Several common errors

1. Forget null

Misunderstanding the meaning of null is probably the most common mistake Java developers make when writing SQL. This may be because null is also known as unknown, but there are other reasons. Of course, it would be easier to understand if it was only called unknown. Another reason is that when JDBC obtains data or binds variables, null in SQL is mapped to null in Java. This may lead people to think that, similar to the case of null = = null in Java, null = null also exists in SQL.

A more bizarre example of the misunderstanding of null is when null predicates are used in row valued expressions.

Another subtle problem is the misunderstanding of the meaning of null in notin anti join.

terms of settlement

Keep training yourself. The meaning of null should always be clear. Every time you write SQL, you should consider:

Is the predicate correct for null? Does null affect the result of this function?

2. Process data in Java Memory

Some Java developers know SQL features very well. Occasionally join, scattered Union, no problem. But what if you encounter window function, result set grouping and so on? Many Java developers will load SQL data into memory, convert the data into some suitable collection types, and perform annoying mathematical operations on the collection with a very lengthy loop structure (at least before Java 8 improves the container).

However, in addition to supporting SQL standards, some SQL databases also support advanced OLAP features, which are more efficient and easier to write. A non-standard example is the model clause of Oracle. Just let the database process the data and load the final results into JAVA memory. Because some very smart people have optimized these expensive products. So, in fact, by migrating to an OLAP database, you will get two benefits:

Concise. It may make it easier to write correct code in SQL than in Java. The database will probably be faster than your algorithm. More importantly, you don't have to transmit millions of records over the network.

terms of settlement

Every time you implement a data centric algorithm in Java, try to ask yourself: is there a way for the database to perform these tasks and only deliver the results to me?

3. Try to use union instead of union all

Compared with union, union all needs additional keywords, which is dwarfed. It would be much better if the following support has been defined in the SQL standard:

Union (allow duplication) union distinct generally requires little duplication removal (sometimes duplication removal or even error), and it is often very slow for large result sets with many columns, because the two sub queries need to be sorted, and each tuple needs to be compared with the subsequent tuples.

It should be noted that even though the SQL standard specifies intersectall and excelall, almost no database implements these useless operations.

terms of settlement

Every time you write about Union, consider whether you actually want to write union all.

4. Use JDBC paging function to page a large number of results

Most databases support access through limit OFFSET,TOP .. START AT、OFFSET.. Clauses such as fetch page the results in some way. Without support for these clauses, there is still rownum (Oracle) or row_ Number () over () (DB2, SQL Server 2008 and earlier), which is much faster than paging in memory. And this is more obvious for large data sets.

terms of settlement

As long as you use those clauses or tools (such as jooq), you can simulate the above paging clauses for you.

5. Implement the connection in JAVA memory

From the early stage of SQL development, some developers still feel uneasy when facing SQL connection. There has always been an inherent fear that the join speed is slow. If the cost based optimizer chooses to execute nested loops and load the complete table into database memory before creating a connection table source, it is indeed very slow. But it rarely happens. Mergejoin and hashjoin operations are very fast with appropriate predicates, constraints, and indexes. This is related to the correct metadata (I don't have to give Tom Kyte another example). However, it is still possible that many Java developers will load two tables from a separate query into the map container to connect in Java Memory in some way.

terms of settlement

If you select from multiple tables in multiple steps, you should carefully consider whether you can express the query function you need in one statement.

6. Use distinct or union to remove duplicates from a Cartesian product

The existence of lengthy connections will lead to the loose relationship in SQL statements. Specifically, if multiple column foreign key relationships are involved, it is likely to forget to add predicates to the join clause. This may lead to duplicate records, but perhaps only in special cases. Then some developers may choose to use distinct to delete these duplicate records again. This error has three hazards:

It may cure the symptoms rather than the root cause. Even in some marginal cases, the standard can not be cured, which will be very slow on a large result set with many columns. Distinct performs an order by operation to remove duplicates. This is also very slow in large Cartesian products, because doing so will still cause a large amount of data to be loaded into memory. terms of settlement

As a rule of thumb, when you get unwanted duplicate results, you should first check your join predicate. Because there may be an imperceptible Cartesian product somewhere.

7. Do not use merge statements

Strictly speaking, this is not a real mistake. It may just be a lack of sufficient cognition or some fear of powerful merge statements. Some databases include other forms of update statements, such as the on duplicate key update clause of MySQL. But merge is really powerful. The most important thing is that in the database, it extends SQL standards to a great extent, such as SQL server.

terms of settlement

If you link insert and update or link select For update to implement updating, you should think more about it. Regardless of the risks associated with operating conditions, you may be able to use a simple merge statement to achieve your goal.

8. Aggregate function is used instead of form function

Before introducing the window function, using the group by clause and the projection aggregation function is the only way to summarize data. This is very effective in most cases. If the aggregated data needs to be supplemented by conventional data, the grouped query can be placed in the join sub query.

However, SQL: 2003 defines the window function. At present, many mainstream database manufacturers have also implemented the window function. The window function can aggregate data not grouped in the result set. In fact, the function of each window supports its own independent partition by clause, which is a very useful tool for report application.

Using the window function will:

It leads to more readable SQL (reducing the existence of non dedicated group by clauses in subqueries) and improves performance. As an RDBMS, it is likely to be easier to optimize its window functions.

terms of settlement

When you write a group by clause in a subquery, think carefully about whether this can be done with a window function.

9. Use memory sorting method for indirect sorting

Multiple types of expressions are supported in the sqlorder by clause, including case statements, which is very useful for indirect sorting. You should always be able to sort data in JAVA memory because you think:

SQL sorting is too slow. SQL sorting cannot do this

terms of settlement

If you sort any SQL data in memory, think carefully about whether you can migrate the sorting to the database. This is the same reason for migrating pages to the database.

Insert a large number of records one by one

JDBC includes batch processing, and you should use it. In the face of thousands of records, do not create a new Preparedstatement for each record to insert. If you want to insert all records into the same table, use a single SQL statement and multiple bound value sets to establish a batch insert statement. Depending on your database and database configuration, you may need to commit after a certain number of inserted records. In order to keep the undo log not too large.

terms of settlement

Always batch insert large datasets.

There are 10 common errors that Java developers make when writing SQL statements. Do you have a general understanding? I hope you must pay special attention to them in the process of writing!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>