Java network programming – HTTP protocol
Definition of HTTP protocol
This article temporarily does not study the handshake and wave process of TCP / IP at the bottom of HTTP, but only analyzes the HTTP protocol from the surface interaction process.
The full English name of HTTP is hypertext transfer protocol, that is, hypertext transfer protocol. HTTP is a standard that defines how the web client talks to the server and how data is transferred from the server to the client. In the process of daily development and use, HTTP is often regarded as a protocol or means for transmitting HTML files and pictures embedded in files. In fact, HTTP is a general network data transmission format. Its transmission content is not limited to HTML files or pictures, but also used to transmit Microsoft Word documents and even windows exe files, All data that can be represented by byte sequences can be transmitted using HTTP.
HTTP transmits data through TCP / IP. If the details of handshake and wave of the underlying TCP protocol are ignored, for each request and request response from the client to the server, it is shown in http1 0 has the following steps:
In http1 1 (currently the most commonly used is HTTP 1.1) and later HTTP versions can send multiple requests and receive multiple responses continuously through a TCP connection. That is, steps 2 and 3 in the middle of steps 1 and 4 above can be executed repeatedly. In addition, http1 1, request data and response data can be sent in blocks to improve scalability.
HTTP request method
A variety of request methods are defined in HTTP to identify what types of operations need to be completed in the current request. The common HTTP request methods are get, head, put, post, patch, trace, options and delete.
The "security" option mentioned above is "yes", which means that using this HTTP request method will not modify or update any data, that is, multiple requests will not affect the status of resources. If the idempotent option is yes, it means that the HTTP request method is used to request multiple HTTP calls. No matter how many times it is called, the request result or resource state is the same (it can be understood that only the first call truly modifies the resource state, and subsequent calls only obtain the result of the first call from the second call). The security and idempotency of HTTP methods are two key factors that we need to consider when designing HTTP interfaces.
It is worth noting that the functions of post and put methods mentioned above can be understood as the same. The main difference between them is that post is not idempotent, but put is idempotent. In the current web development, the post method has been abused. Generally, few people will use put, unless they advocate restful style programming. The function of the put method is similar to that of the patch method. Both of them replace the contents of the specified document in the server with the data requested by the client. However, the put method replaces all, while the patch method replaces part.
PS: the above methods are just some specifications of the request methods in the HTTP protocol. There are no hard and fast rules to follow.
Common HTTP status codes
The common HTTP status codes in JDK can be found in Java net. Found in httpurlconnection, summarized as follows:
Briefly summarized as follows:
Common HTTP headers
Here are some common headers and their functions.
User-Agent
User agent is generally used as the request header to tell the server what browser is currently used by the client. It is translated as user agent. Its function is to allow the server to optimize the returned data or files according to the type of client user agent when responding to the request. For example, when sending a request using chrome, the user agent is as follows:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/69.0.3497.81 Safari/537.36
Host
The host is generally used as the request header to specify the host name and port number of the server receiving the request. For example:
Host: www.importnew.com
Accept
Accept is generally used as the request header. Its function is to tell the server what it can use or want, what it can't use or doesn't want. Here are several accept headers and their functions:
The accept header is used to specify the type and subtype of the received media class. This is because the media type (MIME) is classified by secondary level. For example, the media type of JPEG image is image / jpeg, the type is image, and the subtype is JPEG. Mime has defined eight top-level types:
For example, if the client only receives JSON data:
Accept: application/json
Referer
Referer is generally used as the request header. It provides the URL of the document containing the URL of the current request, that is, the document from the previous source of the current request. It is generally used as an anti-theft chain. For example, www.baidu.com com/search? Name = Doge, the server needs to judge whether the referer is www.baidu.com when processing this request com,www.baidu. The previous document source of COM / search must be www.baidu.com COM, otherwise the server should reject the request.
Cookie
Cookies are generally used as request headers, through which the client transmits one or more tokens to the server. In principle, cookies are not secure headers, and the contents of cookies will also be cached on the client. Generally, in servlet applications, cookies are the best way to identify the current user and realize persistent sessions. From the perspective of expiration time classification, cookies are divided into session cookies and persistent cookies. The expiration time of session cookies is relatively short, and the expiration time of persistent cookies is relatively long or will not expire. The expiration policy of cookies should be controlled by the server. Since cookies are directly exposed to the client, cookies are generally not used to store sensitive data. If sensitive data needs to be stored, data encryption can be considered.
Cookie: uid=10086; domain="localhost"
Set-Cookie
Set cookie is generally used as the response header, which corresponds to the cookie, indicating that the server has successfully set the cookie.
Cache-Control
Cache control is generally used as the request header to inform the server to cache the response results of the current request. Cache control supports many values. The details are not expanded here. For example, no cache means that the cache cannot be used without successfully passing the source verification. For example, Max age means that the response results need to be cached to the specified maximum time.
Content-Type
Content type is a general header, which can be used as a request header or response header. Its function is to inform the server or client of the content (media) type of the current request or response result.
Content-Length
Content length is a general header, which can be used as a request header or response header. Its function is to inform the server or client of the length of the current request or response data body.
Content-Encoding
Content encoding is generally used as the response header, corresponding to accept encoding, and is used by the server to inform the client of the content encoding of the current response result.
Content-Language
Content language is generally used as the response header, corresponding to accept language, and is used by the server to inform the client of the content language of the current response result.
Connection
Connection is generally used as the request header to indicate whether a persistent connection is required. In http1 1, if it is specified as keep alive, it can provide persistent connections, improve the reuse rate of sockets, and reduce the performance consumption of multiple connections. The following section is devoted to keep alive.
Orgin
Origin is generally used as the request header, indicating that the current request is a request for cross domain resource sharing (the request requires the server to add an access control allow origin header in the response, indicating the source allowed by access control).
Origin: http://www.baidu.com
Access-Control-Allow-Origin
Access control allow origin is generally used as the response header, corresponding to origin, indicating the request source of cross domain resource sharing allowed by the server.
Access-Control-Allow-Origin: http://www.baidu.com
Server
The server is generally used as a response header to inform the client server of relevant information.
HTTP request body
If you use the get request method, you only need to provide the URL to the remote server, and the path and query string in the URL can match the resources to be queried. However, detailed client information cannot be provided in the URL. In addition, the data body carried by request methods such as post and put may be large and cannot be placed in the query string of the URL. Therefore, HTTP requires a request body. The HTTP request body includes the following four parts:
The text description may be abstract, which is shown in the figure as follows:
PS: space stands for space and \ R \ n for line feed.
for instance:
GET /wp-admin/admin-ajax.PHP?postviews_id=23996&action=postviews&_=1538708851063 HTTP/1.1
Host: www.importnew.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/69.0.3497.81 Safari/537.36
Referer: http://www.importnew.com/23996.html
Accept-Encoding: gzip,deflate
Accept-Language: zh-CN,zh;q=0.9
postviews_id=23996&action=postviews&_=1538708851063
HTTP response body
The format of the response body is similar to that of the request body. It mainly returns the response data of the server to the client, including some information of the server and the response data body. The HTTP response body mainly includes the following four parts:
The text description may be abstract, which is shown in the figure as follows:
PS: space stands for space and \ R \ n for line feed.
for instance:
HTTP/1.1 200 OK
Server: Nginx
Date: Fri,05 Oct 2018 03:07:37 GMT
Content-Type: text/html; charset=UTF-8
@R_350_301@: chunked
Connection: keep-alive
Keep-Alive: timeout=2
Vary: Accept-Encoding
X-Powered-By: PHP/5.3.3
X-Robots-Tag: noindex
X-Content-Type-Options: nosniff
x-frame-options: SAMEORIGIN
Content-Encoding: gzip
2995
Keep-Alive
When using http1 0 will open a new TCP connection for each request. In fact, this leads to a typical web session in which the events taken to open and close all connections are much longer than the time actually spent transmitting data, especially for sessions with many small documents in the response results. For HTTPS connections encrypted with SSL or TLS, this problem is more serious, because the handshake process of establishing a secure socket requires much more work than that of establishing a conventional socket.
In http1 In versions 1 and later, the server does not have to close the connection after returning the response. The established connection can remain open and wait for new requests from the client on the same socket. To put it simply, you can send multiple requests and respond to multiple requests continuously on a TCP connection.
The client can add a connection request header in the HTTP request header with the specified value of keep alive, so as to realize socket reuse:
Connection: Keep-Alive
HTTP1. In version 1 or later, keep alive is enabled by default and does not need to be explicitly specified. If it needs to be closed, it can be set to close:
Connection: close
Once keep alive is enabled, the server reuses the socket if a new client connects to the server again before closing a socket connection. In JDK, it can be controlled through system attributes. If keep alive of HTTP is used:
Cookies and cookie management
Many websites use small text strings to store persistent client state between connections. These small text strings are called cookies. Cookies are transmitted from the server to the client at the head of the request and response, and then back to the server from the client. The server uses cookies to indicate sessionid, shopping cart content, login credentials, etc. In addition to the simple name = value pair, cookies can have multiple properties to control their scope, including expiration date, path, domain, port, version and security options.
Java. In JDK net. The cookie store class provides the operation of adding, deleting and querying cookies. Its default implementation is Java net. Inmemorycookeiestore. If the cookie store is implemented, the cookies in the JDK are stored in memory by default. In addition, Java net. Cookie manager internally holds cookie policy and cookie store, and defines a series of methods to manage cookies. Cookies are generally operated through cookie manager. Of course, cookie management can also be customized by implementing cookie store and overriding the default cookie manager.
Summary
(end of this article c-2-d e-20181005)
The official account of Technology (Throwable Digest), which is not regularly pushed to the original technical article (never copied or copied):
Entertainment official account ("sand sculpture"), select interesting sand sculptures, videos and videos, push them to relieve life and work stress.