HTTP protocol

2019-01-10 • Java

This is the back-end small class of the monastery. Each article is shared from

[background introduction] [knowledge analysis] [common problems] [solutions] [coding practice] [extended thinking] [more discussion] [References]

Eight aspects of in-depth analysis of back-end knowledge / skills. This article shares:

[HTTP protocol]

title:

[java small class of Xiuzhen academy] HTTP protocol

Opening remarks:

Hello, I'm Guo Jing, a pure and kind java programmer in the fourth phase of Xi'an Branch of it Academy. Today, I'd like to share with you the in-depth thinking of Java task 2 on the official website of academy - HTTP protocol

(1) Background:

What is the HTTP protocol??

HTTP protocol, hypertext transfer protocol, is the most widely used network protocol on the Internet. Is a transfer protocol used to transfer hypertext from a world wide web server to a local browser. HTTP is a communication protocol based on TCP / IP to transfer data (HTML files, picture files, query results). HTTP is an object-oriented protocol belonging to the application layer. Because of its simple and fast way, it is suitable for distributed hypermedia information system. HTTP protocol works on the client server architecture. As an HTTP client, the browser sends a request to the HTTP server, that is, the web server, through the URL, and the web server receives it Send response information to the client.

(2) Knowledge analysis:

2.1 working principle of HTTP

The HTTP protocol defines how the web client requests a web page from the web server and how the server transmits the web page to the client. HTTP protocol adopts request / response model. The client sends a request message to the server, which contains the requested method, URL, protocol version, request header and request data. The server responds with a status line, which includes the protocol version, success or error code, server information, response header and response data.

HTTP request / response steps:

1. The client connects to the web server

An HTTP client, usually a browser, establishes a TCP socket connection with the HTTP port (80 by default) of the web server. For example:.

2. Send HTTP request

Through TCP socket, the client sends a text request message to the web server. A request message consists of request line, request header, empty line and request data.

3. The server accepts the request and returns an HTTP response

The web server parses the request and locates the requested resource. The server writes the resource copy to the TCP socket and the client reads it. A response consists of four parts: status line, response header, blank line and response data.

4. Release TCP connection

If the connection mode is close, the server actively closes the TCP connection, and the client passively closes the connection and releases the TCP connection; If the connection mode is keepalive, the connection will be maintained for a period of time, during which you can continue to receive requests.

5. Client browser parses HTML content

The client browser first parses the status line to view the status code indicating whether the request is successful. Then each response header is parsed, and the response header tells the following HTML document and document character set of several bytes. The client browser reads the response data HTML, formats it according to the HTML syntax, and displays it in the browser window.

For example, after typing the URL in the browser address bar and pressing enter, the following process will be followed:

1. The browser requests the DNS server to resolve the IP address corresponding to the domain name in the URL;

2. After resolving the IP address, establish a TCP connection with the server according to the IP address and the default port 80;

3. The browser sends an HTTP request to read the file (the file corresponding to the part behind the domain name in the URL), and the request message is sent to the server as the data of the third message of TCP three handshakes;

4. The server responds to the browser request and sends the corresponding HTML text to the browser;

5. Release TCP connection;

6. The browser parses the HTML text and displays the content;

2.2. Request method of HTTP protocol:

According to the HTTP standard, HTTP requests can use a variety of request methods. HTTP1. 0 defines three request methods: get, post and head.

HTTP1. 1. Five request methods are added: options, put, delete, trace and connect methods.

Get requests the specified page information and returns the entity body.

Head is similar to get request, except that there is no specific content in the returned response to get the header

Post submits data to the specified resource for processing requests (such as submitting forms or uploading files). The data is contained in the request body. Post requests may lead to the establishment of new resources and / or the modification of existing resources.

Put replaces the content of the specified document with the data transmitted from the client to the server.

Delete requests the server to delete the specified page.

The connect http / 1.1 protocol is reserved for proxy servers that can change the connection to pipeline.

Options allows clients to view the performance of the server.

The request received by the trace echo server is mainly used for testing or diagnosis.

2.3 status code of HTTP request:

The first line in the response message is called the status line, which is composed of HTTP protocol version number, status code and status message. The status code is used to tell the HTTP client whether the HTTP server has generated the expected response

Five types of status codes are defined in http / 1.1. The status code consists of three digits. The first digit defines the type of response

(a) 1 (information prompt) these status codes represent temporary responses. Clients should be prepared to receive one or more 1XX responses before receiving regular responses

100 (continue) the initial request has been accepted, and the customer should continue to send the rest of the request. (HTTP 1.1 new)

101 (switching protocol) the server converts compliance with the client's request to another protocol (HTTP 1.1 new)

(b) The status code beginning with 2 (request successful) indicates that the request was successfully processed

200 (successful) the server has successfully processed the request. Usually, this means that the server has provided the requested web page.

201 (created) the request succeeded and the server created a new resource.

202 (accepted) the server has accepted the request but has not yet processed it.

203 (unauthorized information) the server has successfully processed the request, but the information returned may come from another source.

204 (no content) the server successfully processed the request, but did not return any content.

205 (reset content) the server successfully processed the request, but did not return any content.

206 (partial content) the server successfully processed some get requests.

300 (multiple choices) for requests, the server can perform a variety of operations. The server can select an operation according to the user agent or provide a list of operations for the requester to choose.

301 (permanent move) the requested web page has been permanently moved to the new location. When the server returns this response (response to get or head request), the requester will be automatically moved to the new location.

The 302 (Temporary Mobile) server currently responds to requests from web pages in different locations, but the requester should continue to use the original location for future requests.

303 (view other locations) the server returns this code when the requester should use a separate get request for different locations to retrieve the response.

304 (unmodified) the requested web page has not been modified since the last request. When the server returns this response, the web page content will not be returned.

305 (use proxy) the requester can only use the proxy to access the requested web page. If the server returns this response, it also indicates that the requester should use the proxy.

(d) 4 (request error) these status codes indicate that the request may be wrong and hinder the processing of the server.

400 (bad request) the server does not understand the syntax of the request.

401 (unauthorized) request requires authentication. The server may return this response for web pages that require login.

403 (Forbidden) the server refused the request.

404 (not found) the server could not find the requested page.

405 (method disable) disables the method specified in the request.

406 (not accepted) unable to respond to the requested web page with the requested content feature.

407 (proxy authorization required) this status code is similar to 401 (unauthorized), but the specified requester should authorize the use of the proxy.

408 (request timeout) a timeout occurred while the server was waiting for a request.

(e) 5 (server error) these status codes indicate that the server encountered an internal error when trying to process the request. These errors may be the server itself, not the request.

500 (server internal error) the server encountered an error and was unable to complete the request.

501 (not yet implemented) the server does not have the ability to complete the request. For example, this code may be returned when the server does not recognize the request method.

502 (error gateway) the server, as a gateway or proxy, received an invalid response from the upstream server.

503 (service unavailable) the server is currently unavailable (due to overload or shutdown maintenance). Usually, this is only a temporary state.

504 (Gateway timeout) the server acts as a gateway or proxy, but does not receive a request from the upstream server in time.

505 (HTTP version not supported) the server does not support the HTTP protocol version used in the request.

2.4 three main content types of HTTP

1. Application / x-www-form-urlencoded: in the initial request mode, the request parameters are placed in the URL. When the form is submitted,

They are written after the URL in the form of key = & value =. This is also the default method for browser form submission.

2. Multipart / form data: this method is mostly used for file upload. The form data is saved in the body of HTTP, and each form item is separated by boundary.

3. Application / JSON: it is used to tell the server that the message body is a serialized JSON string.

(3) Frequently asked questions:

3.1 what is the difference between URI and URL?

3.2. What is the difference between get and post requests?

(4) Solution:

4.1 what is the difference between URI and URL?

Uri is a uniform resource identifier, which is used to uniquely identify a resource. Every resource available on the web, such as HTML documents, images, video clips, programs, etc., is located by a URI

Uri generally consists of three parts:

① Naming mechanism for accessing resources

② Host name where the resource is stored

③ The name of the resource itself, represented by the path, with emphasis on the resource.

URL is a uniform resource locator. It is a specific URI, that is, URL can be used to identify a resource, and also indicates how to locate the resource.

URL is a string used to describe information resources on the Internet. It is mainly used in various www client and server programs, especially the famous mosaic.

URL can be used to describe various information resources in a unified format, including files, server addresses and directories. URL generally consists of three parts:

① Agreement (or service mode)

② IP address of the host where the resource is stored (sometimes including port number)

③ The specific address of the host resource. Such as directory and file name, etc

Urn, uniform resource name, uniform resource name, identifies resources by name, such as mailto: Java- net@java.sun.com 。 Uri is an abstract and high-level concept to define unified resource identification, while URL and urn are specific ways of resource identification. Both URL and urn are URIs. Generally speaking, every URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the uniform resource name (urn), which names the resource but does not specify how to locate it. In Java URIs, a URI instance can represent absolute or relative, as long as it conforms to the syntax rules of URIs. The URL class not only conforms to the semantics, but also contains the information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any method to access resources. Its only function is parsing. Instead, the URL class can open a stream to a resource.

4.2. What is the difference between get and post requests?

Get request

GET /books/? sex=man&name=Professional HTTP/1.1

Host: www.wrox. com

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)

Gecko/20050225 Firefox/1.0. one

Connection: Keep-Alive

Notice that the last line is blank

Post request

POST / HTTP/1.1

Host: www.wrox. com

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)

Gecko/20050225 Firefox/1.0. one

Content-Type: application/x-www-form-urlencoded

Content-Length: 40

Connection: Keep-Alive

name=Professional%20Ajax&publisher=Wiley

1. Get submission, The requested data is appended to the URL (that is, put the data in the HTTP protocol header) to split the URL and transmit the data. Multiple parameters are connected with &; for example: login. Action? Name = hyddd & password = Idontknow & verify =% E4% BD% A0% E5% a5% BD. if the data is English letters / numbers, send it as it is. If it is empty, convert it to +. If it is Chinese / other characters, directly encrypt the string with Base64 to get the following result: :% E4% BD% A0% E5% a5% BD, where XX in% XX is the ASCII of the symbol in hexadecimal.

Post submission: place the submitted data in the package body that is an HTTP package. The red font in the above example indicates the actual transmission data

Therefore, the data submitted by get will be displayed in the address bar, while the address bar will not change when submitted by post

2. Size of transmitted data: first of all, the HTTP protocol does not limit the size of transmitted data, and the HTTP protocol specification does not limit the length of URL.

The limitations in actual development mainly include:

Get: specific browsers and servers have restrictions on URL length. For example, ie limits URL length to 2083 bytes (2k + 35). For other browsers, such as Netscape and Firefox, there is no length limit in theory, which depends on the support of the operating system.

Therefore, for get submission, the transmission data will be limited by the URL length.

Post: since the value is not transmitted through the URL, the data is not limited theoretically. However, in practice, each web server will limit the size of post submission data. Apache and IIS6 have their own configurations.

3. Security

The security of post is higher than that of get. For example, when submitting data through get, the user name and password will appear in plaintext on the URL, because (1) the login page may be cached by the browser; (2) Other people can get your account and password by viewing the browser's history. In addition, submitting data using get may also cause cross site request forge attack

4. HTTP, get, post and soap protocols all run on HTTP

(1) Get: the request parameter is attached to the URL as a sequence of key / value pairs (query string)

The length of query string is limited by web browser and web server (for example, ie supports 2048 characters at most), which is not suitable for transmitting large data sets. At the same time, it is very unsafe

(2) Post: the request parameters are transmitted in a different part of the HTTP title (named entity body), which is used to transmit form information. Therefore, the content type must be set to: application / x-www-form - urlencoded. Post is designed to support user fields on web forms, and its parameters are also transmitted as key / value pairs.

However: it does not support complex data types because post does not define the semantics and rules of the data structure.

(3) Soap: a special version of HTTP post, which follows a special XML message format

Content type is set to: text / XML. Any data can be XML.

HTTP protocol defines many methods to interact with the server. There are four basic methods: get, post and delete A URL address is used to describe a resource on the network, and get and delete in HTTP correspond to four operations: query, change, add and delete. The most common ones are get and post. Get is generally used to obtain / query resource information, while post is generally used to update resource information

Let's look at the difference between get and post

The data submitted by get will be placed after the URL, with the following format:? Split the URL and transfer data, and connect the parameters with &, such as editposts aspx? name=test1&id=123456. The post method puts the submitted data into the body of the HTTP package

The data size submitted by get is limited (because the browser limits the length of the URL), while the data submitted by post method is not limited

The get method requires request Querystring to get the value of the variable, and post through request Form to get the value of the variable.

Submitting data by get will bring security problems. For example, when a login page submits data by get, the user name and password will appear on the URL. If the page can be cached or other people can access the machine, the user's account and password can be obtained from the history

(5) Coding practice:

(6) Expand thinking:

6.1 HTTP features: 1. Simple and fast: when a client requests a service from the server, it only needs to transmit the request method and path. The common request methods are get, head and post. Each method specifies a different type of contact between the client and the server. Due to the simple HTTP protocol, the program scale of HTTP server is small, so the communication speed is very fast.

2. Flexible: http allows any type of data object to be transmitted. The type being transferred is marked by content type.

3. No connection: no connection means that only one request is processed for each connection. After the server processes the customer's request and receives the customer's response, it disconnects. In this way, the transmission time can be saved.

4. Stateless: http protocol is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. Missing status means that if the previous information is required for subsequent processing, it must be retransmitted, which may increase the amount of data transmitted per connection. On the other hand, when the server does not need previous information, its response is faster.

5. Support B / s and C / s modes.

(7) References:

https://www.cnblogs.com/ranyonsue/p/5984001.html

https://blog.csdn.net/shiyongyue/article/details/773685

Baidu Encyclopedia

(8) More discussion:

8.1 Qin Yonghui asked: what is the difference between HTTP and HTTPS?

HTTP (Hypertext Transfer Protocol) is used to transfer information between web browser and web server in clear text without providing any data encryption. Therefore, with the emergence of SSL protocol, SSL relies on SSL certificate to verify the identity of the server and encrypt the communication between server and browser. In most cases, HTTP and HTTP are the same , the only difference is the description of the protocol header (HTTPS).

8.2 Zhao Linai asked: what is TCP socket?

TCP uses the IP address of the host plus the port number on the host as the endpoint of TCP connection. This endpoint is called socket or socket. Socket is represented by (IP address: port number).

It is an abstract representation of the endpoint in the process of network communication, including five kinds of information necessary for network communication: the protocol used for connection, the IP address of the local host, the protocol port of the local process, the IP address of the remote host, and the protocol port of the remote process.

8.3 rubrian asked: what does context type mean?

Context type refers to the Internet media type. In the HTTP message header, context type is used to represent the media type information in the specific request.

(9) Thanks:

(10) Conclusion:

That's all for today's sharing. You are welcome to like, forward, leave messages and make bricks!

Ppt link video link

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

HTTP protocol

< <上一篇

Java process control

下一篇>>

搜索内容

HTTP protocol

热门文章