What is the HTTP protocol?

This is the back-end small class of the monastery. Each article is shared from

[background introduction] [knowledge analysis] [common problems] [solutions] [coding practice] [extended thinking] [more discussion] [References]

Eight aspects of in-depth analysis of back-end knowledge / skills. This article shares:

[what is HTTP protocol?]

1. Background introduction

What is the HTTP protocol?

In web applications, the server transmits the web page to the browser, which is actually sending the HTML code of the web page to the browser for display. Therefore, HTTP is a protocol for transmitting HTML on the network for the communication between browser and server.

HTTP protocol is the abbreviation of Hyper Text Transfer Protocol, which is used to transfer hypertext from the world wide web server to the local browser.

HTTP is a protocol stack based on TCP / IP to transfer data (HTML files, picture files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer. Because of its simple and fast way, it is suitable for distributed hypermedia information system.

Www introduction

Wiki: the world wide web is a system of many linked hypertext accessed through the Internet.

What we usually call the web is the world wide web. Generally speaking, this is a technology to access resources through a browser. Most of what we often say about surfing the Internet refers to surfing the world wide web, but we often confuse the world wide web with the Internet.

Internet is a network interconnection technology, which refers more to the interconnection at the physical level. Specifically, Internet includes a group of specific hardware devices and technologies that enable these devices to connect with each other, such as router, optical fiber, wireless base station, mobile phone, Ethernet and so on.

The world wide web should be regarded as a service running on the Internet (common Internet services include e-mail, WWW, FTP, remote login, e-commerce, etc.). Therefore, web is an information sharing service provided on the basis of Internet. The most intuitive feeling of this service is to visit web resources (hypertext, pictures, audio). Click a connection in the web page to jump to another web page or website.

The first technology that can support the web is the underlying network, because the web is built on the Internet. The basic protocol of the web is HTTP protocol, which runs above the protocol on TCP. TCP protocol needs the support of IP protocol, and IP protocol is supported by the underlying link layer. Therefore, we can see such a protocol stack HTTP - > TCP - > IP - > link layer protocol from top to bottom.

2. Knowledge analysis

Main features of HTTP

① Simple and fast: when a client requests a service from the server, it only needs to send the request method and path. The common request methods are get, head and post. Each method specifies a different type of contact between the client and the server. Due to the simple HTTP protocol, the program scale of HTTP server is small, so the communication speed is very fast.

② Flexible: http allows any type of data object to be transmitted. The type being transferred is marked by content type.

③ No connection: no connection means that only one request can be processed per connection. After the server processes the customer's request and receives the customer's response, it disconnects. In this way, the transmission time can be saved.

④ Stateless: http protocol is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. Missing status means that if the previous information is required for subsequent processing, it must be retransmitted, which may increase the amount of data transmitted per connection. On the other hand, when the server does not need previous information, its response is faster.

⑤ Support B / s and C / s modes. B / S is browser / server, browser / server, C / S is client / server, client / server

HTTP protocol: URL

The various resources on the network we talked about before are marked by something called URI. Of course, we are more common is URL. Uri (Uniform Resource Identifier) is equivalent to our ID card, which can uniquely identify a person. URL (the same resource locator), we can make an analogy:

Address agreement: / / earth / China / Zhejiang Province / Hangzhou City / Xihu District / a university / dormitory building 14 / dormitory 525 / Zhang San people

You can see that this string also identifies the only person and acts as a URI, so the URL is a subset of the URI. No matter whether the ID number is used for unique identification or the way of location is the only way to identify URI, URL is the way to achieve URI by location.

Describes the components of a normal URL

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

(1) Protocol Part: the protocol part of the URL is "http:", which means that the web page uses the HTTP protocol. "/ /" after "HTTP" is the separator.

(2) Domain name part: the domain name part of the URL is "www.aspxfans. Com". In a URL, you can also use an IP address as a domain name.

(3) Port part: after the domain name is the port. Use ":" as the separator between the domain name and the port. The port is not a required part of the URL. If the port part is omitted, the default port will be used

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

(4) Virtual directory: from the first "/" after the domain name to the last "/", it is the virtual directory. The virtual directory is also not a required part of the URL. The virtual directory in this example is "/ news /"

(5) File name: from the last "/" after the domain name to "?" So far, it is the file name part. If there is no "?", It starts from the last "/" after the domain name to "#", which is the file part; If there is no "?" And "#", then from the last "/" after the domain name to the end, it is the file name part. In this case, the file name is "index. ASP". The file name part is also not a required part of the URL. If this part is omitted, the default file name is used.

http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

(6) Anchor part (identifier of the location): it is an anchor part from "#" (representing a location in the web page) to the end. In this case, the anchor part is "name". The anchor part is not a required part of the URL.

(7) Parameter part: from "?" The part from the beginning to "#" is the parameter part, also known as the search part and query part. The parameter part in this example is "boardid = 5 & id = 24618 & page = 1". Parameters can have multiple parameters, and "&" is used as the separator between parameters.

HTTP protocol: request message request

The request message that the client sends an HTTP request to the server includes the following format:

It consists of request line, request header, blank line and request data.

Part I: request line, which describes the request type, the resource to be accessed and the HTTP version used

Part II: request header, Next to the request line (that is, the first line) is used to describe the additional information to be used by the server, such as host: host name, use agent: basic information of the browser, accept: response type recognized by the browser, accept language: default language of the browser, accept encoding: compression method recognized by the browser, referer: origin page, and connection: whether to maintain the connection.

Part III: blank line. The blank line behind the request header is required. Even if the request data in the fourth part is empty, there must be empty rows.

Part IV: request data is also called subject, and any other data can be added. Can be empty.

HTTP protocol: response message response

Generally, the server will return an HTTP response message after receiving and processing the request sent by the client.

The HTTP response also consists of four parts: status line, message header, blank line and response body.

Part I: status line, which is composed of HTTP protocol version number, status code and status message. (HTTP / 1.1) indicates that the HTTP version is version 1.1, the status code is 200, and the status message is (OK)

Part II: message header, which is used to describe some additional information to be used by the client.

Date: the date and time when the response was generated; Content type: indicates the content of the response, specifies HTML (text / HTML) of MIME type, and the encoding type is UTF-8

Part III: blank line. The blank line after the message header is required

Part IV: response body, the text information returned by the server to the client. The HTML part after the blank line is the response body.

HTTP protocol: request method

According to the HTTP standard, HTTP requests can use a variety of request methods.

HTTP1. 0 defines three request methods: get, post and head.

HTTP1. 1. Five request methods are added: options, put, delete, trace and connect methods.

Get requests the specified page information and returns the entity body.

Head is similar to get request, except that there is no specific content in the returned response to get the header.

Post submits data to the specified resource for processing requests (such as submitting forms or uploading files). The data is contained in the request body. Post requests may lead to the establishment of new resources and / or the modification of existing resources.

Put replaces the content of the specified document with the data transmitted from the client to the server.

Delete requests the server to delete the specified page.

3. Frequently asked questions

(1) What is the difference between get and post?

(2) What are the HTTP status codes?

(3) There are several content types for HTTP requests. What is the difference?

4. Solutions

1) When the get is submitted, the requested data will be attached to the URL (that is, the data will be placed in the HTTP protocol header). If the data is English letters / numbers, it will be sent as it is; if it is spaces, it will be converted to +; if it is Chinese / other characters, the string will be directly encrypted with Base64, such as:% E4% BD% A0% E5% a5% BD, where XX in% XX is the ASCII represented by hexadecimal.

Post submission: place the submitted data in the package body that is an HTTP package. The red font in the above example indicates the actual transmission data.

Therefore, the data submitted by get will be displayed in the address bar, while the address bar will not change when submitted by post.

2) Size of transmitted data: first of all, the HTTP protocol does not limit the size of transmitted data, and the HTTP protocol specification does not limit the length of URL. The limitations in actual development mainly include:

Get: specific browsers and servers have restrictions on URL length

Post: since the value is not transmitted through the URL, the data is not limited theoretically. However, in practice, each web server will limit the size of post submission data. Apache and IIS6 have their own configurations.

3) Security: the security of post is higher than that of get. For example, when submitting data through get, the user name and password will appear in plaintext on the URL, because (1) the login page may be cached by the browser; (2) If others check the browser's history, they can get your account and password. In addition, using get to submit data may also cause cross site request forge attack.

4) The get method requires request Querystring to get the value of the variable, and post through request Form to get the value of the variable.

However, most of these are from Google, and a large number of articles describe get and post in this way. But there are still different opinions on the Internet. Specific content: there is no difference between get and post except semantics. They are just a name. If the server supports it, you can change get to get2; There is only one fundamental difference, one for obtaining data and one for modifying data.

So, where do these statements that are widely spread on the Internet come from? Source: in the HTML standard, there is a similar description. However, this is only the Convention of the HTML standard on the usage of the HTTP protocol. It cannot be regarded as the difference between get and post. For details, please refer to Google: HTML standard and RFC document (request for comments, known as the Bible of network knowledge)

2) The status code consists of three digits. The first digit defines the category of response, which is divided into five categories:

1XX: instruction information -- indicates that the request has been received and continues processing

2XX: successful -- indicates that the request has been successfully received, understood and accepted

3xx: redirection -- further operations are required to complete the request

4xx: client error -- the request has syntax error or the request cannot be implemented

5xx: server side error -- the server failed to implement the legal request

200 OK. / / the client request succeeds

400 bad request / / there is a syntax error in the client request, which cannot be understood by the server

403 Forbidden / / the server receives the request but refuses to provide the service

404 not found / / the requested resource does not exist, eg: an incorrect URL is entered

500 internal server error / / an unexpected error occurs in the server

503 server unavailable. / / the server cannot process the client's request at present. It may return to normal after a period of time

3) I inquired about the number of content types on the Internet: nearly 200.

There are three main types:

Application / x-www-form-urlencoded: data is encoded as name / value pairs. This is the standard encoding format. This method is used more.

Multipart / form data: the data is encoded as a message, and each control on the page corresponds to a part of the message. This method is generally used for uploading files.

Text / plain: data is encoded in plain text (text / JSON / XML / HTML) without any controls or formatting characters. This method is generally used to send JSON data to the server.

5. Coding practice

6. Expand thinking

7. References

8. More discussion

(1) What is the difference between HTTP and HTTPS?

The data transmitted by HTTP protocol is unencrypted, that is, clear text. Therefore, it is very unsafe to use HTTP protocol to transmit private information. In order to ensure that these private data can be encrypted, Netscape designed SSL (secure sockets layer) protocol to encrypt the data transmitted by HTTP protocol, which gave birth to HTTPS.

(2) What is connection: keep alive mode?

In HTTP 1.0, the default is: Connection: close; Keep alive is enabled by default in HTTP 1.1.

The HTTP protocol adopts the "request response" mode. When it is in the non keepalive mode, each request / response client and server must create a new connection, Disconnect immediately after completion (HTTP protocol is a connectionless protocol); when using keep alive mode (also known as persistent connection and connection reuse), the keep alive function enables the connection from the client to the server to remain valid. In case of subsequent requests to the server, the keep alive function avoids establishing or re establishing the connection. Enabling the keep alive mode is more efficient and has higher performance.

(3) The disadvantages of HTTP and HTTPS?

a. The communication uses plaintext and is not encrypted. The content may be eavesdropped B, the identity of the communicating party may not be verified, may be masked C, the integrity of the message cannot be verified, and may be tampered with

HTTPS is HTTP plus encryption processing (generally SSL secure communication line) + authentication + integrity protection

That's all for today's sharing. You are welcome to like, forward, leave messages and make bricks~

Skill tree It Academy

"We believe that everyone can become an engineer. From now on, find a senior brother to introduce you, control your learning rhythm, and stop being confused on the way to learning.".

Here is the skill tree In it academy, thousands of senior brothers have found their own learning route here. Learning is transparent and growth is visible. Senior brothers have 1-to-1 free guidance.

Come and study with me:

Tencent Video: http://v.qq.com/x/page/u0719f04kms.html

Ppt link video link

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>