HTTP stands for Hyper Text Transfer Protocol. It provides set of rules and standards that govern how information is transmitted on the World Wide Web. Computers on the World Wide Web use the Hypertext Transfer Protocol to talk each other. It is network protocol of the web, which is a stateless and application level protocol for communicating between distributed systems and interacts with network based hypertext information systems. It is network protocol used to deliver the virtually all files and data such as HTML files, image files or anything else on the World Wide Web. It is foundation of data communication for the World Wide Web.
HTTP is called stateless protocol because each command is executed independently, regardless if they come across from the same address and server doesn’t remember previous requests. A stateless protocol refers to protocols, which do not save session state between connections. The communication takes place over TCP/IP and default port for TCP/IP is 80, but other ports can also be used.
Features of HTTP
Following are some of the feature of HTTP:
- HTTP is an application layer protocol which is useful for retrieving web pages, sending and receiving email or transferring files.
- Consider example, http://www.google.com, the first part of the address of a site on the internet specifies document written in Hypertext Markup Language(HTML).
- HTTP is a client server oriented protocol by which two machines communicate using reliable, transport service such as TCP.
- HTTP does server authentication, client authentication, data encryption etc.
- A browser is a HTTP client because, it provides request and response mechanism where client sends request to server and server generates a response.
- It supports for resource identification where each HTTP request includes URI(Uniform Resource Identifier).
- It is used to transmit resources, where resource is some chunk of information that can be identified by URL that is R in URL.
- Any type of data can be sent by using HTTP with client and server to handle data content and specifies the content type using MIME type.
- The standard and default port for http server is 80, though they can use any port.
- HTTP can be implemented on top of any other protocol on the internet or on the other networks. HTTP provides reliable transport service such as TCP.
Advantages of HTTP
- It is platform independent-allows cross platform porting.
- No runtime support required to run.
- It supports for global applications.
- It is not connection oriented, so we can create and maintain session state and information.
Disadvantages of HTTP
- Anyone can see content. So security problem may arise.
- Someone may alter the content. Since no encryption methods are used.
- Authentication is sent in clear. Anyone who intercepts the request can determine username and password.
HTTP ConversationThe following diagram shows the HTTP conversion:
Figure 1: HTTP Conversion
The HTTP protocol is request/response protocol based on client/server architecture. The client open the connection sends request to the server in the form of URI. The server gets the resource location or web address sent by the client, processes the request of the client and sends response back to the client and closes the connection.
It stands for Hypertext Transport Protocol over Secure Socket Layer or HTTP over SSL. Secure Socket Layer (SSL) acts like sub layer under HTTP application layering. HTTPS encrypts the message and decrypts a message upon arrival. HTTPS uses default port 433 as opposed to the standard HTTP port of 80. URL’s beginning with HTTPS indicates the connection between client and browser is encrypted using SSL. The SSL is need only, if you have online store or accept online orders and credit cards, when logging in your site, if need to comply with privacy and security requirements.
We can connect to the server via HTTP secure which consists of following:
- Generating key
- generating certificate signing request.
- Certificate Authority signed certificate.
- Configuring the web browser.
Below we will see the http parameters.
HTTP uses “.” numbering schemes to indicate versions of the protocol. The number is incremented when changes made to the protocol, which does not change the general message algorithm, which may add to the message semantics and additional capabilities of the sender. The number is incremented when the format of message is changed.
The syntax for HTTP Version field can written as follows:
HTTP-Version=”HTTP” “/” 1*DIGIT “.” 1*DIGIT
For example: HTTP/1.0 or HTTP/1.1
Uniform Resource Identifier (URI)
URIs known by many names such as WWW addresses, universal document identifiers and uniform resource locators. The http scheme is used to locate network resources via HTTP protocol. It can be written as follows:
http_URL=”http:” “//” host[“:”port][abs_path[“?” query]]
If port is empty, then port 80 is assumed and request URI for the resource is abs_path. If abs_path is not present in the URL, it must be given as “/” when used as request URI for the resource.
HTTP has three different formats for the representation of the date/time stamps.
- Mon, 10 Dec 1998 09:55:30 GMT ; RFC 822,updated by RFC 1123.
- Monday, 10-Dec-1998 09:55:30 GMT ; RFC 850, obsolete by RFC 1036 .
- Mon Dec 10 09:55:30 1998 ; ANSI C’s asctime() format.
HTTP message consists of header and optional body. The message header of HTTP request consists of request line and header fields. The message header of response consists of status line and header fields.
HTTP request message
In this, message is sent from client to server. It includes method to apply to the resource, the identifier of the resource and version of the protocol.
HttpRequest request= new BasicHttpRequest(“GET”, “/”, HttpVersion.HTTP_1_1);
HTTP response message
In this, message is sent by server back to the client after interpreting requested message. It includes protocol version followed by HTTP status code and textual phrase.
HttpResponse response= new BasicHttpResponse(HttpVersion.HTTP_1_1,Httpstatus.SC_OK, “OK”);
There are some general headers which are shared by both request and response messages:
- Cache-control: Specifies information about caching.
- Connection: Shows connection should be closed or not.
- Date: Shows the current date.
- MIME-version: Shows MIME version used such as text/plain etc.
- Upgrade: Specifies preferred communication protocol.
The request and response messages also include Entity Headers as follows:
- Allow: It allows list of valid methods that can be used with a URL.
- Content-Encoding: It specifies encoding scheme.
- Content-Length: It shows the length of the document.
- Content-Language: It specifies the language.
- Content-Location: It specifies location of the created or moved document.
- Content-Range: It specifies range of the document.
- Content-Type: It specifies the medium type.
- Expires: it gives data and time when contents may change. i.e. it gives expiry date and time.
- Last-Modified: It gives date and time of the last change.
The request header has following set:
- Accept: Shows the medium format the client can accept.
- Accept-Charset: It shows character set that client can handle.
- Accept-Encoding: It shows encoding scheme the client can handle.
- Accept-Language: It shows the language that client can accept.
- Authorization: It shows permissions of client.
- From: It shows email address of the user.
- Host: It shows host and port number of the server.
- If-Match: It sends the document only if it matches the given tag.
- If-Modified-Since: It sends the document if changed since specified date.
- If-Unmodified-Since: It sends the document if not changed since specified date.
- If-Non-Match: It sends the document only if it doesn’t matches the given tag.
- If-Range: It sends only portion of the document that is missing.
- Referrer: It specifies the URL of the linked document.
- User-Agent: It identifies the client program.
The response header has following set:
- Accept-Range: It shows if server accepts the range requested by client.
- Age: It shows the age of the document.
- Location: It specifies location of the document.
- Proxy –Authenticate: It shows authorization credentials for connecting to a proxy.
- Retry-After: It specifies the date after which the server is available.
- Server: It shows server name and version number.
- WWW-Authenticate: It indicates authentication scheme that should be used to access the requested entity.
Following are some examples of above various fields :
Host: www.google.com Date: Sun, 15 May 2008 10:30:45 GMT Server: Apache Last-Modified: Tue, 10 May 2008 Content-Length: 30 Content-Type: text/plain Expires: Fri, 01 Jul 2008 15:00:00 GMT Retry-After: Thu, 31 May 2008 20:00:00 GMT Referrer: http://www.w3c.org/http/http_messages.htm Content-Encoding: gzip
HTTP Request Methods
- GET: It is used to retrieve information from specified resource.
- POST: It is used to submit data to the server.
- HEAD: It is same as GET, but returns only HTTP headers and no document body.
- PUT: It uploads representation of specified URI.
- DELETE: It deletes the specified target resource given by URI.
- CONNECT: It establishes TCP/IP tunnel to the server by given URI.
- OPTIONS: It represents HTTP methods that server support.
- TRACE: It invokes remote application layer feedback of the request message.
HTTP Status Codes
HTTP status codes are response codes given by server on the internet. It is common term for the HTTP status line, which includes both the HTTP status code and the HTTP reason phrase.
Following are the list of codes:
- 100 Continue: It informs that server has received the request headers and has not yet been rejected by the server.
- 101 Switching Protocols: It means request has asked to the server to switch protocols.
- 102 Processing: Server has processing the request.
- 200 OK: The request has succeeded.
- 201 Created: The request has been fulfilled and new resource is created.
- 202 Accepted: The request has been accepted ,but the processing has not been completed.
- 203 Non-Authoritative Information: The request has been processed, but information may be from another source.
- 204 No Content: The request has been processed, but not returning any content.
- 205 Reset Content: The request has been processed, but not returning any content and user agent should reset the document view.
- 206 Partial Content: The server has fulfilled partial resource returned due to request header.
- 300 Multiple Choices: It provides list of options for the resource that client can select and go to that location.
- 301 Moved Permanently: The requested resource has been moved to new URI.
- 302 Found: The requested resource has been temporarily moved to new URI.
- 303 See Other: The requested resource found via alterative URI.
- 304 Not Modified: The request has not been modified since last requested.
- 305 Use Proxy: The requested resource accessed through the proxy given by the location filed.
- 306 Unused: This code was used in previous version and is no longer used and the code is reserved.
- 307 Temporary Redirect: The requested resource moved temporarily to new URI.
4xx: Client Error
- 400 Bad Request: The request can be fulfilled due to bad request.
- 401 Unauthorized: It is used when authentication is required and has failed or has not yet been provided.
- 402 Payment Required: Reserved for future use.
- 403 Forbidden: The request was valid, but server is refusing to respond to it.
- 404 Not Found: The requested resource could not found, but available in the future.
- 405 Method Not Allowed: The method specified in the request is not allowed.
- 406 Not Acceptable: Content not acceptable according to the accept headers.
- 407 Proxy Authentication Required: The client must authenticate before request can be served.
- 408 Request Timeout: The server timed out waiting for the request.
- 409 Conflict: The request could not be completed due to conflict in the resource.
- 410 Gone: The requested resource is no longer available at the server.
- 411 Length Required: Request did not specify the length of its content.
- 412 Precondition Failed: The precondition given in one or more request header fields evaluated to false when it was tested on the server.
- 413 Request Entity Too Large: The request is larger than server, so server will not accept the request.
- 414 Request-url Too Long: The url is too long, so server will not accept the request.
- 415 Unsupported Media Type:The server is refusing the request because the request format is supported by the server.
- 416 Requested Range Not Satisfiable: The client has asked for portion of the file , but the requested byte range is not available.
- 417 Expectation Failed: The server cannot meet requirements of the expect request header field.
5xx: Server Error
- 500 Internal Server Error: It generic error message given when unexpected condition occurs.
- 501 Not implemented: the server does not support the functionality to fulfill the request.
- 502 Bad Gateway: The server cannot process the request due to high load.
- 503 Service Unavailable: The service temporarily unavailable ,but may be requested in the future.
- 504 Gateway Timeout: Gateway didn’t receive response from server.
- 505 HTTP Version Not Supported: The server doesn’t support the HTTP protocol version.
It informs application developers, information providers, and users of the security limitations in HTTP/1.1 as described follows:
- Personal Information: HTTP clients often allow large amount of personal information’s such as user name, location, mail address, passwords etc and should be careful to prevent unintentional leakage of this information via HTTP protocol to other users.
- Abuse of Server Log Information: A server in the position to save user’s personal data which might identify their reading patterns or subjects of interest. This information is clearly confidential in nature.
- Transfer of Sensitive Information: HTTP cannot regulate the content of the data is transferred, nor is there any prior method of determining the sensitivity of any particular piece of information within the context of any given request. Therefore applications should keep control over the information to provide that information to user.
- Attacks based on file and path names: Implementations of HTTP servers should be careful to restrict documents returned by HTTP requests that were intended by the server administrators.
- DNS Spoofing: Clients using HTTP rely heavily on the Domain Name Service, and are thus generally prone to security attacks based on the deliberate mis-association of IP addresses and DNS names. HTTP clients should rely on their name resolver for confirmation of IP number/DNS name association rather than caching result of previous host name.
- Location Headers and spoofing: If single server do not trust one another, it must check value of location headers in the responses that are generated under organizations over which they have no authority.
- Authentication Credentials: HTTP doesn’t provide method for server to direct clients to ignore cached credentials .
- Proxies and Caching: Proxies contain security related information, personal information about users and organizations. Log information should be gathered at proxies which contains sensitive information about organization. Caching provides additional vulnerability, because cache persists after HTTP request is complete. User believes that information is removed from the network. So cache contents protected as sensitive information.