CIS 209 Web Page Scripting Languages

Chapter 1: The Internet and the World Wide Web

This chapter will describe how domain names, IP addresses, domain name servers, Web servers, and various Internet protocols are joined together to for the World Wide Web. We'll also learn how to locate and select a Web hosting service provider, and how to create basic Web pages using HTML.

The History of the Internet

A web browser is a program that is used to surf the Internet. The most common web browsers are Microsoft's Internet Explorer and Firefox. The former market leader was Netscape Navigator. Opera is another popular browser. Since the introduction of Windows 95, all computers that run the Windows operating system have the Internet Explorer browser application built in. Professional Web developers often develope in Firefox, then test in Internet Explorer.

The browser is an example of a client application which runs on your local computer, and allows your computer to communicate with other computers on the Internet. Servers provide programs to clients. Email clients such as Outlook, allow you to access your email on the email server. Browsers allow you to access web pages stored on a web server.

The original Internet, once called APRANET, consisted of a group of computers that connected government agencies and a few participating universities, and was primarily used for science, research and education.

Hypertext documents on the Internet are called web pages. The location address of these documents is known as a Uniform Resource Locator (URL). Web pages can contain text, graphics, multimedia and links to other web pages. The WWW Consortium defines and maintains HTML and HTTP standards.

How the Internet Works

Computers connected to the Internet use a suite of protocols known as TCP/IP to communicate. TCP/IP includes a variety of other protocols, including TCP, IP, telnet, FTP, HTTP, Simple Mail Transfer Protocol (SMTP), and Network News Transfer Protocol (NNTP). Other network protocols such as NetBEUI and IPX/SPX establish rules to communicate within private networks, called intranets. These network protocols  do not allow computer to communicate over the Internet - ALL computers that connect to the Internet MUST use the  TCP/IP protocol.

Transmission Control Protocol (TCP)

TCP controls the physical transmission of the Internet data. It splits the transmission into small packets, which are sent over the Internet to a destination computer where the the packets are reassembled into the original transmission. On the way to the destination computer, each data packet makes several stops, called hops, at one or more computers along the way. The computer at each stop is called a gateway computer, because it plots the next leg of the journey and sends the packet on its way. This process is repeated until the packet reaches its destination.

Internet Protocol (IP)

The IP uses an addressing scheme to manage the addressing of each packet. Gateway computers use the IP address to determine how to plot where the packet should go next. Each packet may take a different route through the Internet, but they all end up at the same place. All Internet travels via this IP routing model.

IP Address

Every computer connected to the Internet has an IP address associated with it. This IP address is a unique 32 bit numeric number - written in decimal dot notation as four numbers (octets), separated by periods. Each number is between zero and 255. For example, the IP address for my computer is 66.182.132.124  (See Understanding IP Addressing)

You can type the following URL into the address bar of a browser to view my web site: http://66.182.132.124

An Internet Service Provider (ISP) assigns either a static or a dynamic IP address to each customer that connects to the Internet. Most ISPs assign a dynamic IP address, one that changes each time a client connects to the Internet. A static IP address is assigned to a specific computer and does not change. A Web server must have a static IP address so that the routing servers on the Internet will always know the IP address for the web site. The IP address assigned follows the IP addressing scheme that is defined in the IP protocol version 4.0. Since there is a limited number of IP addresses, the IP protocol is being upgraded to a newer version (version 6.0) which can support many more IP addresses. See IPv6.

The Internet Corporation for Assigned Names and Numbers (ICANN) is the non-profit corporation that was formed to assume responsibility for the IP address space allocation, protocol parameter assignment, domain name system management, and root server system management functions previously performed under U.S. Government contract by IANA and other entities.

A few IP addresses are reserved for special purposes. For example, the IP address 127.0.0.1 is not assigned to any computer on the Internet. It is known as the localhost, and is used to route packets to the local computer. This IP address is often used to test whether TCP/IP is installed and working correctly.

Internet programmers use the localhost IP address to represent the local web server during development. Typing http://127.0.0.1 or http://localhost into the browser's address bar will direct the user to the Web server located on the local computer. This allows programmers to view their web site locally and get the server response just as a web visitor would receive - including scripts run on the server.

HTTP Protocol

The HTTP protocol is used to send HTML documents through the Internet. The HTTP protocol sends documents in packets, using TCP/IP. With each packet, the HTTP protocol attaches a header, which contains information such as the name and the location of the page being requested, the name and IP address of the remote server that contains the Web page, the IP address of the local client, the HTTP version number, and the URL of the referring page. This information is known as the SERVER VARIABLES. Later, we will learn how to retrieve these values from the header.

HTTP version 1.0 is a stateless protocol. This means that when a client requests a document from a web server, the server returns the page to the client, then, ends all communication with the client. When the client requests a second page, the server has no way knowing that the client just requested a previous page.

By using cookies, session variables, text files, and databases the server can maintain state - recognize the client over multiple transactions, and thereby remember information from previous transactions.

In version HTTP 1.1, the Web server and the client can maintain their connection across Web pages. The Microsoft web server known as Internet Information Server (IIS) can be configured to support this "keep alive" HTTP 1.1 feature.

TCP/IP Utilities

When you (the client) request a service (for example, a web page be sent) from another computer (a server) on the Internet, you are utilizing the client/server model via TCP/IP. There are many TCP/IP utility programs that are used to monitor the network to perform basic diagnostic testing on your web site. For example, if a customer cannot view your web site, you can use the ping utility to determine whether the server is connected to the Internet. Two of the more important TCP/IP utilities include ping and tracert.

Ping

The ping utility is used to test network connectivity. When you ping a site, four packets are sent to the IP address. If the Web site is up, and connected to the Internet, ping will provide its IP address. If not, a Request Timed Out message will appear.

To use the ping utility, go to a command prompt by clicking START, RUN, then type COMMAND. Then, type the word PING, followed by a space, followed by an IP or domain name, then press the ENTER key. To close the command window, type the word EXIT and press the ENTER key.

In the example below, I pinged my web server, from the same computer.

Tracert

The tracert utility is a packet-tracing utility used to locate the Web server. Tracert provides the path that an Internet packet travels to get from the client to the server. Tracert lists each stop, or hop, that indicates that the packet has been routed to another server (or a router). The last hop on the list indicates the IP address for the Web site. If you see an asterisk (*) on the list, it indicates that the packet has experienced a delay at the server.

To use the tracert utility, from the command prompt, type tracert, followed by a space, followed by the IP or domain name, then press the ENTER key.

Here is a tracert to www.jeffstateonline.com

Notice the short route when I run a tracert to www.halharris.com from the same computer

You can download a freeware program called PingPlotter that will show the same information in a graphical environment.

TCP/IP Client Networking Software

Your computer will have TCP/IP networking client software installed, or you would not be able to surf the Internet. If you're not connected to the Internet on a computer at home, and you don't have TCP/IP installed, and would like to use that computer to do exercises in our book for this class, see our textbook pages 8-11 for instructions on how to set it up.

IP Configuration Utilities

The Windows operating system provides utilities to detect TCP/IP settings such as the IP address of the client. Winipcfg is the utility for Windows 95/98/ME. Ipconfig is the utility for Windows NT/2000/XP.

Domain Names

A group of related Web pages, files, and directories - called a Web site - is stored on the Internet on a Web server. A Web server is a computer that contains software that allows you to host Web pages on the Internet.

A Web site can be identified by its IP address or by its domain name. For example, typing http://66.182.132.124 or http://www.halharris.com will take you to the same Web site. Since it is difficult to remember IP addresses, people register domain names in a global database known as the registry. Each domain name is unique on the Internet.

In the early 1990s, all domain names were required to register with InterNIC. InterNIC processed all registrations and maintained the registry database. In 1992, InterNIC transferred the domain registration and registry maintenance responsibility to a company called Network Solutions. While Network Solutions currently maintains the registry database, there are many companies, called registrars, which can register domain names. InterNIC provides a list of these registrars at http://www.internic.net/regist.html. The registrant is the legal owner of the domain.

Here is some background information on the dispute in the late 1990s concerning who could register domain names.

Signing Up for a Domain Name

To sign up for a domain name, point your browser to the Web site of one of the registrars on the list. Actually, you will find that there are a number of secondary registrars (not listed) that have agreements with the primary registrars which allow them to register domain names. For example, I registered halharris.com with 123domains.com which is not listed. This company offers very reasonable prices - $12.99 a year. If you're planning to host your site by renting web space from someone else, you might want to check out catalog.com - which offers free hosting if you pay their $35 a year domain name registration fee. We'll take a look at selecting a Hosting Service Providers later in this chapter.

Domain Name Service Servers (DNS Servers)

A Non-Technical Explanation of the Domain Name System from InterNIC.

The Domain Name Service (DNS) is a network service that translates domain names to IP addresses. Each Web server is assigned an IP address and domain name. When you register a domain, you typically do not provide the IP address for your Web site. Instead, you provide the name and IP address of a DNS server. The DNS server contains a listing of domain names and their associated IP addresses. The ISP that hosts your Web site will enter your domain name and IP address in their DNS server. In my case, since I host the site from a DSL connection, I gave the registrar my IP address and they entered it into their DNS server.

The Whois Utility

To locate the DNS servers, InterNIC handle, or registrant information for a Web site, you can use the whois utility. The whois utility looks up the information associated with the individuals, domain names, and DNS servers in the registry. You can access the whois utility at http://www.networksolutions.com/cgi-bin/whois/whois. Or you can use http://www.accesswhois.com or http://www.betterwhois.com or http://www.dnsstuff.com.

The World Wide Web Service

A network server is a network software application that provides network related services such as e-mail, database, messaging, file, and print services. The network software that stores Web pages and processes the requests for Web pages is called the Web server or the World Wide Web (WWW) Service. The same server can host multiple Web sites - known as virtual Web servers. The web server that hosts your site can be on your local network or anywhere in the world. Companies that provide hosting services are called hosting service providers.

Internet programming has been referred to as client/server programming because the client application is communicating with the server application. A browser sends a request for an Internet document that exists on a remote Web server. The browser receives and processes the request and renders (or displays) the requested page. Today, this basic model has become more complex because the of the number and types of applications on the Internet.

Internet applications are no longer limited to a single server. E-commerce Internet applications often communicate with database servers that are not located on the same physical network server as the Web server. So, in addition to client/server programming, servers communicate with other servers.

Web Servers

Although there are many different types of Web servers, the most common are Apache and Internet Information Server (IIS). On the UNIX operating system, Apache is the most common Web server. On the Windows platform, IIS is the most common. Overall, Apache Web servers outnumber IIS Web servers at about the rate of two to one, according to Netcraft. Developers often use Microsoft Personal Web Server (PWS) to create a Web environment for testing. PWS can be found on the Windows 98 CD or it can be downloaded as part of the NT Option Pack.

You can find out what type of Web server is running by visiting http://uptime.netcraft.com/up/graph.

Hosting Service Providers

A hosting service provider (HSP) is an ISP that specializes in Web site hosting. HSPs are also called Web presence providers. Some specialize in working with Internet programmers, while others provide only general Web hosting services. Some offer co-location whereby they will maintain the equipment connecting the server to the Internet, but not the web server software. The person who maintains the Web server is called the Webmaster.

Hosting Criteria

In order to pick the best HSP for your web site, you need to create a list of features that you feel are important. Here are some questions that will help you to evaluate HSPs.

  • Customer service 24/7, 365 days a year?
  • Toll Free number?
  • Documentation provided?
  • Reliable Internet Connection?
  • Redundancy?
  • Log Files?
  • Unix based or Windows?
  • Multiple User Accounts?
  • Can you change permissions on files and directories?
  • Does it provide support for Secure Socket Layer (SSL) certificates for credit card orders?
  • Does it allow server-side includes (SSIs)?
  • Can you customize error messages?
  • FTP?
  • Telnet?
  • FrontPage support? and Server Extensions?
  • Dreamweaver support?
  • How many POP email accounts?
  • Web based email?
  • Unlimited email aliases?
  • Email forwarding?
  • Auto-responders?
  • Which database applications does it support? (SQL, Oracle, Access, mySQL, DSNs)?
  • Third party e-commerce application support such as Miva Merchant or AbleCommerce?
  • Payment mechanisms? Such as CyberCash or Authorize
  • Reseller or affiliate programs?
  • How much hard drive space?
  • How much is setup fee?
  • How much is monthly fee?
  • Support for CGI such as Perl, cgi-bin directory, free CGI scripts, page counters?
  • Support for Active Server Pages (ASP)?
  • Streaming audio and video?

Locating a Hosting Service Provider

After developing your evaluation criteria, you will want to compare several different hosting service providers - our book suggests a minimum of ten different ones. You may want to check the following links to locate HSPs.

Internet Programming User Groups

You may want to check the Internet for user groups that share information about Internet Programming.

Creating a Web Page

Web pages are created using the HTML language that was established by the HTML protocol. The specifications for the most recent version of HTML are available at http://www.w3.org. If you are new to HTML, I would suggest that you read through the study materials for CIS 198 - Web Page Development HTML at http://www.halharris.com/207/198lecnotes.htm.

Web pages can be created in

  1. a simple text editor like Notepad
  2. FrontPage
  3. DreamWeaver

Web Development Tools

Although we can write web pages with Notepad, and you will want to learn HTML so that you understand the language enough to make changes, you may want to use a tool to help you write the HTML code.

HTML Tags

See our textbook pages 20 to 29 for some basic HTML tags if you are new to HTML.

We can use the following META tag to redirect users to another page.

<META http-equiv="refresh" content="5;URL=newURLorPage.htm">

In this example, the browser will wait five seconds before jumping to new page.

Publishing Your Web Pages

Once you have created Web pages, you must publish them to a Web server. Some Web development tools, including FrontPage, InterDev, and Dreamweaver, allow you to publish your Web pages directly to a Web server. However, if you're using a text editor, you will need a program to copy your files to the Web server. You can use FTP software, such as WS_FTP at http://www.wsftple.com/download.htm, to transfer your files to the Web server.