The increasing usage of databases causes a great deal of worry for individuals. For example what is someone typed in your zip/code and house number instead of someone else who was on a credit black list. What could you do once the data was on the database, as this data could be sent around the world in seconds. Fortunately there are laws on what can be done with databases, such as the from the European Data Protection and Telecoms Protection Directives, which states:
And individuals have the following rights:
Also special conditions are made for personal data which on racial or ethnic origin, political options, religious or philosophical beliefs, trade union membership, and information concerning the health or sex life of those still alive. For these areas individual must give explicit permission for the data to be used for other purposes. Thus a company who wants to view the medical records of an individual would have to get the permission of the individual before they could do this.
Of course at one time this was easy to control, as request for data were made via the postal service. These days data can be transferred from one place to another within a fraction of a second, and transferring around the world in seconds. Thus it is important that electronic data is kept in a secure form, which cannot be tampered with, especially personal data. As much as possible databases with personal data should be protected by passwords, firewalls, and all the other secure methods that were discussed in a previous chapter. When the data is transmitted over the Internet, it must be encrypted, as TCP/IP allows any listener to view the data. Any breaches in security can leave the organization responsible, and they could receive a heavy fine.
The seven rules of providing safe data can be summarized as:
Notice. This should be given about the reasons for the data collect.
[Data Protection Act]
Do you have a comment on this essay? Send it here....
Databases store information in an easy-to-access format, and they have an increasing role in virtually every area of computing. Many WWW pages are now generated from a database, where content is added to the database, and the WWW page reads it to generate the WWW page. This allows for more dynamic content within WWW pages. For example a database could contain a list of recommended WWW links, which were updated every hour. There may be many pages which use these links. Thus a good approach is to design so that the links are generated from the database, which is updated hourly.
We have a great deal of information on us already in databases. For example now it is possible to receive approval for a loan request in a matter of seconds. This is because there are databases around the Internet which contain much of your financial details, such as the number of times your have been late with your payments, your monthly salary, your current loan commitments, and so on. The loan application program thus goes and gathers the information on you and quickly generates a score which relates to your ability to pay back the loan. This type of approach is known as data mining where programs gather data on the user from several different sources. In the future, with data mining, it should be possible to determine many other things about a user, apart from their financial details. For example if a user purchases movie tickets on-line that a data-mining agent might determine the type of movie that the person preferred, and use this information for advertising new movie releases. This may lead to an increasing amount of personalization on WWW pages. For example if a person accessed amazon.com then, if the WWW browser knew the user, they would be greeted with a page which displayed the books that were recommended for that user, based on their interests. At one time the user would have to enter these preferences, but in the future they may be generated automatically. A typical technique, these days, is to store user details on a database, such as their credit card number, bank details, and so on, so that when the user purchases something on-line, these details are automatically entered. This makes shopping on-line easier.
An increasing concept is personalization, where the WWW pages are designed especially for the user. The information to generate these is contained in databases. This personalization can either be:
Implicit personalization. With this the pages are designed using the user's personal preferences. A good example of this is My Yahoo and MSN which allow the user to design the WWW page for their preferences, such as the news articles that they would like on the page, the stocks that they would like to view, and so on.
This personalization has many advantages. A good example is Amazon's One Click service which allows the user's credit card details to be held in a secure manner, and used every time that the user requires to purchase a book. Users could also benefit from being offered products that another user with a similar profile has bought.
For organizations, there is also an increased amount of targeted marketing, where complimentary services can be offered, such as a company which gives a holiday book, might also provide travel insurance to the user. Also marketing can be targeted at specific users, rather than, at present, with blanket marketing. Most users now ignore the advertising baners which appear at the top of many WWW pages. This database approach also leads to savings in WWW development, as template pages are produced, and the content for these are generated from the database. With a non-database driven system, the pages must be coded for every different type of page. Changes in products can also be quickly updated, as it only requires a single change to the database, rather than over many pages.
Examples of personalization include:
Customized user interface. This could be with fonts, colors, layout, and structure.
A good example of personalization is on the Dell.com WWW site (see Figure 1). With this the user enters the system service tag (or express service code) of their computer and the WWW site generates pages which relate specifically to the product. This allows Dell to pro-vide details of software downloads, hardware updates, and so on, specifically for the computer. This overcomes one of the most annoying features when trying to find the cor-rect documentation and software downloads for a specific product. It also allows Dell to quickly target specific products for bug fixes, and product updates.
Figure 1: Dell site
The Dell database also keeps track of the complete history of a product, as the service tag is a unique code. This is shown as a text code, and also as a bar code, from which the servic-ing department can easily update and retrieve data on the system. The database knows the specification of the computer, and when it was shipped from Dell. It will also track the product and stores details of its operating system, its specification, and so on. These provide important information for the Dell Support team, and make it easier for them to make the correct decisions in providing help. Database systems are even used to track a product that is being fixed, as the user can contact Dell, and they are able to track the actually location of the system, and its current status.
An example of a Dell cookie is (stored as firstname.lastname@example.org):
Cookies or remote storage of details
The personalization can be achieved in number of ways, from simple cookies, which are stored on the users computer. Cookies are simple text files (typically stored as TXT files in the WINDOWS\COOKIES folder on Microsoft Windows), and will contain relevant details on the user, and the any of their preferences. As these are text files, they cannot do any damage to the local computer. Sometimes users delete these cookies, and all the previous information is lost, and must thus re-register for the system to be able to store their details. Other system store details of the user on the server, and this allows the user to move around the Internet and still get access to their logged data. A good example of this is MSN Messenger which allows the user to login with a passport, and their contacts, and other data is downloaded from the server. This allows the user to read their e-mail or contact the buddies from any location on the Internet. As the users details and preferences (such as where they live, and their favorite links) are stored on a server, they can't be deleted when they erase the cookies on the own computer. Unfortunately the logon process can be quite time consum-ing, as the program must contact the server, which is likely to be processing many other logins. Problems can also occur when the server goes down, as users will not be able to login. These types of servers are targets for DOS (denial-of-service) attacks, and can be made to slow down their processing, if they have to respond to too many logins, at a time.
With cookies the data stored can involve:
User profile. This would typically store details of the user, such as their common name, date of birth, and so on. This could also store the user's login name and password (obviously this would not be shown in the cookie in a text form, and will be encoded in some way, so that the user's login and password details cannot be viewed from their cookies.
As cookies are just simple text files they cannot pass information on a user's computer, other than the information contained within the cookie. Also the cookie generated by a WWW site cannot be used by another other WWW, as they can only be used by the WWW site that created it. Thus cookies cannot be used to track users around the Internet. The greatest drawback with cookies is that users typically do not get the opportunity as to whether they want the cookie stored to their local disk, or not.
An example of the usage of centralized market information is DoubleClick, who specialize in generating banner advertisements which are aimed at specific users. With this companies who subscribe to DoubleClick, have a cookie request from DoubleClick on their page. If the user has an existing one it is read for the users details (otherwise a new one is generated). As DoubleClick has many organizations subscribing to it, they can search for the types of sites that the user has most frequently accessed. The user will then receive a targeted banner advertisement which is most relevant to them. Over time the advertising will become more focused as DoubleClick learn more about the user. Note all the cookies will be sent and received by DoubleClick, and not by the organization that subscribed to them.
Data warehousing is a method which stored vast quantities of raw data, such as one generated from logs files. This data can then be prepared and reformatting for data mining process, which will try and create meaningful information from the raw data. This is similar to traditional paper-based storage, but it is obviously easier to stored large amounts of electronic data in a small physical space. For example exam papers could be marked to get the final exam mark. The raw data for this would be the actual exam papers. This data could be analyzed for the average number of words per question, or the average mark for each question, or the number of pages used, and so on. With the raw data, it is possible to analyze the data in many different ways, and find new insights on it. Without the raw data it is often difficult to run different analyses. Another example relates to car sales. With the raw data on car sales, it would be possible to determine the percentage of people within a certain street that bought red cars, or the number of people in a city that bought a blue, 2000cc car, or the number of people with a surname that begun with a letter 'C', that bought a Ford van. All this data in warehouses will make marketing more refined in the future.
There is obviously a very fine line between personalization and personal intrusion, and the collection of data must comply with current laws. Unfortunately many data collection programs confuse the user by displaying great deals of text, for which the user is asked to read, and then agree to. Most users now, typically, just click the accept button without even bothering to read the agreement statement. For example, when was the last time that some-one actually read the license agreement for a software program that they had just bought?
Anonymous profile data. This is generated whenever a user contacts a site, and might contain the network address (IP address), domain name, ISP provider, WWW browser version, and so on.
A registration form is an excellent method for an organization to get user data, and is an opportunity for the organization to ask questions about the user, in which they could use for marketing purposes. For example, how many times have you been asked if you where male or female when you registered for a WWW site? It should not matter to the registration if you were male or female. So why do they ask? For marketing and data mining purposes. This form of data mining is explicit, where the user actually knows that there data is being stored. Many users do not like this form of data mining, as they feel that it is obtrusive. In newer form of data mining is implicit, where the user does not know that they are being monitored for their usage patterns. These include cookies, but WWW sites can also monitor how the user moves through a site, and the pages that they are most likely to spend time with. For example if the user on a bank site spends more time looking at the corporate page, then the may possibly be interested either in buying stocks in the company, or they are looking for a job with them.
Future technologies may include spyware. With these WWW pages code contains graphics files which are invisible to the user (as they may only contain a few pixels), but are resident on a data mining server. When the page is loaded the data mining server is con-tacted, and the details of the access can be logged. This will give details of where the user is located, their network address, their browser details, and so on. This could also be applied to e-mails, which contain graphics which are contained on data mining services. The server can then log the accesses to the graphics, and thus log when the e-mail was read, and from which location.
Note: This essay was inspired by the excellent MSc thesis by Deborah Crompton [Technology and Implications of Dynamically Personalised Web-Applications, School of Computing, Napier University, March 2001], of which I was lucky enough to be part of the examination team for the MSc Viva. It thus contains some material which was used in the thesis. In my opinion it was one of the best MSc thesis's that I have ever read, and was interesting and extremely focused on achieving a goal.