The increasing usage of databases causes a great deal of worry for individuals. For example what is someone typed in your zip/code and house number instead of someone else who was on a credit black list. What could you do once the data was on the database, as this data could be sent around the world in seconds. Fortunately there are laws on what can be done with databases, such as the from the European Data Protection and Telecoms Protection Directives, which states:
That the data is processed fairly and lawfully.
That the data is collected for specific, explicit and legitimate purposes.
That the data is adequate, relevant and not excessive in relation to the processes for which they are collected and/or processed.
That the data is accurate and, where necessary, kept up-to-date.
That the data is kept in a form which permits identification of data subjects for no longer than is necessary.
That appropriate technical and organizational measures are taken by the data control-ler to protect personal data against accidental or unlawful destruction or loss, etc.
That where the data controller chooses a data processor to process personal data for him, he must take appropriate measures to ensure that the data processor complies with the obligations of data controller.
That the transfer of personal data to a country outside the EEA (the 15 member states of the EU plus Iceland, Norway and Liechtenstein) may take place only if the country in question ensures an adequate level of protection or there are appropriate contractual arrangements in place or the data subjects have given consent.
And individuals have the following rights:
The right to information concerning the personal data held about them by a data controller.
The alteration of personal data held by controllers where such personal data is incorrect.
Also special conditions are made for personal data which on racial or ethnic origin, political options, religious or philosophical beliefs, trade union membership, and information concerning the health or sex life of those still alive. For these areas individual must give explicit permission for the data to be used for other purposes. Thus a company who wants to view the medical records of an individual would have to get the permission of the individual before they could do this.
Of course at one time this was easy to control, as request for data were made via the postal service. These days data can be transferred from one place to another within a fraction of a second, and transferring around the world in seconds. Thus it is important that electronic data is kept in a secure form, which cannot be tampered with, especially personal data. As much as possible databases with personal data should be protected by passwords, firewalls, and all the other secure methods that were discussed in a previous chapter. When the data is transmitted over the Internet, it must be encrypted, as TCP/IP allows any listener to view the data. Any breaches in security can leave the organization responsible, and they could receive a heavy fine.
The seven rules of providing safe data can be summarized as:
Notice. This should be given about the reasons for the data collect. Choice. This gives individuals the choice as to whether they want their data collected, or not. Onward transfer. This gives individuals the choice as to whether they want the data to be forwarded to third parties, or not. Access. This gives individuals the right to access any data which the organization has on them. Security. The data must be kept securely.
Data integrity. The data must be correct, and up-to-date. Enforcement. There must be enforcement procedures for complaints.
if (isset( $billscookie) )
print "<P>A cookie exists on your computer";
[Data Protection Act] Are you worried about the details that organisations have on you? You should be. Check this link to find out what organisations are allowed to do.
[Dell.com] I don't have a Dell anymore, but I miss the support that the Dell site gives users. When I had a Dell I could simply access my own customized page, and download drivers, and get help for my specific notebook. Now I have to sift through the #@^% [name with-held] site and find the odd updated driver, or help manual.
[Msn.com] One of the great sites in the world, and just keeps getting better. It's beautifully functional, has dynamic content and is informational. Unfortunately my MSN agent has been mining me for information, and knows that I live in the UK, so it kindly re-directs me to the msn.co.uk site, which I do not want. The funny thing is that Internet Explorer redirects me to the msn.co.uk site, but NetScape Navigator does not.
This is also the place to get MSN Messenger, which is one the most useful pieces of software I have ever found.
Databases store information in an easy-to-access format, and they have an increasing role in virtually every area of computing. Many WWW pages are now generated from a database, where content is added to the database, and the WWW page reads it to generate the WWW page. This allows for more dynamic content within WWW pages. For example a database could contain a list of recommended WWW links, which were updated every hour. There may be many pages which use these links. Thus a good approach is to design so that the links are generated from the database, which is updated hourly.
We have a great deal of information on us already in databases. For example now it is possible to receive approval for a loan request in a matter of seconds. This is because there are databases around the Internet which contain much of your financial details, such as the number of times your have been late with your payments, your monthly salary, your current loan commitments, and so on. The loan application program thus goes and gathers the information on you and quickly generates a score which relates to your ability to pay back the loan. This type of approach is known as data mining where programs gather data on the user from several different sources. In the future, with data mining, it should be possible to determine many other things about a user, apart from their financial details. For example if a user purchases movie tickets on-line that a data-mining agent might determine the type of movie that the person preferred, and use this information for advertising new movie releases. This may lead to an increasing amount of personalization on WWW pages. For example if a person accessed amazon.com then, if the WWW browser knew the user, they would be greeted with a page which displayed the books that were recommended for that user, based on their interests. At one time the user would have to enter these preferences, but in the future they may be generated automatically. A typical technique, these days, is to store user details on a database, such as their credit card number, bank details, and so on, so that when the user purchases something on-line, these details are automatically entered. This makes shopping on-line easier.
An increasing concept is personalization, where the WWW pages are designed especially for the user. The information to generate these is contained in databases. This personalization can either be:
Implicit personalization. With this the pages are designed using the user's personal preferences. A good example of this is My Yahoo and MSN which allow the user to design the WWW page for their preferences, such as the news articles that they would like on the page, the stocks that they would like to view, and so on. Explicit personalization. With this the pages are designed using data from the user's behaviors, and occurs automatically without the user have any direct influence on the choices. A good example of this is Amazon's book recommendation service which offers customers books based on the books that they have purchased in the past.
This personalization has many advantages. A good example is Amazon's One Click service which allows the user's credit card details to be held in a secure manner, and used every time that the user requires to purchase a book. Users could also benefit from being offered products that another user with a similar profile has bought.
For organizations, there is also an increased amount of targeted marketing, where complimentary services can be offered, such as a company which gives a holiday book, might also provide travel insurance to the user. Also marketing can be targeted at specific users, rather than, at present, with blanket marketing. Most users now ignore the advertising baners which appear at the top of many WWW pages. This database approach also leads to savings in WWW development, as template pages are produced, and the content for these are generated from the database. With a non-database driven system, the pages must be coded for every different type of page. Changes in products can also be quickly updated, as it only requires a single change to the database, rather than over many pages.
Examples of personalization include:
Customized user interface. This could be with fonts, colors, layout, and structure. Stimuli. This could be related to the way that the content is delivered, such as differing ways of delivering content using images, video or audio. Personalized content. This allows the user to select the types of content they wish to receive (such as My Yahoo), such as which providing content on news, sports, events, and so on. Personalized services. This would allow registered users more access to services than a guest user. Remembering details. This allows the details of the user to be stored. Many sites now remember the name of the person who is accessing the site. These details are typically stored as a text file, called a cookie, on the users computer. When the user goes back to the site, the cookie is loaded back, and the details of the site can be remembered. Se-cure information, such as credit card details will be stored on a secure database. Match services or products. Services can be exactly matched to the user's preferences. Pre-emptive customer service. With this organizations can predict the requirements of the user, such as providing a graduation gown service in the month of June in the year that they graduate from college. Product suggestions. This is based on products that the user has bought in the past, or ones that similar users have purchased. Cater for individual needs. This caters for special needs that only apply for a few customers.
A good example of personalization is on the Dell.com WWW site (see Figure 1). With this the user enters the system service tag (or express service code) of their computer and the WWW site generates pages which relate specifically to the product. This allows Dell to pro-vide details of software downloads, hardware updates, and so on, specifically for the computer. This overcomes one of the most annoying features when trying to find the cor-rect documentation and software downloads for a specific product. It also allows Dell to quickly target specific products for bug fixes, and product updates.
Figure 1: Dell site
The Dell database also keeps track of the complete history of a product, as the service tag is a unique code. This is shown as a text code, and also as a bar code, from which the servic-ing department can easily update and retrieve data on the system. The database knows the specification of the computer, and when it was shipped from Dell. It will also track the product and stores details of its operating system, its specification, and so on. These provide important information for the Dell Support team, and make it easier for them to make the correct decisions in providing help. Database systems are even used to track a product that is being fixed, as the user can contact Dell, and they are able to track the actually location of the system, and its current status.
An example of a Dell cookie is (stored as firstname.lastname@example.org):
The personalization can be achieved in number of ways, from simple cookies, which are stored on the users computer. Cookies are simple text files (typically stored as TXT files in the WINDOWS\COOKIES folder on Microsoft Windows), and will contain relevant details on the user, and the any of their preferences. As these are text files, they cannot do any damage to the local computer. Sometimes users delete these cookies, and all the previous information is lost, and must thus re-register for the system to be able to store their details. Other system store details of the user on the server, and this allows the user to move around the Internet and still get access to their logged data. A good example of this is MSN Messenger which allows the user to login with a passport, and their contacts, and other data is downloaded from the server. This allows the user to read their e-mail or contact the buddies from any location on the Internet. As the users details and preferences (such as where they live, and their favorite links) are stored on a server, they can't be deleted when they erase the cookies on the own computer. Unfortunately the logon process can be quite time consum-ing, as the program must contact the server, which is likely to be processing many other logins. Problems can also occur when the server goes down, as users will not be able to login. These types of servers are targets for DOS (denial-of-service) attacks, and can be made to slow down their processing, if they have to respond to too many logins, at a time.
With cookies the data stored can involve:
User profile. This would typically store details of the user, such as their common name, date of birth, and so on. This could also store the user's login name and password (obviously this would not be shown in the cookie in a text form, and will be encoded in some way, so that the user's login and password details cannot be viewed from their cookies. Session details. This would store the date and time that the user last accessed the site, and the time they have spent there. Customer identification. This might contain a unique customer identification, which can be used to match-up with the organization's database entry. Advertising profiles. This typically defines the adverts that have already been displayed. For example many sites show an initial 'flash' screen which is useful initially to present a good image for the organization, but should not be displayed again, the user has al-ready seen it. If the user has viewed it, then the cookie stores this information, so that it will not be displayed again (obviously if they were to delete the cookie, it would appear again).
As cookies are just simple text files they cannot pass information on a user's computer, other than the information contained within the cookie. Also the cookie generated by a WWW site cannot be used by another other WWW, as they can only be used by the WWW site that created it. Thus cookies cannot be used to track users around the Internet. The greatest drawback with cookies is that users typically do not get the opportunity as to whether they want the cookie stored to their local disk, or not.
An example of the usage of centralized market information is DoubleClick, who specialize in generating banner advertisements which are aimed at specific users. With this companies who subscribe to DoubleClick, have a cookie request from DoubleClick on their page. If the user has an existing one it is read for the users details (otherwise a new one is generated). As DoubleClick has many organizations subscribing to it, they can search for the types of sites that the user has most frequently accessed. The user will then receive a targeted banner advertisement which is most relevant to them. Over time the advertising will become more focused as DoubleClick learn more about the user. Note all the cookies will be sent and received by DoubleClick, and not by the organization that subscribed to them.
Data warehousing is a method which stored vast quantities of raw data, such as one generated from logs files. This data can then be prepared and reformatting for data mining process, which will try and create meaningful information from the raw data. This is similar to traditional paper-based storage, but it is obviously easier to stored large amounts of electronic data in a small physical space. For example exam papers could be marked to get the final exam mark. The raw data for this would be the actual exam papers. This data could be analyzed for the average number of words per question, or the average mark for each question, or the number of pages used, and so on. With the raw data, it is possible to analyze the data in many different ways, and find new insights on it. Without the raw data it is often difficult to run different analyses. Another example relates to car sales. With the raw data on car sales, it would be possible to determine the percentage of people within a certain street that bought red cars, or the number of people in a city that bought a blue, 2000cc car, or the number of people with a surname that begun with a letter 'C', that bought a Ford van. All this data in warehouses will make marketing more refined in the future.
There is obviously a very fine line between personalization and personal intrusion, and the collection of data must comply with current laws. Unfortunately many data collection programs confuse the user by displaying great deals of text, for which the user is asked to read, and then agree to. Most users now, typically, just click the accept button without even bothering to read the agreement statement. For example, when was the last time that some-one actually read the license agreement for a software program that they had just bought?
Data mining methods include:
Anonymous profile data. This is generated whenever a user contacts a site, and might contain the network address (IP address), domain name, ISP provider, WWW browser version, and so on. Cookies. This provides information on the user. Monitoring newsgroups and chat rooms. This can used to determine information on the user. For example if the user subscribes to many job related newsgroups, then there is a good chance that the user is actively looking for another job.
Self-divulgence of information for a purchase. This is data that is completed when pur-chasing a product. Self-divulgence of information for free merchandise. This typically related to on-line prize draws, where the user completes a form, in order to win a prize or receive a free gift. Self-divulgence of information to access a web site. This is where the user subscribes to a WWW site, and fills-in a form.
A registration form is an excellent method for an organization to get user data, and is an opportunity for the organization to ask questions about the user, in which they could use for marketing purposes. For example, how many times have you been asked if you where male or female when you registered for a WWW site? It should not matter to the registration if you were male or female. So why do they ask? For marketing and data mining purposes. This form of data mining is explicit, where the user actually knows that there data is being stored. Many users do not like this form of data mining, as they feel that it is obtrusive. In newer form of data mining is implicit, where the user does not know that they are being monitored for their usage patterns. These include cookies, but WWW sites can also monitor how the user moves through a site, and the pages that they are most likely to spend time with. For example if the user on a bank site spends more time looking at the corporate page, then the may possibly be interested either in buying stocks in the company, or they are looking for a job with them.
Future technologies may include spyware. With these WWW pages code contains graphics files which are invisible to the user (as they may only contain a few pixels), but are resident on a data mining server. When the page is loaded the data mining server is con-tacted, and the details of the access can be logged. This will give details of where the user is located, their network address, their browser details, and so on. This could also be applied to e-mails, which contain graphics which are contained on data mining services. The server can then log the accesses to the graphics, and thus log when the e-mail was read, and from which location.
Note: This essay was inspired by the excellent MSc thesis by Deborah Crompton [Technology and Implications of Dynamically Personalised Web-Applications, School of Computing, Napier University, March 2001], of which I was lucky enough to be part of the examination team for the MSc Viva. It thus contains some material which was used in the thesis. In my opinion it was one of the best MSc thesis's that I have ever read, and was interesting and extremely focused on achieving a goal.