Friday, December 17, 2010

Apache log: log analysis

Even though the log file contains a large number of useful information, but this information only after the following extensive mining in order to maximize the role.

This article first discussed can be obtained from the log file information and cannot be obtained from the log file, and then presents several excellent log analysis tools, and how their programming analysis log file. 1. what information can be obtained (4 April) in the Apache logs "series preceding articles, we discussed the standard log files of Apache-access log and error log, and how to customize the log file. This article then discusses how to analyze log files for valuable statistical information. We face the problem is that even though the log file contains a wealth of information, but this information for our management, planning your site doesn't have much direct help. In order to manage and plan your Web site, we need to know: how many people visit the site, what they are looking at, how long to stay, where to learn that this Web site, and so on. All of this information is hidden by (or you may hide to) log file. The operators of the website, they also want to know the visitor's name, address, shoes size, and even visitors of your credit card number, but this information is not likely to get from the log file. To this end, as technicians we must know how to explain these operators: this information is not only impossible to obtain from the log files, and to obtain this information, the only way is to directly to the browser I asked and was refused. There are a lot of information you can use log files to record, including: the address of the remote machine: "the remote machine address" and "who browse the Web," almost, but not equivalent. Specifically, the address of the remote machine to tell our visitors come from, for example, it may be a buglet.rcbowen.com or proxy01 aol.com. Browse time: browser when accessing a site from the answer to this question we can learn a lot in. If the Web site of the majority of surfers are morning 9: 00 p.m. and 4: 00 visit to a website, you can trust that site visitors most total access during working hours; if access record mostly appears in the 7: 00 pm to midnight, we can be sure that browsers typically at home on the Internet. Of course, from a single access records can get information is very limited, but if the records from thousands of access, we can be very useful and important statistics. Users are accessing resources: which parts of the site most users? these favorite part is that we should continue to be part of development. Which parts of the site will always be left out? site the doghouse part perhaps hidden too deeply, perhaps they do not mean that we need to find ways to improve them. Of course, the content of the site have, such as legal declarations, although very few people visit, but it should not be arbitrarily changed them. Invalid link: of course, the log file can also tell us which things not as we imagine. The existence of errors in the Web site of links to other Web sites link back? when URL mistake? existence does not run CGI programs? is there a search engine to retrieve the program issued thousands per second, thus affecting the requested on this Web site's normal service? the answers to these questions can be found from the log file.

No comments:

Post a Comment