Identifying Sources of Bandwidth Usage with AWStats

Link: https://support.brilliantdirectories.com/support/solutions/articles/12000101333-identifying-sources-of-bandwidth-usage-with-awstats

AWStats, a powerful web analytics tool integrated into cPanel, offers a wealth of information about a website's traffic. Let's delve into how to leverage AWStats to pinpoint traffic sources and identify which ones are consuming the most bandwidth.


Accessing AWStats


  1. In the Admin area of the website, navigate to the Developer Hub > cPanel Dashboard to log into your cPanel


  2. Navigate to the "Metrics" section and click on the "AWStats" icon


  3. Select the domain you want to analyze from the dropdown menu and click "View."


  4. At the top of the screen, there is an option to choose the "Reported Period". Choose "Monthly" and the month and year to view.  The current month will be selected by default.



Identifying Sources of Bandwidth Consumption


Once in the AWStats interface, there is a lot of information about the traffic to the website.  The following sections are the most important when trying to find the primary sources of bandwidth consumption.


"Summary" Section


This section gives a high level overview of how much traffic a site received from humans ("Viewed traffic") and bots ("Not Viewed Traffic").



Key Takeaways / Action Items



One critical piece of information here is the "Bandwidth" column, which gives an indication of how much bandwidth has been consumed by humans vs bots.  


If the majority of the traffic is coming from humans, checking the "Locales" and "Hosts" sections below will provide more information about the specific sources of that traffic and if there are opportunities to block specific IPs or countries.


If the majority of the traffic is coming from bots, the "Robots/Spiders visitors" section below will list the specific bots that are consuming the traffic.  This will help identify bots that can be blocked in the robots.txt file to prevent future bandwidth usage.



"Locales" Section


This section provides a breakdown of visitors based on their geographical location. You'll see the number of visits and the percentage of total traffic each country contributes.  



Key Takeaways / Action Items



If a lot of visitors are coming from outside of the website's target market, blocking traffic from outside of those countries can have a large impact on the bandwidth usage of the site.


"Hosts" Section



This is a list of IP addresses that have accessed the site. This can help identify the specific IPs, companies, organizations, or internet service providers (ISPs) consuming bandwidth on the site.  



Key Takeaways / Action Items



Specific IPs can sometimes consume an outsized portion of a site's bandwidth. Use a tool like https://www.ip2location.com/ to search for a specific IP and find information about what country it is from, if it is associated with a specific company or ISP, and more.


If a single IP is identified to be consuming a large amount of bandwidth and it is not a desirable source of traffic, the IP can be blocked.


Note: If a website is using Cloudflare, Securi, or other DNS hosts that act as a proxy, the IPs listed here will be from the DNS host rather than the original user that made the request.  More information about the IPs accessing the site can be found in the DNS host account.


"Robots/Spiders visitors" Section


This section shows the "user agent" of bots that have visited the site, and how much bandwidth has been consumed by each bot.



Key Takeaways / Action Items



If the "Summary" section described above indicates that most of the bandwidth usage of the website is coming from bots, this section will help identify which specific bots are consuming the bandwidth so a decision can be made whether to block them or not.


Once an undesirable bot is identified, it can be blocked in the robots.txt file.


The default robots.txt file blocks ALL bots except for Google, Yahoo, Bing, and Twitter / X. Using the default robots.txt rules is recommended for most websites and will prevent a large number of undesirable bots from crawling the site and consuming bandwidth.


Important: Facebook has a bot called "facebookexternalhit". This bot must be allowed in the robots.txt file in order for pages on the website to be shared on Facebook. However, this bot is known for consuming an enormous amount of bandwidth, far beyond what is required to share content on Facebook, and is blocked by default. Find more information about the facebookexternalhit bot.