BlogCadre users see no ads!  Popular topics: humor, video, links, cool, wtf.  Go create an account!




Weighing the Internet

In 1798 Henry Cavendish, known for his scientific brilliance and terrible fear of women, developed a system for calculating the gravitational constant (G) by measuring the gravitational attraction between two small spheres. In essence, he was able to "weigh the earth" by comparing the relationship between two known objects.

This got to thinking about weighing the internet -- calculating the number of users online. Since I am by no means a brilliant scientist and am horribly attracted to women everywhere, there were obviously roadblocks in my path that Henry did not have to deal with.

Want to know how many internet users there are? Curious about how many people read a site like Slashdot every day? Read on!

Tools
Alexa has done a nice job collecting the browsing statistics for a sizeable sample of internet users. It’s not a perfect sample, as it relies on a browser plugin that requires a voluntary install, but it’s about as good a sample as is available.

Using Alexa, you can find the percentage of internet users that visited a particular site on a particular day. If we know the actual number of visitors that come to a particular site, and compare this with the Alexa data, we can extrapolate the total number of users on the internet for that day.

Reference Data
Now, measuring the gravitational attraction between relatively small masses is very difficult, due to the fact that the actual gravitational force between tiny objects is infinitesimal. The larger the mass, the more measurable it's gravitational effect. In other words, Cavendish needed some really large balls to weigh the earth. 350 pound balls, to be precise.

There are some similarities, here, to weighing the internet. Alexa data is only really valid for the top 100,000 sites, so you need the stats for a relatively large site to even attempt to make a measurement. Not a lot of sites in the top 100,000 are too keen about divulging their stats. This kind of information is what you might call "shock value bragging material," so it’s typically saved for special introductions and dinner party conversations. So when Vince and Eliot were podcasting about the number of daily hackaday viewers, I realized we now have the missing piece of the puzzle.

Results
The two of them seemed to be in a bit of a disagreement as to the number of page hits hackaday receives daily, with Eliot figuring 65k and Vince figuring 80k. Assuming they both were making a reasonable estimate, I’m going to average that to a whopping 72,500 page hits a day on hackaday.

This was sometime around July 8th. According to Alexa, around that same time they had a reach of about 110 people per million users. On average, people who visited hackaday viewed roughly 1.4 pages.

So we can figure out the number of people who view hackaday by dividing 72,500 by 1.4, which gives us roughly 51,800 daily viewers. The 110 per million figure tells us that they get about .011 percent of the internet’s viewers. 51,800 divided by .00011 leaves us with a result of about 471 million internet users.

With this knowlege, you can easily estimate the traffic to other sites. If we go by the 471 million estimate, Slashdot gets a whopping 380,000 daily readers.

A Perplexing Conclusion
Unless my math is wrong, the result is way off from the 880+ million users Nielsen/NetRatings reports. Even if we go with Vince’s 80k/day estimate, that still leaves us with only 519 million users. It could be that the Alexa reach is exaggerated due to hackaday readers having the Alexa toolbar installed more than average, but I highly doubt that. I ran the numbers with BlogCadre (a statistically much smaller sample) from when we got boingboinged and it came out similarly, around 520 million users.

It would appear that either Nielsen is pretty inflated, or the Alexa toolbar is unavailable to a very large population of people -- a population that disproportionately doesn't read hackaday when compared to the population the toolbar is exposed to.

Cavendish’s measurement with the 350 pound lead spheres was so accurate that it stood for over a century. It had only a 1% error. It makes me jealous.

But who is off? Alexa? Nielsen? Both? Is hackaday still not a large enough site to produce statistically valid results? Are the available web use statistics really that inaccurate? I’m looking forward to your comments.

update 7-14-2005
After having a night to sleep on the problem, I realized there probably isn't one at all.

What is happeneing is that we are measuring a different value than what is reported by Nielsen. We are measuring the number of people that use the internet for a particular day. Nielsen is measuring the number of people who have access to a connection.

From what we calculated, it would appear that roughly 41 percent of internet users did not log in that day. It would be interesting to know the same stats for TV viewers. I.e. what percentage of all people with access to a TV don't watch it on a given day.

If we knew this information, we could calculate the internet junky to couch potato index!

update 7-14-2005 - post slashdot
Thanks for coming everyone! I've had a little time to think about all of your great comments. It really got me to thinking about some further areas of study.

Trackback URL for this post:

http://www.blogcadre.com/trackback/302
from guilty green on July 15, 2005 - 1:13pm

Jason Striegel has some interesting information over on his blog about how many internet users there are out there. He uses a mathematical formula similar to the one Henry Cavendish used when he estimated the weight of the earth back in 1798. He uses ...

from Texas Venture Capital Blog on July 14, 2005 - 8:47pm

Jason Striegel’s post titled ‘Weighing the Internet’. He concluded that during any one single day 520 million users were on the web. Roughly 41% of of internet users do not log on to the internet during any particular day. Interes...

from Silly Science on July 14, 2005 - 6:44pm

In the true spirit of the name of my blog I think this is of some crazy interest... Weighing the Internet

Non English speaking individuals?

Could you be missing the large population of say Japan and China that do not read websites only done in english?

-Ryan

Re: Non English Speaking Individuals

Jason Striegel's picture

The statistics would only be affected if both these conditions are satisfied:

  • there are large populations that Alexa isn't sampling
  • the proportion of the missing population that visits the site you are using for measurement is different from the alexa measured proportion

If that's the case, you could get an idea for which populations are missing from the survey by weighing the internet with a site that has a higher percentage of people from the missing population. Because these populations would affect only the site stats and not Alexa, your result would be higher.

In other words, find the sites that weigh the internet high and you've found a population that isn't sampled by Alexa.

speaking of which

Jason Striegel's picture

The Alexa toolbar is only available for IE. One might expect that a disproportionately high amount of hackaday readers use firefox. This should only inflate the measurement, though, which adds to my suspicion that somebody's numbers are wildly fudged.

Here's a more interesting thought, however. Our measurement is weighing the amount of people who use the internet in a day. Perhaps the Nielsen report is higher because it includes people that rarely use the internet. Or maybe they are counting all the coasters that AOL has sent out over the years!

uniqueness

This also is a relatively high number as you are assuming that your site visitors are totally different from other sites. Lets say, my site receives 50000 daily visitors and your site 80000.. are we talking about 130k total visitors? nah... I see some "interesection" on the "Venn" circles.

hmm. not sure i understand what you are getting at.

Jason Striegel's picture

The measurement relies on the site being part of a statistical sample of the entire net's traffic.

Let's call all the people online per day 'I', the page views per day on a particular site 's', and the number of views per user for that particular site 'v', and the number of users per day for a particular site 'i':

i is equal to s/v. I is equal to i/p, where p is the fraction of all internet users that visit the site in question. So if Alexa tells us that hackaday had a p of 110 people per million (.00011) and a v of 1.4 and Vince tells us that s is 80,000, we have all the information we need.

i(hackaday) = 80,000/1.4 = 57143 = how many unique visitors had gets in a day.

I = 57143 / .00011 = 519 million = how many people used the internet that day.

Now, the potential error lies in Vince's page view estimate, and more importantly, whether Alexa is providing a statistically valid sample of the entire internet. I expect the former is pretty close, and the latter I really don't know.

We could find out more about who Alexa is missing by performing this calculation for a number of different sites who's user base we suspect is disproportionately left out of the Alexa sample. Those sites would weigh the internet much heavier if our assumption was true that Alexa isn't sampling their users.

You could add some numbers...

http://news.bbc.co.uk/1/hi/technology/4630867.stm

That site shows that 100 million chinese online, more than 100 million USA online, meaning already we have over 200million.
BBC NEWS also lists 15.6million Households have internet, so you can assume that 4 in a house, making 62.4 million UKers.

So our total so far could be almost 300Million, not including every county in the world,except UK, USA, China..

That doesnt work

There are less than 62.4 million people living in the UK... although all technically have some kind of internet acess even if they have to take a trip to the library or Internet cafe.

Actually, it makes more sense that I originally supposed

Jason Striegel's picture

What we're really measuring is the number of people online a day. This number is bound to be smaller than the total number of people who have the capability of being online. From these calculations, if the Nielsen report is correct, it would appear that on this particular day there were 320 million people that decided not to log in.

Maybe they were watching TV.

I know

Maybe you are gay

Or just STUPID

I pity the fool!

Jason Striegel's picture

At least I know how to rap, Lawrence.

Suck my wang with gusto

Suck my wang with gusto

Same using BBC news data

According to Alexa, BBC News has a daily reach of about 20,000 per million. After the London bombings last week, that shot up to about 32,000, or 0.032 of all users.

Now according to this article, the BBC news website had a record 115 million page views last Thursday, so with 5.9 page views per user (from Alexa), that's 19.49 million users.

Dividing 19.49 by 0.032 gives 609M.

Of course, something is totally out of whack because that article also states that the number of page views was 5 times normal, but that isn't reflected in either the reach or page views per user reported by Alexa.

However...

Your numbers are skewed from the get go simply by using Alexa.
How many People do YOU know, use alexa?
Me? None. Have I Used it, Yes, Many, MANY moons ago.
Using numbers from Alexa is trying to base your measurement on a Statistically insignificant number of users from square one. (How many of the total of internet users, use Alexa? I would bet 1 in a million)
Just sayin

Maybe the atomic weight of internet

the internet is just the electons and photons that travel over the fiber and wire. So what is atomic weight of inernet

Re: Maybe the atomic weight of internet

>> the internet is just the electons and photons that
>> travel over the fiber and wire. So what is atomic
>> weight of inernet

42?

are you high, Clarie?

Alexa has done a nice job collecting the browsing statistics for a sizeable sample of internet users. It’s not a perfect sample, as it relies on a browser plugin that requires a voluntary install, but it’s about as good a sample as is available.

Using Alexa, you can find the percentage of internet users that visited a particular site on a particular day. If we know the actual number of visitors that come to a particular site, and compare this with the Alexa data, we can extrapolate the total number of users on the internet for that day.

You assume, incorrectly:
1: The number of people using Alexa contrasted to the number of people hitting web sites vs american 'geek'/'popular' websites provides a good basis of statistics. HAH!
2: For the love of g-d, the web is NOT the internet. Repeat with me: WEB IS NOT THE INTERNET. The web is merely one service that has adopted a protocol that rides along the internet waves. How many messengers, FTP services, even gopher and YP, and other services exist that are not web-centric? OMGWTFLOL, The intarwebs is NOT 'the internet'.

From what we calculated, it would appear that roughly 41 percent of internet users did not log in that day. It would be interesting to know the same stats for TV viewers. I.e. what percentage of all people with access to a TV don't watch it on a given day.

Log on? HAHAH. Persistent connections, such as DSL and Cable... how do you ascertain if someone's "logged on" or not? What constitutes logging on? Merely obtaining an IP, having a connection to your provider, or making a connection to some service out there? Heaven forbid the my tv cable company try to gather statistics on what I watch ("he must LOVE HBO, it's on 24x7"), because I generally just kill the TV and stereo, leaving the cable box turned on and active.

Stupid fluff 'studies' and crack analysis idjuts. And blasted /. for thinking this was a worthy posting.

ah yeah...

Stella's picture

get a sense of humor!

humor

Show us something funny, and we'll laugh. Being presented someone's cranial-rectal syndrome is not necessarily a laugh riot.

Logged on and riding the internet waves

Jason Striegel's picture

Thanks for adopting your protocol of choice and riding the internet waves all the way over here to comment on my crack idjut fluff study. It means the world to me and I appreciate your thoughtful response.

The number of people using Alexa contrasted to the number of people hitting web sites vs american 'geek'/'popular' websites provides a good basis of statistics. HAH!

Well, Alexa has some of the best available browsing statistics. It's not perfect, but please let me know when you find better. Hackaday has an audience large enough to provide a stable comparison point. So yeah, knowing Hackaday's users per day along with it's percentage of internet users (measured by web use) provides us with some interesting information. Run the numbers on another large site that you have page view information for - I'd like ot compare the results.

For the love of g-d, the web is NOT the internet. Repeat with me: WEB IS NOT THE INTERNET. The web is merely one service that has adopted a protocol that rides along the internet waves. How many messengers, FTP services, even gopher and YP, and other services exist that are not web-centric?

Oh, snap! Damn... I really should have based an internet usage measurement on the amount of people who use gopher. It's making a huge comeback, I hear.

I asked around and 10 out of 10 people like to ride their internet waves on the web. That's in addition to other things people do, but we're counting people, not protocol usage, right?

Speaking of protocols, thank goodness the web finally adopted one. I don't know how it ever rode internet waves around before.

What constitutes logging on?

In the context of the article, it constitutes using a web browser. You know, to look at a website like slashdot, to read an article, or maybe to post a sad flame in an attempt to ride on someone else's coat tails instead of writing something interesting of your own.

Maybe try another source...

Perhaps you can run your numbers using another page. Try Ranking.com - They use Alexa and combine it with other stats to provide their own. Alexa tried to do things their way, and their toolbar is available to truly a very small part of the web's population. If available, you could try to compute alexa's population vs. total internet population to see if your alexa's population is big enough to be statiscally relevant for your purposes. There's a formula to do that... I just can't remember how it went...

Cheers!
Luis Alberto