Richard Price's blog

SOPA doesn’t apply to foreign sites only, as NBCUniversal General Counsel claims

Richard Cotton, the General Counsel of NBC Universal, was on MSNBC yesterday saying that SOPA only applies to foreign internet sites. He says categorically that no US internet site would be affected. 

Laurence Tribe, a professor of constitutional law at Harvard Law School, disagrees:

Although SOPA’s supporters have described the bill as directed at “foreign rogue websites,” the definitions in the bill are not in fact limited to foreign sites.

When you look at the definitions of the SOPA bill, a “domestic internet site” is defined as follows:

    The term `domestic Internet site’ means an Internet site for which the corresponding domain name or, if there is no domain name, the corresponding Internet Protocol address, is a domestic domain name or domestic Internet Protocol address.

A “domestic domain name” is defined as:

    The term `domestic domain name’ means a domain name that is registered or assigned by a domain name registrar, domain name registry, or other domain name registration authority, that is located within a judicial district of the United States.

So, a US corporation whose internet domain name was registered with a foreign domain name registrar would not count as a domestic site. A foreign internet site is defined as follows:

    The term `foreign Internet site’ means an Internet site that is not a domestic Internet site.

So any US corporation whose domain name is registered by a foreign domain name registry counts as a foreign site, and SOPA will apply to it. The letter of Richard Cotton’s assertion that SOPA does not apply to any domestic US sites may be true, as long as you define ‘domestic’ in the same way as the SOPA bill, but given that SOPA can apply to a US corporation with a domain registered abroad, the spirit of the assertion is wrong. 

One might think that a US corporation with a foreign-registered domain name can simply transfer their domain name to a US domain name registry, and avoid SOPA. However, if you have a country-specific domain name, like www.bit.ly (Libya), or www.justin.tv (Tuvalu), then you can’t just transfer your domain name to a US registry: the registries for those domain names are all foreign by law. 

So it looks SOPA does apply to Bit.ly and Justin.tv, both US corporations that count as foreign internet sites according to the SOPA definitions. The spirit of Richard Cotton’s assertion that SOPA doesn’t apply to any domestic US sites isn’t true at all. 

Three issues with the current academic journal system

I think there are at least three problems with the academic journal system:

  • Peer review is very slow. The time lag between finishing a paper, and its getting in the hands of everyone in the research community worldwide, is 6 months to 2+ years. 
  • Peer review is sometimes of questionable quality - the two peer reviewers selected by the journal might be unqualified, biased or just in a bad mood when reviewing a paper. Also 2 people is a small set of judges for a piece of research; typically it is going to be more interesting to know the opinions of everyone in the research community. 
  • The whole process is very expensive. Universities and companies spend $8 billion a year on subscribing to journals. 

I would like to see a world where all these problems are fixed:

  • Instant distribution: The time lag between finishing a paper and everyone in the research community seeing it is measured in hours and days, rather than months and years. 
  • Crowd-sourced peer review: Peer review should be crowd-sourced, and should happen post-distribution. With hundreds of people peer reviewing a paper, individual academic biases or moods get evened out.
  • Free distribution. Distribution should be free. The idea that it should cost $5,000 on average to distribute an academic paper around the world is crazy to my mind. That is way too much. Distribution should be free with the internet, and the $8 billion that gets spent annually by universities and companies on journals could get spent on more research - e.g. curing diseases faster.

With Academia.edu, we are mainly working on the first bullet point right now: instant distribution. Once we have achieved the first bullet point, we’ll move onto the second one: crowd-sourced peer review. The third bullet point is part of our DNA, and that is taken for granted: the research on Academia.edu is free to access and distribute, and it always will be. 

Dec 5

Ph.D to Dot.com

(I wrote this article for the alumni magazine of the Oxford Philosophy department, where I did my Ph.D in philosophy. I’m reposting the article here.)

Richard Price talks about his move from philosopher of perception to web entrepreneur and founder of Academia.edu. 

I remember first becoming interested in philosophy when I was at school, at around the age of 15 or 16. Some friends and I would vigorously debate topics such as free will, the existence of God, and communism. I realized that there was this subject called ‘philosophy’ and that I loved thinking about philosophical questions. I did PPE as an undergraduate (Philosophy, Politics and Economics) at St Catherine’s College, and did as much philosophy as I could. I stayed on to do a two-year Master’s degree, the B.Phil, at St Catherine’s, and then a Ph.D at Corpus Christi College and All Souls College. Doing philosophy at Oxford was an incredibly exciting experience for me.

My other passion in life, aside from philosophy, is entrepreneurship. After I finished my B.Phil, I was very keen to set up a business during the summer. I created a company called ‘Richard’s Banana Bakery’, selling banana cakes to cafes and offices in London. I was doing all the baking of the cakes in my Mum’s kitchen in London. I had two Magimixes on the go for a few hours a day, and the oven on most of the time. There was banana cake mixture everywhere, and I think I drove my Mum up the wall a bit.

At the end of the summer, and over the course of the vacations of my first year of my Ph.D, I turned the cake business into a sandwich business called ‘Dashing Lunches’ (I was dashing around London on a bike delivering sandwiches to offices).

Running both of these businesses was exhausting work, but also the idea of making my own products and making money out of them was incredibly thrilling to me. After a year of running Dashing Lunches, I decided to try something on the internet, and I built a student accommodation site in Oxford, LiveOut.co.uk, with some other Oxford students.

My entrepreneurial ventures were a side interest while I was pursuing my Ph.D in philosophy, which I was having the time of my life with. Tim Williamson was my Ph.D advisor and his standards of precision, and approach to philosophy, had a huge impact on me. I was working on a question within the philosophy of perception regarding how rich the content of visual experience is. Does visual experience only represent a sparse set of properties, such as colours and shapes, or does it represent richer properties too, such as the property of being a tomato, the property of being sad, and so on?

The graduate community at Oxford was incredibly alive with passionate and brilliant people, and I benefited almost more than I can say from being immersed in the community, often talking about philosophy into the small hours of the night. I remember one philosophy conversation with a friend of mine, Hemdat Lerman, going on for 12 hours with one half an hour break. I found philosophy at Oxford to be an exhilarating experience.

As I was starting to finish my Ph.D, I realized I was extremely torn about what my next steps should be. I had been very fortunate to win an All Souls Prize Fellowship, and I had the option to pursue research on that fellowship for another few years after my Ph.D finished. I decided to try out entrepreneurship for a couple of years, and if my efforts failed, I would still have 3 years left on my All Souls fellowship to pursue research.

The business idea I had was Academia.edu. I saw sites like LinkedIn and Facebook growing incredibly quickly, and I wanted to build a platform where researchers could share their research with others, and keep up with research in their field, both with ease and minimal friction. That platform is Academia.edu. The mission of Academia.edu is to accelerate the world’s research. 

To get Academia.edu going, I raised $600,000 in venture capital funding from London (from Spark Ventures), and moved to San Francisco in order to be part of the Silicon Valley technology culture. Silicon Valley is to technology startups what the Oxford philosophy department is to philosophy: you can immerse yourself completely in a community of people who are all obsessively passionate about the same thing as you.

After 3 years in operation, and after raising another $6 million in venture capital, mostly from Spark Capital and True Ventures, we have over 800,000 registered users, and over 3 million monthly visitors. Every day about 3,500 academics sign up, and about 2,500 papers are added. We have 6 employees and are based in downtown San Francisco. There are many challenges to building an internet company:

  • ensuring that you have a clear product vision
  • recruiting the best software engineers you possibly can
  • ensuring the company is adequately financed so you can pay all the bills. 

It’s extremely enjoyable to face all these challenges and to try to overcome them.

Some people ask me whether there are any connections between philosophy and entrepreneurship. I think there is at least one connection, which is about attitudes towards problem-finding. Problem-finding comes before problem-solving: you have to find and clearly articulate the problem before you can set about trying to solve it. In every day life, we often zoom along through logical transitions at such speed that we don’t notice minor glitches in those transitions. I think one thing philosophers do is try to slow those transitions down, so that we are more sensitive to glitches that may occur. After experiencing a glitch, something that doesn’t feel quite right, instead of marching ahead, philosophers will magnify that sensation of something not feeling quite right, in order to see whether there is a problem in the underlying rational transition.

When looking for business ideas, the analog is that we often zoom around in life and have adapted our behaviour around the constraints in life. We have adapted our behavior so successfully that we don’t often notice the constraints that we are skillfully navigating around. When hunting for business ideas, one has to slow down when one feels that one is navigating around some constraint, and then examine that constraint to see whether it can be removed. This is one of the similarities between philosophy and entrepreneurship for me: in the case of philosophy, one is on the lookout for logical problems with a train of thought, and in the case of entrepreneurship, one is on the lookout for practical problems in a train of activity. 

The number of academics and graduate students in the world

By my calculations, there are about 17 million faculty members and graduate students in the world: 11 million graduate students and 6 million faculty members. 

# of graduate students

The data about graduate students comes from the NSF Science and Engineering Indicators report. According to this table in the report, there are 600,000 graduate students in the US in science and engineering fields. According to this table, science and engineering degrees represent 22% of the total. So there are therefore 2.7 million graduate students in the US. 

According to this page of the report, the United States represents 25% of the researchers in the world. This means that there are 2.7 *4 = 10.8 million graduate students in the world. 

# of faculty members

According to the Bureau of Labor Statistics, there are 1.7 million teaching personnel at universities in the US. If you exclude teaching assistants, who tend to be graduate students, that number comes down to 1.54 million. Assuming that the US has 1/4 of the world’s faculty members, there are 1.54 * 4 = 6.16 million faculty members in the world. 

Size of science and engineering workforce

Another interesting stat is the size of the science and engineering workforce. According to this page of the NSF report, 12.9 million people in the US say that their job requires their science and engineering degree. Assuming that the United States is 1/4 of the global numbers, then the global science and engineering workforce is 12.9 *4 = 51.6 million people.

Of this 51.6 million figure, 6.8 million are teaching personnel at universities (1.7 million * 4), and 44.8 million are in the private sector.

Measuring monthly actives on a social site, and whether you count the signup visit

A lot of people measure how engaging a social app is according to what percentage of its users log in at least once in a given time period, e.g. 1 day, 1 week, 1 month etc. Fred Wilson wrote a post the other day about this issue that’s very interesting. He has exposure across a lot of great social sites and reported that it’s common to find 30% of the userbase logging in at least once a month, and 10% of the userbase logging in at least once a day. 

One thing about these numbers is that I don’t think yet there is an industry-agreed-upon standard for how to calculate the active users within a given time period. One interesting issue concerns the signup visit. Suppose your site has 85,000 users at the end of June, and in July 15,000 people signed up, so you have 100,000 users at the end of July. Furthermore, suppose that, throughout July, you saw 30,000 users visit the site (15,000 of whom were people signing up). 

If you include the signup visit as a visit from an ‘active user’, then you’ll report 30% monthly actives (30,000/100,000). However, you could argue that you’re only an active user if you visit the site during a non-signup visit. I.e. you could argue that the ‘monthly active’ figure is supposed to report what percentage of your users are engaged enough with your application to return to it after signing up.

If you have this conception of an active user, you should exclude the signup visit from your definition of ‘monthly active’. Suppose that, of the 15,000 people who signed up in July, 10% returned to the site during that same month - i.e. 1,500 users came back during the same month. Then your monthly actives are:

15,000 (people who signed up before July who were active)

 + 

1,500 (people who signed up during July and were active in that month)

 = 16,500

16,500 as a percentage of 100,000, which is the user count at the end of July, is 16.5%. 

So in this case, whether you count the signup visit or not makes a huge difference to what monthly active percentage you count. Either 30% of your users are active, or 16.5% of your users are active, depending on whether you could the signup visit as a visit from an ‘active user’.

At Academia.edu, we don’t count the signup visit in our definition of ‘monthly active’. So If someone signs up in a given month, we only count them as active in that month if they visit again during that month. About 30% of our registered users are active each month, using this definition of ‘active’. 

The impact that the signup visit has on your monthly active numbers grows as your growth rate grows. In the example above, the site was growing at around 18% a month (15,000 users signed up off the back of a userbase of 85,000 users). If your site is growing at 20% or 30% a month, whether you count the signup visit in monthly active counts becomes an even larger issue. 

Another interesting issue that gives rise to varying results is whether you use cookies to track registered users, or whether you use your own custom stats framework. With Google Analytics you can record whether a visit comes from a registered user or not (using custom variables). Google Analytics can then tell you how many unique visits it saw from registered users in a given month. The problem with this approach is that you tend to see more unique cookies visiting your site in a given month than you see real users, the reason being that some percentage of your registered users will access your site via multiple devices. Google Analytics will record those visits as visits from multiple unique registered users, but in reality it is one user behind multiple cookies. 

The alternative is to have your own custom stats framework, where you record all hits from registered users, with the user_id, and then you turn the database of hits into a database of visits (ideally working the standard ways of defining ‘visit’). With this framework you can track directly how many users visit your site during the month. 

The discrepancy between the number of visits reported by these two different methods can be large. And the discrepancy grows as the time period in question increases. The longer the time period, the greater the chance that a user will access the site from multiple devices.

In the case of Academia.edu, Google Analytics reports that it sees visits from 40% of our registered users every month during non-signup visits. We have our own stats framework for tracking registered user visits, and according to that framework, about 30% of our registered users visit at least once a month. So the unique cookie framework overestimates the actual numbers by 33% (40% instead of 30%). 

If you both count the signup visit AND use the cookie method to track unique registered user visits, then you could see massive discrepancies between counts of monthly active users. Let’s assume that Google Analytics’s unique cookie count overestimates true unique users by 33%. In the case above where 30% of the site’s users were seen in July, where the signup visit was being counted, Google Analytics would have reported that it saw visits from 40% of the registered userbase (assuming it overestimates true visits by 33%). If you use your own framework, and don’t count the signup visit, then we saw that, in this example, 16.5% of the user-base was active in the month. 40% vs 16.5% is a huge difference in monthly active percentages!

It goes to show that it’s really essential to know the methodology behind a certain ‘monthly active’ count before you know what that count means. 

Google Webmaster and custom crawl rates

I was looking at our Google Webmaster Dashboard a few months ago, and I saw that for a few weeks, Googlebot had basically completely stopped indexing new content on Academia.edu. We then saw in the Settings page that the default crawl rate was really low. I forget exactly what it was, but it was so low as to make it impossible for Google to index more than the tiniest fraction of new content that we were adding per day. 

We set a custom crawl rate which you can do on the Settings page. I think we increased the rate by a factor of about 100,000 or maybe 1 million. We also let someone at the Google sitemaps team know that we had experienced this issue. Soon after that, Googlebot’s indexing activity was back up to normal.

I spoke to a friend who runs a massive site, and he said that in his experience on his site, the default indexing rate for Googlebot has been way too low, and he has always set a custom crawl rate. Good to know that. 

How we plan stuff

Startups are extremely hard. There are all kinds of challenges, in building out the product, scaling up the engineering, and all the other parts of building a business (revenue, hiring, growth etc).

At Academia.edu we try to leverage everyone’s brainpower to work through problems. When it comes to product design, we encourage everyone to be thinking of ideas and how to prioritize those ideas against other things we could be working on.

We put all our ideas on sticky notes on giant whiteboards (see photo), and every 3-4 weeks we have a product roadmap discussion. 

To help bring some order to these discussions, we have a graph with two axes: on one axis we have Impact, and the other we have Ease.

We then start plotting the various sticky notes on the graph according first to how easy they are to implement (1 day, 1 week, 1 month etc), and then according to how much impact they have. We then end up with a cluster of sticky notes, and the ones at the top right of the graph (high impact, high ease) are the ones we start working on, and then we work diagonally downwards, so on a given diagonal you might find something that’s high impact, low ease, together with something that’s low impact, high ease.

We’ve found this way of thinking to be invaluable in order to go from a wall of sticky notes to a specific roadmap. It’s one of those things that, now we use, we wouldn’t want to do without. 

Some thoughts about the culture at Academia.edu

Before I got into business, and when I was still at school, I would often read that such-and-such merger didn’t work out because the cultures between the two companies were too different. At the time I never understood what this meant. For me corporate culture just looked like one homogeneous whole.

Having now worked in the professional world for a few years, I think culture is super important, and varies massively and in important ways between companies. This is especially true at the startup stage when you just have a few people in the company. The culture between companies is as different as you find personality traits varying between groups of friends. What kinds of humor do you like? What kinds of degrees of effort do you like? What kinds of judgement calls do you make on product decisions? What goals and what balances are worth striving for?

In Academia.edu we care about a few things. One is passion for building great things, and working to a high standard. Some people really care about doing things to a high standard, and doing a good job, and some really don’t. We strive to hire people in the first bucket.

Another one is the ability to participate well in group dynamics and group decision-making. The goal of group dynamics, I think, is to channel individual passion into a common integrated whole, with everyone fully backing and supporting the end decision. Problems can occur when people aren’t sufficiently passionate, or when they are too passionate about the wrong thing, such as getting their idea selected rather than anyone else’s.

The kind of group dynamics we push for are ones where the question of who came up with the idea is completely irrelevant to everyone in the discussion, including to the person who came up with it. We strive for a dynamic where all that matters is the idea, and that there is no such thing as ego, or ‘giving ground’, or ‘saving face’, or any of those emotions associated with ego.

Another thing that is really important to get this kind of dynamic going is the ability to ensure that your words and contributions to the discussion, and the strength of them, correctly represent the strength of your true opinions: i.e. what you think in the cool light of day. A lot of group discussions go wrong because people get locked onto a certain track of thought, sometimes locked there by themselves, and sometimes locked there by other people, and they feel duty-bound, or saving-face-bound, to defend that track no matter where it takes them. In other words they lose track of what they think about the matter, and what would constitute a measured, balanced response to the situation, and what would constitute a good judgement call in the face of uncertainty, and instead get stuck in their line of argument, and end up defending something a viewpoint that, in the cool light of day, wouldn’t have subscribed to at all.

Maintaining a sense of good judgement when there are multiple points of view, and even when there is someone in the group who is looking to polarize things, is really important. It’s really important not to lose your head, and to retain your judgement, so if you find yourself defending something that no longer makes sense, you can say ‘wow, this line of thinking clearly has problems’.

It’s also really important for credibility. If you say something is good or bad, it’s really important for the rest of the group to know that is what you really think, and you’re not just being forced into that judgement by the dialectical situation you have found yourself in. Sometimes, even with the best intentions, and the most solid group of people, if people are tired, then you can get discussions that you can tell aren’t heading in a very promising direction. It’s really good for everyone to have a sense of that point in a discussion, and to be able to agree just to cut the discussion off, and delay the decision until a later point, perhaps the next day.

I feel that successful group decision-making requires constant vigilance with regard to the emotional state of the discussion. Are the emotions directed at the right things, i.e. the product getting better, users being happier etc? Or have they started to shift towards the wrong things, i.e. someone feeling re-buffed and then digging their heels in a bit stronger as a result of feeling rebuffed? Usually if everyone is vigilant to the emotional state of the discussion, and where it’s heading, any slight deviances from the path to a good decision can be picked up on and corrected.

The definition of successful group decision making I think is where the decision that emerges is as good or better than what any subset of the group could have come up with. I feel that we achieve that at Academia.edu. We are super careful with who we hire, and look really carefully for the ability to channel passion in the right direction, and maintain solid judgement no matter what craziness is going on around you.