I started writing code in 1990 (my first language was assembly because I could learn it for free from the Intel manual sent that accompanied their processors). I grew up in one of the poorest counties in Ohio. I wanted to be a great programmer and believed that I could do anything if I set my mind to achieve it. At the age of 21, I had developed an internet search engine which was ranked #2 behind Google in relevance according to an internal review by Yahoo! The search engine was started in May,1998 and sold in November,1999.
In 1998, much of the internet didn't exist. Servers could support up to 4G of RAM. Google would start in September, 1998. I began with the simple premise of downloading web pages to my hard drive to see if I could figure out how to organize them. To do so, I wrote a crawler, an html parser to extract text and links and began pulllng down the web to my hard drives. I built a real-time indexing system along with all of the usual things that search engines had in 1999. In order to keep costs low, I had to build a distributed messaging system that allowed my crawler to be co-located away from the rest of the main system. The distributed messaging system (DMS) allowed me to isolate regions to prevent hackers from getting to the main search engine even if they compromised the remote crawlers. The DMS also allowed me to build out redundant data stores. I went through a hard drive almost every week in a system of approximately 100 hard drives and about 20 computers. My search engine went live with zero pages one year after the search engine was started on May 26th, 1999. By June 17th, 1999, the system had 65,244,000 pages. At this point, we had many companies interested and were asked to halt the index growth so that various tests could be run. I was the only engineer and somehow the search engine grew to 65 million pages (3 million pages per day) without me as I was presenting the site to many well known internet companies. In November, 1999, ExcelleRate, LLC was sold to Ask Jeeves.
After Ask Jeeves purchased my company in 1999, I moved to work out of the Emeryville, CA office. Upon joining Ask Jeeves, I began learning their systems and how I might integrate search into their question and answer system. About a month after I joined Ask, I was sitting in my office and Gary Culliss from DirectHit walked by my cubicle. It became immediately obvious to Ask Jeeves that I knew something about DirectHit, so I was brought into a decision to buy DirectHit vs license Google. At the time, I thought that Ask Jeeves wouldn't survive more than a couple of years if they did a deal with Google, so I recommended purchasing DirectHit. DirectHit was a click based search engine and I felt that they could effectively compete with Google. DirectHit was purchased and I was relocated to Natick, MA to continue my work and to help facilitate the acquisition.
DirectHit had a popularity engine that sat on top of its own search engine affectionately known as OOSE (our own search engine). The popularity engine provided great results and OOSE was not so great. The search engine ordered its sets in a way that didn't allow for efficient processing. This meant that either search didn't scan every possible document which often led to missing and/or irrelevant results. I worked with a team to replace OOSE with my search engine. In the process of rewriting the core of the engine, we added better duplicate processing and internationalization. We had a team of 3-5 people working on web search at the time and grew the index to over a billion pages and were handling over 50 million queries a day.
In July or August of 2001, Ask Jeeves began looking at a small search company called Teoma just outside of New York city in Piscataway, NJ. Our search engine team was tiny and we needed to grow the team size to effectively compete in the industry. Teoma had 10 people who were invested in search along with a clustering algorithm that used link communities. Ask Jeeves purchased Teoma on September 11, 2001 for 4.5 million. The sale obviously didn't make the news. On September 11th, I had a couple of co-workers who were on standby to get on the flight to San Francisco. Fortunately, they didn't get on the plane. I moved to Piscataway, NJ to figure out how to combine our technologies. As our index grew and web pages became more dynamic, there was a recognition that we needed to improve similar document recognition. I invented and developed an approach that would allow us to compare billions of documents to each other and detect documents that were the same semantically. I also helped with the process of integrating our crawling platform into the Teoma system.
I wrote a post yesterday at Search Engine Land, titled Why Do People Google Google? Understanding User Data to Measure Searcher Intent, ...
Before ↑ After ↓
A new patent from Ask.com also looks carefully at how people use their search engine.. Invented by Andy Curtis, Alan Levin, and Apostolos Gerasoulis
If you were searching for Andy Curtis Patent, you might see the above result and not know how or if "Andy Curtis" existed in the page (in the top result). It could very well exist as Alan Curtis and Andy Taylor. Teoma, DirectHit, and many other search engines produced 10 search results with the description being generated from the top of the page or the meta description if it existed in 2002. Google had began showing the query in the context of the document. I was tasked with building the system to do this for Ask.com. I designed a system which would first break content into chunks that was generally considered the language of the document and then would dynamically find the best piece or pieces of text to display to the user to show how the query is present in the given document. User satisfaction improved dramatically once the system was launched.
In June, 2003, I had an idea for how to greatly improve relevance on Ask.com using clicks and session behavior. I took off part of June and July to build the project (as it wasn't my current work assignment). I was able to demonstrate that using this system, we could match Google's relevance. Once I realized this system would work, I took it to the CTO who would not allow me to work on the project as he felt it wouldn't really work in the end. I found an advocate in the founder of Teoma (Apostolos Gerasoulis) and we formed a second development group outside of the control of the CTO to develop the idea. The initial group was Apostolos Gerasoulis, Alan Levin, and myself. Over the next few months, I built out a system, but still couldn't bring it live. At the time, I didn't have access to the system which delivered the final results to Ask.com (and the CTO wasn't willing to give it to me). I got a break when I called a new employee who had access and asked him to download it for me (because I was having trouble with permissions). Once I had the code, I spent the next three days modifying it and then put up a second version of Ask.com with the system my group created. I needed this to show the CEO that it was worth launching. On the 4th day, I was asked to return the code (fortunately I had already finished the demo). Jim Lanzone championed the product to the CEO Steve Berkowitz and the product was fast tracked to be launched. The product launched in the early part of 2004 and Ask saw significant increases in traffic month over month. In 2004, I was given an award "for single handedly saving the company" from Jim Lanzone. I wasn't the only one involved, but it was pretty cool to be recognized. For whatever reason, it wasn't initially launched in the UK and the UK traffic remained flat. My team grew in size and we began developing new and innovative products which continued to improve monetization and user satisfaction. These patents served as the foundation for many new products.
One of the first projects that the newly formed team worked on was the spell checker. The query logs showed a reasonably high percentage of queries were misspelled and those queries were leading to undesirable search results. To complete this project, we had several engineers and a linguist. It was a very challenging project inasmuch as many words are incorrect due to context and were correct by themselves. We didn't have a dictionary which indicated every correct word. The dictionaries we did have didn't indicate the likelihood that a user meant the given word. They also didn't include places, people, slang, etc. In the end, we were able to produce a high quality spell checker that continued to help improve Ask.com's metrics and monetization.
As the spell checker was being completed, my team recognized the need to build Related Search. No matter how good the search results were, it was likely that a user who entered the query "baseball" probably wouldn't be satisfied with the results. If the user wasn't satisfied with the regular results, they would either rephrase their query or leave. Related search gave users an opportunity to find what they wanted without perhaps knowing what it was that they were searching for in the first place. Related search was complicated to build as we didn't necessarily know which queries were valid to show. We wanted to avoid showing queries which were too similar and also avoid showing queries which don't seem related. We wanted to avoid showing queries which were misspelled. In the end, related search proved to be a very successful project and improved Ask.com's metrics and boosted monetization.
From December, 2004 through March, 2005, Apostolos, Joshua Frattarola, and I built out the initial image search system prototype for Ask Jeeves. After our initial proof of concept, the larger development team took over and we focused on a number of other projects including various clustering projects, blog search, recipe search, and news search. My team continued to research and develop new products for the rest of my employment. In 2005, IAC purchased Ask Jeeves for approximately $2 billion. After IAC purchased Ask Jeeves, Jim Lanzone became the CEO of the Ask Jeeves division within IAC.
Clicker was a startup that aimed to be the TV guide for the Web. I joined it in its early stages in July, 2009. Jim Lanzone (from Ask Jeeves) had just become the CEO of Clicker. Upon joining, I built a search engine from scratch that went live two months later for TechCruch-50 2009. I managed and/or directly built the majority of the backend for Clicker. I built several real-time systems, using personalized approaches to handle user behavior: tracking recent vs. long-term activity, correlated trends, implicit user actions, comments, list reordering, and more. I also built frameworks to serve general pages, search, and recommendations.
Clicker was one of the first Facebook instant personalization partners. I designed and built systems to utilize users’ interests to create a personalized recommendation experience. The system digested billions of users’ interest, matched them to TV shows and movies, correlate data, and provide high quality recommendations
In march, 2011, Clicker was sold to CBS Interactive. After joining CBSi, I managed a team which helped to integrate much of the Clicker technology into TV.com.
Shortly after I joined, I was asked to look at the advertising system inventory system to see if I could solve some of their technology related issues. I ended up building out an advertising inventory system supporting billions of ads in realtime and optimized the price on ads sold and reduced the time to compute overlapping inventory from many hours to a couple of seconds. The inventory was a challenge due to the fact that there were billions of impressions being sold to hundreds of thousands of overlapping clients. The system needed to know not only what inventory was available, but also how to best deliver to yield maximum monetization.
I led a team that helped re-platform last.fm, which has over 106 billion scrobbles based. As part of the process I wrote and led much of the backend design for the last.fm system, including search, tagging, real time recent activity, the primary user experience with scrobbles, and a pipeline to process scrobbles in realtime. The system was deployed on significantly smaller set of hardware and was more efficient and flexible than the older system.