- Feb 18, 2015
- 23
- 1
- 11
Im going to be doing a project where I set up and test a few open source web crawlers and search engines over the next month. Ive fooled around with Open Search Server on my laptop (Intel Haswell i5 1.3Ghz, 8GB, Mac Air) and my poor little machine is completely tapped out.
I would like to build a Linux machine to run all of these tests. As the bulk of the time in web crawling is spent waiting for data, the more threads the merrier. I would like to spend about $500 on this project but could stretch a little bit higher if it would result in markedly better performance. I would prefer to buy off of eBay/craigslist as this is a one time project for me and Id like to keep the overall costs down.
What motherboards/cpus /memory do you guys recommend for a project like this one? I dont mind two or three year old technology if it can do what I need it to do (in fact I would prefer it as it will be at a discount to 2014 product). Id like a mid sized case as I am not sure my wife will appreciate a full tower case hanging around the apartment. I do not play computer games so I do not think I will need a graphics card. Should I use a small SSD for system software and then a multi terabyte HDD for data? Or will it work just fine if I keep everything on a single HDD?
I am planning on testing a few engines that are based on vector space models, but other than Apache Nuch/Solr, do you know of any other search engine packages that use an implementation of the PageRank algorithm or some other link analysis algorithm to judge relevancy of search results? The vector space model may have worked well when the web was small, but it is giving me very poor results today.
Answers for the questions in the sticky:
450 to 500 dollar budget
I will be buying parts in the United States
I have no brand preference. I just need the system to work as intended
I do not have any parts that I will be using
I definitely will not be overclocking
I will be using an old monitor that I will borrow from a friend. Not sure what its resolution is. It was a standard issue office/work monitor from about 5 years ago.
I will build it in the next couple of weeks
Im planning on using open source software for everything. No Windows or MS Office etc.
I would like to build a Linux machine to run all of these tests. As the bulk of the time in web crawling is spent waiting for data, the more threads the merrier. I would like to spend about $500 on this project but could stretch a little bit higher if it would result in markedly better performance. I would prefer to buy off of eBay/craigslist as this is a one time project for me and Id like to keep the overall costs down.
What motherboards/cpus /memory do you guys recommend for a project like this one? I dont mind two or three year old technology if it can do what I need it to do (in fact I would prefer it as it will be at a discount to 2014 product). Id like a mid sized case as I am not sure my wife will appreciate a full tower case hanging around the apartment. I do not play computer games so I do not think I will need a graphics card. Should I use a small SSD for system software and then a multi terabyte HDD for data? Or will it work just fine if I keep everything on a single HDD?
I am planning on testing a few engines that are based on vector space models, but other than Apache Nuch/Solr, do you know of any other search engine packages that use an implementation of the PageRank algorithm or some other link analysis algorithm to judge relevancy of search results? The vector space model may have worked well when the web was small, but it is giving me very poor results today.
Answers for the questions in the sticky:
450 to 500 dollar budget
I will be buying parts in the United States
I have no brand preference. I just need the system to work as intended
I do not have any parts that I will be using
I definitely will not be overclocking
I will be using an old monitor that I will borrow from a friend. Not sure what its resolution is. It was a standard issue office/work monitor from about 5 years ago.
I will build it in the next couple of weeks
Im planning on using open source software for everything. No Windows or MS Office etc.