High end storage/archiving

AluminumStudios

Senior member
Sep 7, 2001
628
0
0
Hi,

I work for the Center for Biomedical Informatics at the University of Pittsburgh. One of our research groups has a project that collects tons of sales information on medication as well as hospital records (anonymized for those of you with security/big brother concerns) and analyzes them for trends that could indicate a disease outbreak. It's part of an early warning system that can also be used to detect bioterroism before people realize what is going on.

Anyway, they are beginning to collect huge amounts of data and want to look into a large scale backup and archiving mechanism. They are considering looking for a massive tape library (the data doesn't need to be on-line all of the time but it needs to be retrievable.)

My thoughts were a two tiered system with a massive RAID for daily backups and semi-current data, then a moderate sized tape machanism for archiving/long term storage.

Currently they have a 1 terrabyte RAID and a 9 tape DLT auto-loader. They are looking into a solution that could hold many terrabytes of data.

They are also like all projects on a buget and fortunatly fairly modern thinking. They aren't hung on up over-priced SCSI RAIDs and such (their current TB of storage is actually IDE RAID.) They just need reliablity and decent performance.

Does anyone have any suggestions? Particuarly does anyone have any suggestions for tape loaders with several terrabytes of capacity and RAID systems with several terrabytes of capacity that aren't over the top expensive 15k SCSI (I really don't think they need the top of the line speed when reliabiltiy and capacity is the focus.)

Thanks.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Back again? How'd that computing cluster project turn out? Did you end up going with the dual Opteron systems?

If you're really looking for multi-terabyte RAID1 or RAID5 capacity with decent performance, high reliability, and good long-term backup flexibility, the best choice is a true SAN or NAS storage solution. Of course, I'm biased, because I work in the storage industry. Those start in the $50-100K range, depending on configuration, size, and feature set.

If performance is really not a concern (that is, the array is being used mostly as space to dump backups to before they're offloaded to tape), then you might be able to get away with just setting up a big Samba or NFS server with Gigabit Ethernet connections to the hosts, and a couple SCSI RAID controllers with 10KRPM (or even slower) disks. SATA might even be a possibility if you used 300GB drives and got several external controller cards. Then you just have the hosts mount the network drives, and back up everything that way. But it's a huge strain on your network, performance won't be great, and you may not be able to access it while the data is being dumped to tape. The hardware to do this would probably cost under $10,000 -- but you won't have any software or hardware support when something goes wrong.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Well, the cheapest thing would just be a bunch of Maxtor Maxline 2 drives. But the interesting thing with this problem is the relative slowness of the network. I did some calcs and if you're only able to backup one drive at a time over a GigE network you would be looking at taking 69 hours for a full backup! Even if you could manage to saturate GigE, it would still take ~28 hours which is no good for a daily backup. But maybe differential backup would make this a non-issue.
 

RickH

Senior member
Aug 5, 2000
784
0
76
You contact Dell or HP/Compaq--you do not try to rig up some homemade system. Your job, university grants and contracts, patient information could be lost if the system fails. Don't put your ass on the line--they are crazy if they even consider letting you design a system. Rick
 

Pariah

Elite Member
Apr 16, 2000
7,357
20
81
Originally posted by: RickH
You contact Dell or HP/Compaq--you do not try to rig up some homemade system. Your job, university grants and contracts, patient information could be lost if the system fails. Don't put your ass on the line--they are crazy if they even consider letting you design a system. Rick

I agree completely. For hardware of this nature and for this type of "real" work, a messageboard where you have no idea of the qualifications of the people giving you advice, this the wrong place to be asking. If the success of my project and possibly my job reputation was on the line, I wouldn't want some 13 year old who couldn't tell me what SCSI stood for telling what I should be buying. The Univ Pitt has to have contacts at IBM, Dell, or someone else who can tell you what you need.
 

loosbrew

Golden Member
Oct 30, 2000
1,336
1
0
ok, here is your answer. Check out back up to disk then back up the disk later on. Use a company like comvault in conjunction with either an HP tape library or an EMC SAN or XIOTECH SAN. Works like a charm, however the key is the backup software. If you want to restore wuickly, backup to disk first , then back that disk up to tape later that day or night, so you have instant restore capabilities. Its costly, but if you only need to archive data, use a tape librarty with a bunch of LTO2 drives that can hold 200-400 gb each. Your best bet would be to call a local or major vendor(CDW etc) and have them spec out a solution for your specific needs and environment.

Good Luck!

Luis
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Originally posted by: RickH
You contact Dell or HP/Compaq--you do not try to rig up some homemade system. Your job, university grants and contracts, patient information could be lost if the system fails. Don't put your ass on the line--they are crazy if they even consider letting you design a system. Rick


Whatever.

What matters most in this decision is uptime, reliability, and disaster recovery time.

That means big bucks.

Now if you need 150% reliable media then definately go with the corporate design/supported SANS/NAS whatever. That stuff is hard to do, and engineers design that stuff and do a good job at it.

At my work we have a old guy who helped impliment IBM mainframe-type stuff for the government. A big part was dealing with the government's payroll. They had multiple redundant facilities spread thruought the country and had to maintain them in such a way that if one facility was completely wiped out for whatever reason that another facility will be able to kick in and take over completely in some unreasonably short about of time, with no loss of function or loss of data. Something like under 1minute or 10minutes.

And that was with 1970's technology. (since it was the 1970's ). Now that is some serious mojo.


big bucks big bucks big bucks.

Now the nature of the data is important. If you and deal with a major catastrophy and 1 or 2 days of loss of functionality was acceptable, then you can get away with a much cheaper system. Cheaper even then that if speed is not critical.

For instance by running a SMB or NFS share from a Linux server(s) you could possibly get away with something fairly cheap.

AS LONG AS YOU KEEP GOOD BACKUPS, then this is acceptable.

Something like this:

You need to store an MAXIMUM 4 terrabytes of data. So you set up a server farm of PCs to deal with it.
So you get 12PC's and put a terrabyte of diskspace on 10 of them using normal 250+gig drives. That gives you 10 terrabytes of space.

Then you design your server network. You have 2 PC servers that are a front end for your data. Behind them are the 10 data storage servers on their own private secure network.

The 10 storage PCs reside on a high performance network, and communicate to each other via OpenAFS distributed network file system. 5 of the PC's are active, the other 5 are backup servers that contain either replicate or backup volumes of the files stored on the 5 active servers. The 5 backups can also serve to share the burden of sharing out afs volumes that contain files that are read often and changed rarely.

The 5 backups will back up the volumes that are changed every half hour or so, depending on network demands.

That way if a disk crashed or a file server blew up, caught on fire, was assulted by terrorists then only a maximum of one-half hour of changes will be lost. Plus AFS has a local cache feature that changes on data will be cached in the front end SMB servers, so that even if a harddrive kicks the dust, there is a good chance that you can still recover the changes from over the past few minutes.

If the 5 active PC's get wiped out, then the 5 backup ones can step in immediately and replace them. A maximum of one half hour of work would be loss, turn over from backups to front end server would take less then 2 minutes.

The front-end PCs will have the AFS share's root mounted on it's filesystem and then it would in turn share that out via SAMBA to the Windows desktop clients. The front-end servers would be hardened and contain a firewall, virus scanners, extra intrusion detection software, and would were you hook up the tape backup drives.

Every night you do a full backup of the data, every lunchtime (if it's possible) you do a incremental backup. That way if you have a complete server/network meltdown, only a maximum of 6 or so hours of changes is lost.

Every weekend you perform one last complete backup and then have that backup + the backup of the OS partitions will be delivered to a off-site storage place. That way if the building burnt down or got hit by a natural disaster then only a maximum of a week of changes will be lost.

Quality of tape backups will be tested on a once a month by making sure that the active servers are healthy and then wiping the drives of the backup servers and then performing the backup restoration procedures.

Also critical or hard to get hardware like the tape drives will have a duplicate device stored along the backup media in the off-site storage.

For the servers in case they get wiped out, you just run down to compUSA or something and buy out their harddrive selections and get some PCs and cobble together a server system that will work till you get proper servers setup.

Of course you could do lots of variations on that. Like the servers will have raid setups so that if a harddrive fails you don't loose any information, pluss it would allow for a higher level of performance. It would take a entire server failing to loose the 30 minutes or so of changes.

By doing religous backups and testing them often will compisate for the quality of the commodity hardware.

You could use DVD media to do the backups, each AFS volume will have the quota that is slightly smaller then the size of a single DVD. So that you only backup the volumes that get changed. That can make backups quicker, then you store the DVD's for a couple years before you throw them away.

However if you can't do something like that and stick to it, or it seems excessive, or you need 150% uptime, then the only solution would be a pre-built SANS-like thing from IBM/HP/Dell/whatever.
 

AluminumStudios

Senior member
Sep 7, 2001
628
0
0
Originally posted by: Matthias99
Back again? How'd that computing cluster project turn out? Did you end up going with the dual Opteron systems?
q]

We ordered two dual-Opteron 248 systems with 4 gigs of RAM each from Appro. We didn't recieve them yet, but I'm looking forward to it soon!

To address comments by others, this is not a hospital clinical system, it will NOT be housing patient records for clinical offices.

Lots of systems in research are "self designed and built," especially when money is tight because commercial solutions are so expensive (corporate and enterprise level things are where tech companies make their money afterall.) A SAN developed for a larger scale operatoin is probably not what they are looking for.

This backup/storage system will only connect to a handful of other systems and can probably be connected via dedicated gig Ether lines so killing our network isn't too much of a concern nor is mega-performance. It will be more of a place to offload data to from the servers after it's analyzed, so it will accumulate terrabytes of data over time rather then copying that much on a daily basis. Sorry if I wasn't clear on these things.

I'm thinking a system with a large RAID that holds data for the previous couple weeks before offloading it to a big tape changer library would probably be best in this situation. I'm familair enough with drive systems to know where to look for various RAID deveices and how to build systems capable of holding large RAIDs but one area I know little about is large tape libraries beyond things like 9 tape DLT auto-loaders. I guess the reason for this post was to see if anyone could link me directly to any products of this nature and give ma little background info/recommendations on them.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
What is your needs?

How much storage do you need, how often does this get accessed?

What is a acceptable downtime, what is a acceptable budget?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |