How do large companies capture, store, and validate their websites?

SarcasticDwarf

Diamond Member
Jun 8, 2001
9,574
1
76
This came up in a class this evening. How do large companies with highly dynamic websites capture, store, and validate their websites?

I ask because we were discussing how just the "normal" backup method does not necessarilyaccount for the minute to minute changes in a website.
 

FoBoT

No Lifer
Apr 30, 2001
63,089
12
76
fobot.com
dynamic websites are just databases, or many databases working together
databases by their nature are dynamic
not sure what you are asking
 

SarcasticDwarf

Diamond Member
Jun 8, 2001
9,574
1
76
Originally posted by: FoBoT
dynamic websites are just databases, or many databases working together
databases by their nature are dynamic
not sure what you are asking

Let's take an example:

You have a website with millions of products held within a database. Now for whatever reason (lawsuit, federal investigation, whatever) you need to produce a RECORD from your website. That record is a page on your website.

How do you prove what example.html looked like at exactly 19:27:54?
 

KingGheedora

Diamond Member
Jun 24, 2006
3,248
1
81
Data that changes minute to minute is usually in databases. Companies run those on redundant drive arrays, and also employ clustering and/or replication, where a separate server or servers mirror the data on the active ("live") server. These can act as warm spares, where in more sophisticated setups the warm spare can kick in almost immediately after the main server experiences failure.

Replication is also used to keep fairly up to date redundant copies on other database servers.

Nightly and weekly, and monthly backups are also kept.
 

drinkmorejava

Diamond Member
Jun 24, 2004
3,567
7
81
Just to take at guess at one of the many ways to do. At 19:27:54, have the master DB server stop communicating with a slave DB server and backup the slave.
 

FoBoT

No Lifer
Apr 30, 2001
63,089
12
76
fobot.com
history

the tables in the database have to include history if the data is changing and needs to be kept over time

so what ever information needs to be retrieved is pulled from the database as it existed at 19:27:54 on whatever day is in question
 

hellman69

Member
Feb 15, 2003
180
0
71
Most websites like that are dynamic, driven by a database. For example, take Ebay. You see millions of listings, but they only have a few actual web pages (files that need to be backed up) that display that information. They query the database and display all those millions of pages. So for backup, the actual web page is probably changed, at most, once a week. That's pretty easy to backup. As for the database server, which is important, here is the typical backup plan. Do a full backup once a day. During the day, store and backup the change log to it. The change log can be applied to the full backup to re-create the changes for the day.

Disclaimer: I program a site similar to Ebay. We have about 200 distinct web page files that we backup daily. We have over 30,000 listings that change by the hour. The result is that, as a user, you will see 30,000+ different web pages that are changing minute by minute, but it is really an illusion. This is how we backup our website. Worst case, we will lose the last 5 minutes of changes before a disaster.

Trevor
 

KingGheedora

Diamond Member
Jun 24, 2006
3,248
1
81
Originally posted by: SarcasticDwarf
Originally posted by: FoBoT
dynamic websites are just databases, or many databases working together
databases by their nature are dynamic
not sure what you are asking

Let's take an example:

You have a website with millions of products held within a database. Now for whatever reason (lawsuit, federal investigation, whatever) you need to produce a RECORD from your website. That record is a page on your website.

How do you prove what example.html looked like at exactly 19:27:54?

Most companies cannot do this in as much detail as you describe. They'd need to track changes to all records (some companies might do this).

Most companies do keep historical versions of some of their data. The mechanism for doing this though is to just keep a version of each record with timestamp indicators of when the record became active and when it became invalid (changed).

This is one aspect of data warehousing. But data warehousing is not necessarily the only reason people would version their data.
 

Safeway

Lifer
Jun 22, 2004
12,081
9
81
Originally posted by: FoBoT
history

the tables in the database have to include history if the data is changing and needs to be kept over time

so what ever information needs to be retrieved is pulled from the database as it existed at 19:27:54 on whatever day is in question

Yup, like Wikipedia. You can see exactly what it looked like at a given time on a given date.
 

Safeway

Lifer
Jun 22, 2004
12,081
9
81
Originally posted by: KingGheedora
Originally posted by: SarcasticDwarf
Originally posted by: FoBoT
dynamic websites are just databases, or many databases working together
databases by their nature are dynamic
not sure what you are asking

Let's take an example:

You have a website with millions of products held within a database. Now for whatever reason (lawsuit, federal investigation, whatever) you need to produce a RECORD from your website. That record is a page on your website.

How do you prove what example.html looked like at exactly 19:27:54?

Most companies cannot do this in as much detail as you describe. They'd need to track changes to all records (some companies might do this).

Most companies do keep historical versions of some of their data. The mechanism for doing this though is to just keep a version of each record with timestamp indicators of when the record became active and when it became invalid (changed).

Tracking changes is easier than tracking multiple versions of the same content.
 

KingGheedora

Diamond Member
Jun 24, 2006
3,248
1
81
Originally posted by: Safeway
Originally posted by: KingGheedora
Originally posted by: SarcasticDwarf
Originally posted by: FoBoT
dynamic websites are just databases, or many databases working together
databases by their nature are dynamic
not sure what you are asking

Let's take an example:

You have a website with millions of products held within a database. Now for whatever reason (lawsuit, federal investigation, whatever) you need to produce a RECORD from your website. That record is a page on your website.

How do you prove what example.html looked like at exactly 19:27:54?

Most companies cannot do this in as much detail as you describe. They'd need to track changes to all records (some companies might do this).

Most companies do keep historical versions of some of their data. The mechanism for doing this though is to just keep a version of each record with timestamp indicators of when the record became active and when it became invalid (changed).

Tracking changes is easier than tracking multiple versions of the same content.

Do you mean with text? I'm not sure it's that easy if you are tracking database records (product ID's, product prices, customer info, etc) rather than "content", like wikipedia pages. It's seems like it'd be easier to keep copies of each version of each record (and i know that people do this) than come up with some way of tracking the changes only, not sure how that would be done.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |