How large would a plain text file be of the entire human genome?

JMWarren

Golden Member
Nov 6, 2003
1,201
0
0
I dunno, but word doesn't like it when you try to copy and paste it out....
Apparently Word has a page maximum...who would have known!
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Well, so you know 1 letter obviously = 1 byte (right...?), and I leave the conclusion to you. Remember 1KB = 1024 bytes and so on and so forth...
 

Gigantopithecus

Diamond Member
Dec 14, 2004
7,665
0
71
Actually, it wasn't for a homework assignment (stay in school long enough & they stop giving you homework). I have the sequences of the chromosomes downloaded from Project Gutenberg, but the files only total 400mb, which I thought was way too small. Apparently I was right, if 1 letter = 1 byte, then 3,000,000,000 letters should be just under 3gb, right?
(3,000,000,000/1024 = 2929687kb/1024 = 2861mb/1024 = 2.79gb)
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
one byte would equal one pair. Not letters.

Even then you could fit multiple pairs in a single byte. You can do the math. one byte = 256 combinations. divide that by the number of pair combinations and you have how many pairs you can fit in a byte.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
I assume they don't put these into a straight text file. For one thing, a standard ASCII character can handle 256 different things. Well, in DNA (let's count RNA too for kicks), there are only 5.

And then sequences of 3 encode single amino acids (or something like that, I don't remember much biology), and there are what... like 20 of those?

So yeah there's tons of room for compression just by shrinking it down this way. Not to mention actual compression algorithms that can take these alphabets and assign them smaller codewords.
 

gsellis

Diamond Member
Dec 4, 2003
6,061
0
0
Originally posted by: spidey07
one byte would equal one pair. Not letters.

Even then you could fit multiple pairs in a single byte. You can do the math. one byte = 256 combinations. divide that by the number of pair combinations and you have how many pairs you can fit in a byte.
2 bits would be all you needed as raw (uncompressed). You have 4 possible answers. A, T, C, G. If A = 00, T = 01, C = 10, and G = 11, 2 bits. 4 pairs per byte. Remember that you do not need to store the pair. You only need one side and the sequence.
 

Gibsons

Lifer
Aug 14, 2001
12,530
35
91
Originally posted by: eLiu
I assume they don't put these into a straight text file. For one thing, a standard ASCII character can handle 256 different things. Well, in DNA (let's count RNA too for kicks), there are only 5.

And then sequences of 3 encode single amino acids (or something like that, I don't remember much biology), and there are what... like 20 of those?

So yeah there's tons of room for compression just by shrinking it down this way. Not to mention actual compression algorithms that can take these alphabets and assign them smaller codewords.

It's available as plain text or a zip file, divided up by chromosome. hope this link works... Text

 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: gsellis
2 bits would be all you needed as raw (uncompressed). You have 4 possible answers. A, T, C, G. If A = 00, T = 01, C = 10, and G = 11, 2 bits. 4 pairs per byte. Remember that you do not need to store the pair. You only need one side and the sequence.
:thumbsup:

Divide the total number of bases by two (only need one side of the sequence), then multiply by two (two bits per base).
 

Googer

Lifer
Nov 11, 2004
12,571
4
81
I do know that if it were put on cd rom and placed in standard plastic cases that the stack would reach over 20+ feet in the air. The Museum of Science and Industy in Tampa has a display of CD cases in the shape of a double helix as a demonstartion of it's size.
 

JHutch

Golden Member
Oct 11, 1999
1,040
0
0
Originally posted by: Gigantopithecus
I have the sequences of the chromosomes downloaded from Project Gutenberg, but the files only total 400mb, which I thought was way too small.

Are you sure you downloaded everything? Chromosome 1 is 273.78 MB alone. Chromosome 2 is 245.99 MB, etc ... Quick glance shows that each chromosome averages about 200MB. Plus there is a separate X and Y sequence. Quite a bit more than 400MB of data there...

See http://www.gutenberg.org/etext/3501 thru 3524 for Gutenberg files.

JHutch

EDIT - Granted these are uncompressed text files. Simple zip compression would make it MUCH smaller.
 

Googer

Lifer
Nov 11, 2004
12,571
4
81
Originally posted by: JHutch
Originally posted by: Gigantopithecus
I have the sequences of the chromosomes downloaded from Project Gutenberg, but the files only total 400mb, which I thought was way too small.

Are you sure you downloaded everything? Chromosome 1 is 273.78 MB alone. Chromosome 2 is 245.99 MB, etc ... Quick glance shows that each chromosome averages about 200MB. Plus there is a separate X and Y sequence. Quite a bit more than 400MB of data there...

See http://www.gutenberg.org/etext/3501 thru 3524 for Gutenberg files.

JHutch



EDIT - Granted these are uncompressed text files. Simple zip compression would make it MUCH smaller.

It would, but WinZip does not work on real human genes.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |