I got an idea about how a program would work to compress already compressed lossless

BirdDad

Golden Member
Nov 25, 2004
1,131
0
71
My idea compresses data that has been RARed or Zipped or h.264 or whatever compression it is, my idea would further compress it. How do I approach a seasoned programer to explain to her/him the concept while protecting myself from having my idea ripped off?
 
Feb 25, 2011
16,823
1,493
126
Presumably there's math involved. Patent the algorithm. Or keep it a trade secret like Google and their search. (Their employees would have to sign an NDA.)
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,284
3,905
75
I have to say that, whatever your method is, I doubt it's likely to work. Why? If your method can compress anything that's already compressed, then it should be able to compress something it has already compressed, right? In theory, this would lead to everything being compressed to one bit, and one bit can't represent every possible input.

Or read Wikipedia's more mathematical proof of the same kind of thing: No lossless compression can compress all possible inputs.
 

BirdDad

Golden Member
Nov 25, 2004
1,131
0
71
it can compress something that already has been compressed but it cannot further compress something that it has compressed, the filesize would actually get bigger if you tried to do this. How would I go about forming a NDA?
I don't want to patent it. That would expose how it works and I want to keep that a secret otherwise there are many ways of doing it and some company would rip me off.
 
Last edited:

Spungo

Diamond Member
Jul 22, 2012
3,217
2
81
I have to say that, whatever your method is, I doubt it's likely to work. Why? If your method can compress anything that's already compressed, then it should be able to compress something it has already compressed, right? In theory, this would lead to everything being compressed to one bit, and one bit can't represent every possible input.
I always wondered about this as a kid. I later tested it to find that, at some point, the zipped result is actually larger than the initial result.

Here's a real world test I'm running. Initial file is 67,509 kb .wmv file
7zipped once: 67,135 kb
7zipped twice: 67,139 kb (larger than the first time zipped)

It's important to ask what kind of compression is being used. What are you compressing. What kind of data is it. Compressing video and compressing text are completely different. For a video file, the compression thinks in terms of frames. For frames 1,2,3,4 and 5, which parts are common? Which parts in the same frame are common? Cartoons often compress very easily because humans lack imagination. If Homer is wearing blue pants, his pants are blue. Just blue. In real life, his pants would be infinite shades of blue, they might have stains, they might have flaws, there might be holes in the cloth or lint stuck to it. This means a cartoon and real life should have slightly different compression methods, and this would be radically different from text compression which is radically different from sound compression.
This is why 7-zip's LZMA compression on a .wmv file doesn't work very well. LZMA is not for video, and you can't use Divx for Word documents.

I said this in another thread, but the theory behind compression and all other computing is probably older than any of us on this forum. What holds us back is computer hardware. Sure you can take 20 passes over a 1080p movie and encode something unbelievably tight, but we don't do that because it's not practical. Nobody wants a single video encoding to take a week or more. I find myself taking all kinds of shortcuts with LZMA compression. If I'm compressing 10gb of stuff, I don't always want to wait an hour to compress it as much as possible. I want something that is done in the time it takes for me to walk to the coffee room, socialize a bit, and walk back to my desk. Sometimes I don't even bother to zip things. I want it done now, so it gets tarballed with no compression.
 

BirdDad

Golden Member
Nov 25, 2004
1,131
0
71
I need an expert who can optimize and write new code based on my requirements.
But I need to be protected and don't know how to go about that. I think that the procedure would be to get a lawyer to make the NDA document and have potential programmers videochat with me and send them the document when I have found a good one and have a Notary witness that they have signed it and sent it back to me along with fax of their ID(I have no money to travel and I live in a small town).
Please correct me if I am wrong about the procedure or if you see potential pitfalls with the signing procedure.
Thanks
 

repoman0

Diamond Member
Jun 17, 2010
4,544
3,472
136
Or read Wikipedia's more mathematical proof of the same kind of thing: No lossless compression can compress all possible inputs.

This. Specialized lossless compression such as FLAC achieves its compression ratio by linearly predicting future samples based on past, and assuming that audio waveform won't change that much over time. Then it stores the linear predictor and uses a specialized compression algorithm to compress the residues, or difference of the actual value from the predictor, that does particularly well storing small values -- the idea being that the predictor is hopefully decent. The point is, any real compression on media sources like this requires a model to fit data to. We already have algorithms that are very close to or at the Shannon limit for data sources where the bits are assumed to be independent -- that is, unless your algorithm uses a predictor or model or otherwise assumes some structure to the data (and more importantly, it is a reasonable assumption: you can compress any type of file with the FLAC algorithm for example, but try compressing a text or EXE file with it. It will probably end up larger because the model doesn't fit the data), it is already mathematically proven that it will do no better than what we already have, because what we already have is the best possible in this case. If you do better than this, then it is necessarily not lossless and the output of the decompression will not match the original uncompressed for at least some inputs.

I'm no information theory expert, but spent half a year or so studying it in detail -- if you are and you've thought about all this already, then good luck
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,284
3,905
75
I think that the procedure would be to get a lawyer...I have no money...
I think I see (another) flaw in your plan. :sneaky:

But, OK, if you're serious about this...

1. Yes, you need an NDA. A lawyer would probably be good, but you might be able to find something online that might hold up in court. (Assuming you could either afford to go to court or you got to a point where the potential revenue/settlement from the invention was of sufficient value to interest a lawyer.) When I've signed NDAs, even when I did it remotely and faxed them in, I haven't needed a notary, but if you feel the need you can go ahead. You'll probably have to pay for the notarization, though.

2. You'll also want to draw up a contract. If you got a lawyer, they might be somewhat helpful in this. The contract spells out what the programmer is to do, and how you will be compensating them. (Money, stock, proceeds from potential sales, etc.) Realize that if you later discover that you need something done that isn't in the contract - which is likely - you'll need to add that to the contract later and add more compensation.
 

repoman0

Diamond Member
Jun 17, 2010
4,544
3,472
136
To add to the giant paragraph above since this is one of my favorite topics. A truly random source with its entropy equal to its number of uncompressed bits is uncompressible, period. That is, a model can't be fit to the data, symbols appear roughly the same number of times whether it's taken to be pure binary, ASCII text, hex digits, etc. Any compression algorithm you apply to the data will not shrink it or will make it larger via overhead. Fundamentally, compressing a source requires that its entropy in bits is less than its current representation as data.

An already compressed file will appear to be random to you, since that is the job of the encoder that was just applied to it -- unless again, you have some solid insight into the file structure and there is some sort of pattern that you can exploit. Since you want to apply your compression algorithm to seemingly every possible type of already compressed input, you have a lot of work ahead of you, learning the structure of the output of these existing algorithms and teaching your algorithm.
 

DigDog

Lifer
Jun 3, 2011
13,622
2,189
126
NDAs dont protect you, they just make people scared. if there's money involved, good luck. keep it a trade secret or patent it.
 

Schmide

Diamond Member
Mar 7, 2002
5,590
724
126
You're never going to compress an already compressed data stream.

You can improve the sequence from many points in the algorithm. Choosing to quantify something different, seeding trees for better bit distribution, window size adjustments, and so on. Most of the tradeoffs of size come from time.

The other reality. If you get to the point where can understand these extremely complex storage methods and improve on them, the coding and optimization would be child's play.
 

mikeymikec

Lifer
May 19, 2011
18,061
10,242
136
My idea compresses data that has been RARed or Zipped or h.264 or whatever compression it is, my idea would further compress it. How do I approach a seasoned programer to explain to her/him the concept while protecting myself from having my idea ripped off?

The concept of compressing data has been in documented existence since the Roman Empire (forms of shorthand). For computers it looks like it's been around for nearly half a century, and I very much doubt that the ones that have already been developed ignored two thousand years of history.

Something tells me that because you have no experience of developing compression algorithms, and apparently no programming experience either, that it's unlikely that you'll have thought up a viable new method of data compression, let alone one better than existing algorithms.

If you're truly serious about your idea, I would strongly suggest reading up about existing data compression algorithms first, and I suspect to be able to learn how those work you'll at least need to learn how to program (probably a degree in Maths as well).

The first apparent flaw in your logic is the way you say "or whatever compression they use". H264 and JPEG for example are data compression algorithms designed for specific types of data, and obviously they do a better job than more generic algorithms like zip/rar. If they didn't, they wouldn't be in use. Achieving a basic understanding of how JPEG works is quite easy and interesting in my experience, if you wanted an easy starting point, that's where I would start.
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,278
126
106
The concept of compressing data has been in documented existence since the Roman Empire (forms of shorthand). For computers it looks like it's been around for nearly half a century, and I very much doubt that the ones that have already been developed ignored two thousand years of history.

Something tells me that because you have no experience of developing compression algorithms, and apparently no programming experience either, that it's unlikely that you'll have thought up a viable new method of data compression, let alone one better than existing algorithms.

If you're truly serious about your idea, I would strongly suggest reading up about existing data compression algorithms first, and I suspect to be able to learn how those work you'll at least need to learn how to program (probably a degree in Maths as well).

The first apparent flaw in your logic is the way you say "or whatever compression they use". H264 and JPEG for example are data compression algorithms designed for specific types of data, and obviously they do a better job than more generic algorithms like zip/rar. If they didn't, they wouldn't be in use. Achieving a basic understanding of how JPEG works is quite easy and interesting in my experience, if you wanted an easy starting point, that's where I would start.

My feeling exactly. General compression does not exist, to say "I can compress anything uncompressed or compressed" is pretty naive.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
it can compress something that already has been compressed but it cannot further compress something that it has compressed, the filesize would actually get bigger if you tried to do this. How would I go about forming a NDA?
I don't want to patent it. That would expose how it works and I want to keep that a secret otherwise there are many ways of doing it and some company would rip me off.

An NDA doesn't protect anything, but that doesn't stop people from signing them. You can find lots of boilerplate examples online.

As for the algorithm, it's always possible you've stumbled on something new, but the skepticism expressed in the other replies is well warranted.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I need an expert who can optimize and write new code based on my requirements.

No, what you need is a proof of concept. It doesn't matter how optimized it is or what the user interface is, you need to demonstrate that your compression algorithm actually works better than the competition for some interesting set of files.

I'm very skeptical you came up with a fully realized compression algorithm that you have any confidence works well but lack the means to implement it in anything. If you're serious about this learn how to program in something, anything. Getting a compression algorithm to just work requires very little esoteric programming, just a small amount of complete file loading and saving. The rest is just math and manipulating things in memory. If you can't learn this much then I doubt you can really properly express a compression algorithm. More likely it'll consist of some vague high level ideas where you're expecting someone else to make it work with leaps of logic.

Your request sounds like "I have this incredible idea for a math theorem, I just need someone else to write the proof for it."
 

Spungo

Diamond Member
Jul 22, 2012
3,217
2
81
The first apparent flaw in your logic is the way you say "or whatever compression they use". H264 and JPEG for example are data compression algorithms designed for specific types of data, and obviously they do a better job than more generic algorithms like zip/rar. If they didn't, they wouldn't be in use. Achieving a basic understanding of how JPEG works is quite easy and interesting in my experience, if you wanted an easy starting point, that's where I would start.

One should learn about image compression just for the sake of learning it. I find the information to be very practical, and I use it all the time even if it's just to answer the question of which image format I should save something as. I always save computer screenshots as PNG, but real life pictures are saved as JPG. I'll leave it up to others to figure out why
PNG looks for common colors, so it works very well with screenshots. JPG tries to average colors in an area, so it works best when saving something that contains millions of colors.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,284
3,905
75
No, what you need is a proof of concept. It doesn't matter how optimized it is or what the user interface is, you need to demonstrate that your compression algorithm actually works better than the competition for some interesting set of files.
:thumbsup: How do you know your idea works if you don't have a proof-of-concept?

The other thing I realized is that I have actually seen a few programs that "work to compress already compressed" files losslessly. In all cases they work on optimizing specific compression schemes. The most general I've seen is Ben Jos Walbeehm's DeflOpt. It slightly improves compression of anything using the Deflate algorithm, which includes both zipfiles and PNG files.

There are also optimizers specifically for PNG (lots of them) and JPEG (which optimize data arrangement and Huffman compression.) I helped with one for JPEG.
 

KWiklund

Member
Oct 30, 2013
35
0
16
My idea compresses data that has been RARed or Zipped or h.264 or whatever compression it is, my idea would further compress it. How do I approach a seasoned programer to explain to her/him the concept while protecting myself from having my idea ripped off?

Start by convincing them that the idea will actually work. Most compression algorithms are already pretty good, basic Huffman coding for example, works pretty close to the Shannon limit (the point at which you provably can't compress data any further). With a few clever tricks and optimizations, you might be able to squeeze out a compression ratio that is a few percent better. At that point though, is it worth the effort?

If you need references on the theory, I would suggest Sayood's book on data compression. If you still *really* think you've got something, you might look into a provisional patent. Consult with someone who knows the math, and hell, publish it in IEEE Transactions on Information Theory.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
Are you expecting to make money off of this? I hate to burst your bubble, but that's pretty unlikely. If it's something revolutionary that Amazon or Netflix can use to drastically reduce streaming bandwidth or something like that, maybe, but don't expect to get rich by making a better zip file.

If your idea is viable, then I'm sure there are researchers who would be happy to develop it into a working algorithm and publish papers about it. That is probably your best bet unless you are planning to spend your own money to hire people to work on it.
 

BirdDad

Golden Member
Nov 25, 2004
1,131
0
71
what I should have said that it is for RAR files but can be rewritten for h.264 or other compression. I did not mean that it can compress any source, nothing could be further away from the truth. Only that it can be written to use a specific source of already compressed data.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
14
81
My idea compresses data that has been RARed or Zipped or h.264 or whatever compression it is, my idea would further compress it. How do I approach a seasoned programer to explain to her/him the concept while protecting myself from having my idea ripped off?

The best thing to give a programmer to work with is a detailed specification and description of the process, with all the equations and mathematics worked out. Ideally, you would also have programmed a crude proof-of-concept system which can take some input and produce some output.

It is very important that you understand the mathematics and have checked it very carefully and have identified any pitfalls. If you don't give your programmer detailed information about the pitfalls, they may get it wrong and it may not be obvious that they have done so. For example, the data protection file format "PAR2" laid down a detailed specification for the polynomial coefficient matrix. Several programmers took that specification and made PAR2 programs from it. It turned out, that the specification was not complete - not all matrices worked properly. Some programmers realised this. Not all did. A lot of "PAR2" compatible error checking programs would produce defective files, and it took many years for this to be noticed by the users.

This would allow your programmer to see exactly how the system is supposed to work, and exactly how it does what it is supposed to do. It also gives him something to check his work against.
 

Schmide

Diamond Member
Mar 7, 2002
5,590
724
126
Questions. Do you understand Lempel-Ziv, Discrete Cosine Transformation, Huffman, Wavelet, etc?

If not. No one will take you seriously until you do. You can't revolutionize a highly academic field without a more than trivial knowledge of it.

At the very least it is proof that you aren't reinventing something that is already out there.
 

uclabachelor

Senior member
Nov 9, 2009
448
0
71
My idea compresses data that has been RARed or Zipped or h.264 or whatever compression it is, my idea would further compress it. How do I approach a seasoned programer to explain to her/him the concept while protecting myself from having my idea ripped off?

Sorry to burst your bubble but that line of thinking is flawed by design.

If your idea can reduce compressed data down even more, why can't I just use RAR to compress a txt file, then use your "idea" to compress that RAR file even more, and repeat until the file size reaches its minimal limit?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |