WO2017187244A1

WO2017187244A1 - Method of data compression and decompression

Info

Publication number: WO2017187244A1
Application number: PCT/IB2016/054294
Authority: WO
Inventors: Girik MALIK; Pawan K. DHAR
Original assignee: Malik Girik; Dhar Pawan K
Priority date: 2016-04-28
Filing date: 2016-07-19
Publication date: 2017-11-02

Abstract

The method of present invention relates to data compression (and decompression) wherein during compression, data is reshaped into a matrix and stored on an image preferably of 32- bit floating point of the size of the matrix. The data is reshaped into a two dimensional array by adding bits to the data so that it reaches to its closest integer size. The method of compression and decompression provides an in-built security features. It facilitates efficient memory management and minimal loss of data during data compression.

Description

METHOD OF DATA COMPRESSION AND DECOMPRESSION

FIELD OF THE INVENTION The technology described herein relates to compression of data in floating-point for efficient data storage.

BACKGROUND OF THE INVENTION AND PRIOR ART The data compression involves encoding information using fewer bits than original representation of the data. Therefore, data compression is helpful in reducing data storage space, and further reducing communication cost by using available bandwidth or transmission capacity effectively. Over the last decade, an unprecedented explosion in the amount of digital data transmission via the internet is seen, representing text, images, video, sound, computer programs, and so on. Therefore, it becomes important to develop a method for data, compression that effectively uses available network bandwidth by maximally compressing the data. The below disclosed prior art relates to method of data compression. These include methods that are patented and published in patent application publications. Some of the most relevant methods to the present disclosure are herein described for the purpose of differentiating the unique aspects of the present invention, and further highlighting the drawbacks existing in prior art. R 10-2013-0162743 discloses a method of conversion of Data (a stream of numbers as input) into floating point numbers. The method includes first taking a 13 -bit stream as an input and then removing the sign bit. Thereafter, work on next stream and accordingly break the stream in 5 bits and 3 bits. Such sections are called as mantissa and exponent sections. The process of decompression of data is reverse of the process of compression of data. Further, KR 10-2013-0162743 discloses log table, which keeps track of and map the data that is converted or sent back and forth. However in the method, data loss and degradation may occur while converting the data bits into floating points. JP 4862799 discloses a device thai quantizes the data thereby compressing it and then storing it. The compression and storage of an image is demonstrated by means of splitting it in blocks of 8x8 pixels, applying Direct Cosine Transformation (DCT) using a pre-defined quantization table and thus the compression is performed.

Yet another such compression method of US 12485873, uses the repetitive structure of data in order to compress it, which requires either a lot of human intervention to look at the data and determine patterns or requires highly sophisticated pattern recognition and classification algorithms which take time to compute. Further, an initial starting point and code length of repetitive code is required to be set. Such data knowledge does not give true compression as entropy of data, is already lowered by drawing inferences from it and therefore, thus try to match the lower one and the data gets compressed with reference to the earlier one.

Further, C 200310102771,3 discloses an image floating data converting operation method for image compression and decompression. In the color mode conversion, the floating point number part in the conversion matrix is first converted into integral number for subsequent accumulation, multiplication and other operation, and finally reduced into original number range, raising the integral operation efficiency. The image matrix holding floating values are converted to integer matrix and then simply converted back to the floating point matrix.

Additionally, US 09797937 describes a system and method of scaling the 16bit floating point numbers to another native bit size like 8 bit or 32 bit, so that the processing can be done faster given lesser bits and the representation can have more accuracy given more bits. Finally, in US 13706652 the inventor discusses a system and method of compressing (and decompressing) data by splitting it in bounding box. The floating point numbers in the bounding box are used to generate dimension specific vectors (multiplicands and shared scale multipliers). The original bits in the bounded box are replaced with values generated using the specific vectors. A processor to manipulate the mantissa, section of a floating point number is also disclosed in the said prior art.

The present invention solves the drawback in the prior art by providing a method of data compression by compressing the data onto images along with providing data security. SUMMARY OF THE INVENTION

The present invention relates to data compression (and decompression) by converting data into its corresponding floating point value, and then storing it on an image preferably of 32- bit floating point. The said method of compression and decompression provides in-built security features.

The principal object of the present invention is to provide a method for compressing and decompressing data using floating point representation for efficient memory management.

Another object of the present invention is to provide method of data compression whereby data loss caused by compression is minimized and also data security is provided.

Additional object of the present invention is to store the converted data on an image of 32-bit floating point, reshaped into a matrix.

Yet another object of the present invention is to reshape the data into a two dimensional array by adding bits to the data so that the dimensions reach their closest integer size. Other objects and advantages of the present invention will become apparent from the following description taken in connection with the accompanying drawings, wherein, by way of illustration and example, the embodiments of the present invention are disclosed.

BRIEF DESCRIPTION OF DRAWINGS

The present invention will be better understood after reading the following detailed description of the presently preferred aspects thereof with reference to the appended drawings, in which: Figure 1 illustrates the process of reshaping the data.

Figure 2 illustrates the process of converting data to float type.

Figure 3 shows the compressed data stored on a TIFF format image. DESCRIPTION OF THE INVENTION

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the presented concepts. The presented concepts may be practiced with- out some or all of these specific details. The well-known process operations have not been described in detail so as to not unnecessarily obscure the described concepts. While some concepts will be described in conjunction with the specific embodiments, it will be understood that these embodiments are not intended to be limiting. The present invention relates to data compression and decompression by converting data into its corresponding floating point value, and then storing it on an image, preferably of 32-bit floating point. The invention also facilitates secure data compression and decompression without any loss of data during the process. The data already in floating point value can also be stored directly onto the 32-bit floating-point image.

The main aspect of the present invention is to provide a method of data compression by which data is securely compressed without any loss. In the method, data to be compressed is read from a file and is stored in an array or list. The data is then reshaped into a matrix and an empty image of size of the matrix is created where the data from the matrix is put and stored onto the image.

The data is stored on an empty image of 32-bit floating point. The data need not be grouped and only needs to be reshaped into a matrix. The empty image to be used is preferably two-dimensional, and therefore can be easily represented by two dimensional array or a list of lists. However, it is necessary to re-structure or reshape the data in form of a matrix.

The data is re-structured or reshaped into two-dimensional array by adding bits to the said data so that it reaches to its closest integer size. In order to reshape the data, square root of length of list is calculated and the integer part of the result is assigned to a first variable and the same integer is incremented by 1 and is assigned to a second variable. The product of the first and second variable is assigned to a third variable. The difference between the values of third variable and length of the list is calculated. If the difference is less than zero, then first variable (i.e. integer part of square root of the length of list) is incremented by 1, else the first variable is kept the same. The new product of the variable in the third variable is calculated again. Subsequently, zero (or any other character) is appended, equal to the number in the difference of product and length of the list.

For example: Length of a list is 12000

First variable= 109

Second variable= 110

Third variable= First variable X Second variable= 11990

Difference between length of list and third variable= 11990 - 12000 = -10

Since, the difference is less than zero, first variable is incremented by 1.

New first variable= 110

Third variable= First variable X Second variable= 12100

Number of Zeros/any other character appended = 12100 - 12000 = 100

An empty image preferably float type, of size of reshaped data of matrix is created. The data from said reshaped matrix is placed onto the image pixel wise. The present invention supports encryption of the compressed data. The compressed data is in the form of an image, which cannot be very easily decoded, using the normal Pattern Recognition methods. Therefore, the image obtained after compression appears to be a random noisy image and it is difficult to determine that data is stored onto it unless the process is known. Further due to its noise-like structure, it is extremely difficult to extract the infor- mation from such images using pattern recognition algorithms. Hence, the image so created, is inherently encrypted, without any other additional effort.

Further, the compressed data is stored preferably in TIFF format, as the same is efficient for holding float point values as shown in Figure 3. The data can be also converted to .gipa for- mat, which adds another layer of encryption for security and is then mapped to the original image. For example- a matrix of 3000x3000 floating point numbers (upto 8 places of decimal) is analyzed which can hold data correctly upto 16 places of decimal. The image file format used for storage in this case is TIFF as the other formats are inefficient for holding floating point values. The image can be converted to GIF if required and then mapped to the original image. The compression achieved by the method of present invention is much better and faster for large data than the traditional ZIP, RAR and 7Z data compression methods.

The method of present invention provides an added advantage of data security due to its multilevel nature. The images produced by the method can be opened on any computer or mobile device without the requirement of any additional interface, however the same cannot be easily decoded. Therefore, invention provides data compression as well as encryption. The invention also facilitates data analysis for finding patterns in the data, something similar to the already existing heat maps. The present invention also serves as a tool for the compression and decompression of big data/massive data. Using the present invention the floating point values are easily compressed or depressed as contrary to the belief that the floating-point values are generally difficult to handle. The method of compression and decompression has two variants: a) Low Precision method: in this method, floating point values upto 7 decimal places are handled, after which information loss may occur, if the number is not in a representable form. b) High Precision method: In this method, floating point values upto 22 decimal places are handled, without the loss of information when decompressing the original data.

If the floating-point values fall into one of the representable ranges then even the low precision method would work as good as high precision method. This technique is dependent on the computer architecture. If the computer has a wide ranging architecture, then one maybe able to compress the same data in both the modes with equal precision and accuracy. The Low Precision method is computationally less expensive as it uses direct mapping, however, in the High Precision method, scaled mapping is used. Another aspect of the present invention is to provide a method of decompression of data at the receiving end. During compression, no higher-level encryption method is applied and the image is created using the data in floating-point format. Therefore, in the method of decompressing data, the image is read pixel by pixel in a row major order and the sequence of pixels obtained is stored in a list. These pixels provide a corresponding floating-point number. The stopping criterion for reading the image can be a stream of zero characters/any other character given while construction of the image. Therefore, the sequence obtained in the list is the data that was encoded in the image. The data obtained can be mapped to the corresponding character given the floating-point code word as shown in Figure 4.

While the present invention has been described with reference to one or more aspects and have been set forth in considerable detail for the purposes of making a complete disclosure of the invention, such aspects are merely exemplary and are not intended to be limiting or represent an exhaustive enumeration of all aspects of the invention. The scope of the invention, therefore, shall be defined solely by the following claims. Further, it will be apparent to those skilled in the art that numerous changes may be made in such details without departing from the spirit and the principles of the invention.

Claims

CLAIM

1) A method of data compression comprising:

reading the data to be compressed from a file and storing it in an array or list;

reshaping the data stored in the array or list into a matrix;

creating an empty image of size of the matrix; and

storing the data from the matrix onto the image.

2) The method as claimed in claim 1 , wherein the empty image is of 32-bit floating point and preferably two-dimensional.

3) The method as claimed in claim 1 , wherein the data is securely stored onto the image during data compression and said image appears to be a noisy image.

A method for reshaping data into two-dimensional array comprising:

adding bits to the data in order to reach to its closest integer size;

calculating the square-root of length of list and assigning the integer part of the result to a first variable;

assigning the value of the integer incremented by 1, to a second variable;

assigning the product of the first and second variable to a third variable;

calculating the difference between the values of third variable and length of the list;

wherein if the difference is less than zero, incrementing the first variable i.e. integer part of square root of the length of list by 1 ;

calculating the new product of the variable in the third variable and subsequently, appending zero (or any other character) equal to the number in the difference of product and length of the list.

A method of decompression of data comprising:

reading the image pixel by pixel in a row major order and storing the sequence of pixels obtained in a list;

providing a corresponding floating-point number to each pixel; stopping the reading of the image on occurrence of a stream of zero characters or any other character while construction of image;

storing the stream of characters obtained which is the data, in a list; and

mapping the data obtained to the corresponding character as per floating-point code word or the ASCII Table.

6) The method as claimed in claim 5, wherein the floating-point code word is of variable length and trailing zeros can be ignored. 7) The method as claimed in claims 1 and 5, wherein data compression and decompression is of low precision and high precision type depending upon decimal places upto which data is successfully handled without information loss.

8) The method as claimed in claim 7, wherein low precision method handles data upto 7 decimal places while high precision method handles data upto 22 decimal places.