The OpenNET Project / Index page

[ новости /+++ | форум | wiki | теги | ]

Интерактивная система просмотра системных руководств (man-ов)

 ТемаНаборКатегория 
 
 [Cписок руководств | Печать]

bzip2 ()
  • >> bzip2 (1) ( Solaris man: Команды и прикладные программы пользовательского уровня )
  • bzip2 (1) ( FreeBSD man: Команды и прикладные программы пользовательского уровня )
  • bzip2 (1) ( Linux man: Команды и прикладные программы пользовательского уровня )
  • 
    NAME
         bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
         bzcat - decompresses files to stdout
         bzip2recover - recovers data from damaged bzip2 files
    
    
    SYNOPSIS
         bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ]
         bunzip2 [ -fkvsVL ] [ filenames ... ]
         bzcat [ -s ] [ filenames ... ]
         bzip2recover filename
    
    
    DESCRIPTION
         bzip2 compresses  files  using  the  Burrows-Wheeler  block-
         sorting  text  compression  algorithm,  and  Huffman coding.
         Compression  is  generally  considerably  better  than  that
         achieved  by  more conventional LZ77/LZ78-based compressors,
         and approaches the performance of the PPM family of statist-
         ical compressors.
    
         The command-line options are deliberately  very  similar  to
         those of GNU Gzip, but they are not identical.
    
         bzip2  expects  a  list  of  file  names  to  accompany  the
         command-line  flags.  Each  file is replaced by a compressed
         version of itself, with the name "original_name.bz2".   Each
         compressed  file  has the same modification date and permis-
         sions as the corresponding original, so that  these  proper-
         ties  can be correctly restored at decompression time.  File
         name handling is naive in the sense that there is no mechan-
         ism  for  preserving  original  file  names, permissions and
         dates in filesystems which  lack  these  concepts,  or  have
         serious file name length restrictions, such as MS-DOS.
    
         bzip2 and bunzip2 will by  default  not  overwrite  existing
         files; if you want this to happen, specify the -f flag.
    
         If no file names are specified, bzip2 compresses from  stan-
         dard  input  to  standard  output.  In this case, bzip2 will
         decline to write compressed output to a  terminal,  as  this
         would be entirely incomprehensible and therefore pointless.
    
         bunzip2 (or bzip2 -d ) decompresses and restores all  speci-
         fied  files  whose  names end in ".bz2".  Files without this
         suffix are ignored. Again,  supplying  no  filenames  causes
         decompression from standard input to standard output.
    
         bunzip2 will correctly decompress a file which is  the  con-
         catenation  of  two or more compressed files.  The result is
         the concatenation of the corresponding  uncompressed  files.
         Integrity  testing  (-t) of concatenated compressed files is
         also supported.
    
         You can also compress or decompress files  to  the  standard
         output  by  giving  the  -c  flag.   Multiple  files  may be
         compressed and decompressed like this.  The  resulting  out-
         puts  are fed sequentially to stdout.  Compression of multi-
         ple files in this manner generates a stream containing  mul-
         tiple compressed file representations.  Such a stream can be
         decompressed correctly only by bzip2 version 0.9.0 or later.
         Earlier  versions of bzip2 will stop after decompressing the
         first file in the stream.
    
         bzcat (or bzip2 -dc ) decompresses all  specified  files  to
         the standard output.
    
         Compression is always performed, even if the compressed file
         is  slightly  larger  than the original.  Files of less than
         about one hundred  bytes  tend  to  get  larger,  since  the
         compression  mechanism has a constant overhead in the region
         of 50 bytes.  Random data (including the output of most file
         compressors) is coded at about 8.05 bits per byte, giving an
         expansion of around 0.5%.
    
         As a self-check for your protection, bzip2 uses 32-bit  CRCs
         to  make  sure  that  the  decompressed version of a file is
         identical to the original. This guards against corruption of
         the  compressed  data,  and against undetected bugs in bzip2
         (hopefully very unlikely).  The chances of  data  corruption
         going  undetected  is  microscopic, about one chance in four
         billion for each file processed.  Be aware, though, that the
         check  occurs  upon  decompression,  so it can only tell you
         that that something is wrong.  It can't help you recover the
         original uncompressed data.  You can use bzip2recover to try
         to recover data from damaged files.
    
         Return values: 0 for a  normal  exit,  1  for  environmental
         problems  (file not found, invalid flags, I/O errors, &c), 2
         to indicate a corrupt compressed file,  3  for  an  internal
         consistency error (eg, bug) which caused bzip2 to panic.
    
    
    MEMORY MANAGEMENT
         Bzip2 compresses large files  in  blocks.   The  block  size
         affects  both the compression ratio achieved, and the amount
         of memory needed both  for  compression  and  decompression.
         The flags -1 through -9 specify the block size to be 100,000
         bytes through 900,000 bytes (the default) respectively.   At
         decompression-time,  the  block size used for compression is
         read from the header of the  compressed  file,  and  bunzip2
         then  allocates  itself just enough memory to decompress the
         file.  Since block sizes are stored in compressed files,  it
         follows  that  the  flags  -1 to -9 are irrelevant to and so
         ignored during decompression.  Compression and decompression
         requirements, in bytes, can be estimated as:
    
               Compression:   400k + ( 7 x block size )
    
               Decompression: 100k + ( 4 x block size ), or
                              100k + ( 2.5 x block size )
    
         Larger  block  sizes  give  rapidly   diminishing   marginal
         returns; most of the compression comes from the first two or
         three hundred k of block size, a fact worth bearing in  mind
         when using bzip2 on small machines.  It is also important to
         appreciate that the decompression memory requirement is  set
         at compression-time by the choice of block size.
    
         For files compressed with the default 900k block size,  bun-
         zip2  will require about 3700 kbytes to decompress.  To sup-
         port decompression of any file on a 4 megabyte machine, bun-
         zip2  has  an  option to decompress using approximately half
         this amount of memory,  about  2300  kbytes.   Decompression
         speed  is  also  halved,  so you should use this option only
         where necessary.  The relevant flag is -s.
    
         In general, try and use the largest block size  memory  con-
         straints   allow,   since  that  maximises  the  compression
         achieved.  Compression and decompression speed are virtually
         unaffected by block size.
    
         Another significant point applies to files which  fit  in  a
         single  block -- that means most files you'd encounter using
         a large block size.  The amount of real  memory  touched  is
         proportional  to  the  size  of  the file, since the file is
         smaller than a  block.   For  example,  compressing  a  file
         20,000 bytes long with the flag -9 will cause the compressor
         to allocate around 6700k of memory, but only  touch  400k  +
         20000  *  7 = 540 kbytes of it.  Similarly, the decompressor
         will allocate 3700k but only touch 100k + 20000 *  4  =  180
         kbytes.
    
         Here is a table which summarises the  maximum  memory  usage
         for  different  block  sizes.   Also  recorded  is the total
         compressed size for 14 files of the Calgary Text Compression
         Corpus  totalling  3,141,622  bytes.  This column gives some
         feel for how compression varies with block size.  These fig-
         ures  tend to understate the advantage of larger block sizes
         for larger files, since the Corpus is dominated  by  smaller
         files.
    
                    Compress   Decompress   Decompress   Corpus
             Flag     usage      usage       -s usage     Size
    
              -1      1100k       500k         350k      914704
              -2      1800k       900k         600k      877703
              -3      2500k      1300k         850k      860338
              -4      3200k      1700k        1100k      846899
              -5      3900k      2100k        1350k      845160
              -6      4600k      2500k        1600k      838626
              -7      5400k      2900k        1850k      834096
              -8      6000k      3300k        2100k      828642
              -9      6700k      3700k        2350k      828642
    
    
    OPTIONS
         -c --stdout
              Compress or decompress to  standard  output.   -c  will
              decompress  multiple  files  to  stdout,  but will only
              compress a single file to stdout.
    
         -d --decompress
              Force decompression.   bzip2,  bunzip2  and  bzcat  are
              really  the  same  program, and the decision about what
              actions to take is done on the basis of which  name  is
              used.   This  flag overrides that mechanism, and forces
              bzip2 to decompress.
    
         -z --compress
              The complement to -d: forces compression, regardless of
              the invokation name.
    
         -t --test
              Check integrity of the  specified  file(s),  but  don't
              decompress   them.    This   really  performs  a  trial
              decompression and throws away the result.
    
         -f --force
              Force overwrite of output files.  Normally, bzip2  will
              not overwrite existing output files.
    
         -k --keep
              Keep (don't delete) input files during  compression  or
              decompression.
    
         -s --small
              Reduce memory usage, for compression, decompression and
              testing.   Files  are  decompressed  and tested using a
              modified algorithm which only requires  2.5  bytes  per
              block byte.  This means any file can be decompressed in
              2300k of memory, albeit at about half the normal speed.
    
              During compression, -s selects a block  size  of  200k,
              which  limits  memory use to around the same figure, at
              the expense of your compression ratio.   In  short,  if
              your  machine  is  low on memory (8 megabytes or less),
              use -s for everything.  See MEMORY MANAGEMENT above.
    
         -v --verbose
              Verbose mode -- show the  compression  ratio  for  each
              file  processed.   Further  -v's increase the verbosity
              level, spewing out lots of information  which  is  pri-
              marily of interest for diagnostic purposes.
    
         -L --license -V --version
              Display the software version, license terms and  condi-
              tions.
    
         -1 to -9
              Set the block size to 100  k,  200  k  ..  900  k  when
              compressing.   Has  no  effect when decompressing.  See
              MEMORY MANAGEMENT above.
    
         --repetitive-fast
              bzip2 injects some small pseudo-random variations  into
              very  repetitive blocks to limit worst-case performance
              during compression.  If sorting runs into difficulties,
              the block is randomised, and sorting is restarted. Very
              roughly, bzip2 persists for three times as  long  as  a
              well-behaved  input would take before resorting to ran-
              domisation.  This flag makes it give up much sooner.
    
    
         --repetitive-best
              Opposite of --repetitive-fast; try a lot harder  before
              resorting to randomisation.
    
    
    RECOVERING DATA FROM DAMAGED FILES
         bzip2 compresses files in blocks,  usually  900kbytes  long.
         Each   block  is  handled  independently.   If  a  media  or
         transmission error causes a multi-block .bz2 file to  become
         damaged,  it may be possible to recover data from the undam-
         aged blocks in the file.
    
         The compressed representation of each block is delimited  by
         a  48-bit pattern, which makes it possible to find the block
         boundaries with reasonable certainty.  Each block also  car-
         ries  its  own  32-bit  CRC,  so  damaged blocks can be dis-
         tinguished from undamaged ones.
    
         bzip2recover is a simple program whose purpose is to  search
         for  blocks in .bz2 files, and write each block out into its
         own .bz2 file.  You can  then  use  bzip2  -t  to  test  the
         integrity of the resulting files, and decompress those which
         are undamaged.
    
         bzip2recover takes a single argument, the name of  the  dam-
         aged  file,  and writes a number of files "rec0001file.bz2",
         "rec0002file.bz2", etc,  containing  the  extracted  blocks.
         The  output  filenames are designed so that the use of wild-
         cards in subsequent processing -- for  example,  "bzip2  -dc
         rec*file.bz2  >  recovered_data"  --  lists the files in the
         "right" order.
    
         bzip2recover should be of most use dealing with  large  .bz2
         files,  as  these  will  contain many blocks.  It is clearly
         futile to use it on damaged single-block files, since a dam-
         aged block cannot be recovered.  If you wish to minimise any
         potential data loss through media  or  transmission  errors,
         you might consider compressing with a smaller block size.
    
    
    PERFORMANCE NOTES
         The sorting phase of compression  gathers  together  similar
         strings in the file.  Because of this, files containing very
         long runs  of  repeated  symbols,  like  "aabaabaabaab  ..."
         (repeated  several  hundred  times)  may compress extraordi-
         narily slowly.  You can use the  -vvvvv  option  to  monitor
         progress  in great detail, if you want.  Decompression speed
         is unaffected.
    
         Such pathological cases seem  rare  in  practice,  appearing
         mostly  in  artificially-constructed test files, and in low-
         level disk images.  It may be inadvisable to  use  bzip2  to
         compress  the  latter.  If  you  do  get a file which causes
         severe slowness in compression, try making the block size as
         small as possible, with flag -1.
    
         bzip2 usually  allocates  several  megabytes  of  memory  to
         operate  in, and then charges all over it in a fairly random
         fashion.  This means that performance, both for  compressing
         and  decompressing,  is  largely  determined by the speed at
         which your machine can  service  cache  misses.  Because  of
         this, small changes to the code to reduce the miss rate have
         been observed to give disproportionately  large  performance
         improvements.  I imagine bzip2 will perform best on machines
         with very large caches.
    
    
    CAVEATS
         I/O error messages are not as  helpful  as  they  could  be.
         Bzip2  tries hard to detect I/O errors and exit cleanly, but
         the details of what the problem  is  sometimes  seem  rather
         misleading.
    
         This  manual  page  pertains  to  version  0.9.0  of  bzip2.
         Compressed data created by this version is entirely forwards
         and backwards compatible with the previous  public  release,
         version  0.1pl2, but with the following exception: 0.9.0 can
         correctly decompress multiple concatenated compressed files.
         0.1pl2 cannot do this; it will stop after decompressing just
         the first file in the stream.
    
         Wildcard expansion for Windows 95 and NT is flaky.
    
         bzip2recover uses 32-bit integers to represent bit positions
         in  compressed  files,  so it cannot handle compressed files
         more than 512 megabytes long.  This could easily be fixed.
    
    
    AUTHOR
         Julian Seward, jseward@acm.org.
    
         http://www.muraroa.demon.co.uk
    
         The ideas embodied in bzip2 are due to (at least)  the  fol-
         lowing  people:   Michael Burrows and David Wheeler (for the
         block sorting transformation), David Wheeler (again, for the
         Huffman  coder),  Peter  Fenwick  (for the structured coding
         model in the original bzip, and many refinements), and Alis-
         tair Moffat, Radford Neal and Ian Witten (for the arithmetic
         coder in the original bzip). I am much  indebted  for  their
         help, support and advice.  See the manual in the source dis-
         tribution for pointers to sources of documentation.   Chris-
         tian  von  Roques  encouraged  me to look for faster sorting
         algorithms, so as to  speed  up  compression.   Bela  Lubkin
         encouraged  me to improve the worst-case compression perfor-
         mance.  Many people sent patches,  helped  with  portability
         problems,  lent  machines,  gave  advice  and were generally
         helpful.
    
    NOTES
         Source for bzip2 is available in the SUNWbzipS package.
    
    
    
    


    Поиск по тексту MAN-ов: 




    Спонсоры:
    PostgresPro
    Inferno Solutions
    Hosting by Hoster.ru
    Хостинг:

    Закладки на сайте
    Проследить за страницей
    Created 1996-2022 by Maxim Chirkov
    Добавить, Поддержать, Вебмастеру