Module hippy
[hide private]
[frames] | no frames]

Module hippy

source code

Calculate the sub-piece hashes for large package files.

Run this script in the directory where the extrapieces files are to be stored. It's only command line argument is the Berkeley database containing the cached data from previous runs. Pass the paths of Release files to process into the standard input.

For example:
   find /var/www/debian -maxdepth 3 -name "Release" | hippy ../hippycache.bdb


Functions [hide private]
dictionary, list of string
read_release(filename)
Read the headers and Packages file names from a Release file.
source code
string, list of (string, int)
hash(file, piece_size)
Read a file and hash it's sub-pieces.
source code
int
optimal_piece_size(size)
Calculate the optimal piece size to use for a file.
source code
list of string
cache2list(cache_value)
Convert a cache value to a list of package names.
source code
string
list2cache(deb_list)
Convert a list of package names to a cacheable value.
source code
string, list of (string, int)
cache2hash(cache_value)
Convert a cache value to a list of sub-piece hashes.
source code
string
hash2cache(sha1, piece_list)
Convert a list of sub-piece hashes to a cacheable value.
source code
 
sub_piece(filename)
Calculate and print the sub-pieces for a single file.
source code
list of string
get_packages(filename)
Read the new piece data from a Packages file.
source code
 
run(cache, releasefile)
Process a single Release file.
source code
Variables [hide private]
  MAX_PIECE_SIZE = 524288
  CHUNK_SIZE = 16384
  EXTENSION = '.gz'

Imports: bsddb, sha, binascii, os, sys, gzip, struct, bz2.BZ2File, math.ceil


Function Details [hide private]

read_release(filename)

source code 
Read the headers and Packages file names from a Release file.
Parameters:
  • filename (string) - the Release file to read
Returns: dictionary, list of string
the headers and full file names of Packages files

hash(file, piece_size)

source code 
Read a file and hash it's sub-pieces.
Parameters:
  • file (file) - an already opened file-like object to read from
  • piece_size (int) - the piece size to divide the file into
Returns: string, list of (string, int)
the 40-byte hex representation of the SHA1 hash of the file, and the 40-byte hex representation of the SHA1 hash of the piece and the length of the piece, for each sub-piece of the file

optimal_piece_size(size)

source code 

Calculate the optimal piece size to use for a file.

The optimal piece size is the largest possible piece size such that the piece size is larger than the extra piece, the piece size is a multiple of the chunk size, and the difference between the piece size and the extra piece size is a minimum.

This function currently contains an error, as it returns a non-optimal piece size when the size is a multiple of the maximum piece size. This error is kept for backwards compatibility with previous versions. To correct it:
   n = 1 + (size-1) / MAX_PIECE_SIZE
Parameters:
  • size (long) - the file size
Returns: int
the optimal piece size

cache2list(cache_value)

source code 

Convert a cache value to a list of package names.

The cache is stored as a string. The list is a repeating sequence of one byte length followed by a string of that length. Therefore, the longest string that can be stored is 256.
Parameters:
  • cache_value (string) - the cached value for this file
Returns: list of string
the list of package names stored in the cache

list2cache(deb_list)

source code 
Convert a list of package names to a cacheable value.
Parameters:
  • deb_list (list of string) - the package names to create a cache value for
Returns: string
the cacheable string

cache2hash(cache_value)

source code 

Convert a cache value to a list of sub-piece hashes.

The cache is stored as a string. The first 20 bytes are the SHA1 hash of the entire file. Then there are repeating 24 byte sequences, the first 4 bytes being the length of the piece in network (big-endian) order, the next 20 bytes being the SHA1 hash of the piece. If there are no sub-pieces for the file, the cached string is empty.
Parameters:
  • cache_value (string) - the cached value for this file
Returns: string, list of (string, int)
the 40-byte hex representation of the SHA1 hash of the file, and the 40-byte hex representation of the SHA1 hash of the piece and the length of the piece, for each sub-piece of the file

hash2cache(sha1, piece_list)

source code 
Convert a list of sub-piece hashes to a cacheable value.
Parameters:
  • sha1 (string) - the 40-byte hex representation of the SHA1 hash of the file
  • piece_list (list of (string, int)) - for each sub-piece of the file, the 40-byte hex representation of the SHA1 hash and the length of the piece
Returns: string
the cacheable string

sub_piece(filename)

source code 
Calculate and print the sub-pieces for a single file.
Parameters:
  • filename (String) - the file to calculate sub pieces for

get_packages(filename)

source code 
Read the new piece data from a Packages file.
Parameters:
  • filename (string) - the Packages file to open and parse
Returns: list of string
the package files listed in the Packages file

run(cache, releasefile)

source code 
Process a single Release file.
Parameters:
  • cache (bsddb.BTree) - an already opened bDB b-tree
  • releasefile (string) - the Release file to process