Module hippy
source code
Calculate the sub-piece hashes for large package files.
Run this script in the directory where the extrapieces files are to be
stored. It's only command line argument is the Berkeley database
containing the cached data from previous runs. Pass the paths of Release
files to process into the standard input.
For example:
find /var/www/debian -maxdepth 3 -name "Release" | hippy ../hippycache.bdb
dictionary, list of
string
|
|
string, list of (string,
int)
|
|
int
|
|
list of string
|
|
string
|
|
string, list of (string,
int)
|
|
string
|
|
|
|
|
list of string
|
|
|
|
|
|
|
MAX_PIECE_SIZE = 524288
|
|
|
CHUNK_SIZE = 16384
|
|
|
EXTENSION = '.gz'
|
Imports:
bsddb,
sha,
binascii,
os,
sys,
gzip,
struct,
bz2.BZ2File,
math.ceil
Read the headers and Packages file names from a Release file.
- Parameters:
filename (string) - the Release file to read
- Returns:
dictionary, list of
string
- the headers and full file names of Packages files
|
Read a file and hash it's sub-pieces.
- Parameters:
file (file) - an already opened file-like object to read from
piece_size (int) - the piece size to divide the file into
- Returns:
string, list of (string,
int)
- the 40-byte hex representation of the SHA1 hash of the file,
and the 40-byte hex representation of the SHA1 hash of the piece
and the length of the piece, for each sub-piece of the file
|
|
Calculate the optimal piece size to use for a file.
The optimal piece size is the largest possible piece size such that
the piece size is larger than the extra piece, the piece size is a
multiple of the chunk size, and the difference between the piece size and
the extra piece size is a minimum.
This function currently contains an error, as it returns a non-optimal
piece size when the size is a multiple of the maximum piece size. This
error is kept for backwards compatibility with previous versions. To
correct it:
n = 1 + (size-1) / MAX_PIECE_SIZE
- Parameters:
size (long) - the file size
- Returns:
int
- the optimal piece size
|
|
Convert a cache value to a list of package names.
The cache is stored as a string. The list is a repeating sequence of
one byte length followed by a string of that length. Therefore, the
longest string that can be stored is 256.
- Parameters:
cache_value (string) - the cached value for this file
- Returns:
list of string
- the list of package names stored in the cache
|
Convert a list of package names to a cacheable value.
- Parameters:
deb_list (list of string) - the package names to create a cache value for
- Returns:
string
- the cacheable string
|
|
Convert a cache value to a list of sub-piece hashes.
The cache is stored as a string. The first 20 bytes are the SHA1 hash
of the entire file. Then there are repeating 24 byte sequences, the first
4 bytes being the length of the piece in network (big-endian) order, the
next 20 bytes being the SHA1 hash of the piece. If there are no
sub-pieces for the file, the cached string is empty.
- Parameters:
cache_value (string) - the cached value for this file
- Returns:
string, list of (string,
int)
- the 40-byte hex representation of the SHA1 hash of the file,
and the 40-byte hex representation of the SHA1 hash of the piece
and the length of the piece, for each sub-piece of the file
|
Convert a list of sub-piece hashes to a cacheable value.
- Parameters:
sha1 (string) - the 40-byte hex representation of the SHA1 hash of the
file
piece_list (list of (string,
int)) - for each sub-piece of the file, the 40-byte hex representation
of the SHA1 hash and the length of the piece
- Returns:
string
- the cacheable string
|
Calculate and print the sub-pieces for a single file.
- Parameters:
filename (String) - the file to calculate sub pieces for
|
Read the new piece data from a Packages file.
- Parameters:
filename (string) - the Packages file to open and parse
- Returns:
list of string
- the package files listed in the Packages file
|
Process a single Release file.
- Parameters:
cache (bsddb.BTree) - an already opened bDB b-tree
releasefile (string) - the Release file to process
|