CRC implementation (packed Base64 encoded MD5) #2

Open
opened 2024-09-28 03:35:47 +02:00 by ae · 1 comment
Owner

Currently missing files need to be manually deleted after the download process has finished. By utilizing the md5 (24 character packed base64 encoded MD5) field the API provides, this could be streamlined. Besides file integrity checking this'd also be pretty useful for preventing duplicates.

The only issue is that I haven't yet quite figured out what's wrong with the conversion implementation. E.g. the following example should produce the same packed representation as the API does, but that's not the case in practice:

def md5(fname, hex=True):
    # construct a hash object by calling the appropriate constructor function
    hash_md5 = hashlib.md5()

    # open file in read-only byte-mode
    with open(fname, "rb") as f:
        # only read in chunks of size 4096 bytes
        for chunk in iter(lambda: f.read(4096), b""):
            # update it with the data by calling update() on the object
            # as many times as you need to iteratively update the hash
            hash_md5.update(chunk)

    # get digest out of the object by calling digest() (or hexdigest() for hex-encoded string)
    if hex:
        return hash_md5.hexdigest()
    else:
        return hash_md5.digest()

md5_cleartext = base64.b64encode(md5("file-xyz.jpg", hex=False)).decode("utf-8")
Currently missing files need to be manually deleted after the download process has finished. By utilizing the `md5` (24 character packed base64 encoded MD5) field the API provides, this could be streamlined. Besides file integrity checking this'd also be pretty useful for preventing duplicates. The only issue is that I haven't yet quite figured out what's wrong with the conversion implementation. E.g. the following example should produce the same packed representation as the API does, but that's not the case in practice: ```python def md5(fname, hex=True): # construct a hash object by calling the appropriate constructor function hash_md5 = hashlib.md5() # open file in read-only byte-mode with open(fname, "rb") as f: # only read in chunks of size 4096 bytes for chunk in iter(lambda: f.read(4096), b""): # update it with the data by calling update() on the object # as many times as you need to iteratively update the hash hash_md5.update(chunk) # get digest out of the object by calling digest() (or hexdigest() for hex-encoded string) if hex: return hash_md5.hexdigest() else: return hash_md5.digest() md5_cleartext = base64.b64encode(md5("file-xyz.jpg", hex=False)).decode("utf-8") ```
ae added the
enhancement
label 2024-09-28 03:35:47 +02:00
Author
Owner
- [API docs](https://github.com/4chan/4chan-API/blob/d9a619833e1ef31ca9bdc353989dc0b1dd99970f/pages/Threads.md) - [Reference implementation?](https://github.com/nilfoer/fourcdl/tree/master)
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: ae/dlrs#2
No description provided.