Key-Value Store

序列化: 把内存中的数据结构转换成可以储存/传输的格式序列化前: 是在内存中的复杂数据(不同类型，不同结构) 序列化后: 统一的字节流，所有数据都变成了0和1的序列文件系统只能储存字节，不能直接储存Python对象或者Java对象网络只能传输字节流，不能传输编程语言特定的对象纯字节流无法区分边界: 0101010010001 解决办法: 长度前缀

假设我们要存储 Key = “A”, Value = “BC”: 原始数据: key = “A” (1个字节), Value = “BC” (2个字节) 序列化后的字节流(4个字节, 1个字节是8位):

key的长度前缀(len(‘A’).to_bytes(‘4’, big)): 00000001 00000000 00000000 00000000
key的值(A.encode(‘utf-8’)): 01000001
value的长度前缀(len(‘BC’).to_bytes(‘4’, big)): 00000010 00000000 00000000 00000000
value的值(‘BC’.encode(‘utf-8’)): 01000010 01000011

核心是hashmap到字节流的双向转换这其中，长度前缀是固定长度，实际数据并不是固定长度

Follow Up

随便切: 按固定大小切割，不管数据边界
记录信息: 用metadata记录切了几块
先拼接: restore时按顺序拼接回完整字节流
再解析: 对完整数据按原协议反序列化

The core task is to implement an im-memory Key-Value (KV) store class that support persistence. This means the data held in memory (e.g., in hash map or dictionary) must be able to be saved to a storage “medium” and then loaded back

Core Requirements (API)

You need to implement a class with the following methods:

put(key, value) or set(key, value): Adds or updates a key-value pair in the in-memory store. Keys and values are typically string and/or integers.
get(key): Retrieves the value associated with a given key
shutdown(): This method is responsible for serializing the entire in-memory KV store in to a byte array and writing it to the persistent medium using a provided helper function
restore(): This method reads the byte array from the persistent medium and deserializes it tot reconstruct the KV store in memory

Key Contraints & Environment

provided helpers: you are given helper functions. You do need to handle file I/O (like opening files, naming them, or choosing paths).
- Storage helpers: Functions like save_blob(bytes) and get_blob() are provided to hanlde the actual writing and reading of byte arrays to the persistent store
- Type Conversion helpers: Utility functions are provided to convert primitive types like strings and integers to and from byte arrays (e.g., string_to_bytes, bytes_to_int).
Major Restriction: You are explicitly forbidden from using standard, high-level serilization libraries like JSON, Python’s pickle, or Java’s built-in serialization. The main point of exercise is for you to design your own custom serialization protocol

def string_to_bytes(value: str) -> int:
    """
    convert string to bytes when store
    """
    pass

def bytes_to_string(value: int) -> str:
    """
    convert bytes to string when get
    """
    pass

def save_blob(bytes: int) -> None:
    """
    save the bytes
    """
    pass

def get_blob() -> None:
    """
    get the bytes
    """
    pass
class KVstore:
    def __init__(self):
        self.dictionary = []

    def put(self, key: str, value: str) -> None:
        self.dictionary[key] = value

    def get(self, key: str) -> str:
        return selc.dictionary[key]

    def shutdown(self):
        bytes = []
        for key, value in self.dictionary:
            key_len = len(key)
            serialized_key_len = string_to_bytes(4, 'big') 
            # serialized_key_len = key_len.to_bytes(4, 'big')
            # integer.to_bytes(length, byteorder)是python整数对象内置的方法，用于将整数转换成字节
            bytes.append(serialized_key_len)
            bytes.append(string_to_bytes(key))
            # bytes.append(key.encode())
            # string.key_encode()是python字符串对象内置方法，用于将字符串转换成字节
            value_len = len(value)
            serialized_value_len = string_to_bytes(4, 'big')
            bytes.append(serialized_value_len)
            bytes.append(string_to_bytes(value))
        save_blob(bytes)
        # 字符,字符串: save_blob(''.join(bytes)) '': 空字符串
        # 字节;字节串: svae_blob(b''.join(bytes)) b'': 空字节串
        # save_blob(b''.join(bytes))
    def restore(self):
        bytesArryas = get_blob()

class Bucket:
    def save_blob(self, filename, data):
        pass
    
    def load_blob(self, filename):
        pass

class KVStore:
    def __init__(self):
        self.store = {} 
        self.bucket = Bucket()
        # {} 创建字典(dictionary/map)
        # [] 创建列表(list)

    def put(self, key, value):
        self.store[key] = value

    def get(self, key):
        return self.store.get(key) 
        # self.store[key]: 属于字典的操作, 如果key不存在会跑出KeyError异常
        # self.store.get(key): 也属于字典操作，如果key不存在就返回None, 更安全

    def shutdown(self):
        data = bytearray() # 可变的字节数组，可以动态添加字节数据
        for key, value in self.store.items(): 
            # items()同时返回键值对，如果是for key in self.store只会遍历键
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')
            # string.encode(): 把字符串转换成字节

            # 先转换成字节后，再对字节长度序列化，而不是对string的长度序列化
            # 先转换成字节再计算长度，因为不同编码想啊统一字符串的字节长度可能不同
            data.extend(len(key_bytes).to_bytes(4, 'big'))
            data.extend(key_bytes)
            data.extend(len(value_bytes).to_bytes(4, 'big'))
            data.extend(value_bytes)
        self.bucket.save_blob("data", bytes(data)) 
        # bytes和bytearray都是字节串，bytearray可变，bytes不可变
        
    
    def restore(self):
        data = self.bucket.load_blob("data")
        self.store = {}
        i = 0

        while i < len(data):
            # Read key
            key_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            key = data[i:i + key_len].decode('utf-8')
            i += key_len

            # Read value
            value_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            value = data[i:i + value_len].decode('utf-8')
            i += value_len

            self.store[key] = value
# Follow-up: Multiple files when size > 1KB
class KVstoreMultiFile:
    def __init__(self):
        self.store = {}
        self.bucket = Bucket()
    
    def put(self, key, value):
        self.store[key] = value
    
    def get(self, key):
        return self.store.get(key)
    
    def shutdown(self):
        data = bytearray()
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')
            data.extend(len(key_bytes).to_bytes(4, 'big'))
            # string.encode(): encode是字符串到字节
            # integer.to_bytes(): to_bytes是整数到字节，固定长度，让反序列化时知道读几个字节。
            data.extend(key_bytes)
            data.extend(len(value_bytes).to_bytes(4, 'big'))
            data.extend(value_bytes)

        chunk_size = 1024 # 1KB
        chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]
        self.bucket.save_blob("metadata", str(len(chunks)).encode())

        for i, chunk in enumerate(chunks):
            self.bucket.save_blob(f"data_{i}", chunk)
        
    def restore(self):
        # Read metadata
        num_chunks = int(self.bucket.get_blob("metadata").decode())

        # Read and Combine chunks
        data = bytearray()
        for i in range(num_chunks):
            data.extend(self.bucket.get_blob(f"data_{i}"))
        
        # Deserialize
        self.store = {}
        i = 0
        while i < len(data):
            # Read key
            key_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            key = data[i:i + key_len].decode('utf-8')
            i += key_len

            # Read value
            value_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            value = data[i:i + value_len].decode('utf-8')
            i += value_len
        
            self.store[key] = value

# Exercise 1
class Bucket:
    def save_blob(self, data):
        pass

    def get_blob(self):
        pass

class KVStore:
    def __init__(self):
        self.store = {} 
        self.bucket = Bucket()
    # def __init__(self, store={}):
    #     self.store = store
    # 问题: 所有实例会共享同一个字典！
    # kv1 = KVStore()
    # kv2 = KVStore()
    # kv1.put("key", "value")
    # print(kv2.get("key"))  # 意外地输出"value"！因为默认参数旨在函数定义的时候创建一次，此时store = {}是默认参数，所以永远不要用可变对象作为默认参数

    def put(self, key: str, value: str) -> None:
        self.store[key] = value
    
    def get(self, key: str) -> str:
        return self.store.get(key)
    
    def shutdown(self):
        data = bytes() # 可变字节流 #是bytearray()不是bytes()
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')

            key_bytes_len = len(key_bytes).to_bytes(4, big) # 是(4, 'big')不是(4, big)
            value_bytes_len = len(value_bytes).to_bytes(4, big)
            data.append(key_bytes_len) #用extend不是append
            data.append(key_bytes)
            data.append(value_bytes_len)
            data.append(value_bytes)
        self.bucket.save_blob(b''.join(data)) #转换成bytes: self.bucket.save_blob(bytes(data))

    def restore(self):
        data = self.bucket.get_blob()
        # 清空现有数据: self.store = {}
        i = 0
        while i < len(data):
            key_len = data[i:i + 4].from_bytes(4, big) 
            # 是int.from_bytes(data[i:i+4], 'big')
            # i 后面是会 + 4的，不用担心怎么实现每次跳4的遍历
            i += 4
            key = data[i:i + key_len].decode('utf-8')
            i += key_len

            value_len = data[i:i+4].from_bytes(4, big)
            i += 4
            value = data[i:i + value_len].decode('utf-8')
            #移动指针: i += value_len

            self.store[key] = value

# Follow Up:
class Bucket:
    def save_blob(bytes):
        pass

    def get_blob(bytes):
        pass

class KVStore:
    def __init__(self):
        self.store = {} 
        self.bucket = Bucket()
    # def __init__(self, store={}):
    #     self.store = store
    # 问题: 所有实例会共享同一个字典！
    # kv1 = KVStore()
    # kv2 = KVStore()
    # kv1.put("key", "value")
    # print(kv2.get("key"))  # 意外地输出"value"！因为默认参数旨在函数定义的时候创建一次，此时store = {}是默认参数，所以永远不要用可变对象作为默认参数

    def put(self, key: str, value: str) -> None:
        self.store[key] = value
    
    def get(self, key: str) -> str:
        return self.store.get(key)
    
    def shutdown(self):
        data = bytes() # 可变字节流
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = vaule.encode('utf-8')

            key_bytes_len = len(key_bytes).to_bytes(4, big)
            value_bytes_len = len(value_bytes).to_bytes(4, big)
            data.append(key_bytes_len)
            data.append(key_bytes)
            data.append(value_bytes_len)
            data.append(value_bytes)
        chunk_size = 1024
        chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)]

        self.bucket.save_blob("metadata", str(len(chunks)).encode('utf-8'))

        for i in range(len(chunks)):
            self.bucket.save_blob(f"data_{i}", chunks[i])

    def restore(self):
        num_chunks = int(self.bucket.get_blob("metadata").decode('utf-8'))

        data = bytearray()
        for i in range(num_chunks):
            data.extend(self.bucket.get_blob(f'data_{i}'))
        i = 0
        self.store = {}
        while i < len(data):
            key_len = data[i:i + 4].from_bytes(4, big)
            # i 后面是会 + 4的，不用担心怎么实现每次跳4的遍历
            i += 4
            key = data[i:i + key_len].decode('utf-8')
            i += key_len

            value_len = data[i:i+4].from_bytes(4, big)
            i += 4
            value = data[i:i + value_len].decode('utf-8')

            self.store[key] = value

GPU Credit Scheduling System

You need to implement a class with the following three methods:

add_credit(grant_id: str, amount: int, timestamp: int, expire: int): This function records that a certain amount of credit, identified by grant_id, becomes available at timestamp. The expire value is a duration, meaning the credit is valid for the time interval [timestamp, timestamp + expire]
charge(amount: int, timestamp: int): This function records a usage of a certain amount of credit at a specific timestamp
get_balance(timestamp: int) -> int: This function calculates and returns the total available credit at a given timestamp

grant_id: 授权ID, 每个信用额度的唯一标识符，区分不同批次的信用额度 timestamp: 时间戳, 记录事件发生的具体时间点

import heapq
class GPUSolution:
    def __init__(self):
        self.events = [] # 存储所有事件

    def add_credit(self, grant_id: str, amount: int, timestamp: int, expire: int):
        # 只记录事件，不计算
        self.events.append(('add', timestamp, amount, timestamp + expire, grant_id))
    
    def charge(self, amount: int, timestamp: int):
        # 只记录事件，不计算
        self.events.append(('charge', timestamp, amount))

    def get_balance(self, timestamp: int) -> int: 
        # 在timestamp这个时间点的余额: 计算timestamp时间点及之前发生的事件，返回timestamp时刻的信用余额
        # 筛选并排序事件
        valid_events = [e for e in self.events if e[1] <= timestamp]
        valid_events.sort(key = lambda x: x[1]) # 按时间戳排序
        # 从大到小排序: valid_events.sort(key = lambda x: x[1], reverse = True)

        # 最小堆(过期时间， 剩余金额，grant_id)
        active_grants = []

        # 按时间顺序处理事件
        for event in valid_events:
            if event[0] == 'add':
                _, ts, amount, expire_ts, grant_id = event
                if expire_ts > timestamp: # 还没过期
                    heapq.heappush(active_grants, (expire_ts, amount, grant_id))
            elif event[0] == 'charge':
                _, ts, charge_amount = event
                remaining_charge = charge_amount
                
                # 优先消耗最早过期的credit
                temp_grants = []
                while active_grants and remaining_charge > 0:
                    expire_ts, available, grant_id = heapq.heappop(active_grants) # 自动跳出第一个位置数字最小的tuple

                    if available <= remaining_charge:
                        remaining_charge -= available
                    else:
                        # 部分消耗
                        temp_grants.append((expire_ts, available - remaining_charge, grant_id))
                        remaining_charge = 0
                # 把剩余的grant放回去
                for grant in temp_grants:
                    heapq.heappush(active_grants, grant)
        total = 0
        for expire_ts, amount, _ in active_grants:
            if expire_ts > timestamp:
                total += amount
        return total

# exercise
import heapq
class GPUSolution:
    def __init__(self):
        self.events = []
    
    def add_credit(self, grant_id: str, amount: int, timestamp: int, expire: int):
        self.events.append(('add', timestamp, amount, timestamp + expire, grant_id))

    def charge(self, amount: int, timestamp: int):
        self.events.append(('charge', timestamp, amount))

    def get_balance(self, timestamp: int) -> int:
        valid_events = [e for e in self.events if e[1] <= timestamp]
        valid_events.sort(key = lambda x: x[1])
        # valid_events是记录已经发生的事件
        # activate_grants是已经发生的事件中未过期的事件
        activate_grants = []
        for event in valid_events:
            if event[0] == 'add':
                _, ts, amt, exp_ts, grant_id = event
                # 
                if exp_ts > timestamp:
                    # 已经发生的事件中未过期的事件
                    # 如果是for event in events，就有可能是没有发生的事件中未过期的
                    heapq.heappush(activate_grants, (exp_ts, amt, grant_id))
            elif event[0] == 'charge':
                _, ts, charge_amount = event
                remaining_charge = charge_amount
                while activate_grants and remaining_charge > 0:
                    exp_ts, available, grant_id = heapq.heappop(activate_grants)
                    if available < remaining_charge:
                        remaining_charge -= available
                    else:
                        heapq.heappush(activate_grants, (exp_ts, available - remaining_charge, grant_id))
                        remaining_charge = 0
        total = 0
        for e in activate_grants:
            total += e[1]
        return total

Machine Tree

分布式算法计算树的节点数量传统算法: 所有节点在同一台电脑的内存中，可以直接用root.children[0].children[1].count()来计算分布式算法: 节点在不同电脑中，需要定义通信函数sendAsyncMessage()来获取子节点数量

同步: A先等B回复，再等C回复(串行) 异步: A同时等B和C回复(并行，更快)

You are given a system of machines organized in a tree structure. Each machine is a node in the tree, has a unique nodeId, and knows about its children and its parent (except for the root node)

You need to implement the logic for two main functionalities:

count: Count the total number of machines (nodes) in the tree.
topology: Generate a string representation of the entire tree structure (e.g., using nested maps or a similar format)

一个机器 = 一个节点 = 一个唯一的nodeId

count_request - 收到计数请求，计算一下这个字数有多少个节点的请求 ```python
message结构

message = { ‘type’: ‘count_request’, ‘request_id’: ‘uuid-1234’, ‘from’: ‘2’ # 从节点2发来的，如果是跟节点，那就是None }

sendAsyncMessage(self.parent, { ‘type’: ‘count_response’, ‘request_id’: ‘uuid-1234’, ‘from’: self.nodeId, # 当前节点Id ‘result’: 5 # 当前节点的数量 })

if msg_type == ‘count_request’: if not self.children: # 叶子节点 if message[‘from’]: sendAsyncMessage(message[‘from’], { ‘type’: ‘count_response’, ‘request_id’: request_id, ‘from’: self.nodeId, ‘result’: 1 }) else: # 有子节点 # 我需要问所有的孩子，等他们都回复后再统计 self.pending[request_id] = len(self.children) self.results[request_id] = {} for child in self.children: sendAsyncMessage(child, { ‘type’: ‘count_request’, ‘request_id’: request_id, ‘from’: self.nodeId # self.children是针对self.node的，就是说是是self.node的children })

2. count_response - 收到子节点的计数回复

我的某一个子节点告诉我，我这个字数有多少个节点

```python
# self.result的数据结构
self.results = {
    'request_id_1': {
        'child_node_1': result_from_child1,
        'child_node_2': result_from_child2,

    },
    'request_id_2': {
        'child_node_3': result
    }'
}

count_response的result数据结构

message = {
    'type': 'count_response',
    'request_id': 'uuid-1234',
    'from': 2,
    'result': 3 # 节点2的子数一共有3个节点
}

if msg_type = 'count_response':
    self.results[request_id][message['from']] = message['result']
    self.pending[request_id] -= 1

    if self.pending[request_id] == 0: # 所有孩子都回复了
        total = sum(self.results[request_id].values()) + 1 # 孩子们的总数 + 我自己

        # 告诉我的父亲节点，我这个字数共有total个节点
        if self.parent:
            sendAsyncMessage(self.parent, {
                'type': 'count_response',
                'request_id': request_id,
                'from': self.nodeId,
                'result': total
            })
        else:
            print(f"final count: {total}")

topology_request 收到拓扑请求当前节点收到请告诉我你这个字数的结构的请求

if msg_type == 'topology_request':
 if not self.children: # 叶子节点
     # 直接回复，结构式是我的{Id: {}}
     if message['from']:
         sendAsyncMessage(message['from'], {
             'type': 'topology_response',
             'request_id': request_id,
             'from': self.nodeId,
             'result': {self.nodeId: {}}
         })
 else: # 有子节点
     # 我需要问我所有的孩子的结构，然后组合起来
     self.pending[request_id] = len(self.children)
     self.results[request_id] = {}
     for child in self.children:
         sendAsyncMessage(child, {
             'type': 'topology_request',
             'request_id': request_id,
             'from': self.nodeId
         }) 

topology_response - 收到子节点的拓扑回复我的某个子节点告诉我这个字数的结构是这样的

if msg_type == 'topology_response':
 self.results[request_id][message['from']] = message['result']
 self.pending[request_id] -= 1

 if self.pending[request_id] == 0:
     total = {self.node: {}}
     for child_topo in self.results[request_id].values():
         total[self.nodeId].update(child_topo)
        
     if self.parent:
         sendAsyncMessage(self.parent, {
             'type': 'topology_response',
             'request_id': request_id,
             'from': self.nodeId,
             'result': total
         })
     else:
         print(f"Final topology: {total}")

```python import uuid

def sendAsyncMessage(nodeId, message): “"”Simulate sending message to another machine””” # 是一个黑盒函数，将消息异步发送给指定的nodeId机器 # 那台机器回自动调用自己的receiveMessage方法来处理这个消息 # 我主要需要实现receiveMessage方法，来处理四种消息类型 # count_request: 收到计数请求，计算当前节点的子树有多少个节点 # count_response: 收到子节点的计数回复 pass

class MachineTree: def init(self, nodeId, children = None, parent = None): self.nodeId = nodeId self.children = children or [] self.parent = parent self.pending = {} self.results = {}

def count(self):
    request_id = str(uuid.uuid4())
    self.receiveMessage({
        'type': 'count_request',
        'request_id': request_id,
        'from': None
    })
    return request_id

def topology(self):
    request_id = str(uuid.uuid4())
    self.receiveMessage({
        'type': 'topology_request', 
        'request_id': request_id,
        'from': None
    })
    return request_id

def receiveMessage(self, message):
    msg_type = message['type']
    request_id = message['request_id']

    if msg_type in ['count_request', 'topology_request']:
        if not self.children:
            result = 1 if msg_type == 'count_request' else {self.nodeId: {}}
            self._send_response(message['from'], request_id, msg_type.replace('request', 'response'), result)
        else:
            self.pending[request_id] = len(self.children)
            self.results[request_id] = {}
            for child in self.children:
                sendAsyncMessage(child, {
                    'type': msg_type,
                    'request_id': request_id,
                    'from': self.nodeId
                })
    elif msg_type in ['count_response', 'topology_response']:
        self.results[request_id][message['from']] = message['result']
        self.pending[request_id] -= 1
        if self.pending[request_id] == 0:
            if msg_type == 'count_response':
                total = sum(self.results[request_id].values()) + 1
            else:
                total = {self.nodeId: {}}
                for child_topo in self.results[request_id].values():
                    total[self.nodeId].update(child_topo)
            
            if self.parent:
                self._send_response(self.parent, request_id, msg_type, total)
            else:
                print(f"Final result: {total}")
            
            del self.pending[request_id]
            del self.results[request_id]

def _send_response(self, to_node, request_id, msg_type, result):
    if to_node:
        sendAsyncMessage(to_node, {
            'type': msg_type,
            'request_id': request_id,
            'from': self.nodeId,
            'result': result
        })

def main(): # Create tree: root(1) -> [child1(2), child2(3)] -> [grandchild(4)] root = MachineTree(“1”) child1 = MachineTree(“2”, parent=”1”) child2 = MachineTree(“3”, parent=”1”) grandchild = MachineTree(“4”, parent=”2”)

root.children = ["2", "3"]
child1.children = ["4"]
# Test count
print("Starting count...")
root.count()

# Test topology
print("Starting topology...")
root.topology()

if name == “main”: main()

```python
import uuid
class MachineTree:
    def __init__(self, nodeId, children = None, parent = None):
        self.nodeId = nodeId
        self.children = children
        self.parent = parent
        self.pending = {}
        self.result = {}

    def count(self):
        request_id = str(uuid.uuid4()) # 一个树一个
        # nodeId 是节点的标识符号，request_id是通信的标识符号
        self.receiveMessage({
            'type': 'count_request',
            'request_id': request_id,
            'from': None
        })
        return requst_id

    def topology(self):
        request_id = str(uuid.uuid4())
        self.receiveMessage({
            'type': 'topology_request',
            'request_id': request_id,
            'from': None
        })
        return request_id

    def receiveMessage(self, message):
        msg_type = message['type']
        request_id = message['request_id']

        if msg_type in ['count_request', 'topology_request']:
            if not self.children:
                # leaf node
                result 1 if msg_type == 'count_request' else {self.nodeId:{}}
                self._send_response(message['from'], request_id, msg_type.replace('request', 'response'), result)
            else:
                # not leaf node
                self.pending[request_id] = len(self.children) 
                # 等待多少个子节点回复
                # 如果节点有三个子节点，pending[request_id] = 3
                # 收到一个回复, pending[request_id] -= 1
                self.results[request_id] = {}
                for child in self.children:
                    sendAsyncMessage(child, {
                        'type': msg_type,
                        'request_id': request_id,
                        'from': self.nodeId
                    })
        elif msg_type in ['count_response', 'topology_response']:
            self.results[request_id][message['from']] = message['result']
            self.pending[request_id] -= 1
            if self.pending[request_id] == 0:
                if msg_type == 'count_response':
                    total = sum(self.results[request_id].values()) + 1
                else:
                    total = {self.nodeId: {}}
                    for child_topo in self.results[request_id].values():
                        total[self.nodeId].update(child_topo)
                if self.parent:
                    self._send_response(self.parent, request_id, msg_type, total)
                else:
                    print(f"Final result: {total}")

                del self.pending[request_id]
                del self.results[request_id]

    def _send_response(self, to_node, request_id, msg_type, result):
        # 前缀 _ 表示内部使用
        # to_node: 目标节点ID 
        if to_node:
            sendAsyncMessage(to_node, {
                'type': msg_type,
                'request_id': request_id,
                'from': self.nodeId,
                'result': result
            })

def main():
    root = MachineTree('1', children = ['2', '3'], parent = None)
    grandChildren = MachineTree('4', children = None, parent = ['2'])


if __name__ == "__main__":
    main()

# exercise 1
import uuid
def sendAsyncMessage(nodeId, message):
    pass

class MachineTree:
    def __init__(self, nodeId, children = None, parent = None):
        self.nodeId = nodeId
        self.children = children
        self.parent = parent
        self.pending = {}
        self.results = {}
        # self.request_id # request_id是动态生成的，只有发起请求的时候才需要(count的时候才需要)，init是系统启动阶段

    def count(self):
        # 生成全局唯一的request_id - 整个分布式计算的标识符
        # 启动receiveMessage处理链 - 出发递归的消息传递
        # 设置出事状态 - from: None
        # 返回ID提供跟踪
        request_id = str(uuid.uuid4())
        self.receiveMessage({
            'type': 'count_request',
            'request_id': request_id,
            'from': None,
        })
        return request_id

    def topology(self):
        request_id = str(uuid.uuid4())
        self.receiveMessage({
            'type': 'topology_request',
            'request_id': request_id,
            'from': None,
        })
        return request_id

    def receiveMessage(self, message):
        request_id = message['request_id']
        msg_type = message['type']

        if msg_type == 'count_request':
            if not self.children:
                if message['from']:
                    sendAsyncMessage(self.nodeId, { # 发送给请求方: message['from']
                        'type': 'count_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': 1
                    })
                else:
                    # 如果没有父节点，直接打印
                    print(f"Final count: 1")
            else:
                self.pending[request_id] = len(self.children)
                # 初始化results  self.results[request_id] = {}

                for child in self.children:
                    sendAsyncMessage(self.nodeId, { # 发送给子节点 child
                        'type': 'count_request',
                        'request_id': request_id,
                        'from': child # 发送方是self.nodeId
                    })
        elif msg_type == 'topology_request':
            if not self.children:
                if message['from']:
                    sendAsyncMessage(message['from'], {
                        'type': 'topology_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': {self.nodeId: {}}
                    })
            else:
                self.pending[request_id] = len(self.children)
                self.results[request_id] = {}

                for child in self.children:
                    sendAsyncMessage(child, {
                        'type': 'topology_request',
                        'request_id': request_id,
                        'from': self.nodeId
                    })
        elif msg_type == 'count_response':
            self.results[request_id][message['from']] = message['result']
            self.pending[request_id] -= 1

            # 检查是否收齐: if self.pending[request_id] == 0: to
            if self.pending[request_id] == 0:
                total = sum(self.results[request_id].values()) + 1
                
                # 如果有付节点，发给父节点
                if self.parent:
                    sednAsyncMessage(self.parent, {
                        'type': 'count_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': total
                    })
                else:
                    print(f"Final count: {total}")

                del self.pending[request_id]
                del self.results[request_id]
        elif msg_type == 'topology_response':
            self.results[request_id][message['from']] = message['result']
            self.pending[request_id] -= 1

            if self.pending[request_id] == 0:
                total = {self.nodeId: {}}
                for child_topo in self.results[request_id].values():
                    total[self.nodeId].update(child_topo)

                if self.parent:

IPv4

class IPV4Iterator:
    def __init__(self, start_ip):
        self.parts = [int(p) for p in start_ip.split('.')]
        self.exhausted = false

    def __iter__(self):
        return self
    
    def __next__(self):
        if self.exhausted:
            raise StopIteration
        
        current_ip = ".".join(map(str, self.parts))
        
        if self.parts == [255, 255, 255, 255]:
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 255:
                    self.parts[i] = 0
                else:
                    self.parts[i] += 1
                    break
        return current_i

Nov 13

### exercise
class Bucket:
    def save_blob(self):
        pass

    def read_blob(self):
        pass

class KVStore:
    def __init__(self):
        self.store = {}
        self.bucket = Bucket()

    def get(self, key):
        return self.store.get(key)
    
    def put(self, key, value):
        self.store[key] = value

    def store(self):
        data = bytearray()
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')

            data.extend(len(key_bytes).to_bytes(4, 'big'))
            data.extend(key_bytes)
            data.extend(len(value_bytes).to_bytes(4, 'big'))
            data.extend(value_bytes)
        self.bucket.save_blob(bytes(data))
    
    def destore(self):
        data = self.bucket.read_blob()
        i = 0
        while i < len(data):
            key_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            key = data[i: i + key_len].decode('utf-8')
            i += key_len

            value_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            value = data[i: i + value_len].decode('utf-8')
            i += value_len

            self.store[key] = value

class Bucket:
    def save_blob(self, key, data):
        pass

    def get_blob(self):
        pass

class KVStore:
    def __init__(self):
        self.store = {}
        self.bucket = Bucket()
    
    def get(self, key):
        return self.store.get(key)
    
    def put(self, key, value):
        self.store[key] = value

    def shutdown(self):
        data = bytearray()
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')

            data.extend(len(key_bytes).to_bytes(4, 'big'))
            data.extend(key_bytes)
            data.extend(len(value_bytes).to_bytes(4, 'big'))
            data.extend(value_bytes)
        chunk_size = 1024
        chunks = [data[i: i + chunk_size] for i in range(0, len(data)), chunk_size]

        self.bucket.save_blob("metadata", str(len(chunks)).encode('utf-8'))

        for i in range(len(chunks)):
            self.bucket.save_blob(f"data_{i}", chunks[i])
    
    def restore(self):
        num_chunks = int(self.bucket.get_blob("metadata").decode('utf-8'))
        data = bytearray()
        for i in range(num_chunks):
            data.extend(self.bucket.get_blob(f"data{i}"))
        self.store = {}
        i = 0

        while i < len(data):
            key_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            key = data[i: i + key_len].decode('utf-8')
            i += key_len

            value_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            value = data[i: i + value_len].decode('utf-8')
            i += value_len

            self.store[key] = value

import heapq
class GPUSolution:
    def __init__(self):
        self.events = []

    def add(self, amount, timestamp, expire, grant_id):
        self.events.append(('add', timestamp, timestamp + expire, amount, grant_id))
    
    def charge(self, amount, timestamp):
        self.events.append(('charge', timestamp, amount, grant_id))
    
    def get_balance(self, timestamp, grant_id):
        valid_events = [event for event in self.events if event[1] <= timestamp]
        valid_events.sort(lambda x: x[1])
        activate_events = []
        for event in valid_events:
            if event[0] == 'add':
                
            elif event[0] == 'charge':        

class Bucket:
    def save_blob(self, key, value):
        pass

    def get_blob(self, key):
        pass

class KVStore:
    def __init__(self):
        self.store = {}
        self.bucket = Bucket()

    def get(self, key):
        return self.store.get(key)
    
    def put(self, key, value):
        self.store[key] = value

    def shutdown(self):
        data = bytearray()
        for key, value in self.store.items():
            key_bytes = key.encode('utf-8')
            value_bytes = value.encode('utf-8')

            data.extend(len(key_bytes).to_bytes(4, 'big'))
            data.extend(key_bytes)
            data.extend(len(value_bytes).to_bytes(4, 'big'))
            data.extend(value_bytes)
        chunk_size = 1024
        chunks = [data[i: i + chunk_size] for i in range(0, len(data), chunk_size)]
        self.bucket.save_blob("metadata", str(len(chunks)).encode('utf-8'))

        for i in range(len(chunks)):
            self.bucket.save_blob(f"data_{i}", chunks[i])
        # self.bucket.save_blob("kvstore_data", bytes(data)) 
        # 是bytes不是byte, 因为最后转换成的是字节流
        # bytearray不是bytesarray，因为单位是byte，最后依靠array形成字节流
        # get_blob和save_blob都有key的

    def restore(self):
        num_chunk = int(self.bucket.get_blob("metadata").decode('utf-8'))
        data = bytearray()
        for i in range(num_chunk):
            data.extend(self.bucket.get_blob(f"data_{i}"))
        # data = self.bucket.get_blob("kvstore_data")
        self.store = {}
        i = 0
        while i < len(data):
            key_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            key = data[i: i + key_len].decode('utf-8')
            i += key_len

            value_len = int.from_bytes(data[i: i + 4], 'big')
            i += 4
            value = data[i: i + value_len].decode('utf-8')
            i += value_len

            self.store[key] = value

import heapq
class GPUSolution:
    def __init__(self):
        self.events = []

    def add_credit(self, amount, timestamp, expire, grant_id):
        self.events.append(('add', timestamp, expire + timestamp, amount, grant_id))
    
    def charge(self, amount, timestamp, grant_id):
        self.events.append(('charge', timestamp, amount, grant_id))
    
    def get_balance(self, timestamp):
        valid_events = [event for event in self.events if event[1] <= timestamp]
        valid_events.sort(key = lambda x: x[1])

        activate_grant = []
        for event in valid_events:
            if event[0] == 'add':
                event_type, ts, ex_ts, amount, grant_id = event
                # 因为heappop是根据第一个元素，你的第一个元素是'add', 'charge'，类型，因此不能直接push event进activate_grant，而是新建一个tuple
                if ex_ts > timestamp:
                    heappush(activate_grant, (ts, ex_ts, amount))
            else:
                event_type, ts, charge, grant_id = event
                remaining_charge = charge
                while remainig_charge > 0 and activate_grant:
                    ts, ex_ts, balance, grant_id = heapq.heappop(activate_grant)
                    if balance > remainig_charge:
                        remainig_charge = 0
                        heapq.heappush(activate_grant, (ts, ex_ts, balance - remaining_charge))
                    else:
                        remaining_charge -= balance
            
        total = 0
        for event in activate_grant:
            total += event[2]
        return total



import heapq
class GPUSolution:
    def __init__(self):
        self.events = []
    
    def add_credit(self, amount, grant_id, expire, timestamp):
        self.events.append(('add', timestamp, timestamp + expire, amount, grant_id))

    def charge(self, amount, grant_id, timestamp):
        self.events.append(('charge', timestamp, amount, grant_id))

    def get_balance(self, timestamp):
        valid_events = [event for event in self.events if event[1] <= timestamp]
        valid_events.sort(key = lambda x: x[1])

        activate_grant = []
        for event in valid_events:
            if event[0] == 'add':
                _, ts, ex_ts, amount, grant_id = event
                if ex_ts > timestamp:
                    heapq.heappush(activate_grant, (ex_ts, amount, grant_id))
            else:
                _, ts, charge, grant_id = event
                remaining_charge = charge
                temp_activate_grant = []
                while remaining_charge > 0 and activate_grant:
                    ex_ts, balance, grant_id = heapq.heappop(activate_grant)
                    if balance > remaining_charge:
                        temp_activate_grant.append((ex_ts, balance - remaining_charge, grant_id))
                        remaining_charge = 0
                    else:
                        remaining_charge -= balance
                for e in temp_activate_grant:
                    heapq.heappush(activate_grant, e)
        total = 0
        for e in activate_grant:
            total += e[1]
        return total

import uuid
def sendAsyncMessage(nodeId, message):
    pass

class MachineTree:
    def __init__(self, nodeId, parent = None, children = None):
        self.nodeId = nodeId
        self.parent = parent
        self.children = children
        self.pending = {}
        self.results = {}

    def count(self):
        request_id = str(uuid.uuid4())
        self.receiveMessage({
            'type': 'count_request',
            'request_id': request_id,
            'from': None
        })
        return request_id

    def topology(self):
        request_id = str(uuid.uuid4())
        self.receiveMessage({
            'type': 'topology_request',
            'request_id': request_id,
            'from': None
        })
        return request_id

    def receiveMessage(self, message):
        msg_type = message['type']
        request_id = message['request_id']

        if msg_type == 'count_request':
            if not self.children:
                if message['from']:
                    sendAsyncMessage(message['from'], {
                        'type': 'count_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': 1
                    })
            else:
                self.pending[request_id] = len(self.children)
                self.results[request_id] = {}
                for child in self.children:
                    sendAsyncMessage(child, {
                        'type': 'count_request',
                        'request_id': request_id,
                        'from': self.nodeId,
                    })
        elif msg_type == 'count_response':
            self.results[request_id][message['from']] = message['result']
            self.pending[request_id] -= 1
            if self.pending[request_id] == 0:
                total = sum(self.results[request_id].values()) + 1
                if self.parent:
                    sendAsyncMessage(self.parent, {
                        'type': 'count_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': total
                    })
                else:
                    print(f"Final Result is {total}")
        elif msg_type == 'topology_request':
            if not self.children:
                if message['from']:
                    sendAsyncMessage(message['from'], {
                        'type': 'topology_response',
                        'request_id': request_id,
                        'from': self.nodeId,
                        'result': {self.nodeId: {}}
                    })
            else:
                self.pending[request_id] = len(self.children)
                self.results[request_id] = {}

                for child in self.children:
                    sendAsyncMessage(child, {
                        'type': 'topology_request',
                        'request_id': request_id,
                        'from': self.nodeId,
                    })
        elif msg_type == 'topology_response':
            self.results[request_id][message['from']] = message['result']
            self.pending[request_id] -= 1

            if self.pending[request_id] == 0:
                topology = {self.nodeId: {}}
                for child_id, child_topology in self.results[request_id].items():
                    topology[self.nodeId].update(child_topology)

                if self.parent:
                    sendAsyncMessage(self.parent, {
                        'type': 'topology_response',
                        'request_id':n request_id,
                        'from': self.nodeId,
                        'result': topology
                    })
                else:
                    print(f"Final Result {topology}")

Nov 15

class Node:
    def __init__(self, value = None, children = None):
        self.value = value
        self.children = children
    
    def to_str(self):
        if self.value:
            return self.value
        return '[' + ', '.join(child.to_str() for child in self.children) + ']'

class Function:
    def __init__(self, params, return_type):
        self.params = params
        self.return_type = return_type
    
    def to_str(self):
        params_str = '[' + ', '.join(param.to_str() for param in self.params) + ']'
        return f"{params_str} -> {self.return_type.to_str()}"
    

def match_types(formal, actual, generic_map):
    if formal.value:
        if formal.value.startswith('T'):
            if formal.value in generic_map:
                if not types_equal(generic_map[formal.value], actual):
                    raise ValueError("Type conflict")
            else:
                generic_map[formal.value] = actual
        else:
            if formal.value != actual.value:
                raise ValueError("Type mismatch")
    else:
        if actual.value or len(formal.children) != len(actual.children):
            raise ValueError("structure mismatch")
        for f_child, a_child in zip(formal.children, actual.children):
            match_types(f_child, a_child, generic_map)

def types_equal(node1, node2):
    if node1.value != node2.value:
        return False
    if node1.children is None and node2.children is None:
        return True
    if node1.children is None or node2.children is None:
        return False
    if len(node1.children) != len(node2.children):
        return False
    return all(types_equal(c1, c2) for c1, c2 in zip(node1.children, node2.children))

def substitute_generics(node, generic_map):
    if node.value:
        if node.value.startswith('T') and node.value in generic_map:
            return generic_map[node.value]
        return Node(value = node.value)
    return Node(children = [substitute_generics(child, generic_map) for child in node.children])

def infer_return(function, actual_params):
    generic_map = {}
    for formal, actual in zip(function.params, actual_params):
        match_types(formal, actual, generic_map)
    return substitute_generics(function.return_type, generic_map)

Nov 16

class IPV4Iteartor:
    def __init__(self, start_ip):
        self.parts = [int(p) for p in start_ip.split('.')]
        self.exhausted = False
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted == True:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))
        
        if self.parts == [255, 255, 255, 255]:
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 255:
                    self.parts[i] = 0
                else:
                    self.parts[i] += 1
                    break
        return current_ip

#reverse
class IPV4Iterator:
    def __init__(self, start_ip):
        self.parts = [int(p) for p in start_ip.split('.')]
        self.exhausted = False
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted == True:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))
        
        if current_ip == '0.0.0.0':
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 0:
                    self.parts[i] = 255
                else:
                    self.parts[i] -= 1
                    break
        return current_ip
    
# CIDR classless Inter-Domain Routing 无类别域间路由
class IPV4Iterator:
    def __init__(self, cidr, reverse = False):
        ip, prefix = cidr.split('/')
        self.prefix_len = int(prefix)
        self.reverse = reverse

        self.network_start = self._ip_to_int(ip) & self._get_mask()
        self.network_end = self.network_start | (0xFFFFFFFF >> self.prefix_len)

        if reverse:
            self.current = self.network_end
        else:
            self.current = self.network_start
        self.exhausted = False

    def _ip_to_int(self, ip):
        parts = [int(p) for p in ip.split('.')]
        return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3]
    
    def _int_to_ip(self, num):
        return f"{(num >> 24) & 255}.{(num >> 16) & 255}.{(num >> 8) & 255}.{num & 255}"

    def _get_mask(self):
        return 0xFFFFFFFF << (32 - self.prefix_len)
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted:
            raise StopIteration
        
        current_ip = self._int_to_ip(self.current)
        if self.reverse:
            if self.current == self.network_start:
                self.exhausted = True
            else:
                self.current -= 1
        else:
            if self.current == self.network_end:
                self.exhausted = True
            else:
                self.current += 1
        return current_ip


def main():
    ip_iter = IPV4Iterator("255.255.255.250")
    for ip in ip_iter:
        print(ip)
        if ip == "255.255.255.255":
            break

if __name__ == "__main__":
    main()

Nov 25

class IPv4:
    def __init__(self, start_ip):
        self.parts = [int(ip) for ip in start_ip.split('.')]
        self.exhausted = False

    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted = True:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))

        if self.parts = [255, 255, 255, 255]:
            self.exhausted = False
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 255:
                    self.parts[i] == 0
                else:
                    self.parts[i] += 1
                    break

        return current_ip


# reverse
class IPv4:
    def __init__(self, start_ip):
        self.exhausted = False
        self.parts = [int(p) for p in start_ip.split('.')]
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.exhausted = True:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))
        if self.parts = [255, 255, 255, 255]
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 255:
                    self.parts[i] = 0
                else:
                    self.parts[i] += 1 
        return current_ip

class IPv4:
    def __init__(self, start_ip, reverse):
        ip, prefix = start_ip.split('/')
        self.prefix_len = int(prefix)
        self.reverse = reverse
        self.network_start = self._ip_to_int(ip) & self._get_mask()
        self.network_end = self.network_start | (0xFFFFFFFF >> self.prefix_len)

        if reverse:
            self.current = self.network_end
        else:
            self.current = self.network_start
        self.exhausted = False
        
    
    def _ip_to_int(self, ip):
        parts = [int(i) for i in ip.split('.')]
        return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3]
    
    def _int_to_ip(self, num):
        return f"{(num >> 24) & 255}.{(num >> 16) & 255}.{(num >> 8) & 255}.{num & 255}"


    def _get_mask(self):
        return 0xFFFFFFFF << (32 - self.prefix_len)

    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted == True:
            raise StopIteration
        
        current_ip = self._int_to_ip(self.current)
        
        if self.reverse:
            if self.current == self.network_start:
                self.exhausted = True
            else:
                self.current -= 1
        else:
            if self.current == self.network_end:
                self.exhausted = True
            else:
                self.current += 1
        return current_ip

Nov 26

class IPv4:
    def __init__(self, start_ip):
        self.parts = [int(ip) for ip in start_ip.split('.')]
        self.exhausted = False

    def __iter__(self):
        return self
    
    def __next__(self):
        if self.exhausted == True:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))

        if self.parts == [255, 255, 255, 255]:
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 255:
                    self.parts[i] = 0
                else:
                    self.parts[i] += 1
                    break

        return current_ip

# reverse
class IPv4:
    def __init__(self, start_ip):
        self.parts = [int(ip) for ip in start_ip.split('.')]
        self.exhausted = False

    def __iter__(self):
        return self
    
    def __next__(self):
        if self.exhausted:
            raise StopIteration
        
        current_ip = '.'.join(map(str, self.parts))

        if self.parts == [0, 0, 0, 0]:
            self.exhausted = True
        else:
            for i in range(3, -1, -1):
                if self.parts[i] == 0:
                    self.parts[i] = 255
                else:
                    self.parts[i] -= 1
                    break

        return current_ip

class IPv4:
    def __init__(self, start_ip, reverse):
        self.reverse = reverse
        ip, prefix = start_ip.split('/')
        self.prefix_len = int(prefix)
        self.network_start = self.__ip_to_int(ip) & self.__get_mask()
        self.network_end = self.network_start | (0xFFFFFFFF >> self.prefix_len)

        if self.reverse:
            self.current = self.network_end
        else:
            self.current = self.network_start
        
        self.exhausted = False
    
    def __ip_to_int(self, ip):
        parts = [int(p) for p in ip.split('.')]
        return (parts[0] << 24) + (parts[1] << 16) + (parts[2] << 8) + parts[3]

    def __int_to_ip(self, num):
        return f"{(num >> 24) & 255}.{(num >> 16) & 255}.{(num >> 8) & 255}.{num & 255}"

    def __get_mask(self):
        return (0xFFFFFFFF) << (32 - self.prefix_len)
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.exhausted:
            raise StopIteration
        
        current_ip = self.__int_to_ip(self.current)

        if self.reverse:
            if self.current == self.network_start:
                self.exhausted = True
            else:
                self.current -= 1
        else:
            if self.current == self.network_end:
                self.exhausted = True
            else:
                self.current += 1
        return current_ip

import heapq
class GPUCredit:
    def __init__(self):
        self.events = []

    def add(self, timestamp, expire, amount, grant_id):
        self.events.append(('add', timestamp, timestamp + expire, amount, grant_id))

    def charge(self, timestamp, amount):
        self.events.append(('charge', timestamp, amount))

    def get_balance(self, timestamp):
        valid_events = [event for event in self.events if event[1] <= timestamp]
        valid_events.sort(key = lambda x: x[1])


        activate_events = []
        for event in valid_events:
            if event[0] == 'add':
                _, ts, expire_ts, amount, grant_id = event
                if expire_ts > timestamp:
                    heapq.heappush(activate_events, (expire_ts, amount, grant_id))
            elif event[0] == 'charge':
                _, ts, charge_amount = event
                remaining_charge = charge_amount

                temp_grants = []
                while activate_events and remaining_charge > 0:
                    expire_ts, balance, grant_id = heapq.heappop(activate_events)
                    if balance > remaining_charge:
                        temp_grants.append(（expire_ts, balance - remaining_charge, grant_id))
                        remaining_charge = 0
                    else:
                        remaining_charge -= balance
                for e in temp_grants:
                    heapq.heappush(activate_events, e)
                
                if remaining_charge > 0:
                    return None
        total = sum(item[1] for item in activate_events)
        return total

import sys

class TypeMismatchError(Exception):
    pass

class TypeConflictError(Exception):
    pass

class Node:
    def __init__(self, value = None, children = None):
        if value and children:
            raise ValueError("node cannot have both a value and children")
        self.value = value
        self.children = children
    
    def is_generic(self):
        # return true if this Node is a generic type
        return self.value is not None and self.value.startswith('T')
    
    def __str__(self):
        # returns the string representation
        # 'int' or '[int, T1]'
        if self.value:
            return self.value
        if self.children:
            return f"[{', '.join(str(child) for child in self.children)}]"
    
    def __repr__(self):
        # Make debugging and printouts cleaner
        retrun str(self)

    def __eq__(self, other):
        # Enables comparing two Nodes. This is crucial for checking type conflicts
        if not isinstance(other, Node):
            return False
        return self.value == other.value and self.children == other.children

class Function:
    # represents a function signature: [T1, int] -> [T1, T1]
    def __init__(self, params: list[Node], return_type: Node):
        if not isinstance(params, list):
            raise TypeError("Function params must be a list of Node")
        if not isinstance(return_type, Node):
            raise TypeError("Function return_type must be a Node")
        
        self.params = params
        self.return_type = return_type
    
    def __str__(self):
        params_str = f"[{', '.join(str(p) for p in self.params)}]"
        return f"{params_str} -> {str(self.return_type)}"

def _match_and_map(param_code: Node, arg_node: Node, type_map: dict):
    # recursively matches a single parameter Node to a single argument Node

BQ面试

最高优先级问题

Failed project + 从失败中学到什么 - 几乎每轮都问
Make impact out of your scope - 高频问题，体现leadership
最大的achievement - 标准问题
如何解决conflict - 经典BQ问题
为什么选择OpenAI + 对AI的看法 - 公司特定问题

中等优先级问题

Lead的项目 + 遇到的挑战 - 结合deep dive准备
个人项目经历 - 展示主动性
如何handle stress和move fast - OpenAI节奏很快

准备策略

回答格式： • 准备30秒精炼版本 + 可展开的详细版本 • 使用STAR方法，强调可量化结果 • 重点突出execution和leadership能力

关键特质展示： • 快速迭代和执行能力 • 在高压环境下的表现 • 跨团队协作能力 • 从失败中快速学习的能力

注意事项： • 面试官可能时间紧张，要准备被打断 • 强调business value而非纯技术细节 • 体现对AI safety的思考

Q1: Tell me about a failed project and what you learned from it

逻辑框架 | Logic Framework

设计系统 → 过度复杂 → 性能问题 → 重新简化 → 效果改善 → 经验教训
Design system → Over-engineer → Performance issue → Simplify → Better results → Lesson learned

概念组合 | Concept Combinations

**系统类型	System Type:**
- 实时交易分析系统	real-time transaction analytics system
- 欺诈检测平台	fraud detection platform

**问题描述	Problem Description:**
- 架构过于复杂	overly complex architecture
- 多层缓存	multi-layered caching
- 复杂数据处理管道	complex data processing pipelines
- 处理所有边界情况	handle all edge cases

**解决方案	Solution:**
- 简化架构	simplified architecture
- WebSocket连接	WebSocket connection
- 优化数据流	optimized data flows
- 流线型设计	streamlined design

**结果指标	Results:**
- 响应时间	response time
- 15分钟→30秒	15 minutes → 30 seconds
- 欺诈拦截率提升40%	fraud interception increased by 40%

**学习收获	Learning:**
- 简单有效	simple and effective
- 业务需求导向	business requirement driven
- 核心功能优先	core functionality first

30秒STAR回答： “在Bank of China实习时，我设计的实时交易分析系统最初架构过于复杂，导致响应时间达到15分钟，无法满足欺诈检测需求。我重新简化架构，采用 WebSocket和优化数据流，最终将响应时间降到30秒，欺诈拦截率提升40%。学到了简单有效比复杂设计更重要。”

Follow-up: What specifically made the architecture too complex? “我设计了多层缓存和复杂的数据处理管道，想要处理所有edge cases。但实际上业务只需要快速的异常检测，不需要那么多层次。”

Follow-up: How did you realize you needed to change approach? “当我看到15分钟延迟时，我主动找业务团队了解真实需求，发现他们只需要秒级响应的核心功能。”

30-second STAR response: “During my internship at Bank of China, I initially designed an overly complex architecture for a real-time transaction analytics system, resulting in 15-minute response times that couldn’t meet fraud detection requirements. I redesigned the architecture with a simplified approach using WebSocket and optimized data flows, ultimately reducing response time to 30 seconds and increasing fraud interception by 40%. I learned that simple and effective solutions are better than complex designs.”

Follow-up: What specifically made the architecture too complex? “I designed multi-layered caching and complex data processing pipelines, trying to handle all edge cases. But the business actually only needed fast anomaly detection, not all those layers.”

Follow-up: How did you realize you needed to change approach? “When I saw the 15-minute delay, I proactively reached out to the business team to understand the real requirements and found they only needed core functionality with second-level response.

Q2: What’s your biggest achievement?

逻辑框架 | Logic Framework

接受挑战 → 识别瓶颈 → 创新解决 → 显著提升 → 用户受益
Accept challenge → Identify bottleneck → Innovative solution → Significant improvement → User benefit

概念组合 | Concept Combinations

**项目类型	Project Type:**
- 端到端AI功能	end-to-end AI feature
- 可扩展系统	scalable system
- 低延迟服务	low-latency service

**技术挑战	Technical Challenge:**
- Lambda冷启动	Lambda cold starts
- 高并发AI推理	high-concurrency AI inference
- 性能瓶颈	performance bottleneck

**解决方案	Solution:**
- 创新缓存策略	innovative caching strategies
- 负载均衡算法	load balancing algorithms
- 智能预热机制	intelligent warm-up mechanisms
- 请求路由优化	request routing optimization

结果指标 | Results:

API吞吐量 API throughput
3.1倍提升 3.1x improvement
响应时间 response time
500ms→160ms
QPS: 1000→3100
高可靠性 high reliability

**业务影响	Business Impact:**
- 用户体验改善	improved user experience
- 直接影响	direct impact

30秒STAR回答： “在AWS实习期间，我构建了端到端AI功能，通过架构优化将API吞吐量提升3.1倍。我设计了创新的缓存策略和负载均衡算法，在保持高可靠性的同时大幅提升性能，直接改善了用户体验。”

Follow-up: What was the technical challenge? “主要是Lambda冷启动和高并发AI推理的延迟问题。我实现了智能预热机制和请求路由优化。”

Follow-up: How did you measure the 3.1x improvement? “通过API响应时间监控和QPS指标，从原来的平均500ms降到160ms，并发处理能力从1000 QPS提升到3100 QPS。”

30-second STAR response: “During my AWS internship, I built an end-to-end AI feature and improved API throughput by 3.1x through architectural optimization. I designed innovative caching strategies and load balancing algorithms that significantly boosted performance while maintaining high reliability, directly improving user experience.”

Follow-up: What was the technical challenge? “The main issues were Lambda cold starts and latency in high-concurrency AI inference. I implemented intelligent warm-up mechanisms and request routing optimization.”

Follow-up: How did you measure the 3.1x improvement? “Through API response time monitoring and QPS metrics - average response time dropped from 500ms to 160ms, and concurrent processing capacity increased from 1000 QPS to 3100 QPS.”

Q3: Tell me about a time you had to resolve a conflict

逻辑框架 | Logic Framework

发现分歧 → 影响进度 → 组织讨论 → 数据验证 → 达成共识
Discover disagreement → Affect progress → Organize discussion → Data validation → Reach consensus

概念组合 | Concept Combinations

**冲突背景	Conflict Background:**
- 算法选择分歧	algorithm selection disagreement
- 项目进度影响	project progress affected
- 技术团队	technical team

**不同观点	Different Viewpoints:**
- 传统PID控制	traditional PID control
- 机器学习方法	machine learning approaches
- 稳定可靠	stable and reliable
- 处理复杂场景	handle complex scenarios

**解决方法	Resolution Method:**
- 技术评估会议	technical evaluation meeting
- A/B测试设计	A/B test design
- 实际飞行数据	actual flight data
- 数据说话	let data speak
- 客观对比	objective comparison

**结果证明	Results:**
- gradient-boosted模型	gradient-boosted model
- 精度高25%	25% higher precision
- 团队共识	team consensus
- 公平方式	fairest approach

30秒STAR回答： “在Purdue机器人团队中，技术团队对drone控制算法选择有分歧，影响项目进度。我组织了技术评估会议，设计A/B测试来比较不同方案。用数据证明 gradient-boosted模型比传统方法精度高25%，团队最终达成共识。”

Follow-up: What were the different viewpoints? “一部分人倾向于传统PID控制，认为稳定可靠；另一部分人支持机器学习方法，认为能处理复杂场景。”

Follow-up: How did you convince them to do A/B testing? “我提出用实际飞行数据做对比，让数据说话而不是主观判断。大家都认为这是最公平的方式。”

30-second STAR response: “In the Purdue robotics team, there was disagreement about drone control algorithm selection that was affecting project progress. I organized a technical evaluation meeting and designed A/B tests to compare different approaches. Using data, I proved that the gradient -boosted model had 25% higher precision than traditional methods, and the team reached consensus.”

Follow-up: What were the different viewpoints? “Some people favored traditional PID control, believing it was stable and reliable; others supported machine learning approaches, thinking they could handle complex scenarios.”

Follow-up: How did you convince them to do A/B testing? “I proposed using actual flight data for comparison, letting data speak rather than subjective judgment. Everyone agreed this was the fairest approach.”

Q4: Tell me about a time you made impact outside your scope

逻辑框架 | Logic Framework

发现问题 → 影响团队 → 主动承担 → 深入优化 → 整体受益
Discover problem → Affect team → Take initiative → Deep optimization → Overall benefit

概念组合 | Concept Combinations

**问题识别	Problem Identification:**
- 物理引擎瓶颈	physics engine bottleneck
- 团队实验效率	team experimental efficiency
- 研究范围外	outside research scope

**影响观察	Impact Observation:**
- 团队成员等待	team members waiting
- 物理模拟慢	slow physics simulations
- 研究进度影响	research progress affected

**主动行动	Proactive Action:**
- 主动优化	proactively optimized
- 分析源码	analyzed source code
- 碰撞检测算法	collision detection algorithm
- 空间分割重构	spatial partitioning refactor

**技术细节	Technical Details:**
- Pymunk引擎	Pymunk engine
- 碰撞延迟	collision latency
- 碰撞检测逻辑	collision detection logic

**结果影响	Results:**
- 延迟降低70%	70% latency reduction
- 支持10+对象	support 10+ objects
- 团队整体效率	overall team efficiency
- 个人价值观	personal values

30秒STAR回答： “在AI Lab研究期间，我发现物理引擎性能瓶颈影响整个团队实验效率。虽然不在我研究范围内，我主动优化了Pymunk引擎，将碰撞延迟降低70%，支持10+ 对象处理，大幅提升了团队整体效率。”

Follow-up: Why did you decide to work on this? “我看到团队成员经常因为物理模拟慢而等待，影响研究进度。我觉得解决这个问题比我个人研究更有价值。”

Follow-up: How did you approach the optimization? “我分析了引擎源码，发现碰撞检测算法可以优化。重构了空间分割和碰撞检测逻辑。”

30-second STAR response: “During my AI Lab research, I discovered that physics engine performance bottlenecks were affecting the entire team’s experimental efficiency. Although it wasn’t within my research scope, I proactively optimized the Pymunk engine, reducing collision latency by 70% and supporting 10+ object processing, significantly improving overall team efficiency.”

Follow-up: Why did you decide to work on this? “I saw team members frequently waiting due to slow physics simulations, affecting research progress. I felt solving this problem was more valuable than my individual research.”

Follow-up: How did you approach the optimization? “I analyzed the engine source code and found the collision detection algorithm could be optimized. I refactored the spatial partitioning and collision detection logic.”

Q5: Why OpenAI?

逻辑框架 | Logic Framework

使命认同 → 经验匹配 → 价值观契合 → 贡献愿景 → 发展规划
Mission alignment → Experience match → Values alignment → Contribution vision → Development plan

概念组合 | Concept Combinations

**使命价值	Mission Values:**
- AGI造福全人类	AGI benefits all humanity
- 价值观契合	values alignment
- AI安全发展	AI safety development
- 广泛分布式收益	broadly distributed benefits

**个人背景	Personal Background:**
- AI研究经验	AI research experience
- 工程实践	engineering practice
- 技术前沿	technical forefront

**AI安全观点	AI Safety Perspective:**
- 至关重要	crucial importance
- 模型不可预测性	model unpredictability
- 技术发展平衡	balance technological advancement
- 安全保障	safety assurance
- 严格测试	rigorous testing
- 渐进部署	gradual deployment

**贡献规划	Contribution Plan:**
- 技术领导力	technical leadership
- 工程能力	engineering capabilities
- 快速发展平衡	balance rapid development
- 安全考量	safety considerations
- 团队协作	team collaboration
- AGI挑战	AGI challenges

30秒回答： “OpenAI的mission是确保AGI造福全人类，这与我的价值观高度契合。我在AI研究和工程实践中都有经验，希望在技术前沿的同时确保AI安全发展。我特别认同OpenAI在AI safety和broadly distributed benefits方面的努力。”

Follow-up: What do you think about AI safety? “AI safety至关重要。在研究中我看到AI模型的不可预测性。需要在技术发展和安全保障间平衡，通过严格测试和渐进部署确保可靠性。”

Follow-up: How do you see yourself contributing in the first year? “我希望在technical leadership方面贡献工程能力，学习如何平衡快速发展和安全考量，参与团队协作解决AGI挑战。”

30-second response: “OpenAI’s mission to ensure AGI benefits all of humanity aligns perfectly with my values. I have experience in both AI research and engineering practice, and I want to be at the technical forefront while ensuring AI develops safely. I particularly resonate with OpenAI’s efforts in AI safety and broadly distributed benefits.”

Follow-up: What do you think about AI safety? “AI safety is crucial. In my research, I’ve seen the unpredictability of AI models. We need to balance technological advancement with safety assurance through rigorous testing and gradual deployment to ensure reliability.”

Follow-up: How do you see yourself contributing in the first year? “I hope to contribute my engineering capabilities in technical leadership, learn how to balance rapid development with safety considerations, and participate in team collaboration to address AGI challenges.”

Q6: Tell me about a project you’re proud of

逻辑框架 | Logic Framework

构建系统 → 性能挑战 → 创新优化 → 模型改进 → 综合提升
Build system → Performance challenge → Innovative optimization → Model improvement → Overall enhancement

概念组合 | Concept Combinations

**系统特点	System Features:**
- sketch-based控制	sketch-based control
- drone控制系统	drone control system
- 1K+并发用户	1K+ concurrent users

**性能优化	Performance Optimization:**
- React pipeline	React pipeline
- Web Workers	Web Workers
- 并行处理	parallel processing
- CPU使用率	CPU usage
- 62%降低	62% reduction
- 60 FPS维持	maintain 60 FPS
- 8K点sketches	8K-point sketches

**算法改进	Algorithm Improvement:**
- gradient-boosted ranking模型	gradient-boosted ranking model
- 命令精度	command precision
- 25%提升	25% improvement

**技术挑战	Technical Challenge:**
- 实时sketch处理	real-time sketch processing
- 性能问题	performance issues
- 主线程阻塞	main thread blocking

**验证方法	Validation Method:**
- flight test数据	flight test data
- 对比测量	comparison measurement
- drone执行准确度	drone execution accuracy
- 复杂路径	complex paths
- 明显更好表现	significantly better performance

30秒STAR回答： “我构建了一个sketch-based drone控制系统，服务1K+并发用户。通过Web Workers优化React pipeline，CPU使用率降低62%，维持60 FPS处理8K点 sketches。还开发了gradient-boosted ranking模型，命令精度提升25%。”

Follow-up: What was the biggest technical challenge? “主要是实时sketch处理的性能问题。8K点的sketch会阻塞主线程，我用Web Workers做并行处理解决了这个问题。”

Follow-up: How did you validate the 25% precision improvement? “通过flight test数据对比，测量drone执行sketch命令的准确度，新模型在复杂路径上表现明显更好。”

30-second STAR response: “I built a sketch-based drone control system serving 1K+ concurrent users. I optimized the React pipeline using Web Workers, reducing CPU usage by 62% and maintaining 60 FPS for 8K-point sketches. I also developed a gradient-boosted ranking model that improved command precision by 25%.”

Follow-up: What was the biggest technical challenge? “The main issue was real-time sketch processing performance. 8K-point sketches would block the main thread, so I used Web Workers for parallel processing to solve this problem.”

Follow-up: How did you validate the 25% precision improvement? “Through flight test data comparison, measuring the accuracy of drone execution of sketch commands. The new model performed significantly better on complex paths.”

面试节奏提醒：

• 每个问题准备被30秒内打断 • Follow-up要简洁，1-2句话 • 强调数据和结果 • 体现fast execution和leadership • 准备快速切换话题

ChatGPT Atlas:

Instead of feeling like “a browser that happens to include AI”, it feels like “ChatGPT that happens to include a browser”

I will treat Atlas as a super-assistant with browsing capability, not a browser with assistant features

I think for me, Agent mode isn’t about doing “smart” task - it’s about automating repetitive workflows like data gathering, formatting. It acts as an “automation layer” on

I participate in several online communities, and many of them have daily check-ins for points. One community called “一亩三分地” has a special format - they post a quiz question every day, but the answer is hidden inside a linked WeChat Official Account article. So every day I need to “Open the quiz page, click the linke to the WeChat Official Account, scroll through it to find the answer, go back, and submit the quiz”. It is not difficult, it’s just repetitive - exactly the kind of task that’s perfect for automation.

When I tried Atlas’s Agent Mode, “Open Today’s check-in questions on “一亩三分地”, follow the WeChat Official Account redirect, read the article, extract the answer, return to the website, selected the correct option, and submitted the check-in”. Basically, it automated a routine that I had manually repeated hundreds of times.

These kind of tasks are predictable, structured, and done every single day.

It could automate the long tail of daily micro-actions.

Reflection

Key-Value Store

Core Requirements (API)

Key Contraints & Environment

GPU Credit Scheduling System

Machine Tree

message结构

IPv4

BQ面试

最高优先级问题

中等优先级问题

准备策略

Q1: Tell me about a failed project and what you learned from it

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

Q2: What’s your biggest achievement?

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

Q3: Tell me about a time you had to resolve a conflict

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

Q4: Tell me about a time you made impact outside your scope

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

Q5: Why OpenAI?

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

Q6: Tell me about a project you’re proud of

逻辑框架 | Logic Framework

概念组合 | Concept Combinations

面试节奏提醒：

Enjoy Reading This Article?