snsd0805
  • Joined on 2022-12-29
snsd0805 pushed to master at snsd0805/gpu-contract 2024-05-21 15:17:25 +08:00
5eef033600 feat: getTasks & register event
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-20 23:29:11 +08:00
20fd2fbe08 fix: move trainer
snsd0805 pushed to master at snsd0805/gpu-provider 2024-05-18 18:08:13 +08:00
68123809bb feat: add cluster_info function
01e673c07c feat: finish the communication between master and workers
Compare 2 commits »
snsd0805 pushed to master at snsd0805/gpu-provider 2024-05-18 03:02:36 +08:00
a947f86c91 feat: extract action from node manager
snsd0805 pushed to master at snsd0805/gpu-provider 2024-05-18 01:46:58 +08:00
4fa614776d feat: complete service exploration
b1de4dcacd feat: service exploration module
f253a0d9df feat: UDP broadcast test
9b3ca71b70 feat: TCP client-server
Compare 4 commits »
snsd0805 pushed to socket at snsd0805/gpu-provider 2024-05-18 01:46:46 +08:00
4fa614776d feat: complete service exploration
snsd0805 pushed to socket at snsd0805/gpu-provider 2024-05-18 00:40:19 +08:00
b1de4dcacd feat: service exploration module
snsd0805 created branch socket in snsd0805/gpu-provider 2024-05-17 23:58:07 +08:00
snsd0805 pushed to socket at snsd0805/gpu-provider 2024-05-17 23:58:07 +08:00
f253a0d9df feat: UDP broadcast test
9b3ca71b70 feat: TCP client-server
Compare 2 commits »
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 23:31:32 +08:00
86e0c50a65 fix: global rank
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 23:17:46 +08:00
d4b9aaa1d6 feat: change rank when multi machine training
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 23:09:44 +08:00
874c160eae fix: delete matplotlib dependence
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 20:55:29 +08:00
233bec6d1c feat: torchrun on single machine
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 20:26:08 +08:00
24240f1c3a feat: trainer class for single/multi GPU
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 16:28:31 +08:00
8f3253ff24 fix: single machine parallel training success
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-16 15:59:27 +08:00
aedc6b46e9 feat: single machine DDP (test)
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-15 20:55:13 +08:00
939aa6d92e docs: add some hint comment
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-12 23:33:39 +08:00
e7572347c9 docs: update .gitignore
953d3ce1ee docs: add .gitignore
Compare 2 commits »
snsd0805 pushed to master at snsd0805/Distributed-Training-Example 2024-05-12 01:48:44 +08:00
fc01163995 feat: train on single GPU
bf905e9e03 feat: dataset
Compare 2 commits »
snsd0805 created branch master in snsd0805/Distributed-Training-Example 2024-05-11 21:56:02 +08:00