Projects

BitNet on GPT

An attempt to implement the bitnet paper on the GPT. Built on top of NanoGPT in pyTorch. It also contains implementation of The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.

I’ve not been able to train it on a large scale yet!!.

transformers dl

This is a lightweight reimplementation of core ROS concepts, focusing on roscore and topic-based communication for subscribing and publishing. Built entirely in Go without external libraries, it mimics essential ROS behavior in a minimalistic way. The roscore server manages message exchanges between nodes, while topics enable asynchronous communication.

This aims to mimic the essential behavior of ROS in a minimalistic way, making it easier to understand the underlying mechanisms while maintaining flexibility and performance due to Go’s concurrency model.

commands	purposes
core	To start roscore server on master url as in ROS
subscribe	To subscribe to a topic
publish	To publish a topic
status	To get stats of a topic

TODO

go message-broker robotics

NAT WSL

Github

Hi, I recently ran into a problem when trying to access my server, which is running on my WSL Ubuntu distro, through my Wi-Fi router’s IP address. The thing is, WSL doesn’t automatically allow access through the router. Instead, it uses NAT (Network Address Translation) to route traffic through Windows, which is how I can access the internet from within WSL.

I’d like to set up a more comprehensive and straightforward NAT configuration. Ideally, I’d want to define the ports and settings for my WSL distro in a simple YAML file that automatically starts up when my system boots.

One way to simplify this process would be to run a script that calls netsh, which is the command-line tool Windows uses for managing network configurations. This script could handle all the NAT settings at runtime.

However, we’re aiming to build a Network Address Translation system from scratch because we find it challenging and want to understand how it works in depth. By doing so, we’ll gain hands-on experience with network fundamentals and have better control over our setup

Reference

RFC 6866 - Port forwarding protocol, (pdf version)

go networking

My LazyVim Config

Github

A perfectly curated Neovim config. Built with Neovim, LazyVim and Mason.

vim neovim

AI on Web

Github

Running AI models on Web.

Using Onnx-runtime-web to run BERT for sentimental analysis on the web.

This repository contains a simple implementation of ONNX (Open Neural Network Exchange) using the microsoft/xtremedistil-l6-h256-uncased model. The ONNX model is located in the onnx/model.py file, and we’ve also provided the exported classifier model in both onnx/classifier.onnx and onnx/classifier_int8.onnx formats.

Model Information

Model Used: microsoft/xtremedistil-l6-h256-uncased
ONNX Model Location: onnx/model.py
Exported Classifier Models: onnx/classifier.onnx and onnx/classifier_int8.onnx
Colab Notebook: notebook

dl ai web-dev

transformer

Github

Just another implementation of the transformer model as introduced in the paper Attention is all you need, this is a step-by-step process to building a transformer.

A tutorial project for understanding how the transformer works

dl transformer

trBPE: A Byte Pair Encoder tailored for Turkish

Github

The current landscape of Large Language Models (LLMs) predominantly caters to the English language. This bias can be attributed to extensive training on English datasets and the efficacy of tokenization. Notably, OpenAI tokenizer for GPT-4’s excels in contextualizing tokens based on syllabic divisions, enhancing comprehension and generation capabilities.

However, for foreign languages like Turkish, this advantage diminishes due to tokenization randomness. To address this, a repository was created to develop a BPE tokenizer tailored to Turkish, using rich Turkish language datasets.

This was used by KomRade in the competition...

In an attempt to replicate methods outlined in this paper, with exceptions:

Non-agglutinative pieces are preceded by a space, and agglutinative pieces aren’t # prefixed.
Tokenization is case-insensitive.

dl tokenization

Video Compress

Github

A simple web app to convert videos from H.264 to H.265 encoding, significantly reducing file size while maintaining quality.

Why H.265?

H.265 (HEVC) is the successor to H.264 (AVC). It offers better compression, allowing for smaller file sizes or higher quality at the same bitrate. This project uses FFmpeg to convert videos from H.264 to H.265.

compression ffmpeg go