Submissions from gilesthomas.com

		Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com)
		2 points by gpjt 7 hours ago \| past \| discuss
		Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com)
		1 point by gpjt 1 day ago \| past \| discuss
		Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
		1 point by ibobev 2 days ago \| past \| discuss
		Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
		1 point by ibobev 2 days ago \| past \| discuss
		Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
		3 points by gpjt 5 days ago \| past \| discuss
		Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
		1 point by gpjt 7 days ago \| past \| discuss
		Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com)
		254 points by gpjt 14 days ago \| past \| 10 comments
		Revisiting Karpathy's 'The Unreasonable Effectiveness of RNNs' (gilesthomas.com)
		1 point by ibobev 17 days ago \| past
		Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com)
		2 points by gpjt 19 days ago \| past
		Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)
		1 point by ibobev 21 days ago \| past
		Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)
		1 point by gpjt 22 days ago \| past
		Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com)
		41 points by gpjt 27 days ago \| past \| 3 comments
		How Do LLMs Work? (gilesthomas.com)
		2 points by gpjt 42 days ago \| past \| 1 comment
		How Do LLMs Work? (gilesthomas.com)
		1 point by ibobev 43 days ago \| past
		The maths you need to start understanding LLMs (gilesthomas.com)
		616 points by gpjt 57 days ago \| past \| 120 comments
		What AI chatbots are doing under the hood (gilesthomas.com)
		2 points by gpjt 61 days ago \| past
		LLM from scratch, part 18 – residuals, shortcut connections, and the Talmud (gilesthomas.com)
		2 points by gpjt 72 days ago \| past
		The fixed length bottleneck and the feed forward network (gilesthomas.com)
		1 point by gpjt 76 days ago \| past
		Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com)
		8 points by gpjt 78 days ago \| past
		Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
		Leaving PythonAnywhere (gilesthomas.com)
		3 points by gpjt 4 months ago \| past
		Writing an LLM from scratch, part 15 – from context vectors to logits (gilesthomas.com)
		7 points by gpjt 5 months ago \| past
		Writing an LLM from scratch, part 14 – the complexity of self-attention at scale (gilesthomas.com)
		1 point by gpjt 5 months ago \| past
		Writing an LLM from scratch, part 13 – attention heads are dumb (gilesthomas.com)
		351 points by gpjt 5 months ago \| past \| 67 comments
		Writing an LLM from scratch, part 12 – multi-head attention (gilesthomas.com)
		3 points by gpjt 6 months ago \| past
		Writing an LLM from scratch, part 11 – batches (gilesthomas.com)
		2 points by gpjt 6 months ago \| past
		Writing an LLM from scratch, part 10 – dropout (gilesthomas.com)
		90 points by gpjt 7 months ago \| past \| 8 comments
		Adding /Llms.txt (gilesthomas.com)
		1 point by gpjt 7 months ago \| past
		Writing an LLM from scratch, part 9 – causal attention (gilesthomas.com)
		4 points by gpjt 7 months ago \| past
		Writing an LLM from scratch, part 8 – trainable self-attention (gilesthomas.com)
		380 points by gpjt 7 months ago \| past \| 31 comments
		More