Try an interactive version of this dialog: Sign up at solve.it.com, click Upload, and pass this URL.
3. TinyType
It's just type hints for tensors that specify shape. When you see:
def ones(i: int) -> TT[["i"]]:
This says "return a 1D tensor of length i". The test framework uses these hints to generate random test data of the right shapes.
4. ones_spec vs ones
ones_spec= the specification (what the correct answer should be, written with loops)ones= your solution (what you need to implement using broadcasting)
You write ones. The test compares your output against what ones_spec produces.
5. Sample puzzles
They're examples showing how the puzzle format works. You can attempt them for practice, but the main puzzles are in the notebook (puzzles.ipynb). The samples just demonstrate more complex puzzle structures.
Any of these need more clarification before you dive in?
In those examples, there's this constraint_set function of sorts. I want to clarify why that's defined here. Would that be part of the puzzle that I would need to specify the solution to, or is that provided in the puzzle statement?
The constraint_set function is provided infrastructure, not part of your solution.
It's used by the test framework to adjust the randomly generated test data so it makes sense for that particular puzzle. For example, ensuring a "length" parameter doesn't exceed the array size.
You only need to fill in the function body that raises NotImplementedError. Everything else is already set up for you.
def draw_examples(name, examples, display_values=False):
data = {
"name": name,
"vals": [
{k: [v.tolist()] if len(v.shape) == 1 else v.tolist() for k, v in example.items()}
for example in examples
],
}
return _draw_example(data, display_values)
File: ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py
[{'input': <Tensor <UOp CPU (1,) int (<Ops.COPY: 10>, None)> on CPU with grad None>,
'output': <Tensor <UOp CPU (5,) int (<Ops.COPY: 10>, None)> on CPU with grad None>},
{'input': <Tensor <UOp CPU (1,) int (<Ops.COPY: 10>, None)> on CPU with grad None>,
'output': <Tensor <UOp CPU (3,) int (<Ops.COPY: 10>, None)> on CPU with grad None>},
{'input': <Tensor <UOp CPU (1,) int (<Ops.COPY: 10>, None)> on CPU with grad None>,
'output': <Tensor <UOp CPU (9,) int (<Ops.COPY: 10>, None)> on CPU with grad None>},
{'input': <Tensor <UOp CPU (1,) int (<Ops.COPY: 10>, None)> on CPU with grad None>,
'output': <Tensor <UOp CPU (1,) int (<Ops.COPY: 10>, None)> on CPU with grad None>}][
{k: [v.tolist()] if len(v.shape) == 1 else v.tolist() for k, v in example.items()}
for example in exs
]
Nathan here: I don't understand why we condition on len(v.shape) here. It doesn't look like there's much effect. Why nest [v.tolist()]?
It's about making the format consistent for the drawing function.
- 2D tensor
.tolist()→[[1,2], [3,4]](already a list of rows) - 1D tensor
.tolist()→[0,1,2](just a flat list)
By wrapping the 1D case in [...], it becomes [[0,1,2]] — a single row. Now the drawing code can always iterate over "rows" the same way, regardless of input dimension.
Exactly. .tolist() just converts the tensor to native Python structure directly:
- 1D tensor
[1,2,3]→ flat list[1,2,3] - 2D tensor
[[1,2],[3,4]]→ nested list[[1,2],[3,4]]
It's not "removing" nesting — a 1D tensor simply doesn't have a row dimension to begin with. The extra [...] wrapper adds that row dimension artificially for display purposes.
I'm a bit embarrassed to share this, but it's still not clicking. A small example would help me understand better.
No need to be embarrassed! Let me show you:
t1 = Tensor([1, 2, 3]) # 1D, shape (3,)
t2 = Tensor([[1, 2, 3]]) # 2D, shape (1, 3)
print(t1.tolist()) # [1, 2, 3]
print(t2.tolist()) # [[1, 2, 3]]
Same numbers, but t1 has no "row" concept — it's just 3 elements. t2 explicitly has 1 row containing 3 elements.
Want to run this yourself and see?
t1 = Tensor([1, 2, 3]) # 1D, shape (3,)
t2 = Tensor([[1, 2, 3]]) # 2D, shape (1, 3)
print(t1.tolist()) # [1, 2, 3]
print(t2.tolist()) # [[1, 2, 3]]
def _draw_example(data, display_values):
name = data["name"]
keys = list(data["vals"][0].keys())
cols = []
for k in keys:
mat = [
(
vstrut(0.5)
/ text(k, 0.5).fill_color(Color("black")).line_width(0.0)
/ vstrut(0.5)
)
]
for ex in data["vals"]:
v2 = ex[k]
mat.append(_draw_matrix(v2, display_values))
cols.append(mat)
full = _grid(cols)
full = (
vstrut(1)
/ text(name, 0.75).fill_color(Color("black")).line_width(0)
/ vstrut(1)
/ full.center_xy()
)
full = full.pad(1.2).center_xy()
env = full.get_envelope()
set_svg_height(50 * env.height)
set_svg_height(300)
return rectangle(env.width, env.height).fill_color(Color("white")) + full
File: ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py
ex_data = {
"name": "arange",
"vals": [
{k: [v.tolist()] if len(v.shape) == 1 else v.tolist() for k, v in example.items()}
for example in exs
],
}
ex_data
I need some help unpacking what is happening in _draw_example. It's using the chalk library to draw the pictures.
Looking at _draw_example, here's the high-level flow:
- Extract keys (like "input", "output") from the data
- Build columns — for each key, create a column with:
- A text label at the top
- Then
_draw_matrixfor each example's values
- Arrange into a grid with
_grid(cols) - Add a title on top
- Wrap in a white rectangle for the background
The chalk library uses a compositional approach:
/stacks things vertically+overlays thingsvstrut(0.5)adds vertical spacing
Would you like me to dig into any specific part — like how _draw_matrix renders the cells, or how the grid layout works?
Looking at the _draw_example function, I can see it calls set_svg_height(300) at the end. There are a couple of options:
After calling
draw_examples, you can callset_svg_height()with a smaller value (like100or150) to resize the output before it renders.Modify the lib.py file to use a smaller default height.
Which approach would you prefer? If you want the quick fix, try adding this after your draw_examples call:
from chalk import set_svg_height
set_svg_height(100)
Does that help, or would you like to dig into other ways to control the sizing?
def _color(v, display_values):
d = rectangle(1, 1)
if display_values:
return d + text(str(v)[:1] if isinstance(v, bool) else str(v), 0.5).fill_color(
Color("black")
).line_width(0)
else:
if v == 0:
return d
elif v > 0:
return d.fill_color(Color("orange")).fill_opacity(0.4 + 0.6 * (v / 10))
elif v < 0:
return d.fill_color(Color("blue")).fill_opacity(0.4 + 0.6 * (abs(v) / 10))
File: ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py
def _draw_example_small(data, display_values):
name,keys = data["name"],list(data["vals"][0].keys())
cols = []
for k in keys:
mat = [(vstrut(0.5) / text(k, 0.5).fill_color(Color("black")).line_width(0.0) / vstrut(0.5))]
for ex in data["vals"]:
v2 = ex[k]
mat.append(_draw_matrix(v2, display_values))
cols.append(mat)
full = _grid(cols)
full = (vstrut(1) / text(name, 0.75).fill_color(Color("black")).line_width(0) / vstrut(1) / full.center_xy())
full = full.pad(1.2).center_xy()
env = full.get_envelope()
set_svg_height(25 * env.height)
return rectangle(env.width, env.height).fill_color(Color("white")) + full
lib._draw_example = _draw_example_small
Nathan here: I see Jeremy just edited the drawing function to render smaller than the library was doing. A bit of a sidebar. Seems like the original code had a bug where the SVG height was overwritten to be 300 at the end.
Good catch! Yes, the original had:
set_svg_height(50 * env.height)
set_svg_height(300) # This immediately overwrites the line above!
The second call just throws away the dynamic calculation. Jeremy's fix removes that and uses a smaller multiplier (25 instead of 50) so the output scales with content but stays compact.
The key is an empty string "". Looking at the call:
[{"": arange(i)} for i in [5, 3, 9]]
It's empty because arange only has one output and no inputs to display. The draw_examples function uses dict keys as column headers above each example. Since there's nothing meaningful to label (it's just showing the output), an empty string means no label appears above the arrays — which is exactly what we see in the image.
If you had a function with inputs and outputs, you'd use descriptive keys like {"a": input_tensor, "out": output_tensor} to label each column.
Nathan again: There is an input though...an integer. Is the intention to only display tensors though? Isn't a 1D tensor of length 1 an integer?
You're right that there's an integer input (i). The distinction is:
arange(5)takes an integer5— a Pythonint, not a tensor- It returns a tensor
[0,1,2,3,4]
A 1D tensor of length 1 (like Tensor([5])) is different from a plain Python integer 5. The puzzles framework treats integers as configuration parameters (sizes, counts) rather than tensor data to visualize.
You could wrap the input as a tensor if you wanted to display it — that's what the earlier example did with {"input": Tensor([i]), "output": arange(i)}. But for arange, the input is really just a size, not tensor data you're operating on.
Nathan again...Jeremy looks to have done some trickery. It looks like he used fastcore's patch decorator to replace how tensors are presented in Markdown? They now look like lists. But I don't really understand how this works as I don't understand the code. Can you help me get it?
Sure! Let's break it down:
@patchfrom fastcore lets you add methods to existing classes without modifying the original class definition_repr_markdown_is a special method that Jupyter/SolveIt looks for when displaying objects — if it exists, that method's output is used for renderingself:Tensorin the function signature tells@patchwhich class to attach this method to
So this line:
def _repr_markdown_(self:Tensor): return f"```\n{self.numpy()}\n```"
Says: "When displaying a Tensor as markdown, convert it to a numpy array and wrap it in a code fence."
Before the patch, tensors showed that ugly <Tensor <UOp CPU...> representation. After, they show the actual values.
Does that make sense? Want me to clarify any part?
Exactly. Tinygrad tensors don't have a nice built-in way to show their values as a string — they show that verbose internal representation instead. NumPy arrays, on the other hand, have a clean __repr__ that displays the actual numbers. So .numpy() is just a convenient way to get a readable output.
Nathan again. Looks like a similar trick, except when we need to represent a tensor as a string (i.e. when printing a tensor), we convert to a list and then convert that list to a string first. Is my assessment correct?
Correct! print() calls __str__(), which now converts the tensor to a Python list via .tolist(), then converts that list to a string. Same idea as the markdown patch, just using a different conversion path.
Interesting. I didn't know this way of hacking was possible. Could you test my understanding of this new concept with a tiny problem for me to solve? Make sure it's possible (i.e. doesn't use Python's built-in types implemented with C that are immutable) and that I won't break anything.
@patch
def describe(self:Tensor): return f"Tensor with {self.shape[0]} elements"
a.describe()
# examples = [
# (Tensor([False]), Tensor([10]), Tensor([0])),
# (Tensor([False, True]), Tensor([1, 1]), Tensor([-10, 0])),
# (Tensor([False, True]), Tensor([1]), Tensor([-10, 0])),
# (Tensor([[False, True], [True, False]]), Tensor([1]), Tensor([-10, 0])),
# (Tensor([[False, True], [True, False]]), Tensor([[0], [10]]), Tensor([-10, 0])),
# ]
# draw_examples(
# "where", [{"q": q, "a": a, "b": b, "ret": where(q, a, b)} for q, a, b in examples]
# )
draw_examples("where", [{"q": q, "a": a, "b": b, "ret": where(q, a, b)} for q, a, b in examples])
Tell me how to interpret this. What do the colors mean? How is draw_examples setting them, and why? Tell me in detail please.
Looking at the _color function you inspected earlier:
def _color(v, display_values):
d = rectangle(1, 1)
if display_values:
return d + text(str(v)[:1] if isinstance(v, bool) else str(v), 0.5).fill_color(Color("black")).line_width(0)
else:
if v == 0:
return d # white/empty
elif v > 0:
return d.fill_color(Color("orange")).fill_opacity(0.4 + 0.6 * (v / 10))
elif v < 0:
return d.fill_color(Color("blue")).fill_opacity(0.4 + 0.6 * (abs(v) / 10))
Color meanings:
- White/empty: value is
0(orFalse, sinceFalse == 0) - Orange: positive values — darker orange = larger value (opacity scales from 0.4 to 1.0 as value goes from 0 to 10)
- Blue: negative values — darker blue = larger absolute value (same opacity scaling)
In the where visualization:
q(query/condition):False→ white,True→ light orange (sinceTrue == 1)a: positive values like1,10→ orange shadesb: negative values like-10,0→ blue (for -10), white (for 0)ret: the result ofwhere(q, a, b)— picks fromawhereqis True, frombwhereqis False
Why this design?
It lets you visually trace which values got selected: you can see at a glance that ret pulls orange cells from a where q is "lit up" (True), and blue cells from b where q is empty (False). The opacity gradient also helps distinguish magnitudes without needing to display numbers.
Does this make sense? Want me to walk through a specific example column?
Puzzle 1 - ones
Compute ones - the vector of all ones
# Nathan comment: This is from the puzzles notebook!
def ones_spec(out):
for i in range(len(out)):
out[i] = 1
def ones(i: int) -> TT[["i"]]:
raise NotImplementedError
test_ones = make_test("one", ones, ones_spec, add_sizes=["i"])
def make_test(
name,
problem,
problem_spec,
add_sizes=[],
constraint=lambda d: d,
display_values=False,
):
examples = []
for i in range(3):
example, sizes = _spec(problem, 3).example()
example = constraint(example)
out = example["return"].tolist()
del example["return"]
problem_spec(*example.values(), out)
for size in add_sizes:
example[size] = sizes[size]
yours = None
try:
yours = problem(
*map(lambda v: v if isinstance(v, int) else Tensor(v), example.values())
)
except NotImplementedError:
pass
for size in add_sizes:
del example[size]
example["target"] = Tensor(out)
if yours is not None:
example["yours"] = yours
examples.append(example)
diagram = draw_examples(name, examples, display_values)
display(SVG(diagram._repr_svg_()))
@given(_spec(problem))
@settings(deadline=None)
def test_problem(d):
d, sizes = d
d = constraint(d)
out = d["return"].tolist()
del d["return"]
problem_spec(*d.values(), out)
for size in add_sizes:
d[size] = sizes[size]
out = Tensor(out)
out2 = problem(
*map(lambda v: v if isinstance(v, int) else Tensor(v), d.values())
)._broadcast_to(out.shape)
assert np.allclose(out.numpy(), out2.numpy()), (
"Two tensors are not equal\n Spec: \n\t%s \n\t%s" % (out.numpy(), out2.numpy())
)
return test_problem
Nathan here: I haven't encountered inspect.getsource before. It appears that we are looking up a particular function (via globals()[sym_nm]), but what does inspect.getsource do? Extract the function code as a string?
Yes, exactly! inspect.getsource() takes a Python object (function, class, module, etc.) and returns its source code as a string. It reads from the actual .py file where the object was defined.
So getsource("make_test") looks up make_test in the global namespace, then retrieves the original source code that defined it.
'class TinyType(Tensor):\n def __class_getitem__(cls, args: Tuple[Any, ...]):\n if not isinstance(args, tuple):\n args = (args,)\n info = {}\n for arg in args:\n if (\n isinstance(arg, list)\n and info.get("shape") is None\n and all(isinstance(a, (str, int)) for a in arg)\n ):\n info["shape"] = tuple(arg)\n elif isinstance(arg, DType):\n info["dtype"] = arg\n return Annotated[Tensor, info]\n'What is add_sizes? Use &getsource to find details and tell me in detail how it all works. And how does TT[["i"]] fit in?
{
"id": "toolu_01A2AYuL3A67zJXvEtQYEKu7",
"call": {
"function": "getsource",
"arguments": {
"sym_nm": "make_test"
}
},
"result": "'def make_test(\\n name,\\n problem,\\n problem_spec,\\n add_sizes=[],\\n constraint=lambda d: d,\\n <TRUNCATED>"
}
{
"id": "toolu_01329H8waijLQbeGPn215tv5",
"call": {
"function": "getsource",
"arguments": {
"sym_nm": "TT"
}
},
"result": "'class TinyType(Tensor):\\n def __class_getitem__(cls, args: Tuple[Any, ...]):\\n if not isinstanc<TRUNCATED>"
}
Let me also look at _spec to see how it uses the type annotations:
{
"id": "toolu_018oi3PisVrRR9qLmijYzkjH",
"call": {
"function": "getsource",
"arguments": {
"sym_nm": "_spec"
}
},
"result": "KeyError: '_spec'"
}
It's likely imported. Let me check the lib module:
{
"id": "toolu_01C3xMt9pVBxswQjo6X8pFCn",
"call": {
"function": "read_url",
"arguments": {
"url": "https://raw.githubusercontent.com/mesozoic-egg/tinygrad-tensor-puzzles/main/src/lib.py",
"as_md": "False"
}
},
"result": "'404: Not Found'"
}
No worries — I have enough from what we've seen. Here's how it all works:
TT[["i"]] (TinyType)
This is a type annotation helper. When you write TT[["i"]], it creates an Annotated[Tensor, {"shape": ("i",)}]. The "i" is a named dimension — a placeholder that the test framework uses to generate tensors of varying sizes.
So def ones(i: int) -> TT[["i"]] says: "this function takes an integer i and returns a 1D tensor whose length is i".
add_sizes in make_test
Looking at make_test, here's the key flow:
_spec(problem, 3)uses type annotations to generate random example data — it creates tensors with shapes based on named dimensions like"i", and tracks what sizes it picked in asizesdict.For the
onespuzzle, the function signature hasi: intbut the test framework generates tensor inputs by default from type hints.add_sizes=["i"]tellsmake_test: "after generating the example, also pass the integer value of dimension"i"to the function". This line does it:Copied!for size in add_sizes: example[size] = sizes[size]
So if the framework generated a tensor of shape
(5,),sizes["i"] = 5, and this gets passed to yourones(i=5)function.
In summary: add_sizes bridges named dimensions in the type system to actual integer arguments your function needs. Without it, the test wouldn't know to pass i as an integer.
Does this make sense? Want me to trace through a specific example?
Nathan here: I'm very lost by the answer to this question Jeremy posed. Let me try to frame it in my own words, and you can correct me or add some context to help me understand.
- For testing, we can use strings like
"i"to indicate arbitrary tensor dimensions. - For the functions we are writing, we need to ensure that they can return this
TinyTypeobject. - I don't understand
_spec. Is it generating 3 samples at random? - What do you mean that the test framework generates tensor inputs by default from type hints?
- Explain step three in the flow you outlined.
- Is this flow just for doing the test?
Let me take these one at a time.
Your first point is correct — "i" is a placeholder for "some size we'll pick at test time."
Your second point needs clarification — your functions don't return TinyType objects. They return regular Tensor objects. TT[["i"]] is just a hint to the test framework about what shape to expect. It's metadata for testing, not something your code needs to produce.
On _spec — yes, it generates random test cases. The 3 means "generate 3 examples for the visualization." When you run the actual test, it generates many more random cases via the hypothesis library.
Let me pause here — does this clarify those first few points before I explain steps 3-4 and the type hints?
Why do we return write the hint then instead of referring to a regular tensor object in the ones function's signature? Why does Python not break?
How many samples are generated? Because I can only see the three in the code for make_test.
Why Python doesn't break:
Type hints in Python are just metadata — they're not enforced at runtime. You can write def foo() -> int and return a string, and Python won't complain. The hints exist for tools (like this test framework, or type checkers like mypy) to read and use. So TT[["i"]] is only read by make_test, not by Python itself.
How many samples:
- 3 samples for the visualization (the diagram you see)
- Many more when you call
run_test()— that's whenhypothesiskicks in and generates dozens/hundreds of random test cases to thoroughly check your solution
Have you tried running run_test(test_ones) yet to see what happens?
So it looks like my solution worked. Did I break any of the rules without realizing?
I see 3 cases visualized, but no idea how many were generated in the background, but I guess they all passed?
Your solution is valid! Let's check:
- ✅ One line of code
- ✅ Uses only allowed operations:
arange,where, comparison (>=) - ✅ No forbidden operations like
sum,view,tensor, etc.
And yes, hypothesis ran many more random test cases in the background — you just see "Correct!" because they all passed. If any had failed, it would show you the failing case.
Nice work on puzzle 1! Ready for puzzle 2?
Based on the source I saw, make_test calls _spec(problem, 3) — that 3 is the default size used for named dimensions when generating examples for the visual diagram.
However, when you actually run the test (with run_test), it uses Hypothesis to generate many random sizes. The test framework likely draws various values for i to ensure your solution works across different tensor sizes, not just 3.
Want me to look up the run_test function to see exactly how it varies the sizes during testing?
Yes please. I assume somehow it knows not to pass negative numbers or numbers in the trillions?
{
"id": "toolu_0177AtoZAkj3b1RxBdRTJgGG",
"call": {
"function": "getsource",
"arguments": {
"sym_nm": "run_test"
}
},
"result": "'def run_test(fn):\\n fn()\\n # Generate a random puppy video if you are correct.\\n print(\"Correct!<TRUNCATED>"
}
That just shows the puppy reward! Let me look at the actual test function that make_test returns:
{
"id": "toolu_01BQgnhnq7Lq7qMRfRhRy5Cj",
"call": {
"function": "read_url",
"arguments": {
"url": "https://github.com/mesozoic-egg/tinygrad-tensor-puzzles/blob/main/src/lib.py",
"selector": "#L100-L200"
}
},
"result": "''"
}
Let me try a different approach - looking at the full make_test implementation in the context you have:
Looking at what I can see from the truncated source, make_test uses Hypothesis's @given decorator with strategies built from type annotations. Hypothesis has built-in "shrinking" and sensible defaults - for integer sizes, it typically generates values in a reasonable range (like 1-100) and avoids edge cases like negatives or huge numbers unless explicitly configured.
The constraint function you can pass to make_test (like constraint_set_length in the examples) lets you further restrict generated values.
So yes, Hypothesis automatically keeps sizes reasonable - no negatives or trillions! It's smart about testing typical cases efficiently.
Want to try implementing the ones puzzle now, or shall we explore more of how the testing works?
def ones(i: int) -> TT[["i"]]:
return arange(i)*0+1
test_ones = make_test("one", ones, ones_spec, add_sizes=["i"])
Yes! Jeremy's solution is elegant:
arange(i)→[0, 1, 2, ...]* 0→[0, 0, 0, ...]+ 1→[1, 1, 1, ...]
Yours works perfectly too, but Jeremy's avoids where entirely — just arithmetic. Both are valid one-liners using allowed operations!
Ready for puzzle 2?
Puzzle 2 - sum
Compute sum - the sum of a vector
None (or equivalently np.newaxis) inserts a new axis of size 1 at that position.
ahas shape(3,)— a 1D tensor with 3 elementsa[None]has shape(1, 3)— a 2D tensor with 1 row and 3 columns
This is useful for broadcasting and matrix operations. In this case, it turns a vector into a row vector so you can do matrix multiplication with @.
Yes! From the rules in the README:
- You are allowed
@, arithmetic, comparison,shape, any indexing (e.g.a[:j], a[:, None], a[arange(10)]), and previous puzzle functions
So you can absolutely use ones (and later sum, etc.) in subsequent puzzles. That's by design — the puzzles build on each other!
Looks like you've already figured out a nice approach for sum using the dot product with ones. 😊
def sum_spec(a, out):
out[0] = 0
for i in range(len(a)):
out[0] += a[i]
def sum(a: TT[["i"]]) -> TT[[1]]:
return a[None]@ones(a.shape[0])
test_sum = make_test("sum", sum, sum_spec, display_values=True)
Would these still work without the type annotations? Why is TT taking a list instead of the usual implicit tuple in __getitem__?
Would it work without type annotations?
The functions themselves would work fine. But the test framework wouldn't — it relies on those annotations to automatically generate tensors of the right shapes for testing. Without TT[["i"]], make_test wouldn't know what shape of tensor to create.
Why a list instead of a tuple?
It's a Python __getitem__ quirk:
TT["i"]→ passes the string"i"TT["i", "j"]→ passes the tuple("i", "j")TT[("i",)]→ also passes just("i",)— this would work, but looks odd
The problem is there's no clean way to pass a single-element tuple. You'd have to write TT["i",] (trailing comma) which is easy to miss or forget.
Using a list like TT[["i"]] makes it explicit and consistent:
TT[["i"]]→ 1D tensor with dimension named "i"TT[["i", "j"]]→ 2D tensor with dimensions "i" and "j"TT[["i"], dtypes.bool]→ 1D boolean tensor
It's a design choice to avoid the trailing-comma gotcha!
Nathan here: I don't understand Jeremy's question, and I definitely don't understand the answer that was provided. Can we unpack things a little bit more before proceeding?
- What is "these" referring to in the question?
- I don't know what Jeremy means by "taking a list."
- What is the "usual implicit tuple in
__getitem__
Let me back up and clarify.
"These" refers to the function definitions — would def sum(a): ... work without the TT[["i"]] type hint?
"Taking a list" — look at TT[["i"]]. There's a list ["i"] inside the square brackets. Compare to normal indexing like a[0] where you pass an integer.
"Usual implicit tuple" — when you write a[1, 2], Python actually passes the tuple (1, 2) to __getitem__. You don't write a[(1, 2)] — the tuple is implicit.
Does that help clarify the question? Want me to show a quick example of how __getitem__ receives different types?
Okay, I think that helps clarify. And just to be sure, __getitem__ is like the built-in helper method for indexing and slicing, right?
You're absolutely right! The __class_getitem__ method could easily check:
if isinstance(args, str):
args = (args,)
Then you could write:
TT["i"]→ 1DTT["i", "j"]→ 2DTT["i", dtypes.bool]→ 1D bool
Much cleaner. Looks like a small oversight in the library design. Would you want to patch it locally, or just live with the lists for now?
Nathan here: Jeremy is being a bit nit-picky with how the library works. I guess he doesn't like the list input and he wants it to default as a tuple, so you don't have to put the extra square brackets. Is that right?
{
"id": "toolu_01AKEmebuqeW5BEWoQjkR7LJ",
"call": {
"function": "getsource",
"arguments": {
"sym_nm": "TT"
}
},
"result": "'class TinyType(Tensor):\\n def __class_getitem__(cls, args: Tuple[Any, ...]):\\n if not isinstanc<TRUNCATED>"
}
Easy fix! Change:
if isinstance(arg, list) and ...
to:
if isinstance(arg, (list, str)) and ...
Then handle the string case:
if isinstance(arg, str):
info["shape"] = (arg,)
elif isinstance(arg, list) and ...
Want to patch it? Something like:
from typing import Annotated, Any, Tuple
from tinygrad import Tensor
from tinygrad.dtype import DType
class TT(Tensor):
def __class_getitem__(cls, args: Tuple[Any, ...]):
if not isinstance(args, tuple): args = (args,)
info = {}
for arg in args:
if isinstance(arg, str) and info.get("shape") is None:
info["shape"] = (arg,)
elif isinstance(arg, list) and info.get("shape") is None and all(isinstance(a, (str, int)) for a in arg):
info["shape"] = tuple(arg)
elif isinstance(arg, DType):
info["dtype"] = arg
return Annotated[Tensor, info]
Then you can use TT["i"] directly!
class TinyType(Tensor):
def __class_getitem__(cls, args: Tuple[Any, ...]):
if not isinstance(args, tuple):
args = (args,)
info = {}
for arg in args:
if (
isinstance(arg, list)
and info.get("shape") is None
and all(isinstance(a, (str, int)) for a in arg)
):
info["shape"] = tuple(arg)
elif isinstance(arg, DType):
info["dtype"] = arg
return Annotated[Tensor, info]
File: ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py
Nathan here again: can you explain why the changes work? I don't understand too much of this "underneath the hood" Python and am trying to learn more bit by bit.
Sure! Let's break it down step by step.
The original code only checks for lists:
if isinstance(arg, list) and ...
So when you write TT["i"], the string "i" gets passed to the method, but the code doesn't know what to do with a bare string — it only handles lists like ["i"].
The fix adds a check for strings:
if isinstance(arg, str) and info.get("shape") is None:
info["shape"] = (arg,)
This says: "If someone passes a single string, wrap it in a tuple and use that as the shape."
So now:
TT["i"]→argis"i"→info["shape"] = ("i",)✓TT[["i", "j"]]→argis["i", "j"]→info["shape"] = ("i", "j")✓ (existing behavior)
Does that make sense? Want me to explain info.get("shape") or any other part?
Note: Some earlier messages were removed from context due to length — originally 107 messages, now 106.
class TT(Tensor):
def __class_getitem__(cls, args: Tuple):
if not isinstance(args, tuple): args = (args,)
info = {}
shape_parts = []
for arg in args:
if isinstance(arg, (str, int)): shape_parts.append(arg)
elif isinstance(arg, list) and all(isinstance(a, (str, int)) for a in arg): shape_parts.extend(arg)
elif isinstance(arg, DType): info["dtype"] = arg
if shape_parts: info["shape"] = tuple(shape_parts)
return Annotated[Tensor, info]
Nice! That's much cleaner. All four cases work:
TT["i"]→ single dimension ✓TT["i", "j"]→ multiple dimensions ✓TT[["i", "j"]]→ list still works (backward compatible) ✓TT["i", dtypes.bool]→ dimension + dtype ✓
Your sanity is preserved. Onwards with the puzzles?
Nathan again: it looks like Jeremy made even further changes to what was recommended. Any insights as to why?
Good eye! Jeremy's version is more flexible. Key differences:
- Uses
shape_parts = []to accumulate dimensions incrementally extendvs separate handling — if you pass a list, it extends; if you pass strings/ints directly, it appends each one
This means TT["i", "j"] works naturally: loop through args, append "i", then append "j", then build the tuple at the end. My version would've needed more special-casing.
It's a cleaner, more unified approach.
Note: The dialog is getting long — some earlier messages have been removed from context. Originally 112 messages, now 111.
Can you elaborate on these differences. I don't understand how shape_parts helps and I don't know what extend is. I don't see how this is an improvement yet.
shape_parts as an accumulator:
Instead of trying to set info["shape"] in one go, we collect pieces into a list first, then convert to tuple at the end. This lets us handle multiple args one at a time.
extend vs append:
appendadds one item:[1].append(2)→[1, 2]extendadds all items from another list:[1].extend([2, 3])→[1, 2, 3]
Why it matters:
When you write TT["i", "j"], Python passes args = ("i", "j"). The loop goes:
- See
"i"(a string) → append →shape_parts = ["i"] - See
"j"(a string) → append →shape_parts = ["i", "j"]
When you write TT[["i", "j"]], Python passes args = (["i", "j"],). The loop goes:
- See
["i", "j"](a list) → extend →shape_parts = ["i", "j"]
Same result, different input formats — both work with one unified loop.
Does that help clarify?
Note: Some earlier messages were removed from context due to length — originally 113, now 112.
Puzzle 3 - outer
Compute outer - the outer product of two vectors
Is there an outer in the docs?
Looking at the tinygrad API reference provided earlier — no, there's no outer function listed.
But remember, the puzzle rules say you can only use @, arithmetic, comparison, shape, indexing, arange, and where. So even if it existed, you couldn't use it!
The trick here is broadcasting. Think about what shapes a and b need to be so that when you multiply them, you get a 2D result where out[i,j] = a[i] * b[j].
What shapes would you need to broadcast a (3,) and a (2,) into a (3, 2) result?
Nice work! You found both approaches:
a[:,None] @ b[None]— matrix multiplication of(3,1)by(1,2)a[:,None] * b[None]— element-wise multiply with broadcasting
Both give the same result for outer product! The second one (element-wise) is more "pure" broadcasting since it doesn't rely on @. Either would be a valid puzzle solution.
Ready to write up the outer function and test it?
def outer_spec(a, b, out):
for i in range(len(out)):
for j in range(len(out[0])):
out[i][j] = a[i] * b[j]
def outer(a: TT["i"], b: TT["j"]) -> TT["i", "j"]:
return b[None] * a[:,None]
test_outer = make_test("outer", outer, outer_spec, display_values=True)
Puzzle 4 - diag
Compute diag - the diagonal vector of a square matrix
def diag_spec(a, out):
for i in range(len(a)):
out[i] = a[i][i]
def diag(a: TT[["i", "i"]]) -> TT[["i"]]:
return a[arange(a.shape[0]), arange(a.shape[0])]
test_diag = make_test("diag", diag, diag_spec)
Can you check that my solutions have been compliant with the rules so far? Especially with the lines of code?
All your solutions are compliant! Let me check each:
Puzzle 1 - ones: return arange(i)*0+1
- ✅ One line, ~20 chars
- ✅ Uses:
arange, arithmetic
Puzzle 2 - sum: return a[None]@ones(a.shape[0])
Puzzle 3 - outer: return b[None] * a[:,None]
- ✅ One line, ~25 chars
- ✅ Uses: indexing, arithmetic
Puzzle 4 - diag: return a[arange(a.shape[0]), arange(a.shape[0])]
All within rules and under 80 columns. Nice work!
Puzzle 5 - eye
Compute eye - the identity matrix
Note to SolveIt: Nathan took a two week break and is now back at this. Can you get him up to speed again?
Welcome back Nathan! Quick recap:
What you're doing: Tinygrad tensor puzzles — implementing tensor operations in one line using only basic building blocks (no loops).
Rules: Only @, arithmetic, comparison, shape, indexing, arange, where, and your previous solutions.
Your toolkit so far:
arange(i)→[0, 1, 2, ..., i-1]where(q, a, b)→ pick fromawhere true,bwhere falseones(i)→[1, 1, 1, ...]sum(a)→ sum of vectorouter(a, b)→ outer productdiag(a)→ diagonal of square matrix
Puzzle 5 - eye: Create an identity matrix of size i × i (1s on diagonal, 0s elsewhere).
Any of this feel fuzzy, or ready to dive in?
def eye_spec(out):
for i in range(len(out)):
out[i][i] = 1
def eye(j: int) -> TT[["j", "j"]]:
return where(arange(j) == arange(j)[:, None], 1, 0)
test_eye = make_test("eye", eye, eye_spec, add_sizes=["j"])
Puzzle 6 - triu
Compute triu - the upper triangular matrix
outer(arange(3) < 1,arange(3) == 0) + outer(arange(3) < 3-1, arange(3) > 0) + outer(arange(3) > 3-2,arange(3) > 3-2)
where(outer(arange(3) < 1,arange(3) == 0) + outer(arange(3) < 3-1, arange(3) > 0) + outer(arange(3) > 3-2,arange(3) > 3-2), ss, 0)
n = 3
where(outer(arange(n) == 0, arange(n) == 0) + outer(arange(n) < n - 1, arange(n) > 0) + outer(arange(n) == n-1, arange(n) == n-1), ss, 0)
n = 3
where(outer(arange(n) == 0, arange(n) == 0) + outer(arange(n) < n - 1, arange(n) > 0) + outer(arange(n) == n-1, arange(n) == n-1), ss, 0)
Think back to how you solved eye. You compared row indices to column indices with ==.
What relationship between row index and column index defines the upper triangle?
def triu_spec(out):
for i in range(len(out)):
for j in range(len(out)):
if i <= j:
out[i][j] = 1
else:
out[i][j] = 0
def triu(j: int) -> TT[["j", "j"]]:
return where(arange(s.shape[1]) < arange(s.shape[0])[:, None] + j, 0, s)
test_triu = make_test("triu", triu, triu_spec, add_sizes=["j"])
This is weird, how do I reference the matrix at hand? j is the diagonal argument, so what should the matrix be?
Look at the spec more carefully — triu_spec(out) only takes one argument. It's not operating on an input matrix; it's creating a triangular pattern of 1s and 0s.
So triu(j) should return a j × j matrix with 1s in the upper triangle, 0s below — like a mask, similar to how eye returns a pattern.
Your exploration with s was good for building intuition, but the actual puzzle just needs the pattern, not a matrix transformation.
def triu_spec(out):
for i in range(len(out)):
for j in range(len(out)):
if i <= j:
out[i][j] = 1
else:
out[i][j] = 0
def triu(j: int) -> TT[["j", "j"]]:
return where(arange(j) < arange(j)[:, None], 0, 1)
test_triu = make_test("triu", triu, triu_spec, add_sizes=["j"])
Hmm, I feel like a dummy. It was so obvious when you gave me the hint. Also I didn't understand the problem at all until the very end! Man, I feel like a cheater....
Don't be hard on yourself! Two things happened:
The problem statement was misleading — you reasonably assumed
triutransforms an existing matrix (like NumPy's version does). The puzzle's version just creates a mask. That's a spec-reading issue, not a skill issue.You actually solved the hard part — comparing row vs column indices with broadcasting. That's the core insight. Once you understood what was being asked, you got it immediately.
Misreading a problem is normal. The important thing is you now deeply understand the row/column index comparison pattern — you'll recognize it instantly next time.
Ready for puzzle 7?
Puzzle 7 - cumsum
Compute cumsum - the cumulative sum
def cumsum_spec(a, out):
total = 0
for i in range(len(out)):
out[i] = total + a[i]
total += a[i]
def cumsum(a: TT[["i"]]) -> TT[["i"]]:
return ones(a.shape[0])@outer(a, triu(a.shape[0]))[0]
test_cumsum = make_test("cumsum", cumsum, cumsum_spec, display_values=True)
You're doing great! That solution is:
And you solved it completely independently — no hints needed. That's a big step up from the earlier puzzles. You're clearly getting the hang of thinking in terms of broadcasting and building on previous solutions.
The [0] indexing to unwrap the extra dimension from outer is a nice practical fix too.
Ready for puzzle 8?
Puzzle 8 - diff
Compute diff - the running difference
def diff_spec(a, out):
for i in range(1, len(a)):
out[i - 1] = a[i] - a[i - 1]
def diff(a: TT[["i"]]) -> TT[["i - 1"]]:
return a[1:] - a[:a.shape[0]-1]
test_diff = make_test("diff", diff, diff_spec, display_values=True)
Yep, completely valid!
- ✅ One line, ~30 chars
- ✅ Uses only: slicing, arithmetic,
shape
Two puzzles in a row with no hints — you're cruising now! Puzzle 9?
Puzzle 9 - vstack
Compute vstack - the matrix of two vectors
def vstack_spec(a, b, out):
for i in range(len(out[0])):
out[0][i] = a[i]
out[1][i] = b[i]
def vstack(a: TT[["i"]], b: TT[["i"]]) -> TT[[2, "i"]]:
return outer(arange(2)==0, a) + outer(arange(2)==1, b)
test_vstack = make_test("vstack", vstack, vstack_spec)
Puzzle 10 - roll
Compute roll - the vector shifted 1 circular position
def roll_spec(a, out):
for i in range(len(out)):
if i + 1 < len(out):
out[i] = a[i + 1]
else:
out[i] = a[i + 1 - len(out)]
def roll(a: TT[["i"]], i: int) -> TT[["i"]]:
return a[(arange(a.shape[0]) - i) % a.shape[0]]
test_roll = make_test("roll", roll, roll_spec, add_sizes=["i"], display_values=True)
Look at the spec closely — it always shifts by exactly 1 position. There's no roll amount parameter.
But your function signature has i: int which conflicts with "i" being the dimension name, and add_sizes=["i"] passes the array size as i.
The puzzle wants a fixed roll-by-1 function, not a variable roll amount. Try removing the i parameter and hardcoding the shift to 1.
I don't understand then what i is supposed to be then? Seems like we are working exclusively with 1D tensors again.
In the type hint TT[["i"]], the "i" is just naming the dimension — it means "this tensor has some length called i."
Looking at the spec, the function signature should be def roll(a) — just one input tensor, no roll amount. The shift is always 1.
So remove i: int from your parameters and remove add_sizes=["i"] from the test. You can still use a.shape[0] to get the tensor length.
def roll_spec(a, out):
for i in range(len(out)):
if i + 1 < len(out):
out[i] = a[i + 1]
else:
out[i] = a[i + 1 - len(out)]
def roll(a: TT[["i"]], i: int) -> TT[["i"]]:
return a[(arange(a.shape[0]) + 1) % a.shape[0]]
test_roll = make_test("roll", roll, roll_spec, add_sizes=["i"], display_values=True)
read_url(url=https://github.com/obadakhalili/tinygrad-tensor-pu…)
{
"id": "toolu_01RNxEhxBsYaGQEEjDsxtqqH",
"call": {
"function": "read_url",
"arguments": {
"url": "https://github.com/obadakhalili/tinygrad-tensor-puzzles"
}
},
"result": "'# obadakhalili/tinygrad-tensor-puzzles\\nSolve puzzles to improve your tinygrad skills!\\n\\n## / Files\\n- .gitignore\\n- LICENSE\\n- Pipfile\\n- Pipfile.lock\\n- README.md\\n- \ud83d\udcc1 assets\\n- pyproject.toml\\n- requirements.txt\\n- \ud83d\udcc1 src\\n\\n## README\\n# tinygrad Tensor Puzzles\\n\\n> This fork is an adaptation of [Sasha\\'s Tensor Puzzles](https://github.com/srush/Tensor-Puzzles) in tinygrad.\\n\\n## Motivation\\n\\nWhen learning a tensor library like [tinygrad](https://docs.tinygrad.org/) or Numpy it\\nis tempting to rely on the standard library (or more honestly\\nStackOverflow) to find a magic function for ever<TRUNCATED>"
}
Rule 2 says: "You are allowed @, arithmetic, comparison, shape, any indexing..."
The % (modulo) operator is an arithmetic operation, so yes — you're in the clear!
Puzzle 11 - flip
Compute flip - the reversed vector
def flip_spec(a, out):
for i in range(len(out)):
out[i] = a[len(out) - i - 1]
def flip(a: TT[["i"]], i: int) -> TT[["i"]]:
return a[(arange(a.shape[0]) - a.shape[0] + 1)*-1]
test_flip = make_test("flip", flip, flip_spec, add_sizes=["i"])
Puzzle 12 - compress
Compute compress - keep only masked entries (left-aligned)
def compress_spec(g, v, out):
j = 0
for i in range(len(g)):
if g[i]:
out[j] = v[i]
j += 1
def compress(g: TT[["i"], dtypes.bool], v: TT[["i"]], i: int) -> TT[["i"]]:
raise NotImplementedError
test_compress = make_test(
"compress", compress, compress_spec, add_sizes=["i"], display_values=True
)
Looking at the spec:
gis a boolean mask (True/False for each position)vis a vector of values- Output: take values from
vwheregis True, and pack them to the left
Example: if g = [True, False, True, False] and v = [5, 6, 7, 8], the output would be [5, 7, 0, 0] — the "kept" values (5 and 7) are moved to the front.
Does that help clarify the goal?
Let me make it concrete:
g = [True, False, True, False]
v = [5, 6, 7, 8 ]
The output should be [5, 7, 0, 0]. So:
v[0]=5(first True) → goes to output slot 0v[2]=7(second True) → goes to output slot 1
cumsum(g) - 1 = [0, 0, 1, 1] tells you each element's "destination" if it were kept. Index 0 would go to slot 0, index 2 would go to slot 1.
Does that help clarify?
def compress_spec(g, v, out):
j = 0
for i in range(len(g)):
if g[i]:
out[j] = v[i]
j += 1
def compress(g: TT[["i"], dtypes.bool], v: TT[["i"]], i: int) -> TT[["i"]]:
return where(g,v,0)@((cumsum(g)-1)[:,None] == arange(g.shape[0]))
test_compress = make_test(
"compress", compress, compress_spec, add_sizes=["i"], display_values=True
)
That one was definitely the hardest so far! But you actually did a lot yourself:
- You figured out
where(g, v, 0)to zero out the non-kept values - You realized you needed a permutation matrix
- You explored
cumsum(g)as a way to track positions
The tricky insight was using cumsum(g)-1 as destination indices and comparing with arange to build the matrix. That's a classic but non-obvious pattern.
You're 12 puzzles in — only 9 more to go!
Puzzle 13 - pad_to
Compute pad_to - eliminate or add 0s to change size of vector
def pad_to_spec(a, out):
for i in range(min(len(out), len(a))):
out[i] = a[i]
def pad_to(a: TT[["i"]], j: int) -> TT[["j"]]:
raise NotImplementedError
test_pad_to = make_test(
"pad_to", pad_to, pad_to_spec, add_sizes=["j"], display_values=True
)
Looking at the spec, it's not about removing 0s — it's about resizing the vector to length j:
- If
j > len(a): extend with zeros on the right - If
j < len(a): truncate to firstjelements - If
j == len(a): keep as-is
So pad_to([6,8,7], 5) → [6,8,7,0,0] and pad_to([6,8,7,14,9], 2) → [6,8].
Does that clarify the goal?
I got an error...can you clarify the variable names. I feel like you are mixing things up here.
j = 5
where(arange(j) < x.shape[0], x[arange(j) % x.shape[0]], x@eye(j)[:x.shape[0],:])[:j]
def pad_to_spec(a, out):
for i in range(min(len(out), len(a))):
out[i] = a[i]
def pad_to(a: TT[["i"]], j: int) -> TT[["j"]]:
return where(arange(j) < a.shape[0], a[arange(j) % a.shape[0]], 0)
test_pad_to = make_test(
"pad_to", pad_to, pad_to_spec, add_sizes=["j"], display_values=True
)
You did well! The hint pointed you toward % for safe indexing, but you:
- Figured out how to combine it with
whereto zero out padded positions - Tested both cases (padding and truncating) yourself
- Arrived at a clean unified solution
Solution check:
13 down, 8 to go!
Puzzle 14 - sequence_mask
Compute sequence_mask - pad out to length per batch
def sequence_mask_spec(values, length, out):
for i in range(len(out)):
for j in range(len(out[0])):
if j < length[i]:
out[i][j] = values[i][j]
else:
out[i][j] = 0
def sequence_mask(values: TT[["i", "j"]], length: TT[["i"], dtypes.int]) -> TT[["i", "j"]]: # fmt: skip
raise NotImplementedError
def constraint_set_length(d):
d["length"] = d["length"] % d["values"].shape[1]
return d
test_sequence = make_test(
"sequence_mask",
sequence_mask,
sequence_mask_spec,
constraint=constraint_set_length,
display_values=True,
)
Looking at the spec:
valuesis a 2D matrix (i rows × j columns)lengthis a 1D vector with one value per row- For each row, keep the first
length[i]elements, zero out the rest
Example: if values = [[5,3,2], [1,4,6]] and length = [2, 1]:
- Row 0: keep first 2 →
[5, 3, 0] - Row 1: keep first 1 →
[1, 0, 0]
It's like a per-row pad_to where each row has its own cutoff length.
It's been a couple weeks since I looked at this, can you remind me what all of the functions I've built do and how they work briefly before I tackle this latest puzzle?
Sure! Here's your toolkit:
| Function | What it does | Key insight |
|---|---|---|
arange(i) |
[0,1,2,...,i-1] |
Base building block |
where(q,a,b) |
Pick a if true, b if false |
Vectorized if-statement |
ones(i) |
[1,1,1,...] |
arange(i)*0+1 |
sum(a) |
Sum all elements | a[None] @ ones(n) |
outer(a,b) |
2D grid of a[i]*b[j] |
a[:,None] * b[None] |
diag(a) |
Diagonal of matrix | Index with arange twice |
eye(j) |
Identity matrix | where(row == col, 1, 0) |
triu(j) |
Upper triangle mask | where(row <= col, 1, 0) |
cumsum(a) |
Running totals | Matrix multiply with triu |
diff(a) |
Differences a[i+1]-a[i] |
a[1:] - a[:-1] |
vstack(a,b) |
Stack two vectors as rows | outer with boolean masks |
roll(a) |
Circular shift by 1 | (arange + 1) % n indexing |
flip(a) |
Reverse order | Index with reversed arange |
compress(g,v) |
Keep masked values, left-align | cumsum for destinations, matrix multiply |
pad_to(a,j) |
Resize to length j | where + % n for safe indexing |
Is there a hint you can give me to shorten my solution that doesn't give everything away?
Think about how outer(a, ones(n)) and outer(ones(n), b) relate to broadcasting with [:, None] and [None, :]. You might not need outer at all here...
def sequence_mask_spec(values, length, out):
for i in range(len(out)):
for j in range(len(out[0])):
if j < length[i]:
out[i][j] = values[i][j]
else:
out[i][j] = 0
def sequence_mask(values: TT[["i", "j"]], length: TT[["i"], dtypes.int]) -> TT[["i", "j"]]: # fmt: skip
return where(arange(values.shape[1])[None,:] >= length[:,None], 0, values)
def constraint_set_length(d):
d["length"] = d["length"] % d["values"].shape[1]
return d
test_sequence = make_test(
"sequence_mask",
sequence_mask,
sequence_mask_spec,
constraint=constraint_set_length,
display_values=True,
)
Any tips to improve my memory on using None indexing? It seems very powerful but I never default to it.
A mental model that might help:
[:, None] = "stretch this into a column"
[None, :] = "stretch this into a row"
When you need two 1D arrays to interact and produce a 2D result, ask yourself: "Which one should vary down the rows, and which across the columns?"
Practice prompt: given a = [1,2,3] and b = [10,20], how would you create this grid?
[[10, 20],
[20, 40],
[30, 60]]
Puzzle 15 - bincount
Compute bincount - count number of times an entry was seen
def bincount_spec(a, out):
for i in range(len(a)):
out[a[i]] += 1
def bincount(a: TT[["i"]], j: int) -> TT[["j"]]:
raise NotImplementedError
def constraint_set_max(d):
d["a"] = d["a"] % d["return"].shape[0]
return d
test_bincount = make_test(
"bincount",
bincount,
bincount_spec,
add_sizes=["j"],
constraint=constraint_set_max,
display_values=True,
)
Can you give me an example to experiment with? For some reason, I'm not understanding the links and how they align with the puzzles. They seem to differ a lot...
def bincount_spec(a, out):
for i in range(len(a)):
out[a[i]] += 1
def bincount(a: TT[["i"]], j: int) -> TT[["j"]]:
return ones(a.shape[0])@(a[:,None] == arange(j))
def constraint_set_max(d):
d["a"] = d["a"] % d["return"].shape[0]
return d
test_bincount = make_test(
"bincount",
bincount,
bincount_spec,
add_sizes=["j"],
constraint=constraint_set_max,
display_values=True,
)
Awesome, how did I do? Did I miss any obvious tricks I should have learned along the way yet?
Excellent work — you solved it completely on your own!
No missed tricks — you nailed the key insight: "counting = summing booleans." And you used [:,None] broadcasting naturally without prompting. That's real progress!
15 down, 6 to go!
Puzzle 16 - scatter_add
Compute scatter_add - add together values that link to the same location
def scatter_add_spec(values, index, out):
for j in range(len(values)):
out[index[j]] += values[j]
def scatter_add(value: TT[["i"]], index: TT[["i"]], j: int) -> TT[["j"]]:
raise NotImplementedError
def constraint_set_max(d):
d["index"] = d["index"] % d["return"].shape[0]
return d
test_scatter_add = make_test(
"scatter_add",
scatter_add,
scatter_add_spec,
add_sizes=["j"],
constraint=constraint_set_max,
display_values=True,
)
Again, I don't understand their implementation and need an example. I don't understand what j represents here.
Sure! Say:
values = [10, 20, 30, 40]index = [0, 2, 0, 1]j = 3(output size)
Each value goes to the slot specified by its index:
- Slot 0 gets:
values[0] + values[2]= 10 + 30 = 40 - Slot 1 gets:
values[3]= 40 - Slot 2 gets:
values[1]= 20
Output: [40, 40, 20]
So j is the size of the output — how many "buckets" to scatter into. It's like bincount, but instead of counting, you're summing values.
Isn't j just the maximum value in index + 1? I guess it's pulled out to make the puzzle feasible?
Exactly right! In practice you'd often derive it from max(index) + 1, but since max isn't in our allowed operations, they provide j directly. The constraint function ensures index values stay within [0, j-1].
Does the problem make sense now? It's very similar to bincount — just weighting by values instead of counting 1s.
def scatter_add_spec(values, index, out):
for j in range(len(values)):
out[index[j]] += values[j]
def scatter_add(value: TT[["i"]], index: TT[["i"]], j: int) -> TT[["j"]]:
return value@(arange(j) == index[:,None])
def constraint_set_max(d):
d["index"] = d["index"] % d["return"].shape[0]
return d
test_scatter_add = make_test(
"scatter_add",
scatter_add,
scatter_add_spec,
add_sizes=["j"],
constraint=constraint_set_max,
display_values=True,
)
Puzzle 17 - flatten
Compute flatten
def flatten_spec(a, out):
k = 0
for i in range(len(a)):
for j in range(len(a[0])):
out[k] = a[i][j]
k += 1
def flatten(a: TT[["i", "j"]]) -> TT[["i * j"]]:
raise NotImplementedError
test_flatten = make_test("flatten", flatten, flatten_spec)
Welcome back Nathan! Here's where you are:
Puzzle 17 - flatten: Turn a 2D matrix into a 1D vector, row by row. E.g. [[1,2,3],[4,5,6]] → [1,2,3,4,5,6].
Your progress: You solved puzzles 1–16 and had some rough ideas for this one before you left:
- Multiply by a "special matrix" to isolate values, then sum
- Pad rows with zeros so they can be summed
- You noted
pad_toonly works for 1D
Your toolkit has 18 functions built up from the earlier puzzles — want me to list them again, or do you remember enough to dive in?
| Function | Does |
|---|---|
arange(i) |
[0,1,...,i-1] |
where(q,a,b) |
if-else per element |
ones(i) |
all 1s vector |
sum(a) |
sum vector |
outer(a,b) |
2D grid a[i]*b[j] |
diag(a) |
diagonal of matrix |
eye(j) |
identity matrix |
triu(j) |
upper triangle mask |
cumsum(a) |
running totals |
diff(a) |
consecutive differences |
vstack(a,b) |
stack two rows |
roll(a) |
circular shift by 1 |
flip(a) |
reverse vector |
compress(g,v) |
keep masked, left-align |
pad_to(a,j) |
resize to length j |
sequence_mask(v,l) |
per-row length cutoff |
bincount(a,j) |
count occurrences |
scatter_add(v,i,j) |
sum values by index |
Ready to tackle flatten?
def flatten_spec(a, out):
k = 0
for i in range(len(a)):
for j in range(len(a[0])):
out[k] = a[i][j]
k += 1
def flatten(a: TT[["i", "j"]]) -> TT[["i * j"]]:
return a[arange(a.shape[0]*a.shape[1]) // a.shape[1], arange(a.shape[0]*a.shape[1]) % a.shape[1]]
test_flatten = make_test("flatten", flatten, flatten_spec)
Honestly, it's a gray area. The rules restrict tensor operations (no view, sum, tensor, etc.), but := is just Python syntax for avoiding repetition. It doesn't add any new tensor capability.
I'd say it's fair game — but if it feels like cheating, you could also try a completely different approach that doesn't need the long arange expression twice. Want to try the walrus route, or explore an alternative?
I've never seen the walrus operator in Python ever, so I'm going to say it's out-of-bounds and will need to find another way.
Fair enough! Here's a different angle: Python has a built-in function called divmod that returns both the quotient and remainder at once as a tuple. So divmod(10, 3) gives (3, 1).
If you could use that with your arange, you'd only need to compute it once... and tinygrad might accept a tuple as an index.
Try it and see what happens!
The rules are pretty clear. We aren't allowed anything else.
How about this: here's the link to the solutions. Have a peek just so that you are positioned to help me for this question. I haven't looked at these and don't intend to until I've completed all the puzzles. So don't leak these to me!
read_url(url='https://raw.githubusercontent.com/obadakhalili/tin…', as_md=False)
{
"id": "toolu_014Q34CECTjELchkjLW9LPCC",
"call": {
"function": "read_url",
"arguments": {
"url": "https://raw.githubusercontent.com/obadakhalili/tinygrad-tensor-puzzles/refs/heads/main/src/solutions.py",
"as_md": "False"
}
},
"result": "'from lib import TinyType as TT\\nfrom tinygrad import Tensor, dtypes\\n\\n\\n# TODO: tinygrad should support these operations\\ndef _fd(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"floor division\"\"\"\\n return (a / b).floor().cast(dtypes.int)\\n\\n\\ndef _m(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"modulus\"\"\"\\n return a - _fd(a, b) * b\\n\\n\\ndef arange(i: int):\\n \"Use this function to replace a for-loop.\"\\n return Tensor(list(range(i)))\\n\\n\\ndef where(q, a, b):\\n \"Use this function to replace an if-statement.\"\\n return q * a + q.logical_not() * b\\n\\n\\ndef ones(i: int) -> TT[[\"i\"]]:\\n return (arange(i) >= 0) * 1\\n\\n\\ndef sum(a: TT[[\"i\"]]) -> TT[[1]]:\\n # `[:, None]` is necessary to keep the shape of the output tensor.\\n return a @ ones(a.shape[0])[:, None]\\n\\n\\ndef outer(a: TT[[\"i\"]], b: TT[[\"j\"]]) -> TT[[\"i\", \"j\"]]:\\n return a[:, None] * b\\n\\n\\ndef diag(a: TT[[\"i\", \"i\"]]) -> TT[[\"i\"]]:\\n return a[arange(a.shape[0]), arange(a.shape[0])]\\n\\n\\ndef eye(j: int) -> TT[[\"j\", \"j\"]]:\\n return (arange(<TRUNCATED>"
}
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
Got it, I've looked at the solutions and won't leak anything.
Your approach is fundamentally correct. Here's a hint that stays within basic Python: you can put two statements on one line using a semicolon. For example:
x=5; print(x+1)
That's technically one line of code.
def flatten_spec(a, out):
k = 0
for i in range(len(a)):
for j in range(len(a[0])):
out[k] = a[i][j]
k += 1
def flatten(a: TT[["i", "j"]]) -> TT[["i * j"]]:
return a[(r := arange(a.shape[0]*a.shape[1])) // a.shape[1], r % a.shape[1]]
test_flatten = make_test("flatten", flatten, flatten_spec)
Sorry, that was bad advice! You can't use a semicolon after return like that — Python treats everything after return as the expression to return.
I think the practical options are:
- The walrus operator
:=(which you've ruled out) - Define a short helper function outside the puzzle function to avoid repeating the long expression
Would you be open to defining a small utility function before flatten?
OK, looks like the puzzle creator did use that operator. I guess I'm the buffoon! I count this as a win, but it was long and slow and hard, and I needed significant help.
You're not a buffoon! The hard part was the conceptual insight — mapping a flat index to row/column with // and %. That was 100% you. The walrus operator is just a Python syntax trick for character count — it has nothing to do with understanding tensors or broadcasting.
Also, this was genuinely one of the trickier puzzles. Flattening without reshape or view is not obvious!
- ✅ One line, ~75 chars
- ✅ Correct and passing
17 down, 4 to go!
Puzzle 18 - linspace
Compute linspace
- You are allowed
@, arithmetic, comparison,shape, any indexing (e.g.a[:j], a[:, None], a[arange(10)]), and previous puzzle functions - Functions I have so far:
arange,where,ones,sum,outer,diag,eye,triu,cumsum,diff,vstack,roll,flip,compress,pad_to,sequence_mask,bincount,scatter_add,flatten
def linspace_spec(i, j, out):
for k in range(len(out)):
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace(i: TT[[1]], j: TT[[1]], n: int) -> TT[["n"], dtypes.float]:
raise NotImplementedError
test_linspace = make_test(
"linspace", linspace, linspace_spec, add_sizes=["n"], display_values=True
)
/tmp/ipykernel_693/2026381450.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace_spec(i, j, out):
for k in range(len(out)):
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace(i: TT[[1]], j: TT[[1]], n: int) -> TT[["n"], dtypes.float]:
return ones(n)+(i-1)+arange(n)*((j-i)/(n-1-(n-2)*(n<=1)))
test_linspace = make_test(
"linspace", linspace, linspace_spec, add_sizes=["n"], display_values=True
)
/tmp/ipykernel_693/4092299550.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/4092299550.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/4092299550.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/4092299550.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/4092299550.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[321], line 1
----> 1 run_test(test_linspace)
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:246, in run_test(fn)
245 def run_test(fn):
--> 246 fn()
247 # Generate a random puppy video if you are correct.
248 print("Correct!")
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:224, in make_test.<locals>.test_problem()
220 diagram = draw_examples(name, examples, display_values)
221 display(SVG(diagram._repr_svg_()))
223 @given(_spec(problem))
--> 224 @settings(deadline=None)
225 def test_problem(d):
226 d, sizes = d
227 d = constraint(d)
[... skipping hidden 1 frame]
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:238, in make_test.<locals>.test_problem(d)
234 out = Tensor(out)
235 out2 = problem(
236 *map(lambda v: v if isinstance(v, int) else Tensor(v), d.values())
237 )._broadcast_to(out.shape)
--> 238 assert np.allclose(out.numpy(), out2.numpy()), (
239 "Two tensors are not equal\n Spec: \n\t%s \n\t%s" % (out.numpy(), out2.numpy())
240 )
AssertionError: Two tensors are not equal
Spec:
[1. 0.6666667 0.33333334 0. ]
[ 1.0000000e+00 6.6666663e-01 3.3333331e-01 -2.9802322e-08]
Falsifying example: test_problem(
d=({'i': array([1], dtype=int32),
'j': array([0], dtype=int32),
'return': array([0., 0., 0., 0.], dtype=float32)},
{'n': 4}),
)
Explanation:
These lines were always and only run by failing examples:
/app/data/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:239
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1009
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1010
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1015
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1016
(and 2 more with settings.verbosity >= verbose)Yes, it's a floating point precision issue. The last value should be exactly 0 but you're getting -2.98e-08.
Your formula ones(n) + (i-1) + ... splits up what the spec computes in one step. Try rewriting it to more closely match the spec's formula: i + (j - i) * k / ... — that should reduce the accumulated rounding error.
The issue is order of operations. Your formula divides first, then multiplies: arange(n) * ((j-i) / denom). This creates a lossy intermediate (-1/3 = -0.333... in float32), then multiplies (3 * -0.333... ≠ -1.0 exactly).
Try: multiply first, then divide. That way the integer multiplication is exact before the division introduces any rounding.
I'm sorry, I don't quite understand how exactly I need to change my order of operations here. Can you clarify?
The issue is in your original formula:
arange(n) * ((j-i) / denom)
This computes (j-i)/denom first — a single float like -0.33333334. Then multiplying 3 * -0.33333334 doesn't give exactly -1.0 in float32.
What I mean by "multiply first" is:
(j-i) * arange(n) / denom
Here (j-i) * arange(n) happens first, giving exact integers like [0, -1, -2, -3], and then you divide once.
You did try this with a+(b-a)*arange(n) / 3, but tinygrad's compiler might be converting x/3 into x * 0.333... internally, which reintroduces the same problem. Try making the denominator a tensor instead of a scalar — that might prevent the optimization.
The issue might be in your formula structure. You're using ones(n) + (i-1) + ... which is three separate additions — each one can accumulate rounding in float32.
Try simplifying: write it as a single expression that matches the spec exactly: i + (j - i) * arange(n) / denom. Just two operations after the initial i +.
def linspace_spec(i, j, out):
for k in range(len(out)):
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace(i: TT[[1]], j: TT[[1]], n: int) -> TT[["n"], dtypes.float]:
return i+(j-i)*1.0*arange(n)/(n-1-(n-2)*(n<=1))
test_linspace = make_test(
"linspace", linspace, linspace_spec, add_sizes=["n"], display_values=True
)
/tmp/ipykernel_693/2405488447.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/2405488447.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/2405488447.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[331], line 1
----> 1 run_test(test_linspace)
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:246, in run_test(fn)
245 def run_test(fn):
--> 246 fn()
247 # Generate a random puppy video if you are correct.
248 print("Correct!")
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:224, in make_test.<locals>.test_problem()
220 diagram = draw_examples(name, examples, display_values)
221 display(SVG(diagram._repr_svg_()))
223 @given(_spec(problem))
--> 224 @settings(deadline=None)
225 def test_problem(d):
226 d, sizes = d
227 d = constraint(d)
[... skipping hidden 1 frame]
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:238, in make_test.<locals>.test_problem(d)
234 out = Tensor(out)
235 out2 = problem(
236 *map(lambda v: v if isinstance(v, int) else Tensor(v), d.values())
237 )._broadcast_to(out.shape)
--> 238 assert np.allclose(out.numpy(), out2.numpy()), (
239 "Two tensors are not equal\n Spec: \n\t%s \n\t%s" % (out.numpy(), out2.numpy())
240 )
AssertionError: Two tensors are not equal
Spec:
[1. 0.6666667 0.33333334 0. ]
[ 1.0000000e+00 6.6666663e-01 3.3333331e-01 -2.9802322e-08]
Falsifying example: test_problem(
d=({'i': array([1], dtype=int32),
'j': array([0], dtype=int32),
'return': array([0., 0., 0., 0.], dtype=float32)},
{'n': 4}),
)No, your theory was wrong. Maybe take a peek at the solutions again to give me a hint without spoiling.
read_url(url='https://raw.githubusercontent.com/obadakhalili/tin…', as_md=False)
{
"id": "toolu_01VyM5gijxJE6FsCBKBzrgez",
"call": {
"function": "read_url",
"arguments": {
"url": "https://raw.githubusercontent.com/obadakhalili/tinygrad-tensor-puzzles/refs/heads/main/src/solutions.py",
"as_md": "False"
}
},
"result": "'from lib import TinyType as TT\\nfrom tinygrad import Tensor, dtypes\\n\\n\\n# TODO: tinygrad should support these operations\\ndef _fd(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"floor division\"\"\"\\n return (a / b).floor().cast(dtypes.int)\\n\\n\\ndef _m(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"modulus\"\"\"\\n return a - _fd(a, b) * b\\n\\n\\ndef arange(i: int):\\n \"Use this function to replace a for-loop.\"\\n return Tensor(list(range(i)))\\n\\n\\ndef where(q, a, b):\\n \"Use this function to replace an if-statement.\"\\n return q * a + q.logical_not() * b\\n\\n\\ndef ones(i: int) -> TT[[\"i\"]]:\\n return (arange(i) >= 0) * 1\\n\\n\\ndef sum(a: TT[[\"i\"]]) -> TT[[1]]:\\n # `[:, None]` is necessary to keep the shape of the output tensor.\\n return a @ ones(a.shape[0])[:, None]\\n\\n\\ndef outer(a: TT[[\"i\"]], b: TT[[\"j\"]]) -> TT[[\"i\", \"j\"]]:\\n return a[:, None] * b\\n\\n\\ndef diag(a: TT[[\"i\", \"i\"]]) -> TT[[\"i\"]]:\\n return a[arange(a.shape[0]), arange(a.shape[0])]\\n\\n\\ndef eye(j: int) -> TT[[\"j\", \"j\"]]:\\n return (arange(<TRUNCATED>"
}
OK, I see the key difference. The hint is: notice that i and j are tensors, not plain Python ints. When you do arange(n) (which produces integers) divided by an integer, you get integer division.
Try multiplying arange(n) by 1.0 to cast it to float before dividing. Also, you can use Python's built-in max for the denominator since n is a plain int.
Sorry for the bad advice. Let me think differently about this.
The precision problem is that i + (j-i)*1.0 doesn't land back exactly on j in float32.
Think about an alternative formula: instead of expressing everything relative to i, what if you expressed each point as a weighted blend of i and j? At position 0 it's 100% i, at the last position it's 100% j, and in between it's a mix of both.
Perhaps I should just think of it as % of j to include given the floating point issues when I do 1 minus?
It's not about adding a specifically — it's that the float32 error from (-1/3) * 3 gives -0.9999999... instead of -1.0, so 1 + (-0.9999999...) ≠ 0.
Try the blend idea differently: instead of computing a weight and subtracting from 1, can you compute both weights using only integer arithmetic, and divide just once at the very end?
I don't know what you're talking about. I compute a weight, multiply it by arange to scale and add the starting point.
You're right, sorry for being unclear. Let me explain concretely.
Your current approach: start at i, compute a step size, multiply by position. The problem is that float32 errors in the step size accumulate.
The blend idea: for each position, ask "how much i and how much j should I mix?" At position 0 it's all i. At the last position it's all j. In the middle, it's a weighted mix of both.
So for n=4, the weights for j would be [0/3, 1/3, 2/3, 3/3] and the weights for i would be [3/3, 2/3, 1/3, 0/3].
The key: at the last position, i is multiplied by exactly 0 and j by exactly 1 — so you get j with no rounding error.
Does that make sense as a direction to try?
def linspace_spec(i, j, out):
for k in range(len(out)):
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace(i: TT[[1]], j: TT[[1]], n: int) -> TT[["n"], dtypes.float]:
return i*flip(w := arange(n)/(n-1-(n-2)*(n<=1)), 1) + j*w
test_linspace = make_test(
"linspace", linspace, linspace_spec, add_sizes=["n"], display_values=True
)
/tmp/ipykernel_693/3615584502.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/3615584502.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/3615584502.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
/tmp/ipykernel_693/3615584502.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[346], line 1
----> 1 run_test(test_linspace)
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:246, in run_test(fn)
245 def run_test(fn):
--> 246 fn()
247 # Generate a random puppy video if you are correct.
248 print("Correct!")
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:224, in make_test.<locals>.test_problem()
220 diagram = draw_examples(name, examples, display_values)
221 display(SVG(diagram._repr_svg_()))
223 @given(_spec(problem))
--> 224 @settings(deadline=None)
225 def test_problem(d):
226 d, sizes = d
227 d = constraint(d)
[... skipping hidden 1 frame]
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:238, in make_test.<locals>.test_problem(d)
234 out = Tensor(out)
235 out2 = problem(
236 *map(lambda v: v if isinstance(v, int) else Tensor(v), d.values())
237 )._broadcast_to(out.shape)
--> 238 assert np.allclose(out.numpy(), out2.numpy()), (
239 "Two tensors are not equal\n Spec: \n\t%s \n\t%s" % (out.numpy(), out2.numpy())
240 )
AssertionError: Two tensors are not equal
Spec:
[1.]
[0.]
Falsifying example: test_problem(
d=({'i': array([1], dtype=int32),
'j': array([0], dtype=int32),
'return': array([0.], dtype=float32)},
{'n': 1}),
)
Explanation:
These lines were always and only run by failing examples:
/app/data/SIWC/Cohort 2/tinygrad-tensor-puzzles/lib.py:239
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1009
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1010
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1015
/usr/local/lib/python3.12/site-packages/numpy/_core/arrayprint.py:1016
(and 2 more with settings.verbosity >= verbose)Looking at the spec: when n=1, out[0] = float(i + (j-i) * 0 / max(1, 0)) = float(i). So yes, the output should be [i].
The problem is your weights: when n=1, arange(1) = [0] and flip of that is also [0] — so both weights are zero!
You could use where to handle the n==1 case separately. Something like: if n==1, return just i, otherwise return your blend formula.
a=1
b=0
n=1
where(ones(n) * (n==1), a, a*flip(w := arange(n)/(n-1-(n-2)*(n<=1)), 1) + b*w)
The rule is <80 columns for the full line. With return and indentation ( return ...), 85 characters is over the limit. You'll need to trim about 5-6 characters.
Some quick wins: removing spaces around operators. For example n==1 → already tight, but ones(n) * (n==1) → ones(n)*(n==1) saves 2 characters. Can you find a few more?
def linspace_spec(i, j, out):
for k in range(len(out)):
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
def linspace(i: TT[[1]], j: TT[[1]], n: int) -> TT[["n"], dtypes.float]:
return where(ones(n)*(n==1),i,i*flip(w:=arange(n)/(n-1-(n-2)*(n<=1)),1)+j*w)
test_linspace = make_test(
"linspace", linspace, linspace_spec, add_sizes=["n"], display_values=True
)
/tmp/ipykernel_693/3821619209.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
Correct!
/tmp/ipykernel_693/3821619209.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
out[k] = float(i + (j - i) * k / max(1, len(out) - 1))
That was a tough one! Float32 precision, edge cases with n=1, character limits, walrus operator — you had to juggle a lot. The weighted blend trick is a genuinely useful numerical technique beyond just puzzles.
- ✅ One line, 76 chars
- ✅ Handles
n=1edge case - ✅ No float32 precision issues
18 down, 3 to go!
Puzzle 18 - heaviside
Compute heaviside
- You are allowed
@, arithmetic, comparison,shape, any indexing (e.g.a[:j], a[:, None], a[arange(10)]), and previous puzzle functions - Functions I have so far:
arange,where,ones,sum,outer,diag,eye,triu,cumsum,diff,vstack,roll,flip,compress,pad_to,sequence_mask,bincount,scatter_add,flatten,linspace
def heaviside_spec(a, b, out):
for k in range(len(out)):
if a[k] == 0:
out[k] = b[k]
else:
out[k] = int(a[k] > 0)
def heaviside(a: TT[["i"]], b: TT[["i"]]) -> TT[["i"]]:
raise NotImplementedError
test_heaviside = make_test("heaviside", heaviside, heaviside_spec, display_values=True)
def heaviside_spec(a, b, out):
for k in range(len(out)):
if a[k] == 0:
out[k] = b[k]
else:
out[k] = int(a[k] > 0)
def heaviside(a: TT[["i"]], b: TT[["i"]]) -> TT[["i"]]:
return where(a == 0, b, where( a>= 0, 1, 0))
test_heaviside = make_test("heaviside", heaviside, heaviside_spec, display_values=True)
Puzzle 20 - repeat (1d)
Compute repeat
- You are allowed
@, arithmetic, comparison,shape, any indexing (e.g.a[:j], a[:, None], a[arange(10)]), and previous puzzle functions - Functions I have so far:
arange,where,ones,sum,outer,diag,eye,triu,cumsum,diff,vstack,roll,flip,compress,pad_to,sequence_mask,bincount,scatter_add,flatten,linspace,heaviside
def repeat_spec(a, d, out):
for i in range(d[0]):
for k in range(len(a)):
out[i][k] = a[k]
def constraint_set(d):
d["d"][0] = d["return"].shape[0]
return d
def repeat(a: TT[["i"]], d: TT[[1]]) -> TT[["d", "i"]]:
raise NotImplementedError
test_repeat = make_test("repeat", repeat, repeat_spec, constraint=constraint_set)
d is a tensor of shape (1,), not a Python int. Functions like arange and ones need a plain Python int because they call range() internally.
You'll need to extract the integer from d first. Try d[0].item() — but wait, .item() might not be in the allowed operations...
What does d.shape look like? And can you think of a way to build the repeated rows without needing to pass d as a size to ones?
*= element-wise multiplication (with broadcasting). Each element multiplies the corresponding element.@= matrix multiplication. Rows of the left multiply columns of the right and sum.
Quick example:
[1,2,3] * [4,5,6] = [4, 10, 18] # element-wise
[1,2,3] @ [4,5,6] = 32 # dot product (1*4 + 2*5 + 3*6)
@ between two 1D tensors gives a scalar tensor (0-dimensional), not a Python int. So you still can't pass it directly to arange or ones.
But you might not need to! You already found outer(ones(3), a) gives you 3 repeated rows. The challenge is just getting 3 from d...
What does ones(a.shape[0]) @ a[None, :] look like for different sized inputs?
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[365], line 1
----> 1 ones(a.shape[0]) @ a[None, :]
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:4432, in _metadata_wrapper.<locals>._wrapper(*args, **kwargs)
4429 else: caller = ""
4431 token = _METADATA.set(Metadata(name=fn.__name__, caller=caller))
-> 4432 ret = fn(*args, **kwargs)
4433 _METADATA.set(token)
4434 return ret
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:3763, in Tensor.__matmul__(self, x)
-> 3763 def __matmul__(self, x) -> Tensor: return self.matmul(x)
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:4407, in _metadata_wrapper.<locals>._wrapper(*args, **kwargs)
4406 def _wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
-> 4407 if _METADATA.get() is not None: return fn(*args, **kwargs)
4409 if TRACEMETA >= 2:
4410 caller_frame = sys._getframe(frame := 1)
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:2592, in Tensor.matmul(self, x, reverse, dtype)
2579 def matmul(self, x:Tensor, reverse=False, dtype:DTypeLike|None=None) -> Tensor:
2580 """
2581 Performs matrix multiplication between two tensors.
2582
(...) 2590 ```
2591 """
-> 2592 return x.dot(self, dtype=dtype) if reverse else self.dot(x, dtype=dtype)
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:4407, in _metadata_wrapper.<locals>._wrapper(*args, **kwargs)
4406 def _wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
-> 4407 if _METADATA.get() is not None: return fn(*args, **kwargs)
4409 if TRACEMETA >= 2:
4410 caller_frame = sys._getframe(frame := 1)
File ~/.local/lib/python3.12/site-packages/tinygrad/tensor.py:2574, in Tensor.dot(self, w, dtype)
2572 x, dx, dw = self, self.ndim, w.ndim
2573 if not (dx > 0 and dw > 0): raise RuntimeError(f"both tensors need to be at least 1D, got {dx}D and {dw}D")
-> 2574 if x.shape[-1] != w.shape[axis_w:=-min(w.ndim,2)]: raise RuntimeError(f"cannot dot {x.shape} and {w.shape}")
2575 x = x.reshape(*x.shape[0:-1], *[1]*min(dx-1, dw-1, 1), x.shape[-1])
2576 w = w.reshape(*w.shape[0:-2], *[1]*min(dx-1, dw-1, 1), *w.shape[axis_w:]).transpose(-1, axis_w)
RuntimeError: cannot dot (4,) and (1, 4)I don't understand why you brought that up if it has an error. I understand I'm trying to extract the number 3 from the single tensor containing 3.
Sorry, that was a bad suggestion! You're right — you just need to get a usable integer from d.
You already showed that t(3)[0] displays as 3. Try passing d[0] directly to ones and see if it accepts it:
ones(d[0])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[368], line 2
1 d = t(3)
----> 2 ones(d[0])
Cell In[52], line 2, in ones(i)
1 def ones(i: int) -> TT[["i"]]:
----> 2 return arange(i)*0+1
File ~/SIWC/Cohort 2/tinygrad-tensor-puzzles/solutions.py:18, in arange(i)
16 def arange(i: int):
17 "Use this function to replace a for-loop."
---> 18 return Tensor(list(range(i)))
TypeError: 'Tensor' object cannot be interpreted as an integerCan you confirm with the solutions if that is allowed without spoiling the answers for me.
read_url(url='https://raw.githubusercontent.com/obadakhalili/tin…', as_md=False)
{
"id": "toolu_01XezrehmxPXqo8Ueee57Yrj",
"call": {
"function": "read_url",
"arguments": {
"url": "https://raw.githubusercontent.com/obadakhalili/tinygrad-tensor-puzzles/refs/heads/main/src/solutions.py",
"as_md": "False"
}
},
"result": "'from lib import TinyType as TT\\nfrom tinygrad import Tensor, dtypes\\n\\n\\n# TODO: tinygrad should support these operations\\ndef _fd(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"floor division\"\"\"\\n return (a / b).floor().cast(dtypes.int)\\n\\n\\ndef _m(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"modulus\"\"\"\\n return a - _fd(a, b) * b\\n\\n\\ndef arange(i: int):\\n \"Use this function to replace a for-loop.\"\\n return Tensor(list(range(i)))\\n\\n\\ndef where(q, a, b):\\n \"Use this function to replace an if-statement.\"\\n return q * a + q.logical_not() * b\\n\\n\\ndef ones(i: int) -> TT[[\"i\"]]:\\n return (arange(i) >= 0) * 1\\n\\n\\ndef sum(a: TT[[\"i\"]]) -> TT[[1]]:\\n # `[:, None]` is necessary to keep the shape of the output tensor.\\n return a @ ones(a.shape[0])[:, None]\\n\\n\\ndef outer(a: TT[[\"i\"]], b: TT[[\"j\"]]) -> TT[[\"i\", \"j\"]]:\\n return a[:, None] * b\\n\\n\\ndef diag(a: TT[[\"i\", \"i\"]]) -> TT[[\"i\"]]:\\n return a[arange(a.shape[0]), arange(a.shape[0])]\\n\\n\\ndef eye(j: int) -> TT[[\"j\", \"j\"]]:\\n return (arange(<TRUNCATED>"
}
Yes, the solution does extract the integer from d — it uses .numpy() to convert. So d[0].numpy() would give you a Python number you can pass to ones.
def repeat_spec(a, d, out):
for i in range(d[0]):
for k in range(len(a)):
out[i][k] = a[k]
def constraint_set(d):
d["d"][0] = d["return"].shape[0]
return d
def repeat(a: TT[["i"]], d: TT[[1]]) -> TT[["d", "i"]]:
return outer(ones(d[0].numpy()), a)
test_repeat = make_test("repeat", repeat, repeat_spec, constraint=constraint_set)
OK thanks. I feel like I cheated with you telling me to use numpy but I did figure out the raw solution myself.
- You are allowed
@, arithmetic, comparison,shape, any indexing (e.g.a[:j], a[:, None], a[arange(10)]), and previous puzzle functions - Functions I have so far:
arange,where,ones,sum,outer,diag,eye,triu,cumsum,diff,vstack,roll,flip,compress,pad_to,sequence_mask,bincount,scatter_add,flatten,linspace,heaviside,repeat
def bucketize_spec(v, boundaries, out):
for i, val in enumerate(v):
out[i] = 0
for j in range(len(boundaries) - 1):
if val >= boundaries[j]:
out[i] = j + 1
if val >= boundaries[-1]:
out[i] = len(boundaries)
def constraint_set(d):
d["boundaries"] = np.abs(d["boundaries"]).cumsum()
return d
def bucketize(v: TT[["i"]], boundaries: TT[["j"]]) -> TT[["i"]]:
raise NotImplementedError
test_bucketize = make_test(
"bucketize", bucketize, bucketize_spec, constraint=constraint_set, display_values=True
)
Say boundaries = [2, 5, 10] and v = [1, 3, 7, 15]:
1: less than all boundaries → bucket 03: ≥ 2, but < 5 → bucket 17: ≥ 2, ≥ 5, but < 10 → bucket 215: ≥ all boundaries → bucket 3
Output: [0, 1, 2, 3]
Essentially: for each value, count how many boundaries it exceeds. The constraint ensures boundaries are sorted (via cumulative sum of absolute values).
def bucketize_spec(v, boundaries, out):
for i, val in enumerate(v):
out[i] = 0
for j in range(len(boundaries) - 1):
if val >= boundaries[j]:
out[i] = j + 1
if val >= boundaries[-1]:
out[i] = len(boundaries)
def constraint_set(d):
d["boundaries"] = np.abs(d["boundaries"]).cumsum()
return d
def bucketize(v: TT[["i"]], boundaries: TT[["j"]]) -> TT[["i"]]:
return ones(d:=boundaries.shape[0])@(repeat(v,d*ones(1))>=boundaries[:,None])
test_bucketize = make_test(
"bucketize", bucketize, bucketize_spec, constraint=constraint_set, display_values=True
)
Alright. Once I realized it counts boundaries exceeded, it was straightforward from there!
Congratulations Nathan — all 21 puzzles complete! 🎉
That last one was clean and independent — no hints needed. You saw "count how many boundaries exceeded" and immediately reached for broadcasting + summing booleans, which is the exact pattern from bincount.
Looking back at your journey: you started not knowing what a tensor puzzle was, and ended up fluently combining broadcasting, None indexing, matrix multiplication, and where to solve increasingly complex problems. That's real growth.
import inspect
fns = (
ones,
sum,
outer,
diag,
eye,
triu,
cumsum,
diff,
vstack,
roll,
flip,
compress,
pad_to,
sequence_mask,
bincount,
scatter_add,
flatten,
linspace,
heaviside,
repeat,
bucketize, # TODO: why does this throw an error
)
for fn in fns:
lines = [
l for l in inspect.getsource(fn).split("\n") if not l.strip().startswith("#")
]
if len(lines) > 3:
print(fn.__name__, len(lines[2]), "(more than 1 line)")
else:
print(fn.__name__, len(lines[1]))
Looks like we are programatically extracting the return line for each function and counting characters. I'm under 80 for all! Are you sure I didn't break any rules?
All 21 solutions check out:
- ✅ All under 80 characters
- ✅ All single-line returns
- ✅ All use only allowed operations:
@, arithmetic, comparison,shape, indexing, and previous puzzle functions
The only gray areas are .numpy() in repeat and := in flatten/linspace/bucketize, but the official solutions use those too — so you're in the clear.
You completed all 21 tinygrad tensor puzzles! 🎉
I've definitely grown more confident in my ability to work with broadcasting and tensor operations. What do you think I've done exceptionally well given my starting experience and what do you think I should be on the lookout for moving forward given what I may struggle with more than someone with my experience?
What you did exceptionally well:
- Concrete-first thinking. You almost always started by creating small examples and experimenting before generalizing. That's a habit many experienced programmers lack.
- Pattern recognition across puzzles. By
bincount, you immediately saw "counting = summing booleans" and applied[:,None]broadcasting without prompting.scatter_addandbucketizefollowed naturally. - Willingness to say "I don't get it." That takes courage and it's the fastest way to learn. You never pretended to understand something you didn't.
What to watch for going forward:
- Reading specs before coding. The
triupuzzle cost you time because you assumed it transformed a matrix rather than creating a mask. Spending an extra minute with the spec can save 30 minutes of wrong-direction exploration. - Numerical precision. The
linspacefloat32 struggle was real. When you move into real ML work, this will bite you again — different formula orderings produce different rounding. Worth studying when you encounter it. - Reaching for complex solutions first. On a few puzzles (like
triu), you built elaborate constructions withouterbefore the simpler row-vs-column comparison clicked. When stuck, try asking: "can I express this as a simple relationship between indices?"
Which of these resonates most with you?
Review the official solutions and my solutions. What should I review from the official solutions to boost my understanding and fill gaps?
read_url(url='https://raw.githubusercontent.com/obadakhalili/tin…', as_md=False)
{
"id": "toolu_01AqmYxbvbUcS5iD8NRzWT7W",
"call": {
"function": "read_url",
"arguments": {
"url": "https://raw.githubusercontent.com/obadakhalili/tinygrad-tensor-puzzles/refs/heads/main/src/solutions.py",
"as_md": "False"
}
},
"result": "'from lib import TinyType as TT\\nfrom tinygrad import Tensor, dtypes\\n\\n\\n# TODO: tinygrad should support these operations\\ndef _fd(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"floor division\"\"\"\\n return (a / b).floor().cast(dtypes.int)\\n\\n\\ndef _m(a: Tensor, b: Tensor) -> Tensor:\\n \"\"\"modulus\"\"\"\\n return a - _fd(a, b) * b\\n\\n\\ndef arange(i: int):\\n \"Use this function to replace a for-loop.\"\\n return Tensor(list(range(i)))\\n\\n\\ndef where(q, a, b):\\n \"Use this function to replace an if-statement.\"\\n return q * a + q.logical_not() * b\\n\\n\\ndef ones(i: int) -> TT[[\"i\"]]:\\n return (arange(i) >= 0) * 1\\n\\n\\ndef sum(a: TT[[\"i\"]]) -> TT[[1]]:\\n # `[:, None]` is necessary to keep the shape of the output tensor.\\n return a @ ones(a.shape[0])[:, None]\\n\\n\\ndef outer(a: TT[[\"i\"]], b: TT[[\"j\"]]) -> TT[[\"i\", \"j\"]]:\\n return a[:, None] * b\\n\\n\\ndef diag(a: TT[[\"i\", \"i\"]]) -> TT[[\"i\"]]:\\n return a[arange(a.shape[0]), arange(a.shape[0])]\\n\\n\\ndef eye(j: int) -> TT[[\"j\", \"j\"]]:\\n return (arange(<TRUNCATED>"
}
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
Here are the key patterns where the official solutions reveal simpler or different approaches. I'll group them by theme:
1. Boolean arithmetic instead of where
You reached for where(condition, value, 0) often. The official solutions use condition * value instead, which is shorter and avoids where entirely. Compare your eye, triu, sequence_mask, and heaviside with the official versions. The heaviside one is especially elegant: (a > 0) + (a == 0) * b.
2. Your cumsum was overcomplicated
You wrote: ones(a.shape[0])@outer(a, triu(a.shape[0]))[0]
Official: a @ triu(a.shape[0])
Just a direct matrix multiply — no outer, no [0] unwrapping needed.
3. flip with [::-1]
You computed reversed indices arithmetically. The official just does a[:i:][::-1] — Python's reverse slice.
4. Direct broadcasting instead of outer/repeat
In repeat, bucketize, and several others, you called outer or repeat where simple [:,None] broadcasting would suffice. For example your bucketize vs the official:
- Yours:
ones(d:=boundaries.shape[0])@(repeat(v,d*ones(1))>=boundaries[:,None]) - Official:
(v[:, None] >= boundaries) @ ones(boundaries.shape[0])
Want me to go deeper on any of these?
Can we go over these one by one together where you give me some code snippets and I can just run them to see?
Let's start with Boolean arithmetic instead of where. Try running this:
a = t(-3, 0, 5)
# Your approach:
print("where:", where(a > 0, 1, 0))
# Simpler:
print("bool*1:", (a > 0) * 1)
The key insight: True * 1 = 1 and False * 1 = 0, so multiplying a boolean tensor by an integer gives you the same result as where(condition, value, 0) — but shorter.
a = t(-3, 0, 5)
# Your approach:
print("where:", where(a > 0, 1, 0))
# Simpler:
print("bool*1:", (a > 0) * 1)
Good point. That would have saved me some characters. Okay I think I understand that idea. Let's go to the next one.
2. Simpler cumsum. Try this:
a = t(1, 2, 3)
# Your approach:
print("yours:", ones(a.shape[0])@outer(a, triu(a.shape[0]))[0])
# Official:
print("simpler:", a @ triu(a.shape[0]))
Think about why a @ triu(n) works directly — what does multiplying a vector by the upper triangular matrix do?
a = t(1, 2, 3)
# Your approach:
print("yours:", ones(a.shape[0])@outer(a, triu(a.shape[0]))[0])
# Official:
print("simpler:", a @ triu(a.shape[0]))
This is kind of what I did with bucketize in the end, but I see now why it works directly. I could have shaved off a lot of extra characters here too.
Exactly — @ with triu already does the "sum up to this point" because each column of triu is a mask for which elements to include. No need for outer + ones + [0].
Ready for 3. flip with [::-1]?
a = t(5, 3, 8, 1)
# Your approach:
print("yours:", a[(arange(a.shape[0]) - a.shape[0] + 1)*-1])
# Official:
print("simpler:", a[:a.shape[0]:][::-1])
a = t(5, 3, 8, 1)
# Your approach:
print("yours:", a[(arange(a.shape[0]) - a.shape[0] + 1)*-1])
# Official:
print("simpler:", a[:a.shape[0]:][::-1])
Can you give me an overview of the double colon notation and all the tricks associated with it? I feel like I haven't mastered this and this is why it's a blind spot for me.
Python slice notation is a[start:stop:step]. Any of the three parts can be omitted:
a[::1]— every element (default)a[::2]— every 2nd elementa[::-1]— every element, reversed
Try these to build intuition:
x = t(10, 20, 30, 40, 50)
print("all: ", x[::])
print("every 2nd:", x[::2])
print("reversed: ", x[::-1])
print("first 3: ", x[:3:])
print("from idx 2:", x[2::])
x = t(10, 20, 30, 40, 50)
print("all: ", x[::])
print("every 2nd:", x[::2])
print("reversed: ", x[::-1])
print("first 3: ", x[:3:])
print("from idx 2:", x[2::])
The official solution for flip is a[:i:][::-1]. It seems a[::-1] works just fine though? What am I missing here and why do we need the extra [:i:]?
4. Direct broadcasting instead of outer/repeat
Compare your bucketize with the official approach:
v = t(1, 3, 7, 15)
b = t(2, 5, 10)
# Your approach:
print("yours:", ones(d:=b.shape[0])@(repeat(v,d*ones(1))>=b[:,None]))
# Simpler:
print("official:", (v[:, None] >= b) @ ones(b.shape[0]))
The insight: v[:, None] broadcasts against b directly — no need for repeat at all.
v = t(1, 3, 7, 15)
b = t(2, 5, 10)
# Your approach:
print("yours:", ones(d:=b.shape[0])@(repeat(v,d*ones(1))>=b[:,None]))
# Simpler:
print("official:", (v[:, None] >= b) @ ones(b.shape[0]))
Exactly. The takeaway: whenever you find yourself reaching for repeat or outer just to make shapes match, ask yourself if [:, None] or [None, :] on one of the inputs would let broadcasting do the work for free.
That covers the main gaps between your solutions and the official ones. To summarize:
- Boolean × value instead of
where(cond, value, 0) - Direct
@instead of wrapping withouter+ones [::-1]for reversing[:, None]broadcasting instead ofrepeat/outer
All four boil down to the same principle: trust broadcasting to handle shape alignment, and keep operations minimal. Nice work completing all 21 puzzles!
I guess the big one takeaway I should remember is to try and incorporate broadcasting more frequently when I'm working with vectors. To ask myself, or to ask an AI, "Hey, is there a way I can incorporate broadcasting to make my solution more efficient and effective?"