Hello readers, and welcome to my new blog.
Inheritance (of OO) is a tricky business. When creating a sub-class, nothing clear is said about the purpose of this new sub-class, or the relation between it and its parent. One is forced to guess, based on how the sub-class is written and used. This has been well said before, so I will not repeat it. But if it is not clear to humans, why should it be clear to a compiler?
Allow me to be concrete, by telling you my (sad) story. This all started as I was writing a tree class in C#. To simplify matters, suppose that this is how it looked:
class TreeNode { public TreeData data {get; set;}; public IEnumerable<TreeNode> subnodes(); public void AddSubNode(TreeData data); /* ... A lot of useful methods for trees, such as bfs/dfs visitors ... */ }
This was very useful, but then I needed some new functionality. This functionality did not fit inside Tree (for reasons I will spare you, as you could probably make some up yourself), so I decided to create a new class, called BetterTreeNode.
So, suppose I wrote this code:
class BetterTreeNode : TreeNode { /* ... Additional functionality ... */ }
I was happy, when horror struck: The class was flawed. I will spare you the detective work (pause now if you want to guess it by yourself): Whenever a method travelled the tree, the subnodes() method would return a TreeNode rather than a BetterTreeNode. Same with other methods. This required an explicit cast after every call to an inherited method. Also, methods such as AddSubNode were written to create a new TreeNode instance. In short, a mess. To solve it, I had the following options:
1. Override (but with the new keyword) every inherited method to return proper values. Rewrite AddSubNode (in the hope that it is possible).
2. Not inherit. Use TreeNode as a shadow tree. This means shadowing all the methods I want to use.
3. Rewrite everything from scratch. While the first two are annoying, this is a complete failure at code-reuse.
But, none of these options seem good; none of them answer my need for an easy and concise code-reuse.
I felt the true solution would be to have the methods automatically re-written to return BetterTreeNode. But here enters the ambiguity and obscureness of inheritance: Perhaps for some methods I would not want it. The compiler cannot guess this.
This problem was demonstrated using C#, but it exists in every statically-typed object-oriented language I know (granted, not that many).
Therefor, I propose for a new functionality for these languages - a "thisclass" keyword.
So this is how I would write TreeNode:
class TreeNode { public TreeData data {get; set;}; public IEnumerabl<thisclass> subnodes(); public void AddSubnode(TreeData data) { ... new thisclass(...) ... } ... }
I also have a suggestion: Take another look at code-reuse. Inheritance is a great way to re-use and extend code in classes, but perhaps it is not always the best way. Perhaps other ways should be sought after.
9 Comments
hi erez
gz on new blog!
why do you need to work with BetterTreeNode rather than TreeNode, if all functionality inside TreeNode should work for inherited classes anyway?
thanks!
Because BetterTreeNode has additional functionality I want to use. You can imagine it to be a new algorithm, or a new data attribute. And it couldn't fit inside TreeNode for many possible reasons, maybe because of decoupling, or because of its other subclasses who aren't supposed to have it, or maybe because TreeNode is a library class, unchangeable.
This code is just a metaphore for a more complicated one, where this problem still exists.
Hey, I think I found the solution to your problem, have a look at http://en.wikipedia.org/wiki/C%2B%2B0x under the header of "Type Determination", seems like just what you need!
You just have to wait till 2009 till the standard is published and then till 2011 till someone implements it, and then till 2015 till it will be implemented *well*. 🙂
Have you thought of using a named constructor/factory function? there could be a method named "CreateNew" in both TreeNode and BetterTreeNode, that would function the same as "new thisclass", and the correct function will be called by polymorphism and virtual functions and the lot.
I'm not sure if C# allows this - how about typeof(this)?
Noam:
Yes, a CreateNew method would be the same "new thisclass", but would require a cast on the return value.
In C#. typeof doesn't accept instances, but only classes.
You can do this.GetType(), but you can't use expressions for "new", "typeof" and of course not for method declarations.
Forget C#, do you know of any static language which allows something like that?
Brodie:
Good input! C++0x's "decltype" seems like it's a solution. But its details are obscure and I'm not sure it would be allowed to be used for the "new" keyword or for declarations.
Ah. So what you want is covariant return types, that is, the ability to override a [public TreeNode CreateNew()] with [public BetterTreeNode CreateNew()]. C++ does this.
In C#... I don't know. Generics hackery, maybe? It is possible to implement a CreateNew hierarchy by having them both take themselves as template parameters, pardon my C++ terminology. That way CreateNew is declared to return T, TreeNode inherits TreeNodeTemplate and BetterTreeNode inherits TreeNodeTemplate. But I'll be the first to argue against this method, it's ugly and it probably creates more problem than it solves, the way most ugly workarounds are.
No, it is not what I want. Covariant return types (which Java also has afaik) will solve the problem with the CreateNew method, but will not solve the bigger one.
The workaround you suggest is amusing (though I admit to have fallen for it too); trying to create TreeNode with itself as T, results with an infinite recursion: TreeNode<TreeNode> . T is not optional. But even if it was solvable, as you said, it's ugly.
I meant:
public class TreeNodeTemplate where T: new()
{
public static T GetNew()
{
return new T();
}
}
public class TreeNode : TreeNodeTemplate
{
}
public class BetterTreeNode : TreeNodeTemplate
{
}
(also here, bloody brackets: http://rafb.net/p/vCcjeW43.html )
Which does work. This way you can use T whenever you'd have used "thisclass". I'm not sure if it'll work for something larger than a minimal example (the need of the new() constraint may cause some worries...)
Do you think the "thisclass" solution solves more than this? It seems rather problem-specific. Additions to languages should be generally useful. Moreover, I think thisclass as a solution to this problem works backwards. You'd normally write your code for a certain class C such that methods take and return C objects, you'll have IEnumerables everywhere, et cetera. When you need to extend that class, you have to change every appearance of C in the code to "thisclass", or else your derived classes will be flawed. A good solution would not require any change of code in the base class.
Oh, I get it now. It's clever.
But still doesn't address the big problem (unless I missed something?)
"thisclass" is as specific as "this", though perhaps not as useful in languages where types are not "high-order" (that is, objects themselves). Perhaps a more general solution would be to allow basic type expressions. That will allow not only typeof(this), but also typeof(base), and maybe some compile-time testing.
I don't see how that is a problem. The base class wouldn't have to change, because it would be written properly the first time. That is, every self-reference will use "thisclass" (or typeof(this)), unless it explicitly requires class C (which is pretty rare).