Putting parsing aside for a bit, I wish to rant about some issues with open-source which have been bothering me for a while now.
What drives a person to work on an open-source project is usually the strong wish to contribute back to the community and be part of it. It's somewhat of a moral decision. So very often, perhaps more often than not, a volunteer would decide that he wants to join a project before he even knows which project it'd be.
And so in selecting a project, some of the choices he faces are these:
- Should he join an old and stable project -- in which he will have to spend some time learning the existing code base, and will have to work very hard to be significant -- or should he join a new project -- in which everyone start from scratch, and in which he can more easily leave his mark? Perhaps he should even create a new project where he'd have the strongest say?
- Should he join an infrastructure project which is meant for programmers, or should he join a GUI-based project that could appeal to anyone -- in which he has much higher chances of fame and recognition ?
- Should he join an innovative project that has to convince its users that they need it, or should he join a project with proven demand (such as IM) -- which is more likely to be used and is easier to advertise?
There are many other choices an open-source contributor faces in selecting a project, but I've chosen these three because I think that the decisions commonly made for them, while understandable, are damaging to the world of open-source software.
This is a strong claim, I know, and so I'll spend the rest of this post trying to establish it.
Common Decisions
I cannot prove that the common decisions are what I claim they are; I have no statistics to use. It is just my impression out of my own experience with open-source.
Without proof, I can only try and persuade you by using examples as evidence. While presenting these examples I'll try to explain why I think they are so damaging to open-source.
Effects - Example 1: Attack of the Clones
Clones are possibly the most common pattern in open-source. For every purpose, need and audience there are many projects, and most of them are not different enough to technologically justify being separate projects. They do mostly the same thing but don't share libraries, design or knowledge. A quick look at this page can demonstrates this easily: Can you really tell all of these parsers apart? I can't. They can probably all be combined into two or three projects with some additional modules and options. This is just an easy example. There are dozens of IMs and bittorrent clients and hundreds of text editors and code editors with the same list of feature list and only a slightly different GUI. I see a new CMS announced every week, and that's just for python. The clones have gotten beyond the point of healthy variety. There are just too many.
Why is this so bad? Isn't competition beneficial for product quality?
Yes, competition and variety are important, but open-source is not a capitalistic system: The work force is very limited (and even more limited in work hours) and is not necessarily driven by profit, and the goals of all competitors are fairly well-defined and mostly similar. So, the capitalistic model doesn't necessarily apply(and let's not forget, open-source is communism 🙂 ).
Open-source acts like it's got good programmers to spare, and it doesn't. If we took all the developers of bittorrent clients (for example), removed them from their project and divided them between only three bittorrent clients, we would have significantly better clients - in stability, features, support, etc. Variety would come as options within the projects.
Effects - Example 2: Â Are there really 30 ways to do it?
With similar goals should rise cooperation. Differences are important, but similarities should be exploited to reduce amount of code written in each project (more code = more time required and more bugs).
There are so many IMs, and each one is implementing its protocols on its own. The same goes for bittorrent clients, as this table demonstrates (remember: different features == different implementation). Think of all this wasted time that could've been used to make two or three good, stable and feature-rich bittorrent libraries. GUIs can then use these libraries and be as varied as they want. Amount of time saved would be incredible, without any loss of variety, and with the gain of quality.
Remember: Every time you write a library for programmers, you are cooperating.
But who wants to write tedious libraries and APIs? GUIs are so much more fun. *Click*
I'm picking on IMs and bittorrent clients because it's easy. But this applies to most categories of open-source projects that I know.
Effects - Innovation
I cannot give an example to demonstrate that there is hardly any innovation. I can only note that I hardly run into any. Most new open-source projects I see have been done before, and in a fairly similar way. Lists of features in projects seem almost identical to their competitors' lists. Sometimes I'll find out a project has a little innovative feature -- nothing radical -- and it would disproportionately brag about it.
But I do know that clones must hurt innovation. The evolutionary process depends on a large variety of "mistakes" for it to choose from. When most of the projects are the same as their "predecessors", natural selection has little to select from. We need odd projects that nobody thinks they need. Perhaps only one would succeed out of a hundred, but having true innovation might be worth the sacrifice.
Perhaps innovation is out there. If it is, please let me know.
But these are natural trends! What can we possibly do?
Suppose I convinced you that these are actual problems in open-source. Is there anything to do about them?
Well, obviously we cannot tell volunteers which projects to join and how to run their projects. We can't force projects to cooperate. But we can try and persuade them. We can show them our reasoning and ask them politely.
Mature projects could try and be more friendly to new-comers. They can give them more freedom to do forks or optional modules. They can actively encourage ideas and involvement (even if not always directly beneficial). They can make their code-base more accessible to new readers.
If you have any more ideas I'd be happy to hear, but basically this is all in the hands of the masses. If you think my ideas are correct and want to see the trends shift, spread the word. (And if you think I'm incorrect, please let me know)
Open-Source Programmers Of The World - UNITE!