Review your favorite Linux distribution.
Go Back > Blogs > Unpopular Positions: One Geek's Take
User Name


A space to ponder, discuss, speculate, disagree, and learn.

Expected topics include:
  • Technology
  • Politics
  • Defense
  • Philosophy
  • Humanism and Transhumanism
  • The Future
Rate this Entry

Overview of Software Projects

Posted 06-06-2013 at 09:10 PM by ttk

Writing software is one of my favorite pastimes. I derive great satisfaction from designing and implementing tools, giving people new and needed capabilities.

Unfortunately I have more projects underway than time and energy to give them all the attention I'd like. I struggle to apply discipline to finish some projects before starting others, with mixed success. Some of my tools are useful now, while others are works in various states of progress (or neglect).

Some (perhaps a fifth) of these are showcased on my Code Closet page, which I started more than ten years ago, when the norm for software projects was to provide simple tarballs of source code and documentation.

The times have changed, and a few of my friends have asked that I get on GitHub to make my projects more transparent, more available, and more open to end-users' participation and feedback.

It's a great idea, and much like the transition from composition notebooks to this blog, I intend to transition from CVS and a static webpage to GitHub.

The first step is to enumerate the projects to transition, and write a little about each of them (providing the basis for the per-project GitHub documentation).

The Projects:
  • mssh
    One of my most-used and useful tools to date, what mssh lacks in originality it makes up for in scalability and versatility. It is a parallel-ssh command tool, similar to pssh and cssh (and many others).

    The idea is simple: If you have many networked computers, and want to run a command on all of them at the same time and see the results, mssh gives you that capability.

    What distinguishes mssh is its ability to scale to clusters of thousands of servers, tolerate faults (such as down or merely sick target hosts), and deal gracefully with large or infinite command output.

    What mssh currently needs is automatic fan-out. Instead of opening all of the ssh connections from the local host, it needs to open intermediate connections with some hosts, and use processes there to open the rest. Thus, instead of starting thousands (or tens of thousands) of processes locally, it would open only a few dozen locally and a few dozen on each of a few dozen other machines, funneling command results through the intermediate connections to the local host.

    It also needs a better way of mapping target hosts to the usernames and ssh keys it uses to open the ssh connections. Currently it takes a single username and ssh key, and uses it for all ssh connections, and in some environments that just won't cut it.
  • calc
    This might be the tool I use the most, even if nobody else does. It is a simple REPL-style perl calculator with a stack, tickertape, function library, and some other useful features. Think of it as Octave/MATLAB with perl as its command language.

    I've started the second rewrite of the program (calc3), with the aim of giving it "modes", such that switching it into a different mode makes it a REPL interface to a different kind of environment. It will retain "perl calculator" as the main mode, but add modes for "C calculator", "SQL DBMS client", and "command shell".

    The advantage of implementing these as different modules in one program is the ability to more easily and powerfully share data between different tasks, using a different approach from the traditional shell-pipes model. It will also provide a more consistent and configurable user interface, and share state between multiple and remote instances of itself. One of my biggest beefs with bash is my inability to seamlessly share command history between shell instances on different computers.

    I will consider calc3 a success when it is my primary login shell.
  • select
    Similar to SQL's "SELECT" statement, the "sel" utility takes formatted data as input, transforms or filters it, and produces formatted data as output. It can be used as a smarter grep ("show only rows where column A's value is between those of column B and column C"), to add, remove, rename, or reorder data fields, or translate from one data format to another (JSON to XML, for instance, or CSV to HTML, etc).

    I have used this tool in production environments to great effect, but it still needs tremendous work. There are features which I developed for it as an employee of Discovery Mining which are not available to the public, and I want to re-implement those features fresh so they can be shared. It also fails mightily in converting between some formats (XML to CSV, for instance, is hardly useful at all).

    I have also made two attempts thusfar at parallelizing sel's operation, and have not been satisfied with the results. I think my mistake is in trying to be too clever. When I next tackle the problem, it will be much simpler.
  • dy
    "dy" is like a cross between the standard UNIX utilities "ls" and "du", with additional capabilities akin to "md5sum" and "file". It descends through file hierarchies, displaying characteristics of each file and directory, each on one line. It can also extract metadata about some files from a metadata file and populate those files with the metadata extracted. It has output modes for human-readable and machine-readable formatting.

    I have had opportunity to compare dy's file format identification capabilities to the standard UNIX "file" command, and to Oracle's Stellent file identification utility, and they each have their advantages. There are file types Stellent identifies that the others do not, and there are file types "file" identifies that the others do not, and there are file types dy identifies that the others do not (mostly subtypes -- it will attempt to identify C source code, bash source code, perl source code, html, css, etc, all of which "file" simply identifies as "text"). dy's file format identification can also be much faster than either Stellent or "file".

    That having been said, it can still use a lot of improvement.
  • The Black Tea Project
    This is actually a collection of projects, only two of which I have actually started. The idea is to re-implement some useful Java-based infrastructure components in Python (or perhaps Perl).

    Java applications can be a pain to configure, administer, and maintain, and Oracle is taking Java down a dubious trajectory. The hope is that by re-implementing these, other projects will have a choice to divest themselves of dependency on Java. The secondary objectives are to reduce memory overhead and eliminate some of these tools' other known pain-points.

    The tools under The Black Tea Project so far are: Isabella (to replace Lucene, started), DragonScale (to replace Zookeeper, started), and Persephone (to replace Cassandra, not started).
  • zacl

    Zacl's A Concurrency Language.
    Zacl's A Compiled Language.
    Zacl's A Compact Language.

    I already have a lot written elsewhere about Zacl, so won't dwell on it too much here (but will doubtless post more about it in the future). This is by far my most ambitious project, and the only one so far which has attracted collaborators.
  • oddmuse hacks
    I really like OddMuse for a lightweight wiki. I can deploy a new OddMuse instance on my server in about five minutes. It is based on text files rather than a database, so is much less likely to break than other wiki systems, and easier to extend and administer.

    On the other hand, OddMuse also has shortcomings which prevent me from using it for some of my larger data publishing project (like the wikification of my military technology website, which has 20GB of new content waiting to be incorporated).

    Fortunately OddMuse is very easy to modify, so modify it I have. Some of those modifications might be of interest to others, so I'll be exposing them on GitHub.

    This project needs a name!
  • porkusbot
    One of my more frivilous projects is an Internet Citizen's Band robot named porkusbot. It has gone through numerous revisions, of which this is the third. All previous versions exposed its host platform to sundry security vulnerabilities, so I could never share the source code.

    In this most recent attempt to "do it right", I am aiming for a high degree of extensibility (adding functionality without restarting the bot process) and security.

    I have documented some of the older version's capabilities.
  • slackhammer
    Slackware is my linux distribution of choice. For the desktop, laptop, and server, it is stable, simple, complete, and hassle-free. It is the most sane of all linux distributions.

    Unfortunately it isn't quite perfect. A Slackware installation includes all of the libraries and other dependencies the system needs to operate all of its applications, but there is no dependency-tracking. This is actually a strength of Slackware, as many Slackware users will tell you, but sometimes I need to figure out the dependencies of official Slackware packages and of third party Slackware packages.

    It also bothers me to not have unit tests for Slackware and its packages. Transitioning to a new version of Slackware necessitates months of stress-testing on one of my less-critical systems before I trust the new version enough to use it in a mission-critical capacity.

    Furthermore, sbopkg (The premiere third-party Slackware package management utility, similar to Debian's apt-get) isn't really suitable to unattended operation. Even though it has a non-interactive option, there are several contingencies where it behaves interactively anyway, which is a problem when installing several packages on dozens or hundreds of computers simultaneously.

    So, I started writing some tools and databases to help automate all of these things, and to automate the management of local Slackbuild repositories (so that those hundreds of servers don't have to each reach across the internet individually to download their packages).

    As a whole, they make a framework called Slackhammer. Slackhammer allows the tracking, management, dependency tracking, archival, building, and testing of Slackware packages.

    Also, I have occasionally bumped up against Slackware's limitations in a datacenter environment (which goes beyond just being a good server), and have wanted to fork off my own "Datacenter Linux" distribution.

    If I ever pull the trigger on that, Slackhammer will help make it happen. It could also help others who wish to fork Slackware for their own purposes.
  • spamfilter
    About a dozen users get their email through my server, and blocking spam is a never-ending struggle. Several years ago I wrote a simple system-wide spamfilter which works okay, but needs a rewrite to address pressing problems:

    Users need a way to browse messages which have been marked as spam, restore mail to their inboxes which were mistakenly filtered out, and add account-specific spam filtering rules. It also needs a better way of tracking which filtering rules are actually getting used, so that stale rules can be retired -- it currently runs 1067 rules against each email a user receives! Finally, it needs to do a better job of recognizing when the exact same letter is being sent to multiple users (or to a decoy account), a frequent indicator of spam email.
  • jay
    Jove is my favorite text editor, providing the core functionality of Emacs without all the bloat. Jove is comparable in size and overhead to vim.

    Unfortunately Jove also has some shortcomings (line length limit, improper use of tabs, no control language, etc), so I patch it for my personal use. About a year ago I decided to make my patched Jove its own project, under the name Jay.

    The primary goals of the project are to address Jove's line length limit, eliminate tab characters from its output, extend its use of regular expressions, and enhance its ability to use external executables as control scripts.
  • Headcrab
    I prefer curses-based mail clients. Mutt is currently the best such, but it's far from perfect. In 2007 I had a framework of lynxcgi scripts which made for a halfway-decent mail client, but I never committed them to CVS, and now I cannot find them.

    Every time I use Mutt, I wish it used Lynx for its user interface. Headcrab is a fresh re-implementation of those lyxcgi scripts, aiming to provide a reasonably complete email client behind a Lynx front-end.
  • Parallel::RCVM
    Parallel::RCVM is a collection of perl modules implementing a thing I call the "Remote Computing Virtual Machine". It provides a multithreaded environment for perl applications, and a messaging mechanism for communication between threads and between remote instances (including a service similar to Python/Twisted's Manhole).

    The primary goal of the Parallel::RCVM::* modules is to provide a runtime environment similar to Zacl's so its strengths and weaknesses may be assessed before Zacl development gets too far along. Parallel::RCVM::* is also intended to be a useful framework in its own right for distributed systems perl programmers.
  • Physics::Ballistics
    Pursuant to my interest in armor and military technology, I have developed several functions related to the physics of armor, projectiles, and armor/penetration interaction, in both the ballistic and hypervelocity domains.

    Some of these are simple encodings of publicly available data or formulae, while others are my own invention and either have no equivalent in the world, or are only available as commercial software.

    These modules are very close to being published to CPAN. I just need to add a few more unit tests and get the packaging right.

    For the curious, the latest sources are here:
  • File::Valet
    Another perl module soon to be published to CPAN, implementing some functions for File::Slurp-like operations, file-locking, and locating executable files and temporary directories.
  • Text::KVP
    Yet another perl module, implementing functions related to the hash text data format.
  • Sys:: DynaConst
    One more perl module, providing wrappers to ioctl() and fcntl() which allow for dynamically discovering the names of system constants ("FIONREAD", "F_GETFL", etc) and mapping them to platform-specific values.

    This is useful when using perl in an environment lacking and, when generating/installing them is either undesirable or politically unfeasible (for instance, when only the IT department can install these files, and they don't want to do it).

That's a long entry, but it was worth getting it all down. I'll write more about each of these projects in more depth as I get to them.

Please don't feel shy about commenting or asking questions! Anything that encourages me to type up more information about these projects is a good thing.
Posted in Technology, Defense
Views 1309 Comments 0
« Prev     Main     Next »
Total Comments 0




All times are GMT -5. The time now is 03:25 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration