If you've still had no real ideas, I think I have a basic idea that should work.
You need your program to parse the HTML into elements forming a hierarchy. This can be done with a DOM XML parser library.
Then for each element that should be visible you need to have a graphical class that calculates it's appearance. Some depend upon elements contained within, you will need to find how attributed like colour will be inherited through the tree and how container elements will eventually calculate their sizes.
Then you need to use this to render the HTML graphically as a webpage, as a browser does, but with listeners/event handlers to detect where a user clicks and find the element most associated with that location.
Then when the user selects another element to add or attributes to change, you can update your tree, update your graphical objects, and rerender the page.
By detected where the user clicked you can also highlight the relevant HTML text in another window/frame/tab where they can manually alter the HTML, again with changes tracked and added back into the parsed tree, to regenerate the GUI.
So it's even more work than making a webbrowser.