I’m writing a small app (NoteSearch) to search through OneNote pages better than the standard OneNote search function. Since OneNote returns html strings and I’m only interested in the text itself I needed a simple function to remove the html tags. wxWidgets provides nearly everything, but I couldn’t find a function, which does this job. So I crawled the internet and found code on some webpage (actually it’s from the book Thinking in C++ – Volume 2: Practical Programming from Bruce Eckel and Chuck Allison), which I “rewrote” for the wxWidgets library.
wxString& stripHTMLTags(wxString& s, bool reset) { static bool inTag = false; bool done = false; if (reset) inTag = false; while (!done) { if (inTag) { // The previous line started an HTML tag // but didn't finish. Must search for '>'. int rightPos = s.find('>'); if (rightPos != wxString::npos) { inTag = false; s.erase(0, rightPos + 1); } else { done = true; s.erase(); } } else { // Look for start of tag: size_t leftPos = s.find('<'); if (leftPos != wxString::npos) { // See if tag close is in this line: size_t rightPos = s.find('>'); if (rightPos == wxString::npos) { inTag = done = true; s.erase(leftPos); } else s.erase(leftPos, rightPos - leftPos + 1); } else done = true; } } // Replace some special HTML characters s.Replace("<", "<", true); s.Replace(">", ">", true); s.Replace("&", "&", true); s.Replace("Â ", " ", true); s.Replace(""", "'", true); return s; }
You might need to replace some more special characters depending on your needs. On a side note I didn’t manage to run SyntaxHighlighter Evolved with the Gutenberg editor of WordPress 5.x. well :(. I just can’t change the block type to SHE. You need to press the + above or below a block to add a SHE block, but source code is not nicely formatted. So I use the EnlighterJS source code formatted and I’m happy.
The last part about the replacement of special HTML characters is obviously to special for the code highlighters. It should actually look like this: