VHLT source code cleaned up

Solokiller

Posted 6 years ago2018-12-02 17:05:39 UTC Post #341376

I've cleaned up the VHLT V34 source code to make it easier to read: https://github.com/SamVanheer/VHLT-clean

The branch remove-ifdef removes all ifdefs for features. So where originally there was a #define feature and #ifdef feature #endif there is either just the code the ifdef surrounded, or if the feature was disabled the code was removed.

There's also a branch remove-xash that removes all Xash specific code.

I ran a tool that counts lines of code on it, here are the results.

For the original codebase with no changes:

Language	files	blank	comment	code
C++	51	5437	4046	50560
C/C++ Header	33	964	378	5533
Total				56093

For the cleaned up version in the remove-xash branch:

Language	files	blank	comment	code
C++	51	4512	3272	35487
C/C++ Header	33	937	347	4045
Total				39532

So about 16500 lines of code in the regular version of VHLT is dead code, never used at all. Some of it is Xash specific, but it's not that much.

Some features were disabled, so their code was removed. Here they are:

ZHLT_DETAIL: some kind of old func_detail variant, was already obsolete
ZHLT_PROGRESSFILE: never implemented beyond command line argument handling, so didn't work at all
ZHLT_NSBOB: the only thing i found was the definition, no code appears to exist for it
ZHLT_HIDDENSOUNDTEXTURE: would allow you to mark faces as hidden by setting zhlt_hidden on an entity. You can do this by ending a texture name with _HIDDEN, so i guess it was obsolete
HLBSP_SUBDIVIDE_INMID: seems to be intended to reduce the number of faces, but contributes to AllocBlock:Full errors so it was disabled

The cmdlib.h header is where all of these definitions were, it used to be 712 lines and is now 172 lines. There are 2 definitions left in place because they depend on platform specific functionality.
One is game_text's UTF8 conversion support, which relies on a Windows API function. It's not that hard to replace it with a cross platform alternative.

The other is Ripent's -pause parameter which was implemented only on Windows for some reason. This may have to do with the fact that it's implemented using atexit, so it may not work on Linux due to differences in how console resources are managed during program shutdown. Reworking Ripent's code to avoid use of exit should solve this problem.

I don't see any more #define statements used to control features anywhere so i guess it's all cleaned up now.

To make this process easier i used some tools to speed things up. First i used sunifdef to process all files and remove definitions one by one. I wrote a batch file that does this and also commits all changes to Git automatically, so i could just do removeifdefs.bat <definition name>. You can find the batch file in the remove-ifdefs branch in src.

Note that to remove disabled sections you must modify the batch file to pass -Udefinition instead of -Ddefinition or it will turn on the feature.

All in all this took about an hour and a half to do.

My reason for doing this is that the ZHLT/VHLT source code has never been readable, you have to read past the definitions and note which ones are active. More recent versions of Visual Studio do a lot of work for you but it's still hard. For example, the file wadpath.cpp is 92 lines now, but was 174 lines before. That's nearly twice as long, containing code that isn't even used.

wadinclude.cpp is even worse. It used to be 212 lines, now it's 5 lines and there's no actual code left in it. This is because are 3 or more different versions of the same feature (wad inclusion) in the source code. Various long files are much easier to read now that they've been cleaned up.

I hope to use this to rebuild the codebase in C# so that the tools can be integrated more easily, and perhaps deal with some issues that exist in the current implementation. I don't know whether i'll have time to do it or not, but in any case, this cleaned up version is available to anyone to check out. I will not be making a build of this since it's identical to the original V34 build, if you want one you'll have to make it yourself.

Solokiller

Admer456

Posted 6 years ago2018-12-02 17:18:21 UTC Post #341377

This is really nice.

Admer456If it ain't broken, don't fox it!

twitch1

Posted 6 years ago2018-12-02 20:49:17 UTC Post #341378

Woo Hoo!

twitch1

Bruce

Posted 6 years ago2018-12-02 21:39:39 UTC Post #341379

too much 4 drunk person

Bruce

rufee

Posted 6 years ago2018-12-02 21:52:12 UTC Post #341380

I was literally talking to Bruce the other day about the mess that is the code for compilers.
Now the only thing that;s left to do is figure out the process...

rufeeSledge fanboy

Shepard62FR

Posted 6 years ago2018-12-03 07:14:46 UTC Post #341382

Nice work as always Solokiller ^^

Do you have any "refactoring" project of these compilers in mind? Like porting them to C# or clean/document them more like you did with HL:E or something else?

Shepard62FR

Solokiller

Posted 6 years ago2018-12-03 13:19:36 UTC Post #341383

Yeah i'm giving it a shot:

Each tool is a library, the MapCompiler library provides a compiler, the MapCompilerFrontEnd is the command line interface to it.
You could invoke the compile programmatically with this. You could potentially generate BSP files dynamically this way, though it would still take time to compile everything.

I need to rework the Tokenizer class first so it can read .map files properly since apparently QuArK embeds texture info in comments and stuff.

Solokiller

Posted 6 years ago2018-12-08 16:40:40 UTC Post #341406

I'm looking at the code that handles .map file loading and there's special code for QuArK style texture coordinates. However, according to its official documentation this is only used when you use the "Quark etp" format. The Valve 220 format uses the correct values.

I'm bringing this up because the code used to handle this has a pretty substantial presence in the map loading and CSG stages. The script tokenizer has to handle it specially and CSG has to convert texture coordinates differently for this format, which means the map data structures need to store off 2 different formats.

I found that Quark saves .map files correctly when used in "Half-Life" mode, which is when you select it as the active game: http://quark.sourceforge.net/infobase/maped.tutorial.starting.html#entering

So i will not be porting the Quark specific //TX# way of specifying texture data in those files. If these tools ever get finished you'll need to properly configure Quark for Half-Life to use it.

Solokiller

Posted 6 years ago2018-12-09 14:47:54 UTC Post #341408

I think i've nailed down the design of the compiler and tools to make it easy to use as a library as well as a command line program:

ILogger logger = ...;
var providers = new ICompileToolProvider[]
{
    CSGTool.Provider,
    BSPTool.Provider,
    VISTool.Provider,
    RADTool.Provider
};

using (var compiler = new MapCompiler(logger, providers))
{
    try
    {
        compiler.Compile(mapData, sharedOptions);
    }
    catch (CompilationFailureException e)
    {
        logger.Error(e, "An exception occurred while compiling");
        throw;
    }
}

The compiler takes a logger and a list of providers of tools. A tool is provided by a provider that specifies the tool name, its type and can create instances of the tool.

Tools can be configured using an initializer:

var provider = CSGTool.Provider.WithInitializer(csg => csg.HullData = myHullData);

This creates a provider that invokes an initializer function on the tool. In this case the hull data used by CSG is overridden to use your own settings, but more options will be available for each tool.

This approach also allows you to create custom tools that can be inserted between other tools. As long as your tool returns the correct data type for the next tool this will work just fine. You could use this to add tools like analyzers or something that can output the data returned by a tool. For instance, if RAD were to return lighting data that isn't in the BSP file as its result you could output it to a file.

As far as command line usage goes the library i'm using doesn't support the older -option value syntax, only -ovalue or --option=value. Since you'd need a new interface to use this compiler properly anyway i don't see this as a problem. With this compiler you only need to invoke the frontend once to run all the tools you want, so having different syntax can actually prevent misuse by erroring out on the old syntax.

This also lets me use better names for options. Whereas the old tools would often have -no<whatever> arguments, this one just has --<whatever>, for instance -noresetlog becomes --resetlog.

I'm trying to avoid doing any file I/O in the compiler itself, so any files that need loading will be loaded either before the compiler is created or in the initializer for the tool that needs the data. If it's output data then it will need to be handled as a post-tool operation, possibly a custom tool.

This also means that any data that is being output to a file should be available to you programmatically when invoking the compiler as a library. This can make it much easier to access some data, like point file output.

As far as log output goes, i'm currently using the standard Serilog message template, modified to include the name of the tool that's logging the data:

[14:54:40 INF] [MapCompilerFrontEnd] test

The format is:

[time log_level] [tool_name] message newline exception

Since this drastically alters the output in log files i'll probably add an option to remove the extra information and just log what the old one does.

I've looked into how work can be spread over multiple threads, and it looks like C# has a class for that: https://docs.microsoft.com/en-us/dotnet/api/system.threading.threadpool?view=netframework-4.7.2

It's possible to configure this to match the original's settings, then it should be a pretty simple task to dispatch work and await completion. However, thread priority is something that should not be changed according to what i've found since this class has some monitoring system to help manage how threads are used. It may not even be necessary to change the priority if it's able to see other processes so that will require some closer inspection once there's some work being done.

Solokiller

Bruce

Posted 6 years ago2018-12-09 22:51:43 UTC Post #341410

won't C# make it slower than C/C++?

Bruce

Shepard62FR

Posted 6 years ago2018-12-09 23:56:29 UTC Post #341411

AFAIK, since Solokiller is using .NET Core, the performance differences should be minor. Also there is code cleanup involved so that needs to be taken into account.

I guess we'll have the answer once he finishes his work and someone benchmark compile times.

Shepard62FR

Solokiller

Posted 6 years ago2018-12-10 07:45:40 UTC Post #341412

There are a lot of things in this design that makes it faster. It only has to parse in the command line once, data is passed between tools directly instead of having to write it all out to files and then parsing it back in, and there are no fixed size buffers so it's easier to remove stuff without having to touch and move a bunch of lists.

Also, memory allocation in C# is much faster than in C++: http:/codebetter.com/stevehebert/2006/03/03/raw-memory-allocation-speed-c-vs-c

The articles and discussions i've found on this are pretty old and Core is known to be much faster and more efficient so it may actually be the best tool for the job.

If i can use hardware acceleration for maths-intensive stuff like VIS and RAD it'll go even faster. You can then compile maps on your GPU.

I'd also consider any slowdowns to be worth the cost is it means having tools that you can actually understand the inner workings of and modify as needed. Very few people know how to make adjustments to these tools.

Solokiller

rufee

Posted 6 years ago2018-12-10 08:48:48 UTC Post #341413

Iv'e always been interested in getting the tools to work with GPU's, though obvious lack of knowledge etc...

Anyway what parts could be ran in parallel and what potential gains could we be talking about?
I've ran benchmarks on various CPU's some time ago and noticed that increasing the core count past 4 offered no tangible decrease in compile time (4 - 10 sec).

rufeeSledge fanboy

Solokiller

Posted 6 years ago2018-12-10 09:30:19 UTC Post #341414

The tools are already running as much as possible on worker threads. The advantage of using GPU calculations is that the GPU has cores made for mathematical calculations and can handle the spreading of work by itself since it already does that to optimize graphics rendering.

I don't know how much difference there could be in terms of performance since i can't find anything on that, so we'll have to wait and see how it works out. I should be able to get the CSG threading part done soon, then i can do some tests there to see how it compares. unfortunately since CSG is usually done so fast it probably won't have much difference in performance, it may even be slower due to overhead in creating GPU resources. RAD is really where it takes the longest, so that's where it can come in handy.

I'm not sure why having more cores wouldn't have an effect, it could be that each core's frequency is lower so it doesn't have as much effect. Perhaps the OS is under-reporting the processor count so it might not be taking advantage of all cores, but you should see that in the compiler settings output.

It's also possible that the worker thread code isn't optimized for a large number of threads and is locking too much to take full advantage of what it gets, i'm not sure.

Solokiller

rufee

Posted 6 years ago2018-12-10 09:38:34 UTC Post #341415

Granted I did test on a 7700K which only has 4 physical cores. Once I get a chance to test with more I will report on my findings.

rufeeSledge fanboy

Instant Mix

Posted 6 years ago2018-12-10 15:14:03 UTC Post #341416

I can do a benchmark on a 12 thread Ryzen 5 2600 - can also test against an overclock and normal clock. RAD is definitely going to be one that benefits the most from multithreading. It's totally radical, man

Instant MixTitle commitment issues

Oskar Potatis

Posted 6 years ago2018-12-10 16:24:24 UTC Post #341418

Sorry, Solokiller, that article is useless. It's common in high-performance C++ projects to use custom allocators, and the cost of deallocation wasn't measured. A more complicated allocation can make the deallocation cheaper and vice versa since a common goal of a memory allocation scheme is to avoid memory fragmentation and that requires a lot of work. C# is of course garbage collected and the cost of cleaning up and reorganizing needs to be taken into account. Most implementations of the standard C++ operator new are designed to be thread-safe and conservative - so there's likely a big overhead from locking every time you call new - which is something a custom allocator can minimize by being greedy and preallocating large amounts of memory. The costs of dealing with dynamic memory are paid for at different points in the lifetime of the program: with (non-garbage-collected (yes, C++ can be garbage collected according to the standard, no idea why anyone would want that though)) C++ it's very predictable - you only pay when you allocate and deallocate; with C# you leave a lot of the work to the garbage collector. There have been a few interesting CppCon talks on allocators.

Making something like this in C# is fine of course, use whatever you prefer. Ease of development is a good reason to pick one language over the other and C++ can be painful. As you know design usually matters much, much more than choice of language when it comes to performance. I'm excited to see the results of your work!

Oskar Potatis🦔

Solokiller

Posted 6 years ago2018-12-12 12:57:10 UTC Post #341435

Yeah we'll just have to wait and see for performance.

I've been converting CSG's code and i've noticed some areas where performance can improve by not having to write stuff to files. CSG normally writes plane data to 2 files; the BSP file and a separate .pln file. BSP loads the .pln file or if it doesn't exist, converts the data from the BSP file.

Since this data is now shared directly it eliminates the file I/O overhead and the need to support conversion from the BSP file.

This is probably why CSG was merged into BSP for Source.

I've also found a comment indicating a possible race condition in in the plane generation code: https://github.com/SamVanheer/VHLT-clean/blob/remove-xash/src/zhlt-vluzacn/hlcsg/brush.cpp#L28

There is a possibility that one thread is modifying the plane list while another is reading from it, which could cause it to read garbage.

To fix this the code needs to use a reader-writer lock: https://docs.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlockslim?redirectedfrom=MSDN&view=netframework-4.7.2

That's the best solution for this type of problem, it eliminates any race conditions and garbage reads but it'll slow it down a bit to acquire and release the lock.

Solokiller

Posted 6 years ago2018-12-13 16:17:43 UTC Post #341444

I've been keeping track of code i've found that could be optimized in the original version, here's what i've got so far:

The aforementioned use of a reader-writer lock to ensure thread-safe access to the planes array eliminates potential garbage reads
A fair amount of code leaks memory by not freeing the faces list created by MakeBrushPlanes. Using a std::unique_ptr that frees the memory eliminates the leak, using the function FreeFaceList as the custom deleter function. Though i've never heard of CSG running out of memory, in a shared process this does matter a lot more, especially once RAD has started. This isn't a problem in my version of the tools because it's garbage collected
The faces list could be converted to a std::vector<bface_t> to automatically manage the memory and to allow faster iteration by storing the faces by value. Note that this will increase the cost of sorting the sides (the entire face needs to be copied, instead of just changing its position in the list), but that cost may be offset by the speed increase resulting from all faces being located in the same memory range, and thus more likely to be in the CPU cache. Sorting is only done once per brush per hull (4 hulls) at the most (disabling clipping can skip the whole process, some contents types don't create faces at all either), but iteration is done many times, and additional faces are added during the brush expansion process
Origin brush processing and zhlt_minsmaxs handling both use the CreateBrush function to get the bounding box from a brush. However CreateBrush does a lot more work than just calculating the bounding box, work that can be avoided entirely thus speeding up the parsing of the .map file. If properly reworked the number of planes generated can also be reduced by not letting these two features add planes to the array used by the actual brush creation operation. This can reduce the chances of hitting the internal and/or engine limit on the number of planes, since the planes that contain origin and zhlt_minsmaxs brushes are not often going to be used by other brushes
Merging CSG and BSP will cut down on work done to save and load the data generated by CSG, speeding up both tools and eliminating a possible data corruption issue if the plane data is modified between CSG and BSP execution
As is the case with the engine, there's code that uses a varying number of (-)99999 to denote maximum/minimum values, which can cause problems when the engine limit is increased and larger maps are possible. For instance the function isInvalidHull uses 99999 as its limit, so brushes that are partially or completely beyond this limit can end up marked as invalid. Using <cstdint>'s limits constants INT_MIN and INT_MAX appropriately should solve the problem

Solokiller

Posted 6 years ago2018-12-13 20:49:23 UTC Post #341446

I'm posting this separately for visibility: does anyone have a large map that takes a bit (~5 or more minutes) to fully compile using a full run of all tools? I'm going to run a performance analysis tool to find the code that's taking the longest so i can see if anything can be optimized. Ideally with default textures only so i don't need to reference other wad files.

Solokiller

rufee

Posted 6 years ago2018-12-14 07:04:19 UTC Post #341448

Iv'e used Stalkfire https://twhl.info/vault/view/6170
Takes around 5-7 mins. Though custom textures are necessary.

rufeeSledge fanboy

Oskar Potatis

Posted 6 years ago2018-12-17 02:45:45 UTC Post #341463

I wonder if anyone's ever explored the idea of using a rendering engine like the one in Blender to generate light maps for old games

Oskar Potatis🦔

xawari

Posted 6 years ago2018-12-21 09:22:02 UTC Post #341471

A compiler? In .NET?? :facepalm:
What did C ever do to you? :_(

xawari

Solokiller

Posted 6 years ago2018-12-21 11:06:50 UTC Post #341472

It's easier to write everything in the same language so the code can be shared. The engine's data structures for BSP data can also be used by the compiler, so i can increase limits very easily.
It's also much easier to provide debug info when the compiler breaks, and it's easier to maintain this codebase since it's designed and written at by one person in one go, without a bunch of #ifdefs and varying code styles.

Also, C is very poorly suited to writing tools and games these days. A fair amount of code in these tools does nothing more than free memory, something that even C++ could do automatically through deterministic destructors.

Solokiller

VHLT source code cleaned up Created 6 years ago2018-12-02 17:05:39 UTC by Solokiller