If you find your script isn’t running fast enough, there are several possibilities to improve its performance.
It is always a good idea to begin with finding which part
of the code is consuming the execution time. Gauche has
a couple of basic tools to help it. A built-in sampling profiler,
explained in the following subsection, can show how much time
is spent in each procedure, and how many times it is called.
The gauche.time
module (gauche.time
- Measure timings) provides
APIs to measure execution time of specific parts of the code.
Optimization is specialization—you look for the most common patterns of execution, and put a special path to support those patterns efficiently. Gauche itself is no exception, so there are some patterns Gauche can handle very efficiently, while some patterns it cannot. The next subsection, Performance tips, will give you some tips of how to adapt your code to fit the patterns Gauche can execute well.
• Using profiler: | ||
• Performance tips: |
As of 0.8.4, Gauche has a built-in profiler. It is still experimental quality and only be tested on Linux. It isn’t available for all platforms. It works only in single-thread applications for now.
To use the profiler non-interactively, give -ptime
command-line option to gosh.
% gosh -ptime your-script.scm
After the execution of your-script.scm is completed, Gauche prints out the table of functions with its call count and its consumed time, sorted by the total consumed time.
Profiler statistics (total 1457 samples, 14.57 seconds) num time/ total Name calls call(ms) samples ---------------------------------------------------+------+-------+----------- combinations* 237351 0.0142 337( 23%) (lset-difference #f) 1281837 0.0020 256( 17%) (make-anchor make-anchor) 3950793 0.0005 198( 13%) member 4627246 0.0004 190( 13%) filter 273238 0.0030 81( 5%) every 1315131 0.0004 59( 4%) (lset-difference #f #f) 1281837 0.0004 54( 3%) (make-entry make-entry) 730916 0.0005 40( 2%) (clear? #f) 730884 0.0005 33( 2%) (initialize #f) 599292 0.0005 32( 2%) fold 237307 0.0013 30( 2%) acons 806406 0.0004 29( 1%) clear? 33294 0.0084 28( 1%) (combinations* #f) 805504 0.0002 15( 1%) (make-exit make-exit) 730884 0.0002 15( 1%) lset-difference 237318 0.0006 15( 1%) reverse! 475900 0.0001 6( 0%) (fold <top> <top> <list>) 237323 0.0003 6( 0%) procedure? 238723 0.0002 4( 0%) pair? 237307 0.0001 3( 0%) : :
Note that the time profiler uses statistic sampling. Every 10ms the profiler interrupts the process and records the function that is executed then. Compared to the individual execution time per function call, which is the order of nanoseconds, this sampling rate is very sparse. However, if we run the program long enough, we can expect the distribution of samples per each function approximately reflects the distribution of time spent in each function.
Keep in mind that the number is only approximation; the number of sample counts for a function may easily vary if the program deals with different data sets. It should also be noted that, for now, GC time is included in the function in which GC is triggered. This sometimes causes a less important function to "float up" to near-top of the list. To know the general pattern, it is a good custom to run the program with several different data sets.
On the other hand, the call count is accurate since Gauche actually counts each call.
Because all functions are basically anonymous in Scheme, the ’name’ field of the profiler result is only a hint. The functions bound at toplevel is generally printed with the global variable name it is bound at the first time. Internal functions are printed as a list of names, reflecting the nesting of functions. Methods are also printed as a list of the name and specializers.
The profiler has its own overhead; generally the total process time will increase 20-30%. If you want to turn on the profiler selectively, or you’re running a non-stop server program and want to obtain the statistics without exiting the server, you can call the profiler API from your program; see Profiler API, for the details.
Don’t guess, just benchmark. It is the first rule of performance tuning. Especially for the higher-level languages like Scheme, what impacts on performance greatly depends on the implementation. Certain operations that are very cheap on an implementation may be costly on others. Gauche has such implementation-specific characteristics, and to know some of them would help to see what to look out in the benchmark results.
"80% of execution time is spent in 20% of the code" is another old saying. Don’t obscure your code by "potential" optimization that has little impact on the actual execution. We describe some tips below, but it doesn’t mean you need to watch them all the time. It is better to keep most of the code clean and easy to understand, and only do tricks on the innermost loop.
Ports: To satisfy the specification of SRFI-18 (Threading),
every call to I/O primitives of Gauche locks the port.
This overhead may be visible if the application does a lot of I/O
with smaller units (e.g. every bytes). The primitives that deals
with larger unit, such as read
and read-uvector
, are less
problematic, since usually they just lock the port once per call and do
all the low-level I/O without the lock overhead.
(Note: this doesn’t mean those primitives guarantee
to lock the port throughout the execution of the function;
furthermore, the port locking feature is optimized for the case
that port accesses rarely collide. If you know it is possible that
more than one threads read from or write to the same port, it is
your responsibility to use mutex explicitly to avoid the collision.)
If you find out the locking is indeed a bottleneck, there are couple
of things you can consider: (1) Try using the larger-unit primitives,
instead of calling the smaller-unit ones. (2) Use with-port-locking
(see Port and threads) to lock the port in larger context.
Strings: Because of the multibyte strings, two operations
are particularly heavy in Gauche: string mutation and indexed string access.
It is a design choice; we encourage the programming style that
avoids those operations. When you sequentially access the string,
string ports (see String ports) provide a cleaner and
more efficient way.
When you search and retrieve a substring, there are various
higher-level primitives are provided (see String utilities,
Regular expressions, and srfi.13
- String library, for example).
If you’re using strings to represent an octet sequence, use uniform
vectors (see Uniform vectors) instead.
Deep recursion: Gauche’s VM uses a stack for efficient local frame allocation. If recursion goes very deep (depending on the code, but usually several hundreds to a thousand), the stack overflows and Gauche moves the content of the stack into the heap. This incurs some overhead. If you observe a performance degradation beyond a certain amount of data, check out this possibility.
Generic functions: Because of its dynamic nature, generic function calls are slower than procedure calls. Not only because of the runtime dispatch overhead, but also because Gauche’s compile-time optimizer can’t do much optimization for generic function calls. You don’t need to avoid generic functions because of performance reasons in general, but if you do find single function call consuming a large part of execution time and it is calling a generic function in its inner loop—then it may be worth to modify it.
Redefining builtin functions: Gauche inlines some builtin functions if they are not redefined. Although sometimes it is useful to redefine basic functions, you may want to limit the effect. For example, put redefined functions in a separate module and use the module in the code that absolutely needs those functions replaced.
Closure creation: When you create a closure, its closing environment
is copied to the heap. This overhead is small, but it still may be
visible when a closure is created within an innermost loop that is
called millions of times. If you suspect this is a problem,
try disassemble the function. Gauche’s compiler uses some basic
techniques of closure analysis to avoid creating closures for
typical cases, in which case you see the local function’s bodies
are inlined. If you see a CLOSURE
instruction, though, it
means a closure is created.
This list isn’t complete, and may change when Gauche’s implementation is improved, so don’t take this as fixed features. We’ll adjust it occasionally.