#2882 Fantom vs. C, D, Basic, Fortran -- optimized large CSV read, parse, and write back to file

SpaceGhost Sat 17 Dec 2022

Thank you Henry! You got me motivated to do some testing.

This is follow up from Forum #2881 util::CsvInStream readAllrows: https://fantom.org/forum/topic/2881#c5/ but I wanted to title the topic different for search purposes.

Read in a large csv into a: list (Fantom and Basic), a pure array (Basic), or a "helper" list of arrays combo (somewhat unique to Basic). Then wrote to my SSD from the list and/or array as fast as possible.

I programmed and ran this in: Fantom (code shown below for both cases with help from Henry and SlimerDude); as well as compiler-optimized (production vs. debug): C, D, and Basic done on my own previously.

Since Basic was slightly faster than C and D, I only listed Basic below. For these apps, FORTRAN can hang with Basic in my experience, but they are about the same.

Compilation times (not run time) for Basic were basically instant and file size (native binary) is: 3KB (you gotta respect Basic compilers).

Fantom of course has JIT to bytecode for the JVM so there is no separate compilation time.

The input.csv details

Input CSV: 
size: 50MB
rows: 207,361
columns: 32
cells: 6,635,552

Informal Runtime Benchmark Results

Fantom: lists for-each: 2.687s (good job Henry!)
Fantom: lists for-next: 2.778s
Basic: arrays for-next: 2.786s
Basic: arrays for-next: 3.280s
Basic:  lists for-next: 5.025s

I have to admit I am shocked that Fantom (on the JVM) just edged out an optimized C,D, and Basic solution - extremely impressive

Here is the src code for the two Fantom configurations

Fantom: lists for-each: 2.687s (credit to Henry)

Str[][] csvIn := CsvInStream(File(`./src/input.csv`).in).readAllRows

outFile := File(`./src/output.csv`)
outFile.delete

outStream := CsvOutStream(outFile.out(true))

csvIn.each |Str[] row| {
  outStream.writeRow(row)
}
outStream.close

Fantom: lists for-next: 2.778s (my take on Henry's code w/for-next

Str[][] csvIn := CsvInStream(File(`./src/input.csv`).in).readAllRows
  rows := csvIn.size; cols := csvIn[0].size
  echo("csvIn number of rows: $rows")
  echo("csvIn number of columns: $cols")
File(`./src/output.csv`).delete

outFile := File(`./src/output.csv`)
outFile.delete

outStream := CsvOutStream(outFile.out(true))

for (i:=0; i<rows; i++)
{
  outStream.writeRow(csvIn[i])
}
outStream.close

End of Reply

SlimerDude Sun 18 Dec 2022

Interesting - it reminds me of a much older post where someone else bench tested parsing .csv files, and found that Fantom was faster than Java!

See Fantom speed impressive...

Gary Tue 20 Dec 2022

Cool study! Which version of BASIC are you using, since there are so many versions?

SpaceGhost Tue 20 Dec 2022

Hi Chikega!

I have two "production ready" basic platforms and one "hobby" platform. I love all three....sucker for a good basic platform. That said....the production ready ones are Power Basic and Pure Basic. The hobby one is QB64 (64 bit version compliant with QBasic historic code, but GREATLY extended. All three have built in IDE and debugging.

Pure Basic is the most up-to-date and has immediate F1 help on anything the cursor is on...right to the documentation (>1000 pages) very nice. It has a pythonic type syntax. Power Basic is really old school Basic syntax and also has amazing help. I like the syntax of Pure better than Power and the IDE of Pure has dark mode and is very nice. QB64 is lovely and has F1 help but has a more limited built in function list, i.e. hundreds vs. >1,000. It has an old school IDE that works just fine, but is retro.

For production work I would recommend Pure Basic. It transpiles to C then machine code (behind the scenes). It is 64 bit. Power Basic goes right to machine code. They both do not need static or dynamic libraries on Windows so the .exe files are super super tiny. Both compilers are basically instant like Go is too...you cannot believe they actually compile that fast. QB64 transpiles to C++ then machine code. The compiler is slower than the other two but still OK. QB64 is just really fun and reminds me of "yester year". Still very capable and you can literally run MS QBasic code that is decades old.

Gary Sat 24 Dec 2022

Hi SpaceGhost! I hate to say I'm old enough to have watched SpaceGhost cartoons as a kid. :D Well, now I know why you have such blistering speeds! I've heard about the history of PowerBasic (assembly language backend) and the genius developer Bob Zale - may he rest in peace. The BASIC variants you describe are definitely not of the interpretive type. I've followed a few video Youtube tutorials, PowerBasic for Beginners and the gentleman is a great instructor and is very prolific despite not having a lot of likes. I'm more familiar with PureBasic and and have toyed around with QB64 - what a mess they had recently with QB64 the Phoenix Edition rising after the squabble. I absolutely love PureBasic for the reasons you describe - F1 for help, the syntax is great, the IDE is small, clean and snappy - well, everything about PureBasic is snappy. I'm by no means a professional programmer and I'm still learning. I'm perhaps at an intermediate level. I enjoyed watching the Youtube video series Pure Programming by Guillaume - but for whatever reason, Guillaume stopped making anymore tutorials last year. :-/ I've been playing around with Fantom as well. And it's interesting to see that a managed language like Fantom is so fast too. :)

Cheers,

Gary

Gary Thu 17 Aug 2023

I watched an episode of Dave's Garage on YouTube "Top 5 Fastest Programming Languages". Dave was surprised that Java, a managed language, was able to crack into the top 5 especially in the company of un-managed languages like C++, C, Rust, Zig. So it appears the JVM is highly optimized which is a win for Fantom.

Interesting watch: https://youtu.be/pSvSXBorw4A

Login or Signup to reply.