RomRaider Logo

RomRaider

Open Source ECU Tools
 FAQ •  Register •  Login 

RomRaider

Documentation

Community

Developers

It is currently Tue Dec 23, 2025 7:13 am

All times are UTC - 5 hours [ DST ]





Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: UTILITY: ScoobyTables - tables markup utility
PostPosted: Fri Mar 22, 2024 10:28 am 
Offline
RomRaider Donator

Joined: Fri Aug 26, 2016 4:21 am
Posts: 153
Hi guys!

I've made a utility that should significantly decrease time of primary ROM analysis, namely, table markup. It can detect table type - boost, ignition timing, tip in and others. Of course, it must first be trained. One should prepare several definitions from which it will study. To achieve best results, these definitions should be from same CPU architecture, car model and close years.

You can get ScoobyTables here:

https://github.com/aalesv/ScoobyTables

You can get pre-trained model(s) here:

https://github.com/aalesv/ScoobyTables-pretrained

ScoobyTables

This software is designed for automatic Subaru Denso ROM table markup using machine learning. Currently, only the k-nearest neighbors algorithm is supported because it has distance metrics that can effectively eliminate false positives.

What ROMs are supported

For now, only Subaru Denso ROMs (and maybe other ROMs that have the same table format) are supported. It includes SH7055, SH7058, and SH72531.

How to use, briefly

ScoobyTables works with defs in ScoobyRom XML or CSV format. You can get ScoobyRom here. This is a fork of ScoobyRom that has support for CSV export and some other improvements.

  • Get ScoobyRom if you don't have one
  • Download ScoobyTables.py to a separate directory
  • Install python and then pandas, numpy, pyarrow, scikit-learn and whatever else is needed
  • Train your model or get pre-trained here
  • Copy your ROM to the directory where ScoobyTables.py is located
  • Open it with ScoobyRom, select all 2D and all 3D tables, and save them by pressing 'Ctrl+S' or by clicking the menu 'File\Save XML'. Defs will be saved to file 'YOUR-ROM-NAME.xml' located in the same directory.
  • Run
    Code:
    ScoobyTables.py --predict YOUR-ROM-NAME.xml -v
  • Predicted XML defs should be in the output.xml file
  • Backup YOUR-ROM-NAME.xml if you want and rename output.xml to YOUR-ROM-NAME.xml
  • Reopen your ROM with ScoobyRom
  • Inspect tables; you should see tables that ScoobyTables has found.
  • The software is not 100% accurate. There will be incorrectly detected or named tables, some tables could be missing. Please verify every table before proceeding further!

How it works and why exactly this way and not otherwise

ScoobyTables uses k-nearest neighbors algorithm. This means that every table should be presented as a point in n-dimensional space. Then the distance from each new, unknown, table to every known table is calculated. Based on this distance, a decision is made about class membership.

The idea is that tables placement does not change significantly from ROM to ROM. Of course, I'm talking about close enough ROMs - same car models, same CPU models, close years. The best illustration of this is Subaru Forester Gen 4 with SH72531 CPU. All the tables in those ROMs are located very similarly relative to each other. That's why this approach works.

The table contents themselves do not matter, only the data that describes the table:

  • Table structure relative ROM placement
  • Length of table axes
  • Data type of X and Y axes - RPM, temperature, engine load, etc.
  • Data type of table data itself
  • Table multiplier and offset, which are needed to convert from integer to float

Table structure relative ROM placement is easy to calculate if its address is known. Only ScoobyRom stores them. Length of table axes is defined in most defs except ScoobyRom. Data type of X or Y axes could be approximately represented by average number of the axis value - they can be calculated. Fortunately, ScoobyRom stores min and max values in defs' comments. Data type of table data itself are defined in every software. Table multiplier and offset are not stored directly in any definition.

So, no existing defs contain all information. More or less ScoobyRom XML defs are suited, but some info needs to be exctracted from binary ROM file. Or I can modify ScoobyRom to export all the data I need in some format, for example, in CSV, that can be easily imported by pandas. Well, I did both.

That's why if you want to use the XML format, you need both .bin and .xml files. But you can use modified ScoobyRom to export to CSV, and then you don't need a .bin file. By the way, I highly recommend to use modified ScoobyRom because it saves all annotated tables and all selected tables, which original ScoobyRom does not do - original saves only annotated tables. And to calculate relative table position, we need to have all the tables in def.

And now a couple of words about results. KNN predicts very well, but it makes many wrong predictions on incomplete data. Hardly someone defined all tables in ROM. So many wrong predictions are made. To filter them, distance thresholds are used - one for 2D tables (because they are more similar) and another for 3D tables. See CLI help.

How to train

First, you need to prepare the data. It is crucial that the same tables in all ROMs be called the same. Symbol case does not matter; a table name may end with '_1', '_2' etc. or '_A', '_B' etc. - all this ending will be stripped. For example, names 'Base_Timing_1' and 'Base_Timing_A' are good. And names 'BaseTimingA' and 'Base_Timing1' are not. This greatly affects the accuracy of the prediction.

I assume that you use modified ScoobyRom 0.8.5 or later

  • Open the ROM and select all 3D and 2D tables
  • Check that there are no erroneous tables in the list at the start and at the end of lists. Unselect erroneous tables if they exist
  • Save XML or export to CSV
  • Put your CSV files or ROM+XML files in a separate directory, for example, directory 'dataset'
  • Repeat for all your training ROMs
  • Run
    Code:
    ScoobyTables.py --train dataset -v --test-accuracy -i <xml|csv>
  • This means - load all data files from directory
    Code:
    dataset
    , be verbose, and test accuracy
  • If you use XML, you can omit
    Code:
    -i
    parameter. Otherwise, you should specify
    Code:
    -i csv

Command line parameters

One of the arguments
Code:
--train
or
Code:
--predict
is required. Run with
Code:
--help
to see help.

If you specify
Code:
--predict
, file format is guessed from the extension. Only ScoobyRom XML and CSV formats are supported.

You can set a distance threshold above which classifying will be ignored:
Code:
--knn-min-2d-reliable-metric
for 2D tables,
Code:
--knn-min-3d-reliable-metric
for 3D tables.

Full CLI help:
Code:
usage: ScoobyTables.py [-h] (--train <dirname> | --predict <filename>) [-i {xml,csv}] [--model-dump-file <filename>]
                       [-v] [--neighbors <number of neighbors>] [--test-accuracy]
                       [--test-sample-size <float number from 0.0 to 1.0>] [--random-state <int number>] [--dry-run]
                       [--knn-min-2d-reliable-metric <float number>] [--knn-min-3d-reliable-metric <float number>]
                       [--pre-xml-filename <filename>] [--dump-txt] [--pre-txt-filename <filename>] [--version]

options:
  -h, --help            show this help message and exit
  --train <dirname>     Train, test and dump model. Specify --test-accuracy to test accuracy. (default: None)
  --predict <filename>  Predict. Get data from <filename>. XML and CVS formats are supported and autodetected based on
                        the extension. (default: None)
  -i {xml,csv}, --input-format {xml,csv}
                        Input data format. (default: xml)
  --model-dump-file <filename>
                        Model dump file name. (default: scoobytables.dmp)
  -v, --verbose         Be verbose. (default: False)
  --neighbors <number of neighbors>
                        Number of neighbors for KNN model. (default: 2)
  --test-accuracy       Test model accuracy during model training. (default: False)
  --test-sample-size <float number from 0.0 to 1.0>
                        Test sample size. (default: 0.1)
  --random-state <int number>
                        Test sample random state pseudo-random number. If not set, test sample is purely random.
                        (default: None)
  --dry-run             Do not save anything to files. (default: False)
  --knn-min-2d-reliable-metric <float number>
                        Minimum reliable metric for 2D tables. (default: 0.5)
  --knn-min-3d-reliable-metric <float number>
                        Minimum reliable metric for 3D tables. (default: 5)
  --pre-xml-filename <filename>
                        Predicted XML definitions file name. (default: output.xml)
  --dump-txt            Write predicted data to text file. (default: False)
  --pre-txt-filename <filename>
                        Predicted dataframe text file name. (default: output.txt)
  --version             Print version number.


Also there's two additional scripts:

beautify_xml.py makes predicted ScoobyRom XML file more readable:
Code:
usage: beautify_xml.py [-h] [-i <filename>] [-o <filename>] [-F <symbol>] [--numeric-suffix] [--version]

Add suffixes "_A", "_B" etc or "_1", "_2" etc to same table names.

options:
  -h, --help            show this help message and exit
  -i <filename>, --input <filename>
                        Input filename (default: None)
  -o <filename>, --output <filename>
                        Output filename. stdout if not specified (default: None)
  -F <symbol>, --word-separator <symbol>
                        Word separator (default: _)
  --numeric-suffix      Suffix is numeric (default: False)
  --version             Print version number.

For better results please clean and rename tables first!


def_to_sr.py is IDA 6.8 (python 2.7) script to convert RomRaider definitions to ScoobyRom definitions. Run it inside IDA.

Happy hacking!

_________________
2Boost Subaru mod


Top
 Profile  
 
 Post subject: Re: UTILITY: ScoobyTables - tables markup utility
PostPosted: Tue Mar 26, 2024 1:29 pm 
Offline
Newbie
User avatar

Joined: Tue Apr 05, 2022 12:57 pm
Posts: 58
Cool stuff!

I did this manually to compare a 2013 vs 2014 Tribeca, I would be interested in trying out the script. I probably won't be able to get to it for a few weeks due to many other automoative side-quests at the moment.

https://www.romraider.com/forum/viewtopic.php?f=8&t=15179

_________________
2000 Subaru Impreza 2.5RS EJ251 5MT TY754
2005 Saab(aru) 9-2x Linear EJ253 5MT TY754
2014 Subaru Tribeca EZ36D 5EAT TG5D


Top
 Profile  
 
 Post subject: Re: UTILITY: ScoobyTables - tables markup utility
PostPosted: Wed Mar 27, 2024 10:48 am 
Offline
RomRaider Donator

Joined: Fri Aug 26, 2016 4:21 am
Posts: 153
I did not test with only one ROM for training, but it could work. Following could improve accuracy: do not test accuracy when training (omit '--test-accuracy' flag) and set number of neighbors to 1 when training (by specifying '--neighbors 1' flag).

_________________
2Boost Subaru mod


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Style based on FI Subsilver by phpBBservice.nl