How can I use docker without sudo?

Question

gatorback

Asked: 2019-08-15 09:23:34 +0800 CST2019-08-15 09:23:34 +0800 CST 2019-08-15 09:23:34 +0800 CST

CSV table creation from image

772

What are the Ubuntu options to create CSV tables from an image:

Ideally, options would be simple and quick. Image is from Yield Curve article

1 Answers

Voted

BeastOfCaerbannog · Answer 1 · 2019-08-18T05:48:36+08:00

I wrote a small Python script that can do what you want.

For the OCR part you would need tesseract. You can install it running:

sudo apt install tesseract-ocr

Then, run tesseract to create a txt file with the image-read data. I am naming this file tesseract_output (tesseract will add the .txt extension), you can name it as you wish.

tesseract /path/to/image /path/to/tesseract_output

Then copy and paste the following script and save it to a file ending with .py (for example script.py).

import csv

def split_list(l, n):
    """Split list l in size-n lists.
    Returns a list containing the size-n lists.
    """
    splitted = []

    for i in range(0, len(l), n):
        splitted.append(l[i:i + n])

    return splitted

######################   USER INPUT   ######################
input_file = '/path/to/tesseract_output.txt'
output_file = '/path/to/table.csv'
rows = 10
delimiter = ';'
############################################################

# read input file
with open(input_file, 'r') as f:
    data = f.readlines()

# remove trailing whitespace and newlines
data = list(map(lambda x: x.strip(), data))
# remove empty elements 
data = list(filter(None, data))
# split data to rows-sized lists
data = split_list(data, rows)
# shape data
data = list(map(list, zip(*data)))

# write to csv
with open(output_file, 'w', newline='\n') as f:
    wr = csv.writer(f, delimiter=delimiter)
    for i in range(len(data)):
        wr.writerow(data[i])

For the script to work, you have to enter the following in the USER INPUT section:

input_file: the complete path to the tesseract output.
output_file: the complete path to the final csv file.
rows: the number of table rows. In your example image it's 10.
delimiter: the delimiter to be used in the csv. Here I use ;. You can use any 1-character string you need.

Run the file:

python /path/to/your/script.py

You should now have a csv with the following contents:

Date;1Mo;2Mo;3Mo;6 Mo;4Yr;2Yr;3Yr;5Yr;TYr;10 Yr;20Yr;30 Yr
08/01/19;2.14;214;2.07;2.04;1.88;1.73;1.67;1.68;177;1.90;2.21;2.44
08/02/19;2.44;2.12;2.08;2.02;1.85;1.72;1.67;1.66;1.75;1.86;2.16;2.39
08/05/19;2.07;2.08;2.05;1.99;1.78;1.59;1.55;1.55;1.63;1.75;2.07;2.30
08/06/19;2.05;2.08;2.05;2.00;1.80;1.60;1.54;1.53;1.62;1.73;2.03;2.25
08/07/19;2.02;2.04;2.02;1.95;1.75;1.59;1.51;1.52;1.60;171;2.01;2.22
08/08/19;2.09;2.07;2.02;1.96;1.79;1.62;1.54;1.54;1.62;1.72;2.02;2.25
08/09/19;2.05;2.06;2.00;1.95;1.78;1.63;1.58;1.57;1.65;174;2.03;2.26
08/12/19;2.09;2.06;2.00;1.94;1.75;1.58;1.51;1.49;1.56;1.65;1.92;2.14
08/13/19;2.05;2.04;2.00;1.96;1.86;1.66;1.60;1.57;1.62;1.68;1.94;2.15

CAUTION

As you can see, the result is satisfactory, but depends on tesseract's output. It is almost certain that tesseract won't detect everything correctly, as you can easily see in the csv output. You will have to compare the results with the original image and fix them manually, either in the tesseract output or in the csv output at the end.

Also, in the script I am taking care of trailing whitespace and redundant newlines that tesseract spits out, which works fine for your example image. However, if a table cell was empty, it would be completely removed, effectively destroying the whole table structure. In this case, if I were you, I would edit the tesseract_output.txt file and manually change the empty cells to containing a -, so it wouldn't get deleted.

CSV table creation from image

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?