Ahmad

Asked: 2020-07-02 08:41:17 +0800 CST2020-07-02 08:41:17 +0800 CST 2020-07-02 08:41:17 +0800 CST

Copy from Google Chrome doesn't paste all contents

In ubuntu 20.04 I try to copy a block of Google Colabratory opened in Chrome:

This was copied from Firefox, which is the complete text:

import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, quotemark, full_char_coverage_langs
from typing import Collection

def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
    vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
    char_coverage=None, tmp_dir='tmp', enc='utf8'):
    "Train a sentencepiece tokenizer on `texts` and save it in `path/tmp_dir`"
    from sentencepiece import SentencePieceTrainer
    cache_dir = Path(path)/tmp_dir
    cache_dir = Path("data/wiki/fa-2/models/fsp15k")
    quotemark = ''
    print("It's me:", cache_dir)
    os.makedirs(cache_dir, exist_ok=True)
    if vocab_sz is None: vocab_sz=get_default_size(texts, max_vocab_sz)
    raw_text_path = cache_dir / 'all_text.out'
    with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
    spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
    SentencePieceTrainer.Train(" ".join([
        f"--input={quotemark}{raw_text_path}{quotemark} --max_sentence_length={max_sentence_len}",
        f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
        f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
        f"--user_defined_symbols={','.join(spec_tokens)}",
        f"--model_prefix={cache_dir/'spm'} --hard_vocab_limit=false --vocab_size={vocab_sz} --model_type={model_type}"]))
    raw_text_path.unlink()
    return cache_dir
        
fastai.text.data.train_sentencepiece = train_sentencepiece

It's copied from Chrome:

import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, quotemark, full_char_coverage_langs
from typing import Collection

def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
    vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
    char_coverage=None, tmp_dir='tmp', enc='utf8'):
    "Train a sentencepiece tokenizer on `texts…",
        f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
        f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
        f"--user_defined_symbols={','.join(spec_tokens)}",
        f"--model_prefix={cache_dir/'spm'} --hard_vocab_limit=false --vocab_size={vocab_sz} --model_type={model_type}"]))
    raw_text_path.unlink()
    return cache_dir
        
fastai.text.data.train_sentencepiece = train_sentencepiece

As you see, when I copy the block from Chrome and then paste it here, parts of the text is absent!!

Copy from Google Chrome doesn't paste all contents

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

Copy from Google Chrome doesn't paste all contents

0 Answers