Tool for collecting words

6 posts / 0 new
Last post
jap
jap's picture
Offline
Last seen: 2 months 1 week ago
Joined: 2011-06-04 05:53
Tool for collecting words

Hello,

Does anyone know if there is a tool for collecting words from text files? What I need to do is to collect all unique words from a large text file and save them as a list: one word for each line.

jaokim
jaokim's picture
Offline
Last seen: 2 years 1 month ago
Joined: 2011-06-02 15:27
Yopu can use a combination of

Yopu can use a combination of sed, sort and uniq: Something like:

  1. sed "s/ /\n/g" /RAM/FileToSort.txt | SDK:local/C/sort | uniq

Sed Substitues each " " with an "\n", and the g tells sed to perform the substitues for all hits, not just the first -- s/THIS/THAT/g. Sort sorts it all. Uniq find all the unique entries.

There is an Amiga-command "sort" clashing with the SDK sort, that's why you'd need to issue the full path. I'm assuming you have OS4 and the SDK.

kas1e
kas1e's picture
Offline
Last seen: 1 year 6 months ago
Joined: 2010-11-30 15:30
As no one answer to you (i

As no one answer to you (i think it because of the question itself, solution of which are easy, and have a lot of ways, and google can help with most of them), so to avoid silence in the forum, there is detailed answer:

1. any scripting language (python, perl, arexx all in all) in the few strings, like:

Perl

  1. aos4shell:> perl -nle "$w{$_}++ for grep /\w/, map { s/[\. ,]*$//g; lc($_) } split; sub END { printf("%s\n", $w) while (($w) = each(%w)) }" < input_file.txt >words.txt

You can of course make it as script.pl, and write it not in single line, but just as small script, like:

  1. #!/usr/bin/perl
  2.  
  3. user strict;
  4.  
  5. my %count = ();
  6. while (read(STDIN, $_, 4095) and $_ .= <STDIN>) {
  7. tr/A-Za-z/ /cs;
  8. ++$count{$_} foreach split(' ', lc $_);
  9. }
  10.  
  11. my @lines = ();
  12. my ($w, $c);
  13. push(@lines, sprintf("%7d\t%s\n", $c, $w)) while (($w, $c) = each(%count));
  14. print sort { $b cmp $a } @lines;

Running line will be:

  1. aos4shell:> perl sort.pl < input.txt >output.txt

2. Simply script on bash with usage of unix's "sort" command line tool, like:

  1. #!/usr/bin/bash
  2. echo Enter the filename
  3. read fn
  4.  
  5. for WORD in $(cat $fn)
  6. do
  7. echo "$WORD "
  8. done | sdk:local/c/sort -u

Running will be just something like:

  1. aos4shell:> sh sort.sh

And then type file name with words which need to parse. Or just do redirecting like sh sort.sh >ready, and type filename as well. Btw, to be noted, don't mess aos4 version of "sort" binary which are in the system:c/ , with unix "sort" binary, which are placed in the sdk:local/c/.

3. Unix command line programms which aos4 have, like: sed, awk, gawk

Example on awk combined with sort:

  1. aos4shell:> awk "{c=split($0, s); for(n=1; n<=c; ++n) print s[n] }" <file.txt | sdk:local/c/sort -u

4. As well as you can write the same on C or anything else.

ps. Blah, 2 days of silence and when i write post i realise that jaokim also answer in the same time :) But maybe my answer also can be helpfull.

jap
jap's picture
Offline
Last seen: 2 months 1 week ago
Joined: 2011-06-04 05:53
I managed to create a list.

I managed to create a list. Thanks jaokim and kas1e!

JosDuchIt
JosDuchIt's picture
Offline
Last seen: 7 years 5 months ago
Joined: 2012-01-08 20:57
You could do it easily in

You could do it easily in Gui4Cli too. (using any of the suggested tools or using xlistview events and the gui4Cli command set)
You then would have just as easily have a gui that you can expand upon. Working bottom up you can integrate new ideas and have a working gui all the time.

I am interested in the possibility to collect words too. As a basis for document search, selecting the most usefull tag words, spellchecking & translation, a lot of applications can use this.

Integrated in the OS, why not, as a background something working on the set of text files(directories) you indicate. Maybe an application for X1000's second processor?

JosDuchIt
JosDuchIt's picture
Offline
Last seen: 7 years 5 months ago
Joined: 2012-01-08 20:57
I just added a tool written

I just added a tool written today in Gui4Cli

http://www.os4coding.net/source/211

Log in or register to post comments