LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Select files on S3 since a certain date in bash (https://www.linuxquestions.org/questions/linux-newbie-8/select-files-on-s3-since-a-certain-date-in-bash-4175558025/)

eulaersi 11-05-2015 01:40 AM

Select files on S3 since a certain date in bash
 
I would like to select all files on a S3 folder that have been created since a certain date. I can do that by:

Code:

    aws s3 ls --recursive s3://my-s3-folder/ | awk '$1 > "2015-11-03 15:46:37" {print $0}' | sort -n
I'm trying to get the same result in a Bash script by using

Code:

    function select_s3_files() {
      prev_run_date="2015-11-03 15:46:37"
      $AWS_BIN s3 ls --recursive $S3_FOLDER | awk ''$1' > "$prev_run_date" {print '$0'}' | sort -n
    }

But I got the following error:

Code:

    awk:  > $prev_run_date {print ./cp-s3-folder.sh}
    awk:  ^ syntax error
    [Errno 32] Broken pipe


translator1111 11-05-2015 04:23 AM

Quote:

aws s3 ls --recursive s3://my-s3-folder/ | awk '$1 > "2015-11-03 15:46:37" {print $0}' | sort -n
Quote:

$AWS_BIN s3 ls --recursive $S3_FOLDER | awk ''$1' > "$prev_run_date" {print '$0'}' | sort -n
Dear eulaersi
did you notice that the code is not the same?,
the quote
Code:

'
is in a different place. Try this:
Code:

      $AWS_BIN s3 ls --recursive $S3_FOLDER | awk  '$1 > "$prev_run_date" {print '$0'}' | sort -n
Faithfully,
M.

eulaersi 11-05-2015 08:36 AM

Your modification doesn't throw any error, but it doesn't filter the output files. It shows all files and not only the files that have been modified since that date.

When I echo the command

Code:

echo "$AWS_BIN s3 ls --recursive $S3_FOLDER | awk '$1 > "$prev_run_date" {print '$4'}' | sort -n"
I get

Code:

/usr/local/bin/aws s3 ls --recursive s3://my-folder/archive/ | awk ' > 2015-11-03 15:46:37 {print ''}' | sort -n
The $1 and $4 are disappeared in the echo.

allend 11-05-2015 08:53 AM

I like using find in conjunction with a temporary file to do this.
Code:

touch -t 201511031546.37 /tmp/last
find $S3_FOLDER -newer /tmp/last
rm /tmp/last


eulaersi 11-05-2015 09:28 AM

I cannot use find in aws S3. The find command is not supported.

It's working perfect in command line, but I'm struggling to get it to work in a bash script with the different $1 and $4 variables and all the different ' " quotes.

syg00 11-05-2015 04:23 PM

You can't (easily) use bash variables like that in awk - pass them in as assigned awk variables using "-v"
Code:

awk -v prev="$prev_run_date" '$1 > prev {print $4}'
Not that I think date checks using space separated fields like that will work as you want ...

Habitual 11-05-2015 05:21 PM

you could mount the bucket using sshfs and then use find on that mount.

eulaersi 11-06-2015 08:41 AM

[SOLVED]
Thanks for the replies. I've solved it by

Code:

function select_s3_files() {
  prev_run_date="2015-11-05 23:20:34"
  echo "Previous run date: $prev_run_date"
  $AWS_BIN s3 ls --recursive $S3_FOLDER | awk -v prev="$prev_run_date" '$0 > prev {print $0}' | sort -n
}


Habitual 11-06-2015 10:12 AM

Good job and well done.
Glad it worked out!

eulaersi 03-11-2016 01:23 AM

Quote:

Originally Posted by eulaersi (Post 5445573)
[SOLVED]
Thanks for the replies. I've solved it by

Code:

function select_s3_files() {
  prev_run_date="2015-11-05 23:20:34"
  echo "Previous run date: $prev_run_date"
  $AWS_BIN s3 ls --recursive $S3_FOLDER | awk -v prev="$prev_run_date" '$0 > prev {print $0}' | sort -n
}


Apparently, this line doesn't work if there are spaces in the filenames. Can you help me editing this line of code?

syg00 03-11-2016 01:50 AM

Hmmm - I did warn about that some months ago.


All times are GMT -5. The time now is 12:47 AM.