LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-22-2021, 11:04 AM   #1
hz36t5
LQ Newbie
 
Registered: Apr 2014
Posts: 7

Rep: Reputation: Disabled
Question AWS S3 Selectively copy files based on timestamp


I have struggled wit this for time time now.
My goal is to do the following:

Copy files that are newer than the indicated date in the CLI script from my aws bucket to my local linux server.

aws s3 ls --recursive s3://my bucket/ | awk '$1 > "2021-03-18 00:00:00" {print $0}' | aws s3 cp --recursive my bucket/{} localfolder

I keep receiving: [Errno 32] Broken pipe

Any ideas?
 
Old 03-22-2021, 11:29 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,792

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
missing xargs:
Code:
aws ... | awk ... | xargs aws ....
probably helps
 
Old 03-22-2021, 11:30 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,294
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
I'm not familiar with AWS but just as a guess you would need xargs for a pipe like that:

Code:
aws s3 ls --recursive s3://my bucket/ \
| awk '$1 > "2021-03-18 00:00:00" {print $0}' \
| xargs -I{} aws s3 cp --recursive "my bucket/{}" localfolder
What is preventing use of find and either the -newer or -newermt option?
 
Old 03-22-2021, 11:34 AM   #4
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,342

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by hz36t5 View Post
I have struggled wit this for time time now.
My goal is to do the following:

Copy files that are newer than the indicated date in the CLI script from my aws bucket to my local linux server.

aws s3 ls --recursive s3://my bucket/ | awk '$1 > "2021-03-18 00:00:00" {print $0}' | aws s3 cp --recursive my bucket/{} localfolder

I keep receiving: [Errno 32] Broken pipe

Any ideas?
IIRC find will easily locate files based on timestamp with one of the tests options and the list can then be piped to cp. Have not used that for a while so the man page can help better than I can with syntax.
 
Old 03-22-2021, 12:06 PM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,792

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
Quote:
Originally Posted by Turbocapitalist View Post
What is preventing use of find and either the -newer or -newermt option?
Quote:
Originally Posted by computersavvy View Post
IIRC find will easily locate files based on timestamp with one of the tests options and the list can then be piped to cp. Have not used that for a while so the man page can help better than I can with syntax.
Sorry guys, I think aws is a remote service, so there is no direct access to that filesystem. That's why OP wanted first execute an [aws] ls, filter the result and finally [aws] cp the required files.
From the other hand I wanted to mention, the date comparison is probably not that simple, but you know if that works for you.
 
Old 03-22-2021, 02:14 PM   #6
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
The awk expression is wrong. What you probably want is
Code:
awk '$1>="2021-03-18"{print$NF}'
Instead of awk, here I'd rather filter the aws output with JMESPath using the --query option:
Code:
aws s3api list-objects-v2 --bucket 'my bucket' --query 'Contents[?LastModified>=`2021-03-18`].Key' --output text --no-paginate
Not sure this filter will work though. JMESPath spec states
Quote:
Ordering operators >, >=, <, <= are only valid for numbers.
OTOH, AWS CLI User Guide includes an example similar to above. Also see jmespath.py issues #124, #126 and jmespath.rb issues #47, #49.

What certainly will work in a JMESPath query is sorting by date:
Code:
aws s3api list-objects-v2 --bucket 'my bucket' --query 'sort_by(Contents, &LastModified)[].Key' --output text --no-paginate
At least, jq should always work as expected
Code:
aws s3api list-objects-v2 --bucket 'my bucket' --output json|
jq '.Contents[]|select(.LastModified>="2021-03-18").Key'

Last edited by shruggy; 03-22-2021 at 05:25 PM.
 
1 members found this post helpful.
Old 03-22-2021, 02:28 PM   #7
hz36t5
LQ Newbie
 
Registered: Apr 2014
Posts: 7

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
I'm not familiar with AWS but just as a guess you would need xargs for a pipe like that:

Code:
aws s3 ls --recursive s3://my bucket/ \
| awk '$1 > "2021-03-18 00:00:00" {print $0}' \
| xargs -I{} aws s3 cp --recursive "my bucket/{}" localfolder
What is preventing use of find and either the -newer or -newermt option?
Nothing. I didnt pursue that as an option. Would it be a more simple solution?

I am now running the following and not I get this message over and over until I abort:
fatal error: An error occurred (InvalidToken) when calling the ListObjectsV2 operation: The provided token is malformed or otherwise invalid

aws s3 ls --recursive s3://"my bucket" | awk '$1>="2021-03-01"{print$NF}'| xargs -I{} aws s3 cp --recursive "s3:"my bucket/{}" localfolder
 
Old 03-23-2021, 09:11 AM   #8
hz36t5
LQ Newbie
 
Registered: Apr 2014
Posts: 7

Original Poster
Rep: Reputation: Disabled
So I now run the script:

aws s3 ls --recursive s3://bucket name \
| awk '$1 >"2021-03-22 00:00:00" {print $0}' \
| xargs -I {} aws s3 cp --recursive "s3://bucket name/{}" folder

But no files get copied to the location specified (folder)

If I run the first two parts I do get the expected results:
2021-03-23 00:35:33 104742 source/7700ABJIBA442720210811934.20210323003442.txt
2021-03-23 00:35:37 33 source/7700ABJIBA442720210811934.md5.20210323003443.txt

What am I missing?

Last edited by hz36t5; 03-23-2021 at 09:16 AM.
 
Old 03-23-2021, 09:18 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,792

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
I guess the filename itself is $2 (or $3 ?), not $0, so you need to print $2 in your awk.
Probably better to print $NF, which is the last field.

Last edited by pan64; 03-23-2021 at 09:20 AM.
 
1 members found this post helpful.
Old 03-23-2021, 09:21 AM   #10
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
No, I guess it's $3 because $1 is date and $2 is time. That means $1>"2021-03-22 00:00:00" won't work, too. See my post #6 above.

Last edited by shruggy; 03-23-2021 at 09:22 AM.
 
Old 03-23-2021, 11:23 PM   #11
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,367

Rep: Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748
I am new to AWS, but it seems that the AWS tools are deficient in handling date/time stamps.
It is possible to use awk to convert the text date and time fields to a numeric time stamp. Perhaps there are some ideas here
 
Old 03-23-2021, 11:48 PM   #12
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,800

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
What is preventing use of find and either the -newer or -newermt option?
Quote:
Nothing. I didnt pursue that as an option. Would it be a more simple solution?
I'm unsure of the AWS command syntax you'll need, but on generic Linux, you could do something like:
Code:
$ SRCDIR=your-source-location
$ TGTDIR=your-target-location
$ MARKER=some.file
$ cd ${TGTDIR}
$ touch -d "2021-03-22 00:00:00" ${MARKER}
$ find ${SRCDIR} -type f -newer ${MARKER} -exec cp -p {} . \;
$ rm ${MARKER}
Tweak as needed to work with your AWS buckets

HTH...
 
  


Reply

Tags
aws



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Amazon spins Ubuntu-driven AWS DeepLens cam and an AWS-savvy Amazon FreeRTOS LXer Syndicated Linux News 0 12-04-2017 03:51 PM
LXer: List of 71 AWS services and their use! AWS cloud terminology! LXer Syndicated Linux News 0 03-17-2017 02:35 PM
script to timestamp files with timestamp from directory eRJe Programming 4 11-13-2013 06:52 PM
find, grep and copy based on timestamp rootaccess Linux - General 14 03-20-2013 09:09 PM
selectively tar based on date just the innermost directories of a single directory rsiazon Linux - Newbie 5 07-29-2009 12:15 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration