Jump to content
Zenity

Let's talk about file transfers

Recommended Posts

Zenity    0

Heya!

 

I'm loving Plastic so far and I'm really excited about its potential to disrupt the market for small distributed teams working on big ass projects, which is becoming really common with Unreal Engine 4. I consult a few clients and so far everybody has been happy to jump onto the Plastic bandwagon since the alternatives are really not all that great.

 

There is just one thing that worries me a lot, so I'd like to find out if my concerns are justified and if so, whether this is something that could be improved upon soon.

 

An important consideration when working with distributed teams online is efficient file transfers. I was excited to find that Plastic seems to do a lot to optimise this by sending files in bulk and showing useful progress indicators (sadly that's not completely common among other VCS...). But aside from that there seem to be no features to make large file transfers more bearable for remote developers, unless I am just missing it.

 

When it comes to file transfer usability, I think there are three major levels:

 

1) A file transfer has to go through in one go, if it's interrupted you have to start from scratch.

 

2) The ability to resume aborted file transfers.

 

3) A system that analyses local files and re-uses existing blobs of data whenever possible.

 

Now 3) would be an absolute killer feature for this kind of system, because a very common situation is that you have to clone a huge repository containing files you already have on disk. Seeing the entire thing being downloaded from scratch is just painful. I don't know if this would be technically possible, but since data is already bundled, why shouldn't it? Dropbox, Steam or Backblaze would be examples of tools using such a system. 

 

But anyway, that's the wishful thinking part. The concern part is that 2) does not seem to be supported either. Whenever I had to cancel a replication or large checkin/checkout so far, it seems that it started from scratch on the next try.

 

If this is true, this is a huge issue because sometimes projects get so big, that it actually becomes difficult for remote developers to transfer it all in one go (and it's never pleasant to begin with). If there is a way to make this work, please let me know. If not, please let me know if this situation could be improved soon. :) Other VCS are notoriously bad at communicating how they handle resumed file transfers, but both Subversion and Git LFS seem to have at least basic capabilities not to download everything from scratch in case of a failure.

 

Next, there's a little UI issue: When I start a long file transfer, I cannot use the Plastic GUI for any other work. Oddly I can simply open a second GUI and use this one instead, but it would be fantastic if this could be handled a bit more asynchronously (even if it has to block actions on this particular repository, it should allow me to use others at least).

 

And last (for now :)) a slightly related issue: Since local repositories are created in the installation folder by default, this can lead to the system drive blowing up quite unexpectedly. When you have a bunch of huge repositories and your system drive on a small SSD, that is a big problem. A solution of course is to install Plastic on a different drive, but this is hard to foresee before installing and even then it doesn't appear to be the most elegant solution. If there is a possibility to change the location of the local (sqlite) repositories I haven't found it, so it would be great if this could be made easily available from the GUI somehow.

 

Share this post


Link to post
Share on other sites
Zenity    0

Case in point... today I tried to replicate a repo from a cloud that isn't very close to me (we tried at first on my cloud near me which however was way too slow for my client). I had cancelled the attempt earlier already because it took too long, so I went to the coworking space early and had the download running for the whole day. It looked like I was barely making it for most of the time. Then it suddenly slowed down extremely at 99% with no further input. Finally it switched to 100% and I thought I could go home, but now it's sitting there at "checking in" while my food at home is getting cold and I have no idea if this will even finish in time or whether it is save to cancel now.

 

This is depressing, I haven't experienced anything this excessively slow before. The repo should be about 20-30 GB and it's a clean import. I have a good internet connection here but the connection to the data center was abysmal. The laptop got a high end desktop CPU and m2 SSDs. If this is already taking a full day, this just doesn't seem workable without partial downloads.

 

Please tell me that there is a better way of doing things (or in the works). 

 

One thing I tried earlier was to replicate from one cloud to another, but it didn't let me. Did I just do something wrong is that not supported yet? Because right now that is the only idea I have to make this workable, although it's kind of too late for that by now. The most frustrating part of the whole experience was the lack of clear indications what it was doing and how much longer it would take.

Share this post


Link to post
Share on other sites
psantosl    31
> I'm loving Plastic so far and I'm really excited about

> its potential to disrupt the market for small distributed

> teams working on big ass projects, which is becoming really

> common with Unreal Engine 4.

 

Thanks! It is really rewarding to hear it.

 

> An important consideration when working with distributed

> teams online is efficient file transfers. I was excited

> to find that Plastic seems to do a lot to optimise this

> by sending files in bulk and showing useful progress

> indicators (sadly that's not completely common among

> other VCS...).

 

Well, yes, we improved with time, we used to be really bad with progress long ago :)

 

> But aside from that there seem to be no features to make

> large file transfers more bearable for remote developers,

> unless I am just missing it.

 

Well, there's an additional feature you should check, although it won't be useful in every scenario: UDT based data transfer. It works only on windows-windows scenarios, but it is really good for high-bandwidth and high-latency networks.

 


 

Additionally, we also support proxy servers, which in some scenarios are good to save downloading data from central.

 

> When it comes to file transfer usability, I think

> there are three major levels:

> 1) A file transfer has to go through in one go, if it's

>    interrupted you have to start from scratch.

> 2) The ability to resume aborted file transfers.

> 3) A system that analyses local files and re-uses

>    existing blobs of data whenever possible.

 

 

All very good points.

 

We need to think more about resuming: we implemented some retry mechanism during replica and checkin, you can be connected to LAN while doing a big checkin, disconnect, and it continues on the wifi seamlessly (it reconnects and retries and so on).

 

This works for regular servers only, though, not Cloud.

 

But, there's one thing about "restarting":

 

* Suppose you are doing a big checkin, introducing 1000 new files and 1GB.

* Ok, then the transfer gets interrupted after 500MB.

* The regular retry would let you restart immediately, but if you really need like, let's say, 10 minutes to retry, then we need to abort the transaction on the server side, otherwise we could end up storing inconsistent stuff and so on. See what I mean? We do not only transfer data, we deal with tons of metadata, and we need to preserve consistency.

 

 

> Now 3) would be an absolute killer feature for this

> kind of system, because a very common situation is

> that you have to clone a huge repository containing

> files you already have on disk. Seeing the entire

> thing being downloaded from scratch is just painful.

> I don't know if this would be technically possible,

> but since data is already bundled, why shouldn't

> it? Dropbox, Steam or Backblaze would be examples

> of tools using such a system. 

 

Yes, it is doable, in fact it is on our roadmap.

 

Now, let me distinguish between replica and update.

 

When you update (equivalent of "checkout" on git and svn - simply - download stuff to your workspace from a server) we DO reuse content. Sample:

 

* You have a workspace with 80k files, totalling 32GB.

* You copy it to a new folder (you do NOT copy the .plastic hidden folder so we make sure it is not a workspace, just an empty one).

* You create a new workspace on it, and run update.

* Update will try to reuse every single file on disk. It will take some time, but much less than re-downloading, beacause it has to "rehash" the workspace contents.

** I mean: you have big-foo.3ds on disk, and you need to download it: it will try to avoid downloading at all costs, checking if the hash of the local file matches the remote.

** I use this often to avoid redownloading stuff when re-creating workspaces.

 

We do this on a entire-file basis, not on chunks, which is something we'd like to do in the coming future.

 

> But anyway, that's the wishful thinking part. The concern part

> is that 2) does not seem to be supported either. Whenever I had

> to cancel a replication or large checkin/checkout so far,

> it seems that it started from scratch on the next try.

 

Yes, if you CANCEL, then we restart, because we have to get rid of the previous data for consistency.

 

What would be your scenario? I mean, obviously your "cancel" is not "throw away this stuff, I don't want to checkin after all", it is something else... is it some sort of "pause" maybe?? More info will be appreciated.

 

> If this is true, this is a huge issue because sometimes

> projects get so big, that it actually becomes difficult

> for remote developers to transfer it all in one go

> (and it's never pleasant to begin with). If there is

> a way to make this work, please let me know. If not,

> please let me know if this situation could be improved

> soon. :) Other VCS are notoriously bad at communicating

> how they handle resumed file transfers, but both

> Subversion and Git LFS seem to have at least basic

> capabilities not to download everything from scratch

> in case of a failure.

 

In update you are covered, in replica you are not. Not sure what your scenario is.

 

Remember, though, that you can do "partial replica" unlike in Git. You can replicate only a few branches, making replica much lighter.

 

Also, for artists, remember we have Gluon, where you can easily say you only want to download certain parts of the repo.

 

> Next, there's a little UI issue: When I start a long

> file transfer, I cannot use the Plastic GUI for any

> other work. Oddly I can simply open a second GUI and

> use this one instead, but it would be fantastic if

> this could be handled a bit more asynchronously (even

> if it has to block actions on this particular repository,

> it should allow me to use others at least).

 

Absolutely. In fact, this is how the Mac and Linux GUIs work already... it will be coming to Windows... soon! :-)

 

We were experimenting new stuff and concepts on Linux/Mac, that's why.

 

> If there is a possibility to change the location of

> the local (sqlite) repositories

 

Sure: just edit your db.conf file and check how mine looks like:

 

<DbConfig>

  <ProviderName>sqlite</ProviderName>

  <ConnectionString>Data Source={0};Synchronous=FULL;Journal Mode=WAL;Pooling=true</ConnectionString>

  <DatabasePath>c:\users\pablo\plastic\server\databases</DatabasePath>

</DbConfig>

 

"DatabasePath" is what you are looking for :-)

Share this post


Link to post
Share on other sites
psantosl    31
Hi again,

 

I answered to the first entry, onto the second one now:

 

> Case in point... today I tried to replicate

> a repo from a cloud that isn't very close

> to me (we tried at first on my cloud near

> me which however was way too slow for my

> client). I had cancelled the attempt earlier

> already because it took too long, so I went

> to the coworking space early and had the

> download running for the whole day. It

> looked like I was barely making it for most

> of the time. Then it suddenly slowed down

> extremely at 99% with no further input.

 

Ok, if there's no network issues there, we should take a look.

 

How big the repo was?

 

Do you know that you can do a small trick: like create a remote branch, and only replicate this branch. It will bring an entire working copy, but won't bring the entire history, saving time. You can later replicate more branches.

 

 

> Finally it switched to 100% and I thought I

> could go home, but now it's sitting there at

> "checking in" while my food at home is getting

> cold and I have no idea if this will even

> finish in time or whether it is save to cancel now.

 

Ouch! We can take a deeper look into it, having someone from support connecting to you to double check.

 

> This is depressing, I haven't experienced anything

> this excessively slow before. The repo should be

> about 20-30 GB and it's a clean import. I have a

> good internet connection here but the connection

> to the data center was abysmal. The laptop got a

> high end desktop CPU and m2 SSDs. If this is

> already taking a full day, this just doesn't seem

> workable without partial downloads.

 

This is definitely not the expected behavior.

 

Look, when working on Cloud we do this: we directly transfer data (blobs) from your machine to the Azure blob storage, without intermediate cloud servers, which should be the really faster way of doing it.

 

I will double check and come back to you.

 

We will contact you privately to ask you the specifics of your datacenter.

Share this post


Link to post
Share on other sites
Zenity    0

Hi, thanks for the detailed response. To be clear, I am only talking about the simplest possible setup of a single cloud repository with distributed developers using a local sqlite database each (or Gluon). I believe that this is the most interesting setup for small development teams working on large Unreal projects, as it bridges the gap between the simplicity of something like Dropbox and enterprise-level version control setups.

 

The repository was freshly set up and only had a single commit in a single branch, which is why it was particularly surprising that it was so slow. When the progress indicator switched to 99%, notably the download rate went down a lot (to about a third of what it was before, consistently) where it sat for about half an hour or so. I just assumed that it was doing something special near the end it did not expect to take long enough to dedicate a section for it in the progress wheel. Perhaps this was just a freak coincidence which caused the excessively slow download time during the last one percent, but that seems a bit odd.

 

Edit: The size of the repository is about 23 GB. Had it been any larger, I would have been screwed. It already took skipping sleep to get to the office as early as possible and staying until late at night, which is why my tone was perhaps a little grumpy last night. :)

 

The checkin finished eventually, it was just frustrating that it wasn't accounted for in the progress wheel (screwing up my plans to go home and eat since I expected it to be done at 100% :)). When I observed the initial checkin on the other person's computer (over TeamViewer), I already saw how long it takes so I had a vague idea, and since that showed detailed progress, it made it doubly frustrating that I wasn't getting any indication now.

 

The use case I had for which I had to cancel the download before was simply that I had to shut down the laptop to move between locations. With world wide distributed teams this can easily happen, and not all places have good or reliable internet connections. My client before had similar issues where he tried to checkin the project to the cloud server in Singapore (where he got a really slow connection, despite his massive broadband) but eventually his computer had to restart for updates. There are a number of reasons why it can be difficult for regular users to complete a really long operation.

 

In the end it comes down to two specific use cases:

 

1) Like you suggested exactly right, the ability to pause an operation. Especially with the ability to shut down the application and still resume later on. This would cover a lot of cases already.

 

2) The ability to resume after a failure, like a power outage or network failure. Your response suggests that this already works in some situations but not in others (solid support for this with the cloud would be particularly important). Communicating this clearly in the UI and/or the basic user documentation is quite important as well IMO, so that the regular user can make informed decisions in difficult circumstances (like whether it's worth to even attempt an operation when the network is likely to be interrupted, or the computer has to be shut down before it can finish).

 

Also thanks for letting me know about changing the database location. It would be great if you could add this as an option to the GUI, for the less technical users. Even if they should be using Gluon to begin with (for some reason my client wasn't able to make the initial checkin with Gluon, I'm still not sure what the issue was exactly but he was in contact with your support and in the end he was using the standard GUI). Speaking of which, do you have an official name for the "standard GUI"? :)

 

One scenario I found myself in pretty frequently lately is that a small project is run (read bankrolled) by a very non-technical person who sets up the repository primarily for the benefit of hired developers, but also would like to keep their own copy updated and at most will occasionally check in some new content (like an asset pack bought from the Unreal marketplace). Gluon goes a long way in making those use cases more accessible, but there are still some complications like having to configure the repository to update all files which make me wonder if this couldn't be simplified even more. Some random ideas which come to mind would be a Dropbox-style read-only file sync from the main branch on the cloud server or a web interface to download and upload files directly (like Perforce is doing with their Helix Cloud service).

 

I understand that this is all pretty unusual as far as the usual audience for SCM systems go, but this is exactly why I am so excited about Plastic. There is a big market gap which seems ripe for the taking. I'd love to be able to sum up Plastic to clients as "the Dropbox of version control", and it's really not far off! Meanwhile Perforce seems to be playing catchup with adding DVCS capabilities and working on a cloud service, but with their enterprise-focus and lack of agility, it still seems to me that Plastic is in the perfect position to disrupt that market.

Share this post


Link to post
Share on other sites
ironbelly    1

I'll chime in here and throw my vote behind a resume feature.  I have been trying to commit 63GB of data over the last 2 days without avail and really wishing Plastic would be able to resume interrupted transfers right now.  To be clear, I'm not splitting this commit up into 2 smaller commits however the reality is still that with remotely distributed teams where members are 4000+ km away from the server you are going to be seeing transfer speeds of 10-30Mbps and when it comes to transferring such large quantities of data at those speeds over those distances things are going to take a lot longer and the chances of being interrupted are going to be a lot higher.   

Something else that I've brought up in multi-threaded uploads.  We currently have the ability to increase download threads which is awesome, but if we could get off this single TCP thread for uploads, at the very least most people would be able to cut their transfer times down by a few orders of magnitude which could be part of the solution for what we're talking about here.  Other than that, taking a page out of the FTP handbook and allowing the resumption of interrupted transfers would be glorious

Share this post


Link to post
Share on other sites
ironbelly    1

Another option would be not locking the entire folder tree/workspace when committing and allow us to submit multiple commits simultaneously by opening multiple plastic clients.  So if I have 1000 files I open one client, select files 1-500 and start committing those.. Then I open a second client, select files 501-1000 and start submitting those at the same time 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×