Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More spalloc rest calls #1222

Merged
merged 90 commits into from
Mar 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
48624f0
Add a way to read data from and write data to a job
rowleya Feb 10, 2025
8316d1c
Try raw bytes
rowleya Feb 10, 2025
a1bb2ee
Update read return to bytes
rowleya Feb 10, 2025
468cd52
Get the bytes from a non-array-backed buffer
rowleya Feb 10, 2025
f8610ed
Fix docs
rowleya Feb 11, 2025
6d1c407
Add a custom transceiver for Spalloc
rowleya Feb 12, 2025
6b5d398
Update the downloader to move the connection creation a bit
rowleya Feb 12, 2025
795a29b
Attempt to add other functions (started)
rowleya Feb 13, 2025
71f4bfb
Put these back as not needed here
rowleya Feb 14, 2025
3e1e5c3
Put this in the right place and fill in the gaps
rowleya Feb 14, 2025
2df79f2
Add some logs for testing
rowleya Feb 17, 2025
f3b3bcf
Use CSFR Token and make sure connection actually happens!
rowleya Feb 18, 2025
6354de7
Move thing around a bit to make using in spalloc server easier
rowleya Feb 19, 2025
affe8e2
Handle bytebuffer correctly
rowleya Feb 20, 2025
0e65b32
Reset the buffer before returning
rowleya Feb 20, 2025
82d4bb5
Do more via the server
rowleya Feb 20, 2025
67b6b93
Client and server side of Fast Data Write
rowleya Feb 24, 2025
b01f76d
Fix file
rowleya Feb 25, 2025
22cfc2c
Sync access
rowleya Feb 25, 2025
95b8980
Add connection to try to autoclose it
rowleya Feb 25, 2025
b05a0e6
Just go via proxy for fast data; don't try to go around
rowleya Feb 25, 2025
fce568f
Fix tabs
rowleya Feb 25, 2025
df32858
Style fixes
rowleya Feb 25, 2025
e1260b3
More style fixes
rowleya Feb 25, 2025
13ce069
Correct suppression of checkstyle here
rowleya Feb 25, 2025
15bbbb7
First attempt at moving download protocol into Transceiver (not quite
rowleya Feb 26, 2025
8e83126
No need for this to set the IP tag now
rowleya Feb 27, 2025
9cd816d
First is set by anding, not mod
rowleya Feb 27, 2025
2278b9f
Can be useful to see these messages
rowleya Feb 27, 2025
50618d4
Add Header
rowleya Feb 27, 2025
927c8fd
Update the server to make it able to do fast data reads
rowleya Feb 27, 2025
7310c1a
Fix style issues
rowleya Feb 27, 2025
0112715
Add Client-side interaction with spalloc for reading
rowleya Feb 27, 2025
f10baea
Add package info
rowleya Feb 27, 2025
7788d86
Style fixes
rowleya Feb 27, 2025
1efed0e
Fix style
rowleya Feb 28, 2025
9c77dc2
Finally?
rowleya Feb 28, 2025
0bc6e61
Fix docs
rowleya Feb 28, 2025
8795a68
Make transceiver with all connections
rowleya Feb 28, 2025
86a56a0
Fix to create the right type of socket
rowleya Mar 3, 2025
d79dbe9
Reduce verbosity
rowleya Mar 3, 2025
79ca2ec
Create a pool of transceivers
rowleya Mar 3, 2025
27b848f
Fix test
rowleya Mar 3, 2025
0fd234e
Style fix
rowleya Mar 4, 2025
d0adf46
Finish implementation
rowleya Mar 4, 2025
d0b232a
Attempt to fix authorization errors
rowleya Mar 4, 2025
e1b09d6
Check the chip exists, then the error is better if it doesn't
rowleya Mar 5, 2025
4036d12
Avoid asking for the size of the machine after configuring for no drop
rowleya Mar 5, 2025
4536a9c
Major change in operations!
rowleya Mar 7, 2025
5243757
Fix test
rowleya Mar 9, 2025
4f36f91
Avoid issues with null in tests
rowleya Mar 10, 2025
3e19f3f
Fix test again
rowleya Mar 10, 2025
f1867d8
Add missing comments
rowleya Mar 10, 2025
d43b66c
Style fixes
rowleya Mar 10, 2025
d24a53f
Make code less "error prone"
rowleya Mar 10, 2025
0d5d8a7
Remove unused import
rowleya Mar 10, 2025
73b6137
Fix "missing" 255,255
rowleya Mar 10, 2025
8e4dc3f
In fact, get the dimensions from here where we put them!
rowleya Mar 10, 2025
35ae73a
Retry tracker *can* be null!
rowleya Mar 10, 2025
2d7348b
Get the logic right!
rowleya Mar 11, 2025
a11fe52
Try to detect issues with reusing connections in processes
rowleya Mar 11, 2025
1587770
This transceiver needs a machine!
rowleya Mar 11, 2025
f9ecbc4
Avoid logging when boot connection doesn't exist!
rowleya Mar 11, 2025
9f6d586
Debugging
rowleya Mar 11, 2025
55e74c2
More debug
rowleya Mar 12, 2025
cf6b1f5
Try just writing at service side...
rowleya Mar 12, 2025
52e0c1b
Is it really just the traffic identifier?
rowleya Mar 12, 2025
05844c3
More debugging!
rowleya Mar 13, 2025
31f5fa8
Try this again...
rowleya Mar 13, 2025
3eac39c
Undo this
rowleya Mar 13, 2025
742946c
X is X and Y is Y!
rowleya Mar 13, 2025
355273c
Fix test
rowleya Mar 13, 2025
3edd222
Remove unused
rowleya Mar 13, 2025
8cbfd22
Less info verbosity
rowleya Mar 13, 2025
5794e36
No need to "end" the class
rowleya Mar 13, 2025
a633890
Make it clear this is done on purpose!
rowleya Mar 13, 2025
8caa836
Remove more vebosity
rowleya Mar 13, 2025
56d64de
Another removal of verbosity
rowleya Mar 13, 2025
9e2a1ba
Allow setting up routers via REST call
rowleya Mar 14, 2025
f6c865b
Add a dot
rowleya Mar 14, 2025
ab16a64
Use delete as makes more sense
rowleya Mar 14, 2025
c009453
Fix docs
rowleya Mar 14, 2025
5e0e806
Fix docs
rowleya Mar 14, 2025
530c979
Allow not generating resources
rowleya Mar 14, 2025
2f463d3
Try setting a request param to get the right type
rowleya Mar 14, 2025
225a86d
It actually doesn't consume anything...
rowleya Mar 14, 2025
e49ae61
Use better parameters for this API
rowleya Mar 14, 2025
e7ffdcd
Remove unused imports
rowleya Mar 14, 2025
d4cc242
Allow reuse of the downloader by resetting the tag
rowleya Mar 18, 2025
bb0cf45
Fix issues from review
rowleya Mar 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 34 additions & 24 deletions SpiNNaker-allocserv/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -165,30 +165,6 @@ limitations under the License.
</execution>
</executions>
</plugin>
<plugin>
<groupId>uk.co.automatictester</groupId>
<artifactId>truststore-maven-plugin</artifactId>
<executions>
<execution>
<id>generate-truststore</id>
<goals>
<goal>generate-truststore</goal>
</goals>
<phase>generate-resources</phase>
<configuration>
<truststoreFile>${truststore.file}</truststoreFile>
<truststorePassword>${truststore.pass}</truststorePassword>
<servers>
<server>${ebrains.wellknown.server}:443</server>
<!-- Skip things that are down -->
<!--
<server>${ebrains.wellknown.server.dev}:443</server>
-->
</servers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>

<pluginManagement>
Expand Down Expand Up @@ -563,6 +539,40 @@ limitations under the License.
</plugins>
</build>
</profile>
<profile>
<id>generate-truststore</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<groupId>uk.co.automatictester</groupId>
<artifactId>truststore-maven-plugin</artifactId>
<executions>
<execution>
<id>generate-truststore</id>
<goals>
<goal>generate-truststore</goal>
</goals>
<phase>generate-resources</phase>
<configuration>
<truststoreFile>${truststore.file}</truststoreFile>
<truststorePassword>${truststore.pass}</truststorePassword>
<servers>
<server>${ebrains.wellknown.server}:443</server>
<!-- Skip things that are down -->
<!--
<server>${ebrains.wellknown.server.dev}:443</server>
-->
</servers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>windows-specific-bits</id>
<activation>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ public class AllocatorTask extends DatabaseAwareBean
private HistoricalDataProperties historyProps;

@Autowired
private ProxyRememberer rememberer;
private JobObjectRememberer rememberer;

@Autowired
private TaskScheduler scheduler;
Expand Down Expand Up @@ -984,7 +984,7 @@ private Collection<BMPAndMachine> destroyJob(Connection conn, int id,
return bmps;
} finally {
quotaManager.finishJob(id);
rememberer.killProxies(id);
rememberer.closeJob(id);
}
}

Expand Down Expand Up @@ -1245,7 +1245,7 @@ private Collection<BMPAndMachine> setAllocation(AllocSQL sql, int jobId,
log.info("allocated {} boards to {}; issuing power up commands",
boardsToAllocate.size(), jobId);
// Any proxies that existed are now defunct; user must make anew
rememberer.killProxies(jobId);
rememberer.closeJob(jobId);
return setPower(sql, jobId, ON, READY);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
/*
* Copyright (c) 2025 The University of Manchester
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package uk.ac.manchester.spinnaker.alloc.allocator;

import static java.util.Objects.nonNull;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.annotation.PreDestroy;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.springframework.stereotype.Component;

import com.google.errorprone.annotations.concurrent.GuardedBy;

import uk.ac.manchester.spinnaker.alloc.proxy.ProxyCore;
import uk.ac.manchester.spinnaker.machine.ChipLocation;
import uk.ac.manchester.spinnaker.protocols.FastDataIn;
import uk.ac.manchester.spinnaker.protocols.download.Downloader;
import uk.ac.manchester.spinnaker.transceiver.TransceiverInterface;

/**
* Remembers job objects so that they can be closed when the state of a
* job invalidates them. This class takes care to be thread-safe. The
* information it holds is <em>not</em> persistent.
*
* @author Donal Fellows
*/
@Component
class JobObjectRememberer {

private static final Log log = LogFactory.getLog(JobObjectRememberer.class);

@GuardedBy("this")
private final Map<Integer, List<ProxyCore>> proxies = new HashMap<>();

@GuardedBy("this")
private final Map<Integer, TransceiverInterface> transceivers =
new HashMap<>();

private final Map<Integer, Map<ChipLocation, FastDataIn>> fastDataCache =
new HashMap<>();

private final Map<Integer, Map<ChipLocation, Downloader>> downloaders =
new HashMap<>();

/**
* Called when service is shutting down. Kill <em>everything!</em>
*/
@PreDestroy
private synchronized void closeAll() {
proxies.values().forEach(list -> list.forEach(ProxyCore::close));
proxies.clear(); // Just in case
transceivers.values().forEach(txrx -> {
try {
txrx.close();
} catch (IOException e) {
log.error("Error closing Transceiver", e);
}
});
transceivers.clear(); // Just in case
fastDataCache.values().forEach(map -> map.values().forEach(fdi -> {
try {
fdi.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}));
fastDataCache.clear(); // Just in case
downloaders.values().forEach(map -> map.values().forEach(dl -> {
try {
dl.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}));
}

/**
* Note down that a job has a websocket proxy active.
*
* @param jobId
* The job ID.
* @param proxy
* The websocket proxy.
*/
synchronized void rememberProxyForJob(Integer jobId, ProxyCore proxy) {
proxies.computeIfAbsent(jobId, __ -> new ArrayList<>()).add(proxy);
}

/**
* Stop remembering a job's particular websocket proxy.
*
* @param jobId
* The job ID.
* @param proxy
* The websocket proxy.
*/
synchronized void removeProxyForJob(Integer jobId, ProxyCore proxy) {
var list = proxies.get(jobId);
if (nonNull(list)) {
list.remove(proxy);
}
}

/*
* Get the transceiver for a job.
*
* @param jobId The job ID.
*
* @return The transceiver or null if none.
*/
synchronized TransceiverInterface getTransceiverForJob(int jobId) {
return transceivers.get(jobId);
}

/** Set the transceiver for a job.
*
* @param jobId The job ID.
* @param txrx The transceiver.
*/
synchronized void rememberTransceiverForJob(Integer jobId,
TransceiverInterface txrx) {
transceivers.put(jobId, txrx);
}

/** Get the fast data in for a job.
*
* @param jobId The job ID.
* @param chip The ethernet chip to get the fast data in for.
* @return The fast data in or null if none.
*/
synchronized FastDataIn getFastDataIn(Integer jobId, ChipLocation chip) {
return fastDataCache.getOrDefault(jobId, Map.of()).get(chip);
}

/**
* Remember the fast data in for a job.
*
* @param jobId The job ID.
* @param chip The ethernet chip to remember the fast data in for.
* @param fdi The fast data in.
*/
synchronized void rememberFastDataIn(Integer jobId, ChipLocation chip,
FastDataIn fdi) {
fastDataCache.computeIfAbsent(jobId, __ -> new HashMap<>()).put(
chip, fdi);
}

/**
* Get the downloader for a job.
*
* @param jobId The job ID.
* @param chip The ethernet chip to get the downloader for.
* @return The downloader or null if none.
*/
synchronized Downloader getDownloader(Integer jobId, ChipLocation chip) {
return downloaders.getOrDefault(jobId, Map.of()).get(chip);
}

/**
* Remember the downloader for a job.
*
* @param jobId The job ID.
* @param chip The ethernet chip to remember the downloader for.
* @param downloader The downloader.
*/
synchronized void rememberDownloader(Integer jobId, ChipLocation chip,
Downloader downloader) {
downloaders.computeIfAbsent(jobId, __ -> new HashMap<>()).put(
chip, downloader);
}

/**
* Close all remembered objects for a job. This is called when the
* state of a job changes significantly (i.e., when the set of boards that
* may be communicated with changes).
*
* @param jobId
* The job ID.
*/
void closeJob(Integer jobId) {
synchronized (this) {
var proxyList = proxies.remove(jobId);
if (nonNull(proxyList)) {
proxyList.forEach(ProxyCore::close);
}
var txrx = transceivers.remove(jobId);
if (nonNull(txrx)) {
try {
txrx.close();
} catch (IOException e) {
log.error("Error closing Transceiver", e);
}
}
var fdc = fastDataCache.remove(jobId);
if (nonNull(fdc)) {
fdc.values().forEach(fdi -> {
try {
fdi.close();
} catch (IOException e) {
log.error("Error closing FastDataIn", e);
}
});
}
var dl = downloaders.remove(jobId);
if (nonNull(dl)) {
dl.values().forEach(downloader -> {
try {
downloader.close();
} catch (IOException e) {
log.error("Error closing Downloader", e);
}
});
}
}
}
}
Loading
Loading